0% found this document useful (0 votes)
87 views152 pages

Mengjiao Guo Thesis PDF

This doctoral thesis by Mengjiao Guo presents a (sub)graph isomorphism identification theorem. It proposes a 3-stage framework using graph adjacency matrices to capture structural correspondences between graphs. It employs the permutation theorem to evaluate row sums and the equinumerosity theorem to verify graph relationships. The method defines a quantitative distance measure for approximate graph matching with good expressiveness and polynomial complexity. It provides a theoretical foundation for subgraph and approximate graph matching applications.

Uploaded by

Mohamed rahmani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views152 pages

Mengjiao Guo Thesis PDF

This doctoral thesis by Mengjiao Guo presents a (sub)graph isomorphism identification theorem. It proposes a 3-stage framework using graph adjacency matrices to capture structural correspondences between graphs. It employs the permutation theorem to evaluate row sums and the equinumerosity theorem to verify graph relationships. The method defines a quantitative distance measure for approximate graph matching with good expressiveness and polynomial complexity. It provides a theoretical foundation for subgraph and approximate graph matching applications.

Uploaded by

Mohamed rahmani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 152

A (Sub)graph Isomorphism Identification Theorem

by

Mengjiao Guo
Supervisors: Prof. Jing He,
Prof. Chi Huang Chi

Faculty of Science, Engineering and Technology


in fulfilment of the requirements for the degree of
Doctor of Philosophy
at the

SWINBURNE UNIVERSITY OF TECHNOLOGY


2022
Abstract

It is generally accepted that graphs are natural representations of the relationships


between items in complex systems, such as social networks, chemical compounds, and
biological structures. Subgraph isomorphism is a generalisation of the graph isomor-
phism issue. Graph isomorphism is used to determine if the structures of two graphs
are identical, while subgraph isomorphism is an extension of the graph isomorphism
problem. A subgraph isomorphism query is an essential technique for discovering
geometrical patterns in bigger networks. Given a query network Q (also known as a
pattern graph) and a data graph G, this query returns all mappings of nodes in Q to
nodes in G that preserve their corresponding edges. Answering subgraph isomorphism
queries is important for analysing epidemic transmission patterns in social networks,
querying protein interactions in protein networks, etc. Due to the NP-Hard nature
of the subgraph isomorphism search issue, several methods have been developed to
accelerate the search. All of these techniques are based on the similarity of nodes
and subgraphs. Existing enumeration and indexing related subgraph isomorphism
methods exploit different representation orders to search potential objects, as well as
pruning rules to exclude unprofitable paths, and the use of auxiliary information to
eliminate false candidates to speed up the progress. They cannot process matching
problems with both large target and query graphs. Therefore, subgraph querying is
a knotty problem pressing for a solution. As such, the design objective of our sub-
graph topological detection technique may result in an appreciable improvement in
subgraph detection performance by combining the search space and search time.
Exact graph matching produces a binary result when comparing two graphs since
they are either identical (match) or dissimilar (mismatch). By assessing the degree

i
of similarity or dissimilarity between the two graphs, inexact graph matching algo-
rithms provide a gradual matching result. Graph distance is also named approximate
graph isomorphism, error-tolerant graph matching, it is a measure of similarity (or
dissimilarity) between two graphs. The graph distance function is an elusive ques-
tion remaining a pressing problem in a wide range of fields. In general, graphs are
usually enriched with node and edge attributes, namely, heterogeneous graphs, our
focus is encapsulating node and edge identities in a graph. Structural and semantic
features are both preserved by graphs, so the analysis of row sum and eigenspectra
for vertex and edge adjacency matrix has captured the affinity interactions within
graphs locally and globally. Therefore, we estimate the geometrical and semantic
dissimilarities/distances between graphs extensively and systemically. Several illus-
trative chemical compound cases implement in practical scenarios, thus, the theory
could be easily understood. Our method defines a synopsis quantitative distance (or
dissimilarity) measure for approximate matching, it has a good measure of expres-
siveness and a suited polynomial computation complexity, which paves the way for
graph analysis.
In this work, our (sub)graph/approximate isomorphism-based method adopts a
3-stage framework for learning and refining structural correspondences. First, an ad-
equate representation of the graph-adjacency matrix can capture the connectivity of
the graph structure. Secondly, we employ the permutation theorem to evaluate the
row sum of vertex and edge adjacency matrices of two graphs. Lastly, our proposed
scheme deploys the well-found equinumerosity theorem to verify the relationship be-
tween two graphs. The (sub)graph and approximate graph matching models provide
a solid theoretical foundation for practical applications and effective querying. In this
thesis, We close this gap by studying the problem via the permutation and equinu-
merosity theorems, which demonstrates the effectiveness of our method, which can
be solved in polynomial time.

ii
Acknowledgments

I am very thankful to my principal supervisor Prof. Jing He, who led and encouraged
me during my studies. I particularly want to express my gratitude to Prof. Sheng
Wen and Prof. Chi Huang Chi, who both illuminated inspirations and ideas for me.
I would like to express my appreciation to the Faculty of Science, Engineering
and Technology, Swinburne University of Technology. Without their support, all this
would not have been possible.
Special thanks to Matthew Mitchell and Caslon Chua for their willingness to offer
me a tutor job; Kathy Wallace, Cate O’Dwyer and Kostas Kondelias for their kindness
and patience to discuss my career issues and solve the puzzle in my life.
I’m also taking this opportunity to thank my friends in Melbourne, Afzaal Hassan
and Qinyuan Li for sharing their teaching knowledge and invaluable advice, which
really helped me polish my teaching skills and cheer me up; Limeng Zhang for having
lunch and dinner, necessary distractions and sharing happiness. Micheal Baron, Rob
Hand, Sue Hand, Rowan Forster, Michael Cutter and Darren Cronshaw for guiding
me to explore the local culture and customs.
I would also like thank all my colleagues in Algorithms and Chips group, Hui
Zheng, Junfeng Wu and Peng Zhang for sharing knowledge and research experience,
which really helped me to polish my skills.
I would like to thank my beloved parents for encouraging me to pursue a Ph.D
degree overseas over these past years, they have provided me with both physical and
emotional support.
Lastly, I appreciate every person I met during my Ph.D study and my life: thank
you for your love, support and encouragement which enabled me to overcome countless

iii
challenges and made my time in Melbourne unforgettable.

iv
Declaration

I, Mengjiao Guo, declare that this thesis titled, “A (Sub)graph Isomorphism Identi-
fication Theorem” and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a research degree
at this University.

• Where any part of this thesis has previously been submitted for a degree or
any other qualification at this University or any other institution, this has been
clearly stated.

• Where I have consulted the published work of others, this is always clearly
attributed.

• Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.

• I have acknowledged all main sources of help.

• Where the thesis is based on work done by myself jointly with others, I have
made clear exactly what was done by others and what I have contributed myself.

v
vi
Publications

• M. Guo, T. Ji, H. Zheng, H. Fa, J. He. A Triangle Framework among Subgraph


Isomorphism, pharmacophore and structure-function relationship. WWW2020
workshop. 2022.

• M. Guo, C. Chi, H. Zheng, J. He. A Subgraph Isomorphism-based Attacks


Towards Social Networks, IEEE/WIC/ACM International Conference on Web
Intelligence. 2021.

• M. Guo, C. Chi, H. Zheng, K. Zhang, J. He. Homomorphic Encryption based


Subgraph Isomorphism Protocol in Blockchain, IEEE/WIC/ACM International
Conference on Web Intelligence. 2021.

• J. He, J. Tian, Y. Wu, X. Cai, K. Zhang, M. Guo, H. Zheng, J. Wu, Y. Ji. An


Efficient Solution to Detect Common Topologies in Money Launderings Based
on Coupling and Connection[J]. IEEE Intelligent Systems, 2021, 36(1): 64-74.

• J. He, J. Chen, G. Huang, M. Guo, Z. Zhang, H. Zheng, Y. Li, R. Wang, W.


Fan, C. Chi, W. Ding, P. Souza, B. Chen, R. Li, J. Shang, A. Zundert, A Fuzzy
Theory Based Topological Distance Measurement for Undirected Multigraphs,
2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Glasgow,
United Kingdom, 2020, pp. 1-10, doi: 10.1109/FUZZ48607.2020.9177559.

• X. Wang, Z. Jiang, W. Li, R. Zarei, G. Huang, A. Ul-Haq, X. Yin, B. Zhang,


P. Shi, M. Guo, J. He. Active contours with local and global energy based-on
fuzzy clustering and maximum a posterior probability for retinal vessel detec-

vii
tion, Concurrency and Computation: Practice and Experience, Vol. 32, no. 7
(Apr 2020), article no. e5599.

• Y. Shen, Y. Mai, X. Shen, W. Ding and M. Guo. Jointly Part-of-Speech


Tagging and Semantic Role Labeling Using Auxiliary Deep Neural Network
Model. Computers, Materials and Continua, 2020.

• W. Fan, J. He, M. Guo, P. Li, Z. Han, R. Wang, Privacy preserving classi-


fication on local differential privacy in data centers. Journal of Parallel and
Distributed Computing, 2020. https://doi.org/10.1016/j.jpdc.2019.09.009.

• Y. Wu, J. He,Y. Ji, G. Huang, H. Yao, P. Zhang,W. Xu, M. Guo and Y. Li,
Enhanced classification models for iris dataset, Proceedings of the 7th Inter-
national Conference on Information Technology and Quantitative Management
(ITQM), 2019 (Best Paper).

• J. He, J. Chen, G. Huang, Z. Zhang, H. Zheng, P. Zhang, R. Zarei, F. Sansoto,


R. Wang, Y. Ji,Z. Xie, X. Wang, M. Guo, C.H. Chi, P. Souza, J. Zhang, Y.
Li, X. Chen, Y. Shi, D. Green, T. Kersi, Z.Van, D. Ralph, A polynomial-time
solution for graph isomorphism, Concurrency and Computation: Practical and
Experience, 2019. https://doi.org/10.1002/cpe.5484

• H. Zheng, J. He, P. Li, M. Guo, H. Jin, J. Shen, Z. Xie, C. Chi. Glucose


Screening Measurements and Noninvasive Glucose Monitor Methods, ITQM,
2018.

viii
Contents

1 Introduction 1
1.1 Research Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research Concerns and Contributions . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Related work 9
2.1 Basics of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 A Brief Overview of Graph Matching Problems . . . . . . . . . . . . 12
2.3 Existing Exact Graph and the Subgraph Pattern Querying Algorithms 15
2.3.1 Pure Tree Search Algorithms . . . . . . . . . . . . . . . . . . 16
2.3.2 Index-based Algorithms . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 Constraint Programming . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Algebraic Graph Theory Techniques . . . . . . . . . . . . . . 21
2.3.5 Miscellaneous Methods and Techniques . . . . . . . . . . . . . 21
2.4 Existing Inexact Graph and the Subgraph Pattern Querying Algorithms 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Graph Isomorphism Verification 25


3.1 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Undirected Graph Isomorphism . . . . . . . . . . . . . . . . . 26
3.1.2 Directed Graph Isomorphism . . . . . . . . . . . . . . . . . . 26
3.1.3 Undirected Multigraph Isomorphism . . . . . . . . . . . . . . 27
3.2 Graph Representation Method . . . . . . . . . . . . . . . . . . . . . . 27

ix
3.2.1 Triple Tuple Method . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Undirected Vertex and Edge Adjacency Matrix Representation 29
3.2.3 Undirected Multigraph Vertex and Edge Adjacency Matrix Rep-
resentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.4 Directed Graph Vertex and Edge Adjacency Matrix Represen-
tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Permutation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Calculating Arrays of Row Sum based on the Vertex and Edge
Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Mathematical Approval for Permutation Theorem . . . . . . . 42
3.4 Equinumerosity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 Maximal Linearly Independent Subsets . . . . . . . . . . . . . 51
3.4.2 Mathematical Approval for Equinumerosity Theorem . . . . . 53
3.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Performance Studies and Applications . . . . . . . . . . . . . . . . . 59
3.6.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.2 Case Study 1 - Undirected Graph Isomorphism . . . . . . . . 63
3.6.3 Case Study 2 - Directed Graph Isomorphism . . . . . . . . . . 72
3.6.4 Case Study 3 - Multigraph Isomorphism . . . . . . . . . . . . 79
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4 Subgraph Isomorphism Verification 83


4.1 Subgraph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.1 Induced Subgraph Matching . . . . . . . . . . . . . . . . . . . 85
4.1.2 Partial Subgraph Matching . . . . . . . . . . . . . . . . . . . 89
4.2 Performance Studies and applications . . . . . . . . . . . . . . . . . . 91
4.2.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.2 Case Study - Induced Subgraph Matching . . . . . . . . . . . 92
4.2.3 Case Study - partial Subgraph Matching . . . . . . . . . . . . 95
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

x
5 Quantitative Graph Distance Measurement 99
5.1 Permutation Topological Distance . . . . . . . . . . . . . . . . . . . . 100
5.2 Equinumerosity Topological Distance . . . . . . . . . . . . . . . . . . 103
5.2.1 Equinumerous topological distance of singular values . . . . . 103
5.2.2 Equinumerous topological distance of left and right singular
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Graphs with Labeled Nodes . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Performance Studies and applications . . . . . . . . . . . . . . . . . . 109
5.4.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6 Epilogue 125

xi
xii
List of Figures

1-1 Various types of graphs: (a) undirected and unlabeled, (b) undirected
with labeled nodes (different colors refer to different labels), (c) undi-
rected with labeled nodes and edges (d) directed and unlabeled, (e)
directed with labeled nodes, (f) directed with labeled nodes and edges,
(g) undirected, unlabeled multigraph, (h) undirected with labeled nodes
multigraph and (i) undirected with labeled nodes and edges multigraph. 5
1-2 Graph (a) is an induced subgraph of (c), and graph (b) is a partial
subgraph of (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3-1 Multigraph example . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


3-2 Exp1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3-3 Exp2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3-4 Exp3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3-5 permutation theorem for vertex adjacency matrix. . . . . . . . . . . . 37
3-6 permutation theorem for edge adjacency matrix. . . . . . . . . . . . . 38
3-7 Permutation theorem for vertex and edge adjacency matrix. . . . . . 40
3-8 Duality of vertex and edge for Exp2 as treating edge as vertex . . . . 42
3-9 SVD for vertex or edge adjacency matrix of candidate 1 . . . . . . . . 50
3-10 SVD for vertex or edge adjacency matrix of candidate 2 . . . . . . . . 50
3-11 The geometric explanation for the base. The same vector can be rep-
resented in two different bases (green and red arrows) . . . . . . . . . 52
3-12 Flow Chart of Graph Isomorphism Matching. . . . . . . . . . . . . . 60
3-13 Undirected Graph Isomorphism Case. . . . . . . . . . . . . . . . . . . 64

xiii
3-14 Directed Graph Isomorphism Case. . . . . . . . . . . . . . . . . . . . 72
3-15 Multigraph isomorphism matching case . . . . . . . . . . . . . . . . . 79

4-1 Graphs with vertex and edge labeling. . . . . . . . . . . . . . . . . . 86


4-2 Subgraph matching case. . . . . . . . . . . . . . . . . . . . . . . . . . 87
4-3 partial subgraph matching case . . . . . . . . . . . . . . . . . . . . . 90

4-4 The Procedure of Generating Graph c . . . . . . . . . . . . . . . . . . 93
4-5 partial subgraph matching case . . . . . . . . . . . . . . . . . . . . . 95
4-6 partial Matching Case. . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5-1 Flow chart of Quantitative Graph Distance algorithm . . . . . . . . . 100


5-2 Graphs with labeled nodes . . . . . . . . . . . . . . . . . . . . . . . . 109
5-3 Chemical structure for Aspirin . . . . . . . . . . . . . . . . . . . . . . 112
5-4 Chemical structure for Oxetane. . . . . . . . . . . . . . . . . . . . . . 114
5-5 Isomer graph generation based on SMILES formulas. . . . . . . . . . 114
5-6 Chemical structure for Thietane . . . . . . . . . . . . . . . . . . . . . 117
5-7 Isomer graph generation based on SMILES formulas. . . . . . . . . . 117
5-8 Chemical structure for Propanal and Methoxyethene . . . . . . . . . 119
5-9 Graph Generation for Propanal and Methoxyethene . . . . . . . . . . 120
5-10 Graph Generation for Chloroethane and Propanol . . . . . . . . . . . 122

xiv
List of Tables

3.1 General triple tuple format for undirected Graph . . . . . . . . . . . 28


3.2 General triple tuple format for directed Graph . . . . . . . . . . . . . 28
3.3 Triple tuple for graph Exp1 . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Vertex Adjacency Matrix of Exp1 . . . . . . . . . . . . . . . . . . . . 30
3.5 Edge Adjacency Matrix of Exp1 . . . . . . . . . . . . . . . . . . . . . 30
3.6 Triple tuple for Exp2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Vertex Adjacency Matrix of Exp2 . . . . . . . . . . . . . . . . . . . . 32
3.8 Edge adjacency matrix of Exp2 . . . . . . . . . . . . . . . . . . . . . 33
3.9 General triple tuple format for directed Graph . . . . . . . . . . . . . 34
3.10 Vertex Adjacency Matrix of graph Exp2 . . . . . . . . . . . . . . . . . 35
3.11 Edge Adjacency Matrix of graph Exp3 . . . . . . . . . . . . . . . . . 35
3.12 General format to calculate row sum of vertex adjacency matrix based
on triple tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.13 General format to calculate row sum of edge adjacency matrix based
on triple tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.14 General format to calculate row sum of binary vertex adjacency matrix
based on triple tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.15 General format to calculate row sum of binary edge adjacency matrix
based on triple tuple . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.16 The comparison with the existing benchmark algorithms and our pro-
posed algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.17 Triple tuple of graph g1 . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.18 Triple tuple of graph g2 . . . . . . . . . . . . . . . . . . . . . . . . . . 64

xv
3.19 Vertex Adjacency Matrix of g1 . (step 4 marked as G1 ) . . . . . . . . 65
3.20 Vertex Adjacency Matrix of g2 . (step 6 marked as G2 ) . . . . . . . . 65
3.21 Edge adjacency matrix of g. (step 5 marked as G1 ) . . . . . . . . . . 65
3.22 Edge Adjacency Matrix of g1 . (step 7 marked as G2 ) . . . . . . . . . 65
3.23 Computation results of vertex adjacency matrix for graph g1 (step 8) 65
3.24 Computation results of vertex adjacency matrix for graph g2 (step 9) 66
3.25 Computation results of edge adjacency matrix for graph g1 (step 10) . 66
3.26 Computation results of edge adjacency matrix for graph g2 (step 11) . 66
3.27 Left Singular Matrix Uv1 for g1 . (step 18) . . . . . . . . . . . . . . . . 67
3.28 Singular Matrix Σv1 for g1 . (step 19) . . . . . . . . . . . . . . . . . . 67
3.29 Right Singular Matrix VvT1 for g1 . (step 20) . . . . . . . . . . . . . . . 67
3.30 Left Singular Matrix Uv2 for g2 . (step 21) . . . . . . . . . . . . . . . . 68
3.31 Singular Matrix Σv2 for g2 . (step 22) . . . . . . . . . . . . . . . . . . 68
3.32 Right Singular Matrix VvT2 for g2 . (step 23) . . . . . . . . . . . . . . . 68
3.33 Maximally linearly independent system of left singular vector for V1 . 69
3.34 Maximally linearly independent system of left singular vector for V2 . 69
3.35 Maximally linearly independent system of right singular vector for V1 . 69
3.36 Maximally linearly independent system of right singular vector for V2 . 69
3.37 Left Singular Matrix Ue1 for E1 . (step 24) . . . . . . . . . . . . . . . 69
3.38 Singular Matrix Σe1 for E1 . (step 25) . . . . . . . . . . . . . . . . . . 70
3.39 Right Singular Matrix VeT1 for E1 . (step 26) . . . . . . . . . . . . . . . 70
3.40 Left Singular Matrix Ue2 for E2 . (step 27) . . . . . . . . . . . . . . . 70
3.41 Singular Matrix Σe2 for E2 . (step 28) . . . . . . . . . . . . . . . . . . 70
3.42 Right Singular Matrix VeT2 for E2 . (step 29) . . . . . . . . . . . . . . . 71
3.43 Maximally linearly independent system of left singular vector for E1 . 71
3.44 Maximally linearly independent system of left singular vector for E2 . 71
3.45 Maximally linearly independent system of right singular vector for E1 . 72
3.46 Maximally linearly independent system of right singular vector for E2 72
3.47 Triple tuple of graph g3 . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.48 Triple tuple of graph g4 . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xvi
3.49 Vertex adjacency matrix of g3 . (step 4 marked as G1 ) . . . . . . . . . 73
3.50 Vertex adjacency matrix of g4 . (step 6 marked as G2 ) . . . . . . . . . 74
3.51 Edge adjacency matrix of g3 . (step 5 marked as G1 ) . . . . . . . . . . 74
3.52 Edge adjacency matrix of g4 . (step 7 marked as G2 ) . . . . . . . . . . 75
3.53 Computation results of binary vertex adjacency matrix for graph g3
(step 8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.54 Computation results of binary vertex adjacency matrix for graph g4
(step 9) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.55 Computation results of binary edge adjacency matrix for graph g3 (step
10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.56 Computation results of binary edge adjacency matrix for graph g4 (step
11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.57 Triple tuple of graph g5 . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.58 Triple tuple of graph g2 . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.59 Vertex Adjacency Matrices for induced Matching. . . . . . . . . . . . 80
3.60 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.61 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.62 Vertex Adjacency Matrices for induced Matching. . . . . . . . . . . . 80
3.63 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.64 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.65 Computation results of vertex adjacency matrix for graph g1 (step 8) 80
3.66 Computation results of vertex adjacency matrix for graph g2 (step 9) 80

4.1 Vertex Adjacency Matrix of g. . . . . . . . . . . . . . . . . . . . . . . 86


4.2 Vertex adjacency matrix of G. . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Edge adjacency matrix of g. . . . . . . . . . . . . . . . . . . . . . . . 88
4.4 Edge adjacency matrix of G. . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 Vertex Adjacency Matrix of g1 . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Edge Adjacency Matrix of g1 . . . . . . . . . . . . . . . . . . . . . . . 89
4.7 Query graph c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

xvii
4.8 step 1: enumerate vertices from graph C . . . . . . . . . . . . . . . . 92
4.9 step 2a: enumerate edges from graph C . . . . . . . . . . . . . . . . . 93
4.10 step 2b: enumerate edges from graph C . . . . . . . . . . . . . . . . . 93
4.11 Triple tuple of graph c. . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.12 Triple tuple of graph C . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.13 Vertex adjacency matrices for query graph c. . . . . . . . . . . . . . . 95
4.14 Vertex adjacency matrices for data graph C . . . . . . . . . . . . . . 95
4.15 Edge adjacency matrix for query graph c . . . . . . . . . . . . . . . . 96
4.16 Edge adjacency matrix for data graph C . . . . . . . . . . . . . . . . 96
4.17 Triple tuple of graph c. . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.18 Triple tuple of graph C . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.19 c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.20 Vertex adjacency matrix of c′ . . . . . . . . . . . . . . . . . . . . . . 97
4.21 Edge adjacency matrix of c′ . . . . . . . . . . . . . . . . . . . . . . . 97
4.22 Edge adjacency matrix of C . . . . . . . . . . . . . . . . . . . . . . . 97
4.23 Computation results of vertex adjacency matrix for graph g1 (step 8) 97
4.24 Computation results of vertex adjacency matrix for graph g1 (step 8) 97

5.1 Row sums of vertex matrices . . . . . . . . . . . . . . . . . . . . . . . 101


5.2 Row sums of edge matrices . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Row sums of vertex matrices . . . . . . . . . . . . . . . . . . . . . . . 104
5.4 Row sums of edge matrices . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Triple tuple of O1CCC1 . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.6 Triple tuple of C1COC1 . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Vertex adjacency matrix of O1CCC1 . . . . . . . . . . . . . . . . . . 115
5.8 Vertex adjacency matrix of C1COC1. . . . . . . . . . . . . . . . . . . 115
5.9 Edge adjacency matrix of O1CCC1. . . . . . . . . . . . . . . . . . . . 115
5.10 Edge adjacency matrix of C1COC1 . . . . . . . . . . . . . . . . . . . 116
5.11 Triple tuple of Oxetane (O1CCC1) . . . . . . . . . . . . . . . . . . . 117
5.12 Triple tuple of Thietane (S1CCC1) . . . . . . . . . . . . . . . . . . . 117

xviii
5.13 Vertex adjacency matrix of Oxetane (O1CCC1) . . . . . . . . . . . . 118
5.14 Vertex adjacency matrix of Thietane (S1CCC1) . . . . . . . . . . . . 118
5.15 Edge adjacency matrix of Oxetane (O1CCC1) . . . . . . . . . . . . . 118
5.16 Edge adjacency matrix of Thietane (S1CCC1) . . . . . . . . . . . . . 119
5.17 Triple tuple of Propanal (CCC=O) . . . . . . . . . . . . . . . . . . . 120
5.18 Triple tuple of Methoxyethene (COC=C) . . . . . . . . . . . . . . . . 120
5.19 Vertex adjacency matrix of Propanal (CCC=O) . . . . . . . . . . . . 121
5.20 Vertex adjacency matrix of Methoxyethene (COC=C) . . . . . . . . . 121
5.21 Edge adjacency matrix of Propanal (CCC=O) . . . . . . . . . . . . . 121
5.22 Edge adjacency matrix of Methoxyethene (COC=C) . . . . . . . . . . 121
5.23 Triple tuple of Chloroethane (CCCl) . . . . . . . . . . . . . . . . . . 122
5.24 Triple tuple of Propanol (CCCO) . . . . . . . . . . . . . . . . . . . . 122
5.25 Vertex adjacency matrix of Chloroethane (CCCl) . . . . . . . . . . . 123
5.26 Vertex adjacency matrix of Propanol (CCCO) . . . . . . . . . . . . . 123
5.27 Edge adjacency matrix of Chloroethane (CCCl) . . . . . . . . . . . . 123
5.28 Edge adjacency matrix of Propanol (CCCO) . . . . . . . . . . . . . . 123

xix
Chapter 1

Introduction

The interrelations and influences among different objects can intuitively form vari-
ous networks, such as disease propagation networks [1], transportation systems [2],
social networks [3], biological networks [4], interlinked documents with citations [5],
recommendation systems [6] and criminal investigations [7]. These networks can all
be cast in graphical form, which is a potent modelling tool to naturally represent
the associations between nodes, consisting of nodes representing anything that has a
unique identity and clear relationships; edges express any relationship among things.
Therefore, graphs can model and express nearly all the systems involved with con-
nections between things. Some traditional data mining and management algorithms
like frequent pattern mining and classification have been modelled as the graph sce-
nario because the semantic expression in a graph is clearer and more flexible. Thus,
graphs are conducive to summarising and displaying large amounts of information in
a concise and orderly manner that are too numerous or complicated to be described
adequately.
With the continuous development of theoretical computer science, the importance
of graph data processing has become increasingly prominent. Graph matching is the
process of evaluating the similarity of graphs, which includes two approaches: exact
and inexact graph matching. The prior one aims at finding a perfect correspondence
between two graphs to be matched. While in certain cases, it is a common practice
to measure the dissimilarity of two graphs. In other words, querying a graph can

1
be processed by searching for subgraphs using the relation of inclusion or by graph
similarity (checking for structural similarity).

In theoretical computer science, a wide variety of data mining problems involves


identifying isomorphism relationships for graphs. A reasonably operational proce-
dure for checking graphs for isomorphism relationships could easily be applied in the
area of information retrieval and pattern matching and widely utilized in multiple
applications such as image processing, biology, medical data analytics, computer and
information system, chemical bond structure, and social networks. Also, other ap-
plication fields in which comparable structured data. Therefore, finding an efficient
query processing method can improve the analytical efficiency of downstream tasks.

In the area of chemistry, the definition of similarity measures between graphs is


a crucial problem. We also consider a relaxation of this problem - inexact pattern-
matching techniques have been deemed intriguing since discovering comparable matches
that are not exact is more beneficial when the data graph is noisy. For approximate
matching, some metrics are specified to quantify the results’ resemblance to the pat-
tern. Classification for subgraph pattern matching techniques may also be optimum
or approximate. An optimum algorithm ensures the discovery of a proper solution,
such as, for exact matching, all correct matches, and for inexact matching, the closest
match or a correctly sorted list of matches. In comparison, an approximate algorithm
does not guarantee to find a correct solution; e.g., for exact matching, some, but not
all, the matches; and for inexact matching, a close match, but not the closest. In
chemistry, the definition of similarity measures between graphs is a crucial problem.
Therefore, our algorithm could also handle quantitative graph distance measurement.
Thus, our work involves exact matching and inexact matching.

Note that, to prevent ambiguity in this thesis, the pairs of terms below could be
used alternatively: ‘node’ and ‘vertex’, ‘edge’ and ‘link’, ‘graph’ and ‘candidate’. In
addition, because the vertex and edge adjacency matrices are symmetric, we use ‘row’
instead of ‘row/column’ for simplicity.

2
1.1 Research Scope

This thesis considers the relative problems, and we highlight them in the following
list:

• nine basic components of graph matching are studied: (1) undirected and un-
labeled; (2) undirected with labelled nodes; (3) undirected with labelled nodes
and edges; (4) directed and unlabeled; (5) directed with labelled nodes; (6)
directed with labelled nodes and edges; (7) undirected, unlabeled multigraph;
(8) undirected with labelled nodes multigraph and (9) undirected with labelled
nodes. Different kinds of graphs are shown in Figure 1-1.

• graph with different number of nodes;

• graph representation methods;

• incident and partial subgraph isomorphism matching;

• two fundamental verification theorems;

• quantitative graph distance measurement.

The details are introduced in the following.


Graph representations to ensure better performance with subgraph isomorphism
search, especially in the case of massive data. We worked on two main approaches:
vertex adjacency matrix and edge adjacency matrix.
Graph isomorphism problems require considering the relationships among the ob-
jects. The vertex and edge adjacency matrix can offer a satisfying description of
one graph by encoding the relationship between nodes and edges into expressive and
intuitive forms because of their mathematical properties.
Quantitative graph distance measurement defines a metric space by using the eu-
clidean distance and then taking advantage of the discriminative properties provided
by the permutation and equnumerosity theorem.

3
The aim of this research problem can be stated as understanding the structure of
graphs, modeling the complex interactions between entities, and designing algorithms
for estimating the similarity between two graphs (structural and labeled graphs).
Given a query graph Q and a set of graph database D = {g1 , g2 , . . . gn }, graph
search returns a query answer set DQ = {G|C(Q, G) = 1, G ∈ D}, C could be a
function testing graph isomorphism (full structure search), subgraph isomorphism
(substructure search), approximate match (full structure similarity search), and ap-
proximate subgraph match (substructure similarity search), the function return values
range from 0 to 1. In this introductory chapter, detailed illustrations will be discussed
in the following chapters.

1.2 Research Concerns and Contributions

Working with graphs is very challenging, the challenging issues are as follows:

• Querying graph data is one of the challenging operations in the graph theory
area. Graph query problems are associated with subgraph isomorphism search-
ing in almost all large-scale graph database applications. It is knotty to tell
the isomorphic relation of any two graphs since there are approximate n! node
mapping relationships between the graphs both with n vertices by the brute
force method and the high calculation complexity. It is even harder to manage
subgraph matching in graphs because of enumerating the same scale candidate
subgraphs as the query graphs in the data graph, which is also a signification
obstruction that impedes practical usages. However, the existing subgraph iso-
morphism methods offer very unsatisfactory query performance, so a solid data
retrieval protocol is in demand. Consequently, to carry out this task elegantly
and promptly, there are several pivotal issues that need to address: 1) the
method of modelling the query and data graph; ii) the form of managing and
storing the data; and iii) how to map the data for efficient query processing.

• Graphs that arise in vast practical fields. Due to the greater expressive power

4
(a) g1 (b) g2 (c) g3

(d) g4 (e) g5 (f) g6

(g) g7 (h) g8 (i) g9

Figure 1-1: Various types of graphs: (a) undirected and unlabeled, (b) undirected
with labeled nodes (different colors refer to different labels), (c) undirected with
labeled nodes and edges (d) directed and unlabeled, (e) directed with labeled nodes,
(f) directed with labeled nodes and edges, (g) undirected, unlabeled multigraph, (h)
undirected with labeled nodes multigraph and (i) undirected with labeled nodes and
edges multigraph.

(a) g10 (b) g11 (c) g12

Figure 1-2: Graph (a) is an induced subgraph of (c), and graph (b) is a partial
subgraph of (c).

5
associated with graphs, the related cost comes into being. This cost concerns
the complexity of data representation. Nodes and edges can be accessed and
processed in arbitrary order, and the structural nature of the graph data makes
the intermediate representation and interpretability of the mining results much
more challenging. The structural storage and presentation of graphs come with a
price due to their flexible and clear expressive power. A simplified representation
of graphs is essential for further filtering and verification process.

• Filtering is a manner to reduce the search space by eliminating non-relevant


paths. Since reducing the search space is very important, so subgraph iso-
morphism problem demands a prompt and efficient solution to identify fruitful
vertices as early as possible.

• Structural data are able to explicitly model networks of relationships between


substructures of a particular object, and it is more complex than traditional
data types. While because the unavailability of conventional arithmetic oper-
ations cannot be introduced naturally for graphs as is the case for traditional
relational databases. For instance, computing the similarity of two objects,
which is a critical task in many areas, is linear in the number of data items
in the case where vectors are employed. The same task for graphs, however,
is much more complex since one cannot simply compare the sets of nodes and
edges, which are generally unordered and of different sizes. More formally, when
computing graph dissimilarity or similarity, one has to identify common parts
of the graphs by considering all of their subgraphs. Regarding that there are
Cnk subgraphs (graph pattern with k nodes and large graph with n nodes), the
inherent difficulty of graph comparisons becomes apparent. A properly invented
computational method is required to improve the retrieval performance.

• Graphs not only with geometrical information but also carry attribute informa-
tion in practice. The design of labelled structures for graphs is a considerable
challenge. Structural graphs can be modelled as attributed graphs where ver-
tices are associated with specific properties, and edges represent relations be-

6
tween two vertices. Advocating a more realistic subgraph model on attributed
graphs that think beyond the simple structure-based graph models is highly
necessary.

Our new methods are built upon the foundation of mathematics between theory
and practice. We summarize our principal contributions or innovations as follows:

• Our proposed algorithm can handle not only multiple types of the graph but
also graphs of different size;

• Some useful graph invariants of vertex and edge adjacency matrix are essential
to be introduced to reveal the inherent nature of graphs;

• We develop an efficient exact graph algorithm to find the query graph within the
structural and labeled graph. A novel approach to measuring graph similarity
between geometric graphs is also being introduced. The distance between two
geometric graphs is defined, which includes permutation and equinumerosity
distance measurement;

• With the aid of a well-defined permutation theorem, its effectiveness of ver-


ification is shown by the considerable improvement of robustness against the
greedy-search-based topology algorithm;

• We introduce a new concept, called the equinumerosity theorem, that can be


applied to all the models in the graph simulation family to improve their ex-
pressiveness;

• We formalize a threefold verification scheme, which can remarkably reduce the


matching calculation complexity. The quantitative and qualitative performance
of our detailed empirical studies shows the effectiveness of our algorithm com-
bined with filtering and validation rules on graph datasets regarding the fact
that these algorithms enjoy polynomial algorithms.

7
1.3 Thesis Organization
The remaining chapters of this thesis are structured as follows:

• Chapter 2 provides all the theoretical foundations about graphs, and exact and
inexact graph matching problems required to understand the contents of the
following chapters. It also summarizes the literature review of the most recent
and well-known work conducted in the field of graph and subgraph problems
(graph distance) thoroughly. The pros and cons of those works are pointed out.

• Chapter 3 introduces the undirected and directed graph representation method,


which is fundamental topological information of graphs. Our proposed graph
isomorphism verification employs permutation and equinumerosity theorem,
and the mathematical proofs have been discussed. We compare our proposed
algorithm with the state-of-the-art graph isomorphism algorithms and demon-
strate its effectiveness in terms of temporal and spatial complexity.

• Chapter 4 introduces the induced and partial subgraph algorithms, which deal
with subgraph matching problems with different constraints. We speed up sub-
graph isomorphism verification also by permutation and equinumerosity theo-
rem. The algorithm performances are thoroughly analyzed in theory and ex-
amined through experimentation.

• Chapter 5 presents the Quantitative Graph Distance Measurement algorithm to


explore graph dissimilarity or distance problems. We take graphs with labelled
nodes into consideration, and comprehensive experiments are carried out to
evaluate the algorithm performance in different scenarios.

• Chapter 6 summarises the contributions in this thesis, then discusses several


open issues associated with this work. Finally, the suggestions for possible
future research directions for the subgraph isomorphism problems are outlined.

8
Chapter 2

Related work

Structural data have played a prominent role in various domain technologies, such as
social networks, criminal tracking, web link analysis, epidemic spreading, biological
structures, and cheminformatics. Graphs are natural representations of relations be-
tween entities in complex systems. Therefore, effectively managing graph-structured
data is of great significance in different domains.

Graph matching is referred to as the process of evaluating the similarity between


two graphs. It is broadly categorized into exact and inexact graph matching. Exact
graph matching has more theoretical implications in graph theory, while inexact 32.
graph matching has more practical implications related to many areas like pattern
recognition, graphics, computer vision, and bioinformatics.

In this chapter, we present a detailed literature review of the works that are
related to exact and inexact graph matching problems. Section 2.1 presents a brief
overview of the basic concepts of graphs. Section 2.2 discuss the different type of
graph isomorphism problems, including undirected, directed graph and subgraph.
Then, in Section 2.3 and 2.4, different types of exact and inexact graph matching
algorithms and techniques are discussed in detail. Finally, Section 2.5 provides a
summary of the existing literature on graph isomorphism problems and the research
concerns of this thesis.

9
2.1 Basics of Graphs

Any graph can be seen as a finite collection of vertices(or nodes) connected through a
collection of edges that connect a pair of nodes. It is generally represented as G and
consists of two types of elements: vertices and edges. Where V denotes the vertex
set (equivalently node set) and E denotes the edge set (connection or join set). Each
edge has two endpoints u, and v is written as {u, v}, which belong to the vertex set.
One graph is termed as a connected graph if for every pair of vertices u and v, there
is a path from u to v.
An edge (or link) of a graph is one of the connections between the vertices (or
nodes) of the graph. Edges can be undirected, directed, looped, multiplied, unlabeled
and labelled.
A vertex (or node) of a graph is one of the connections that bridge the edges (or
links) of the graph. Vertices can be unlabeled and labelled.
Undirected Graph An undirected graph has an unordered pair of vertexes. In
other words, all the vertexes connect together by bidirectional arcs, or all edges are
undirected (without arrows). When there is an edge representation as (u, v), it is
possible to traverse from u to v, or v to u as there is no specific direction.
Directed Graph A directed graph or digraph G = (V, E) consists of a vertex set
V and an edge set of ordered pairs E of elements in the vertex set. The connected
vertexes have specific directions. The edges of the graph represent a specific direction
from one vertex to another. If there is a direction edge from u to v, the first element
u is the initial or start node and The second element v is the terminal or end node.
The direction is from u to v. Therefore, we cannot consider v to u direction.
Multigraph It is possible to have a single edge or multiple (two or more) edges
that join the same pair of vertices, and the later are called parallel or multiple (and
pointing in the same direction if the graph is directed). An edge connecting a vertex
to itself is called a loop. A graph with neither self-loops nor multiple edges is called
a simple graph.
Labeled/Weighted Graph In a weighted graph, the total weight of the path is

10
the sum of the edge weights for the edges in the path. In a weighted graph, the length
and the total weight may be interesting, depending on the graph. In an unweighted
graph, the concept of “total weight” does not exist since individual weights do not
exist either. Note that the path with the least weight and the fewest edges are not
always the same.
Degree The degree of a node v, denoted by d(v), is the number of links that
connect with other nodes. For directed graphs, the out-degree of a node is the number
of edges entering it; the in-degree of a node is the number of edges coming out of it.
Incidence In graph theory, a vertex is an incident to an edge if the vertex is one
of the endpoints of that edge; an edge is an incident to an edge if two edges share a
common vertex.
Adjacency (If two vertices are connected by an edge in a graph, we say the
vertices are adjacent. The set of vertices adjacent to v is called the neighbourhood
of v, denoted N (v).)
Data Graph (The data graph is a large graph and is the pre-saved data to be
queried on. Given a particular context, we sometimes use data to refer to the data
graph for short.
Query Graph In contrast, the query graph is much smaller than the data graph.
We also use a query or simply query pattern the same as a query graph.) The general
idea is to identify an explicit pattern of the query graph within the data graph.
Subgraph and Supergraph If there exists an embedding of Q = (VS , ES ) in
G = (V, E), then Q is a subgraph of G, denoted by Q ⊆ G, and G is said to be a
supergraph of Q. Vertex set VS is a subset of the vertex set V , that is VS ⊆ V , and
whose edge set ES is a subset of the edge set E, that is ES ⊆ E.
Incident Subgraph An induced subgraph can be constructed by deleting vertices
(and with them all the incident edges), but no more edges. If additional edges are
deleted, the subgraph is not induced. That is, an incident subgraph keeps the same
edges as the original graph between the given set of vertices.
Partial Subgraph In contrast to the incident subgraph, a non-induced subgraph
could have fewer edges than an incident subgraph.

11
The induced and partial subgraphs are shown in Figure 1-2.
Graph Representation Graphs form a complex and expressive data type. We
need methods for representing graphs in databases and manipulating and querying
them. In algebraic graph theory, a matrix can be used to encode some relationships
between entities and obtain the structural properties of a graph.
Adjacency Matrix The adjacency matrix representation as a dominant method
to store graph data in memory to optimize relative graph algorithms. An adjacency
matrix is a V by V (0,1)-matrix of integers with zeros on its diagonal utilized to
describe a finite graph, representing a graph G = (V, E). The connections are rep-
resented via adjacency matrix A, where Aij ̸= 0 denotes (vi , vj ) ∈ E, while Aij = 0
denotes (vi , vj ) ∈
/ E. The degree of node vi is d(vi ). If the edges between nodes are
directed, the in-degree and out-degree are denoted as d+ and d− , respectively. The
number of vertexes and edges of a network are |V | = n, and |E| = m, respectively.
In this thesis, we assume a network is unweighted and undirected unless specified
explicitly.

2.2 A Brief Overview of Graph Matching Prob-


lems

A graph is a prominent mathematical structure, the process of finding structural


mappings between nodes and edges of two graphs under different constraints is re-
ferred to as graph matching. The inherent flexibility makes graph comparisons suffer
from high computational complexity. The proximity between two graphs has been
identified in two main categories: exact graph matching and inexact matching. In
this section, we discuss graph isomorphism and graph similarity problems.
Exact matching also names perfect matching. The aim of exact graph matching
is to find the strict one-to-one alignments (bijection) between nodes and edges of two
structures, which is closely related to graph isomorphism problems, where each edge
connects two nodes in one graph there must also exist an edge between the corre-

12
sponding pair of nodes in another graph. The graph isomorphism problem is one of
the few problems in computational complexity theory that belongs to NP; however,
it is still not known whether it is solvable in polynomial time or NP-complete. The
subgraph isomorphism is a generalization of the graph isomorphism problem. Sub-
graph isomorphism is a subproblem of subgraph matching, which finds all subgraph
isomorphisms from a query graph to a data graph, and it is NP-complete. [8].
Because of the stringent conditions of exact graph matching imposed for matching,
exact matching is normally inapplicable in real-world applications. Exact matching
can only identify whether two graphs are exactly similar, while it does not explore
similarity space between dissimilar graphs. For example, fault-tolerant mapping is an
essential research topic in the graph domain due to graph data may get some modifi-
cations. Therefore, it is necessary to compute the approximate matches between two
non-isomorphic graphs. This problem is also referred to as inexact graph matching.
Most variants of the graph matching problem are well known to be NP-hard.
46.Inexact graph matching accounts for errors, noises, and distortions during the
matching process. It is able to accommodate these differences between graphs. Com-
pared with exact matching, the structural constraint has relaxed to some extent to
quantify the closeness of non-identical graphs.
There are a large number of well-studied works paying much more attention to
both exact and approximate subgraph query [9], [10], [11]. Next, different kinds of
query methods of a subgraph can be bracketed into three classes [12] are presented
in detail. (1) Subgraph/supergraph containment query; (2) Graph pattern matching;
and (3) Graph similarity search.

1. In the first category, the input is a graph database consisting of a number


of small graphs. Given a query graph Q and a set of graph database D =
{g1 , g2 , . . . gn }, containment search problem is formulated as attempting to
recognize all graphs gi ∈ D such that g belongs to Q (g ⊆ Q). To retrieve the
database contained by the given query graph can be categorized into two basic
search problems, subgraph containment search and supergraph containment
query. The subgraph containment search retrieves over a graph database to

13
identify the graphs which are sub-isomorphic to the query graph. Whereas the
supergraph containment query algorithm returns the graphs that are subgraphs
contained in the query graph. The matching subgraph/supergraphs of the query
graph are returned from the database. The subgraph containment search aims
to retrieve if there exists one substructure in the data graph for each graph
candidate. Subgraph containment search requires to index data graphs and then
filtering and verification procedures have been launched. The filtering process
narrows down the search scope to decrease the computing overhead. Each node
in the candidate list would make a comparison with the query graph in the
verification phrase. The nodes corresponding to the query will be saved. Several
works were proposed to work on the subgraph containment search problem (such
as gIndex [51], Tree+delta [59] FG-index [28], gCode [62] and others.)

2. In the second category, the issue is all occurrences should be identified in a


large target data graph that matches a user-given graph pattern, and the input
only provides one large graph. As large-scale networks come forth continuously,
it has been increasingly used in quite a few domains, such as dynamic net-
work traffic, protein interaction, knowledge discovery and social networks. The
fundamental task of effectively and efficiently finding all subgraph patterns in
a large target network is of demanding need. Graph pattern matching query
processing algorithms includes enumeration, graph indexing and partitioning,
optimization, distributed graph processing systems, and query decomposition,
which have received a lot of attention from both academia and industry [36].
This strategy is more complicated than the subgraph containment search be-
cause subgraph matching requires enumerating all occurrences for a given query
graph in a data graph. Generally, the crucial principle of solving this problem
is to eliminate the irrelevant nodes as early as possible, ensuring the verification
is more effective.

3. In the third type of query, a graph database with a number of graphs in the in-
put, the graph similarity search tries to obtain a series of graphs that are isomor-

14
phic (or similar) to the given query graph in a graph database. Graph similarity
search needs the aid of some graph proximity function to measure the similarity
of graphs. Current works mainly rely on a definition of a similarity measure
between two graphs, and then the unqualified candidate graphs can be excluded
by a filtering mechanism. And then, the costing graph search operations are
performed to verify the candidate graphs. There have been proposed several dif-
ferent ways of modelling and computation of similarity between graphs, such as
graph edit distances, maximum common subgraphs, edge/feature misses graph
alignment and graph kernels.

The isomorphism problem is to devise an algorithm for estimating whether two


specific directed graphs are isomorphic (two graphs G1 and G2 are determined to
be isomorphic if there exists a one-to-one correspondence between their vertices and
edges such that the incidence relationship is preserved, each corresponding vertex
share the same degree for undirected graphs, and each corresponding vertex share
the same indegree and outdegree for directed graphs.
In our thesis, we mainly survey graph matching problems based on the second
and third categories that relate to a research concern in this thesis.

2.3 Existing Exact Graph and the Subgraph Pat-


tern Querying Algorithms
For graph isomorphism problem is a special case of subgraph isomorphism problem,
graph isomorphism problems will not be discussed individually. Subgraph isomor-
phism is also known as subgraph pattern matching, subgraph search, subgraph query
or subgraph mapping.
There are a large number of approximations, and suboptimal algorithms have been
studied for graph and subgraph isomorphism problems, and there are no polynomial
time-efficient solutions available. This section first discusses the general character-
istics of the representative algorithms and then presents the main categorisation of

15
these algorithms/techniques. We have identified five main paradigms.

2.3.1 Pure Tree Search Algorithms

The most common technique to establish a subgraph isomorphism is based on back-


tracking in a search tree. In order to prevent the search tree from growing unnecessar-
ily large, different refinement procedures are used. The tree search with a backtrack-
ing scheme gets solutions by adopting a depth-first search method by incrementally
making the matches between all query vertices and candidate data vertices. A partial
matching (initially empty) is represented by a state space representation. At each
new step, new nodes are added to a solution iteratively until the potential solution is
consistent with the constraints of the subgraph isomorphism can either be accepted or
ruled out based on some conditions. And generally, some specific heuristics to prune
unfruitful search paths as early as possible. In the latter case, the algorithm would
backtrack, removing the previous pair and probing new search paths. The lookup
complexity of tree search-based algorithms grows linearly with the size of the data
graph.
Ullmann algorithm [13] came up with the pioneering search method that was
developed to find isomorphic patterns as well as monomorphism in 1976, which is still
the most popular graph matching algorithm and the basic framework for a number of
subsequent subgraph isomorphism algorithms. It is composed of 2 main tasks: tree
searching and refinement procedure. The refinement procedure to eliminate unfruitful
matches which are not consistent with the current partial matching works on a matrix
to match the nodes recursively until no possible matching matrix exists. Subsequently,
it updated this version to achieve high performance in 2010.
VF Cordella et al. [14] presented an algorithm named VF in 1998, which performs
for both isomorphism and subgraph isomorphism using state-space representation.
In this framework, each state associates to a partial mapping solution of the graph
matching process. It uses several pruning strategies to prune out the unpromising
results of partial mapping as much as possible, which are not satisfied the requirement
of graph mapping. This heuristic is significantly fast to compute, leading in many

16
cases to a significant improvement over Ullmann’s algorithms, as shown in [15] in
1999.
VF2 VF2 The same authors for [14] proposed an improved version of VF algorithm
named VF2 algorithm [16] in 2001, which has reduced the memory requirement from
O(n2 ) to O(n) with respect to the number of nodes in the graphs, thus making the
algorithm mainly work well with large graphs. Unlike VF, VF2 algorithm defines
orders for query vertices to be selected. VF2 adopts a depth-first strategy with
backtracking, and it contains two main phrases: search and refinement. The first
step is basically as same as Ullman’s algorithm. The main difference is displayed in
the refinement phase. The algorithm deals with the first vertex initially and then
selects a vertex connected with the already matched query nodes, searches for a
subgraph match, and backtracks if not. The real innovation of VF2 is it brings in
feasibility rules to prune in advance.
RI algorithm [17] is similar to VF2. It has a complexity linear in space and
quadratic in time in the average case. RI can be used for both graph isomorphism
and subgraph isomorphism.RI applies a static pattern preordering procedure before
the search, which guarantees the next node visited involves more constraints with
the matched nodes and considers extra constraints to reduce the search space at the
earliest.
VF2+ algorithm [18] is a modification of VF2 algorithm, which improves to
optimize search paths by providing the prior probability of a candidate node and a
search sequence instead of deep first search strategy that the VF2 algorithm employed.
VF2++ algorithm [19] also determines the matching order of a search sequence
with cutting rules rather than deep First search applied on VF2 algorithm. The
search sequence is improved based on Breath First search to find the most unfruitful
branches. Furthermore, pruning rules are also provided to reduce searches astray.
That means this algorithm tends to recognize searches astray and apply pruning
rules at the highest levels; fruitless searches would be less while the investigation goes
deeper.
RI-DS It [20] is a variant of RI for dense graphs. The difference between them

17
is that RI-DS precomputes a compatibility map among pattern and target vertices,
which can reduce the total running time dramatically.
VF3 [21] is a variant of VF2+ algorithm that is specially designed for larger and
denser graphs. It keeps its structure based on a state space representation and uses
depth-first search with backtracking and several heuristic rules to prune the search
space. VF3 markedly reduces the scale of the number of explored nodes to get the
candidate node and the time spending on each state by computing the coverage tree
during the preprocessing of the data graph.

2.3.2 Index-based Algorithms

Graph indexing-based algorithms mainly adopt two critical steps, nodes pruning strat-
egy and matching order selection, and leverage some auxiliary indices to accelerate
the exploration. First, enumerating every possible index path from a database graph
up to a maximum length and index them to compute a graph index, that is, a vector
or a tree of features representative of the structural and semantic information of a
graph; then, searching all the indices of the subgraphs that containing the same index
inside the data graph. The size of the index path set could increase drastically with
the size of the graph database.
QuickSI [22] proposes a search sequence QuickSI tries to access nodes having in-
frequent node labels and infrequent adjacent edge labels as early as possible. Specif-
ically, instead of using label frequency information from a query graph as in VF2,
QuickSI pre-processes data graphs to compute the frequencies of node labels and the
frequencies of a triple (source node label, edge label, target node label). By using the
calculated edge label frequencies, we assign a weight to each query edge and obtain a
minimum partial tree using a modified Prim algorithm. QuickSI creates a sequence
by using the order in which the vertices are inserted into the minimum partial tree.
QuickSI is designed for handling small graphs, and it does not have a filtering phase
and auxiliary indices for matching on a single data graph.
GraphQL [23] mainly performs two technical skills for filtering: neighbourhood
signatures and pseudo subgraph isomorphisms. The neighbourhood signatures of data

18
graph vertices prune the initial candidate set; the pseudo-isomorphism technique nar-
rows the search space globally. In particular, GraphQL uses a simple greedy method
by working with a bipartite matching between the query graph and the candidates in
the data graph until it achieves the specific requirement. These strategies are costly
but efficient. This algorithm is designed to work with large graphs.
GADDI [24] indexes a data graph based on a neighbourhood discriminating struc-
ture distance of node pairs in the data graph. Then, a two-way pruning dynamic sub-
graph matching procedure launches. It mainly deals with small and medium-sized
graphs.
SPath [25] algorithm uses a path-based indexing technique as patterns of com-
parison in the data graph and neighbourhood signatures to minimize searching space.
It converts the query graph into a set of shortest paths in order to query. It relies on
a disk-based indexing technique, so it is suited for handling large graphs.
TurboIso [26] proposes a new concept of the neighbourhood equivalence class. It
provides a method of merging similar nodes in the query graph and transforms the
query graph into a spanning tree and then filters out the unnecessary intermediate
results by the path filtering method.
BoostIso [27] is optimized for TurboIso, and the proposed preprocessing tech-
niques are applicable to all subgraph isomorphism algorithms. Firstly, it merges the
nodes with similar attributes in the data graph, after that, further reduces the unfea-
sible candidate nodes by the path filtering method. However, TurboIso and BoostIso
are inapplicable to the situation of a query graph and data graph that have dis-
similar nodes. In addition, the running time of the path filtering method increases
exponentially along with the growth of path length.
CFL-Match [28] presents the core-forest-leaf decomposition of the query graph
and the compact path-based index aiming to postpone cartesian products. The query
graph has been decomposed into three substructures: core, forest and leaf before the
BFS tree construction, and each of these substructures execute subgraph matching.
It contains three procedures: it constructs a spanning tree of the query graph; then, it
calculates all non-tree edges in the minimal connected subgraph; lastly, it iteratively

19
rules out the nodes of degree 1 and counts the degrees of each node.
CECI [29] is similar to Turbo-iso, and the intersection-based method included in
this work makes it performs better than the edge verification method CFL-Match for
the most part. This algorithm can’t work well with large graphs.
DAF [30] proposes several novel techniques for pruning. It also adopts the
intersection-based method to find the candidates. However, it is difficult to apply
on large graphs due to its inherent sequential nature.

2.3.3 Constraint Programming

Subgraph isomorphism problems could be formulated as constraint programming


problems in a straightforward way. The method is to give each node of the graph a
variable, which has a domain of possible values while meeting all of the constraints
(adjacent vertices must be mapped to adjacent vertices). In detail, it first computes a
domain of compatibility for each node of the pattern; then, the domains are iteratively
reduced by constraint propagation with a backtracking search until the majority of
successor nodes have been ruled out.
Forward Checking pioneered the constraint programming approach [31] in 1979,
which describes a backtrack search algorithm for the maximal common subgraph
problem. It searches for the maximum common subgraph by employing a simple
backtracking search on the possible common subgraph space and inspecting the po-
tential solution set, and then determining whether it is necessary to bring in the
current solution.
nRF+ One of the most recent constraint satisfaction frameworks for isomorphism
has been proposed by Larrosa and Valiente [32] in 2002, the authors studied the
discrete optimization and operational research.
LAD The LAD algorithm [33] by C. Solnon studies a new filtering algorithm based
on global constraints that the mapping between the nodes of the two graphs must
be injective and edge-preserving. The algorithm iteratively selects a node couple
and propagates that the constraints are satisfied for all the neighbours of the two
unmatched nodes until no other nodes can be removed. The space complexity of

20
LAD is quadratic in the average case, while the time complexity is between n2 and
n4 , where n is the size of the query graph.
Improved Ullmann [34], Ullmann proposed a new bit-vector algorithm for bi-
nary constraint satisfaction in 2011 by computing subgraph isomorphism using binary
constraint satisfaction, which primarily depends on the search and partially on do-
main editing. It is a newly updated version paper of the work [13].

2.3.4 Algebraic Graph Theory Techniques

Algebraic methods applied to graph problems are known as algebraic graph theory.
The structure and properties of a graph G can be revealed by studying the matri-
ces associated with G, such as the adjacency matrix, the incidence matrix, and the
Laplacian.
Nauty [35] is based on group theory. This paper constructs an efficient way for
computing the generators for the automorphism group of the graphs. The canonical
labelling of the input graphs can be derived from the automorphism group. It de-
termines whether two graphs are isomorphic by comparing the adjacency matrices of
canonical form for corresponding graphs. The equality verification can find an optimal
assignment solution in O(n2 ) time, but the computation of canonical labelling cannot
be solved in subexponential time in the worst case. In most cases, this algorithm can
achieve the desired performance.
Traces The same authors of Nauty [36] in 2014 improved the Nauty program and
introduced a novel technique that is a global vertex order for the whole graph and a
local order for each patch in a preprocessing step. The experimental result indicated
it is more efficient than most of the other existing graph isomorphism tools.

2.3.5 Miscellaneous Methods and Techniques

Messmer and Bunke [37], in 1998, proposed a new technique of using a preprocessing
step to convert a graph into a decision tree, which can be solved in polynomial time.
However, the time of preprocessing step and the space of decision tree construction

21
is increased with the number of nodes in the graphs. Weber et al. Messmer and
Bunke [38], in 2000, proposed a technique to find a subgraph as the query graph
from an extensive database of preprocessed graphs. This method follows the rule of
subgraph containment query. It is possible that the dataset may contain multiple
subgraphs like query graphs. Still, only one of the final results will be represented,
which can save execution time. [39], in 2011, the same authors described an extension
of the above approaches to reduce the storage space and indexing time dramatically.

STwig [40] achieves a higher efficiency without considering graph structure in-
dices, but it uses parallel technology. This method decomposes the query graph into
two-level trees by a sophisticated algorithm and adopts exploration and join mecha-
nisms, which can reduce both the number of two-way join operations and the size of
joins’ parameters. However, time-consuming join operations and unevenly distributed
graph data result in the space and time complexity of STwig being relatively high.

Quasipolynomial GI-algorithm [41] L´aszl´o Babai, in 2016, put forward the


graph isomorphism problem is solvable in sub-exponential time complexity bound
having running time within exp(O(log(n)c )) for a graph with n vertices if there exist
constants c ≥ 2. It is currently considered the fastest algorithm. (While the general
graph isomorphism problem’s complexity is not resolved, polynomial-time algorithms
are presented for particular classes of graphs.)

Many algorithms such as VF2 [16], VF2+ [18], VF2++ [19], VF3 [21], QuickSI
[22], GADDI [24], GraphQL [23] and SPath [25] are presented to enhance the perfor-
mance of Ullmann algorithm [13] in recent years. These algorithms mentioned above
exploit different join orders to search potential objects, pruning rules and auxiliary in-
formation to exclude false positive candidates, and auxiliary information to eliminate
false candidates to speed up progress. They lack a well-defined filtering scheme to find
optimal solutions and scalable validation rules to decrease computational complexity
as the data scale grows, and none of these algorithms is designed to handle all types
of graphs of all sizes. In addition, no polynomial runtime algorithm is known for the
problem of graph isomorphism, except some algorithms have been devised for spe-
cial structures, such as trees [42], bounded-valence graphs [43], ordered graphs [44],

22
graphs with unique node labels [45]. Planar graphs [46]. Therefore, new strategies
are needed to be able to compete with the challenges that recently arose.

2.4 Existing Inexact Graph and the Subgraph Pat-


tern Querying Algorithms

Inexact matching is also known as error-tolerant matching or error-correcting graph


matching. Several other important error-tolerant graph matching algorithms include
pure tree search algorithms, relaxation labelling, spectral decompositions, graph edit
distance, artificial neural networks and graph kernels. We will review some of the
existing literature on graph edit distance and the research concerns of this thesis.
Spectral methods are one of the important and most widely used techniques for
graph matching [47], [48], [49], [50]. The general idea is to understand the properties
of the structural matrices associated with a graph. There are several natural ma-
trices to associate with graphs, such as the adjacency matrix, the Laplacian matrix,
the degree matrix and so on. A common technique is deducing proximity relation
by means of the eigendecomposition operation of their structural matrix, and eigen-
value and eigenvector are invariant with respect to node permutation; in addition,
they can be calculated efficiently in polynomial time. The theoretical support for
the spectral method is structural matrices of graphs remain unchanged during node
rearrangement, and therefore, their structural matrices will have the same eigende-
composition if two graphs are isomorphic. Hence, the similarity between graphs could
be calculated based on the relationship of the spectrum of the graph matrix.
A number of graph matching techniques based on the spectral method of algebraic
graph theory are as follows:
Umeyama [47] in 1988 proposed a polynomial-time eigendecomposition-based tech-
nique to compute a bijection between two weighted graphs of the same size. Umeyama’s
method as a pioneer work has become a commonly used reference. If the graphs are
close to each other, a good solution quality could be achieved. Although EDGM is

23
elegant in its formulation and easy to use and computationally efficient, it is sub-
ject to certain restrictions: 1) the input graphs should be of the same size; 2) it is
susceptible to weight errors for weighted graphs; 3) the eigenvalues of the adjacency
matrix of each graph have to be single and isolated enough to each other; 4) the
rows of the matrix of the corresponding absolute eigenvectors cannot be very similar
to each other. [51] improves Umeyama’s work to match two graphs with different
numbers of nodes by choosing the largest k-eigenvalues as the projection space. In
2004, Caelli and Kosinov [52] presented an algorithm using eigen-subspace projec-
tions and vertex clustering techniques to handle graph matching, which is used for
inexact many-to-many graph mapping relations instead of one-to-one matching. In
2005, Shokoufandeh et al. [53] provided an effective mechanism for indexing hierarchi-
cal image structures into large databases of directed acyclic graphs by using spectral
characterization.
The main problem of spectral methods in these existing works is that they are
extremely sensitive towards the structure of the matrix. Moreover, most of these
methods are structurally oriented, limiting them only applicable to unlabeled graphs.

2.5 Summary
This chapter discussed the available algorithms for exact and approximate graph
matching problems. We identified the strategy for different frameworks and the ad-
vantages and limitations of existing algorithms.
The above works have substantially contributed to the development of subgraph
matching problems. Now, no Polynomial runtime algorithms are known for the issue
of graph isomorphism, except for some unique graphs.

24
Chapter 3

Graph Isomorphism Verification

This chapter introduces two theorems for graph isomorphism verification. It focuses
on the undirected graph, undirected multigraph and directed graph isomorphism ver-
ifications. The isomorphism verification algorithm contains the permutation theorem
and equinumerosity theorem. So far, however, no specific polynomial-bound algo-
rithm for graph isomorphism has been recognized. This is due to the inherent diffi-
culties that are contained in this particularly elusive and challenging problem. It has
aroused theoretical interest due to its affiliation with the concept of NP-completeness;
due to the tricky essence of the issue, many graph theorists have committed signif-
icant time to it. Several algorithms on the subject have been scarce, and progress
has been slight. We proposed a simple directed graph isomorphism polynomial-time
algorithm as NP=P. The theoretical analysis and experimental results show that the
proposed protocol is effective and efficient.

This chapter is organized as follows. Section 3.1 describes undirected and directed
graph isomorphism problems. The graph representation method is discussed in Sec-
tion 3.2. Then, we elaborate on the permutation theorem in Section 3.3 and the
equinumerosity theorem in Section 3.4. In Section 3.5, we also discuss some extended
contents of our proposed algorithm. After that, we provide algorithm complexity and
3 case studies in Section 3.6. Finally, we summarise this chapter in Section 3.7.

25
3.1 Graph Matching
In the graph domain, the characteristics of different types of the graph are not uni-
form. We now present the definitions of graph isomorphism in this section.

3.1.1 Undirected Graph Isomorphism

Definition 1. Two isomorphic graphs G1 and G2 is a bijective map f from the vertices
of G1 to the vertices of G2 that preserves the “edge structure” in the case that there
is an edge from vertex u to vertex v in G1 if and only if there is an edge from ƒ(u) to
ƒ(v) in G2 [54].

Definition 2. An isomorphism of graphs G1 and G2 is a bijection (f : bijective


function) between the vertex sets of G1 and G2 .

f : V (G1 ) → V (G2 )

where V (G1 ) is the vertex sets of G1 and V (G2 ) is the vertex sets of G2 [54].

Definition 1 and 2 are equivalent. Both two definitions are indicating that any
two vertices u and v of G1 are adjacent in G2 if and only if ƒ(u) and ƒ(v) are adjacent
in G2 . This kind of bijection is commonly described as “edge-preserving bijection”,
in accordance with the general notion of isomorphism being a structure-preserving
bijection [54]. If an isomorphism exists between two graphs, then the graphs are
called isomorphic and denoted as G1 ≃ G2 .

3.1.2 Directed Graph Isomorphism

Definition 3. Two isomorphic graphs G1 and G2 is a bijective map from the vertices
of G1 to the vertices of G2 that preserves the “directed edge structure” in the case
that there is a directed edge from vertex u to vertex v in G1 if and only if there is
a directed edge from u to v in G2 [55] and [54]. An isomorphism of directed graphs
G1 and G2 is a bijection between the vertex sets of G1 and G2 , f : V (G1 ) → V (G2 ),
where V (G1 ) is the vertex sets of G1 and V (G2 ) is the vertex sets of G2 [55] and [54].

26
The definition indicates that any two vertices u and v of G1 are adjacent in G2
if and only if f (u) and f (v) are adjacent in G2 . This kind of bijection is commonly
described as “edge-preserving bijection”, in accordance with the general notion of
isomorphism being a structure-preserving bijection [55] and [54]. If an isomorphism
exists between two directed graphs, then the directed graphs are called isomorphic
and denoted as G1 ≈ G2 [55] and [54].

3.1.3 Undirected Multigraph Isomorphism

Multigraph isomorphism has opened a wide area of extensive research due to its well-
known NP-complete nature, and nondeterministic polynomial-complete [54]. In exact
graph matching, if there exists a bijective mapping among the vertices and edges on
them; Thus, each pair of two isomorphic graphs share a common structure. A multi-
graph may also contain directed and undirected edges. Multigraphs are more generic
than simple graphs. Simple graphs usually are not rich with multi-edge information,
while multigraph permits multiple edges/relations between a pair of vertices. And
many real-world datasets can be modelled as a network with a set of nodes intercon-
nected with each other with multiple relations. So, the crucial difference is to capture
the multi-edge information.
Multigraph isomorphism based on edge structure: Two multigraphs G1 and G2
are isomorphic if there exists between their vertices a bijection mapping f that cor-
responds the vertices of G1 to the vertices of G2. That keeps the “edge structure”
in the case that there is an edge between vertex u and vertex v in G1 if and only if
there is an edge between f (u) and f (v) in G2 [55].

3.2 Graph Representation Method

In this section, we present graph representation mechanism, in terms of triple tuple,


array of row sum, undirected and directed binary vertex and edge adjacency matrix
presentation method.

27
Figure 3-1: Multigraph example

3.2.1 Triple Tuple Method

Triple tuple, in a simple directed graph where the number of nodes is n, and the
number of edges is m. We employ a triple tuple for each edge e = (j, vs , vt ), where
j = 1, 2, . . . m. For an undirected graph, vs as an endpoint and vt as another endpoint
are two nodes for an edge ej . For a directed graph, vs as a starting node and vt as an
ending node are two nodes for an edge ej . The general format of the triple tuple for
a graph is shown in Table 3.1 and 3.2.

Table 3.1: General triple tuple format for undirected Graph

Edge ID Node1 ID Node2 ID


l1 l1−1 l1−2
l2 l2−1 l2−2
... ... ...
lm−1 l(m−1)−1 l(m−1)−2
lm lm−1 lm−2

Table 3.2: General triple tuple format for directed Graph

Edge ID Starting Node ID Ending Node ID


l1 l1−s l1−e
l2 l2−s l2−e
... ... ...
lm−1 l(m−1)−s l(m−1)−e
lm lm−s lm−e

The matrix generation method for undirected and directed graphs is different. In
Section 3.2.2 and 3.2.4, we will elaborate on the progress.

28
3.2.2 Undirected Vertex and Edge Adjacency Matrix Rep-
resentation

Figure 3-2: Exp1

A graph is endowed with distinct positive integers for subscript of the vertex v1 ,
v2 , . . . , vn , where n is the number of the vertices of the graph [54], we call v1 , v2 , . . . ,
vn the vertex label. Likewise, two graphs are endowed with unique labels commonly
taken from the positive integer range for subscript of edge, where m is the number of
the edges of the graph and natural number, used only to uniquely identify the edges,
we call e1 , e2 , . . . , em , the edge label. Both vertex and edge labelling methods could
represent the undirect graph uniquely.

Triple Tuple for graph Exp1

One triple tuple is produced to represent the finite graph like graph 3-2 shown in
Table 3.3.

Table 3.3: Triple tuple for graph Exp1 .

Edge ID Node1 ID Node2 ID


1 1 2
2 2 3
3 3 1
4 1 4

29
Vertex Adjacency Matrix Representation Method

The vertex adjacency matrix is a Boolean square matrix representing a finite graph.
The elements (valued 0 and 1) in the matrix denote whether pairs of vertices are
connected or not in graph Exp1 3-2. For example, in graph Exp1 , v2 are adjacent
with v1 and v3 . Then, in the vertex adjacency matrix, (v1 , v2 ) = 1 and (v1 , v3 ) =
1. The corresponding vertex adjacency matrices for graph Exp1 are shown in Table
3.19.

Table 3.4: Vertex Adjacency Matrix of Exp1

v1 v2 v3 v4
v1 0 1 1 1
v2 1 0 1 0
v3 1 1 0 0
v4 1 0 0 0

Edge Adjacency Matrix Representation Method

We create the edge adjacency matrix which is a Boolean square matrix which repre-
sents a finite graph. The elements in the matrix denote whether pairs of edges are
connected with each other or not in the graph Exp1 3-2. For example, in the graph
Exp1 , e2 is adjacent with e1 and e3 . Then, in the edge adjacency matrix, (e1 , e2 ) = 1,
(e2 , e3 ) = 1. The corresponding edge adjacency matrices for graph Exp1 is shown in
Table 3.21.

Table 3.5: Edge Adjacency Matrix of Exp1

e1 e2 e3 e4
e1 0 1 1 1
e2 1 0 1 0
e3 1 1 0 1
e4 1 0 1 0

30
3.2.3 Undirected Multigraph Vertex and Edge Adjacency
Matrix Representation

Graphs with parallel edges and/or loops are called multigraph. A Multigraph, with
the counterpart of a simple group, could be with multiple edges and several loops.
For an undirected graph, if there is more than one undirected edge associated with a
pair of vertices, these edges are called parallel edges, as the edges 4 and 9 of graph
Exp2 in Figure 3-3. If there is one edge which has the starting node and the ending
node are the same node, that means to allow an edge that connects a vertex to itself,
and the edge is called a loop, as the edge 3 of graph Exp2 in Figure 3-3. (In this
situation, the diagonal of the vertex adjacent matrix of the graph is a positive nonzero
integer.) The proposed algorithm also works for partial non-simple graphs (working
for only one vertex has the cycle).

Figure 3-3: Exp2

Triple Tuple for Exp2

One triple tuple is produced to represent the finite graph like graph Exp2 shown in
Table 3.6.

Vertex Adjacency Matrix Representation Method

The vertex adjacency matrix is an n × n square matrix that represents a finite


multigraph, where n is the number of the graph vertices and natural number, used
only to identify the vertices uniquely, we call v1 , v2 , . . . , vn the vertex label. Elements

31
Table 3.6: Triple tuple for Exp2

Edge ID Node1 ID Node2 ID


1 1 2
2 2 3
3 3 3
4 4 3
5 2 4
6 1 4
7 1 5
8 1 5
9 2 5

(valued 0, 1, 2, . . . , n, where n is a non-negative integer) in the matrix denote the


number of edges connected between the two given vertices in the graph. For any
parallel edges shown in graph Exp2 or loop shown in Figure 3-3, we need to count
twice in the vertex adjacency matrix. For example, in a graph Exp2 , v3 is adjacent
with v2 , v3 and v4 . Then, in the vertex adjacency matrix, (v2 , v3 ) = 1, (v3 , v3 ) = 1,
(v3 , v4 ) = 1; v5 is adjacent with v1 , and v2 , so in vertex adjacency matrix, (v1 , v5 ) = 1,
(v2 , v5 ) = 2. The vertex adjacency matrix for node 3 in Exp2 is shown in Table 3.20.

Table 3.7: Vertex Adjacency Matrix of Exp2

v1 v2 v3 v4 v5
v1 0 1 0 1 2
v2 1 0 1 1 1
v3 0 1 1 1 0
v4 1 1 1 0 0
v5 2 1 0 0 0

Edge Adjacency Matrix Representation Method

The edge adjacency matrix is an m × m boolean square matrix that represents a finite
multigraph, where m is the number of the edges of the graph and natural number,
used only to identify the edges uniquely, as e1 , e2 , . . . , em . The elements (valued 0, 1)
denote whether pairs of edges are connected or not in the multigraph. For example,
in the graph Exp2 , e4 are adjacent with e2 , e3 , e5 , e6 . Then, in the edge adjacency

32
matrix, (e2 , e4 ) = 1, (e3 , e4 ) = 1, (e4 , e5 ) = 1 (e4 , e6 ) = 1. The edge adjacency matrix
for graph Exp2 is shown in Table 3.22.

Table 3.8: Edge adjacency matrix of Exp2

e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 0 0 1 1 1 1 0
e2 1 0 1 0 1 0 0 0 1
e3 0 1 1 1 0 0 0 0 0
e4 0 1 1 0 1 1 0 0 0
e5 1 1 0 1 0 1 0 0 1
e6 1 0 0 1 1 0 1 1 0
e7 1 0 0 0 0 1 0 1 1
e8 1 0 0 0 0 1 1 0 1
e9 1 1 0 0 1 0 0 1 0

3.2.4 Directed Graph Vertex and Edge Adjacency Matrix


Representation

Two graphs are endowed with unique labels commonly taken from the positive integer
range for subscript and ‘+’, ‘-’ for superscript of vertex, v1+ , v1− , v2+ , v2− , . . . , vn+ , vn− ,
where n is the number of the vertices of the graph and natural number, used only to
uniquely identify the vertices and ‘+’, ‘-’ indicate inlink or outlink of each vertex, we
call v1+ , v1− , v2+ , v2− , . . . , vn+ , vn− the vertex label. Similarly, two graphs are endowed
with unique labels commonly taken from the positive integer range for subscript and
− −
‘+’, ‘-’ for superscript of vertex, e+ + + −
1 , e1 , e2 , e2 , . . . , en , en , where m is the number

of the vertices of the graph and natural number, used only to uniquely identify the
− −
edges and ‘+’, ‘-’ indicate inlink or outlink of each edge, we call e+ +
1 , e1 , e2 , e2 , . . . ,

e+
n , en the edge label. Both vertex and edge labelling methods could represent the

directed graph uniquely.

Triple Tuple for Exp3

One triple tuple is produced to represent the finite graph like graph 3-4 shown in
Table 3.9.

33
Figure 3-4: Exp3

Table 3.9: General triple tuple format for directed Graph

Edge ID Starting Node ID Ending Node ID


1 1 2
2 2 3
3 3 1
4 1 4

Binary Vertex Adjacency Matrix Representation Method

The binary vertex adjacency matrix is a square matrix used to represent a finite
directed/weighted graph. The matrix elements indicate whether pairs of vertices
are adjacent or not in the graph. Each vertex will be represented by two vertexes
with superscripts + and - in the binary vertex adjacency matrix. The superscript +
represents the in-link, and - represents the out-link. We create binary v+
i (in-link)

and v−
i (out-link) to represent one vertex v, where i = 1, 2, . . . n. The superscript +

indicates that there is an in-link to the vertex (a link going into the vertex), and the
superscript - indicates that there is an out-link to the vertex (the link going out). In
this case, the matrix will be a 2n ∗ 2n matrix. For example, in graph G1 shown in
Figure 3-4 directed graph G1 , v− +
1 (out-link of vertex 1) is adjacent with v2 (in-link of

vertex 2), v+
3 (in-link of vertex 3) is adjacent with v2 (out-link of vertex 2). Then, in

the binary vertex adjacency matrix, av1− ,v2+ = 1 and av3+ ,v2− = 1. The corresponding
binary vertex adjacency matrix for directed graph Exp3 is shown in Table 3.10.

34
Table 3.10: Vertex Adjacency Matrix of graph Exp2 .

v+
1 v−
1 v+
2 v−
2 v+
3 v−
3 v+
4 v−
4
v+
1 0 0 0 0 0 1 0 0
v−
1 0 0 1 0 0 0 1 0
v+
2 0 1 0 0 0 0 0 0
v−
2 0 0 0 0 1 0 0 0
v+
3 0 0 0 1 0 0 0 0
v−
3 1 0 0 0 0 0 0 0
v+
4 0 1 0 0 0 0 0 0
v−
4 0 0 0 0 0 0 0 0

Binary Vertex Adjacency Matrix Representation Method



We apply binary e+
j (in-link) and ej (out-link) to represent one edge ej , where j =

1, 2, . . . , m, m is the number of edges. The maximum edge for the simple graph is
n ∗ (n − 1). The superscript + indicates that the edge is an in-link to the vertex (a
link going into the vertex), and the superscript - indicates that the edge is an out-link
to the vertex. If two edges are connected by a vertex, two edges are adjacent. For

example, in graph Exp3 shown in Figure 3-4, e+ +
2 is adjacent with e1 , e4 is adjacent

with e−
3 . Then, in the binary edge adjacency matrix, av1− ,v2+ = 1 and av3− ,v4+ = 1.

It will be a 2m*2m matrix. The corresponding binary edge adjacency matrix for
directed graph Exp3 is shown in Table 3.11.

Table 3.11: Edge Adjacency Matrix of graph Exp3

e+
1 e−
1 e+
2 e−
2 e+
3 e−
3 e+
4 e−
4
e+
1 0 0 0 0 0 1 0 0
e−
1 0 0 1 0 0 0 0 0
e+
2 0 1 0 0 0 0 0 0
e−
2 0 0 0 0 1 0 0 0
e+
3 0 0 0 1 0 0 0 0
e−
3 1 0 0 0 0 0 1 0
e+
4 0 0 0 0 0 1 0 0
e−
4 0 0 0 0 0 0 0 0

The adjacency matrix is not an invariant representation of a graph without labels.


The adjacency matrix representations of two isomorphic graphs can differ. This
means that the graphs can be relabelled and are hence isomorphic. Similar graphs

35
have adjacency matrices with identical algorithmic information content, as proven
in [56]. The eigenvalues and eigenvectors of the adjacency matrix can tell properties
of matrix structure, and eigenvalues are a measure of the distortion induced by the
transformation, and eigenvectors tell about the orientation of the distortion and much
other information.

3.3 Permutation Theorem

The basic concept of permutation is rearranging the components (vertex and edge)
of a structured object (graph). Given two graphs G1 and G2 , calculating the sum
of rows based on the rows of vertex adjacency matrices V1 and V2 , edge adjacency
matrices E1 and E2 , and produce n sets of an array as Vv and Vv′ , m sets of an
Pn i
Pn i
Pn i 2
array as Ee and Ee′ . Compute and check if i=1 Vv = i=1 Vv ′ , i=1 (Vv ) =
Pn i 2
Pn i n
Pn i n
Pm j
i=1 (Vv ′ ) , . . . , i=1 (Vv ) = i=1 (Vv ′ ) for rows. Compute and check if j=1 Ee =
Pm j Pm j 2
Pm j 2 Pm j m
Pm j m
j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) , . . . , j=1 (Ee ) = j=1 (Ee′ ) for rows. n and
m represent the dimension of the corresponding vertex or edge adjacency matrices.
Namely, the permutation theorem would check if two arrays of sum row based on
vertex adjacency matrix and two arrays of sum row based on edge adjacency matrix
are respectively bijective, if and only if one array is a permutation of another one,
the corresponding two graphs are isomorphic.
The basic algorithmic theory of permutation theorem shows in Figure 3-5 and 3-6.
The computational procedure of permutation theorem shows in Figure 3-7 and
the pseudo code shows in Algorithm 1.
The process of calculation shows in Figure 3-7. If matrix A and B do not sat-
isfy permutation theorem, then no need to proceed step 2. If matrix A and B do
satisfy the permutation theorem, while matrix C and D do not satisfy permutation
theorem, then candidate 1 and candidate 2 are not satisfy permutation theorem. If
matrix A and B, matrix C and D do satisfy permutation theorem, then candidate
1 and candidate 2 do satisfy permutation theorem. Note, n represents the number
of vertex of data graph; m represents the number of edges. For undirected graph,

36
 v1n Sum
 v2 n rowsum _ 1  v11  v12    v1n
rowsum _ 2  v21  v12    v2 n
 vnn 
rowsum _ n  vn1  vn 2    vnn
n
1   rowsum _ i
i 1
Sum
 v12n
2 2
 v22n rowsum _ 1  v11  v12    v12n
2 2
rowsum _ 2  v21  v22    v22n
 v11 v12  v1n 
2
 vnn   
 v21 v22  v2 n 
  rowsum _ n  vn21  vn22 2
   vnn
   
  n
v  vnn 
 n1 vn1 2   rowsum _ i
i 1

 v13n
Sum
 v23n
n n
rowsum _ 1  v11  v12    v1nn
n n
3
 vnn rowsum _ 2  v21  v22    v2nn

rowsum _ n  vnn1  vnn2 n
   vnn
n
n   rowsum _ i
i 1

Figure 3-5: permutation theorem for vertex adjacency matrix.

37
Sum
Sum
rowsum _ 1  e11  e12    e1m
1  e11  e12    e1m
rowsum _ 2  e21  e12    e2 m
2  e21  e12    e2 m


rowsum _ m  em1  em 2    emm
m  em1  em 2    emm
m
m

 rowsum _ j
1   rowsum _ j
j 1
j 1
Sum
Sum
2 2
2 2 rowsum _ 1  e11  e12    e12m
1  e11  e12    e12m 2 2
2 2 rowsum _ 2  e21  e22    e22m
2  e21  e22    e22m  e11 e12  e1m 
  
  e21 e22  e2 m  2 2 2
  rowsum _ m  em 1  em 2    emm
2
m  em 2 2
1  em 2    emm    
  m
e  emm 
m

 rowsum _ j
 m1 em1 2   rowsum _ j
j 1
j 1

Sum Sum
3 3
1  e11  e12    e13n m
rowsum _ 1  e11 m
 e12    e1mn
3 3
2  e21  e22    e23m m
rowsum _ 2  e21 m
 e22    e2mm
 
3 3 3 m m m
m  em 1  em 2    emm rowsum _ m  em1  em 2    emm
m m
 rowsum _ j
j 1
m   rowsum _ j
j 1

Figure 3-6: permutation theorem for edge adjacency matrix.

Algorithm 1 permutation theorem Verification


Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates.
Output: If the two candidates are permutable or non-isomorphic
1: for i = 1 to n, j = 1 to m do
2: if rowsum(A)i = rowsum(B)i & rowsum(C)j = rowsum(D)j then
3: go to Algorithm 2
4: else
5: return False
6: end if
7: end for

38
the row sum array of vertex adjacency matrix contains n values and row sum array
of edge adjacency matrix contains m values. While for directed graph, the row sum
array of vertex adjacency matrix contains 2n values and row sum array of edge ad-
jacency matrix contains 2m values. The computational scale is based on the size of
corresponding matrix.

3.3.1 Calculating Arrays of Row Sum based on the Vertex


and Edge Adjacency Matrix

The number of links per node constitutes a vital characteristic of a graph, which is a
foundation for the permutation theorem.
For an undirected graph, the array of the sum of rows of the vertex adjacency
matrix reflects each vertex’s degree (sum of rows). Say we compute the sum of rows
equal to the degree of the vertex. It also applies to the edge adjacency matrix.
For a directed graph, the array of the sum of rows of the vertex adjacency matrix
reflects each vertex’s degree for both in-link (sum of odd rows) and out-link (sum of
even rows). Say we compute the sum of odd rows equal to the in-link degree, and the
sum of even rows equals the out-link degree. It also applies to binary edge adjacency
matrix.
Table 3.12 and 3.13 are the general formats for the row sum of an undirected
graph.

Table 3.12: General format to calculate row sum of vertex adjacency matrix based
on triple tuple

Node ID Row sum for node #ID


p1 The degree of node #p1
p2 The degree of node #p2
... ...
pn−1 The degree of node #pn−1
pn The degree of node #pn

Table 3.14 and 3.15 are the general format for row sum of directed graph

39
A B
' '
 v11 v12  v1n   v11
 v12  v1' n 
 
' '
 v21 v22  v2 n   v21 v22  v2' n 
   
         
 
v  vnn   v' ' ' 
 n1 vn 2  n1 vn 2  vnn 
Node adjacent matrix n*n

Step 1 Permutation Theorem

  V
n n
Vvi  i
v'
i 1 i 1
Candidate 1 and

 (V )  
n n
i 2
(Vvi' ) 2 candidate 2 are
v
i 1 i 1 N not satisfy
 Permutation
Theorem.
 
n n
(Vvi ) n  (Vvi' ) n
i 1 i 1

C D
' '
 e11 e12  e1m   e11 e12  e1' m 
  
' '
 e21 e22  e2m   e21 e22  e2' m 
   
         
 
e  emm   em
' ' ' 
 m1 em2 1 em2  emm 
Edge adjacent matrix m*m

Step 2 Permutation Theorem

  E
m m j
Eej 
j 1 j 1 e' Candidate 1 and
candidate 2 are
 (E )   (E )
m j 2 m j 2
e e' N not satisfy
j 1 j 1
Permutation
 Theorem.

 
m m
( Eej ) m  ( Eej' ) m
j 1 j 1

Candidate 1 and candidate 2 are satisfy


Permutation Theorem.

Figure 3-7: Permutation theorem for vertex and edge adjacency matrix.

40
Table 3.13: General format to calculate row sum of edge adjacency matrix based on
triple tuple

Edge ID Row sum for edge #ID


l1 the number of edges are adjacent edge #l1
l2 the number of edges are adjacent edge #l2
... ...
lm−1 the number of edges are adjacent edge #lm−1
lm the number of edges are adjacent edge #lm

Table 3.14: General format to calculate row sum of binary vertex adjacency matrix
based on triple tuple

Node ID Row sum for node #ID


p+
1 the indegree of node #p1
p−
1 the outdegree of node #p1
p+
2 the indegree of node #p2
p−
2 the outdegree of node #p2
... ...
p+
(n−1) the indegree of node #p(n−1)
p−
(n−1) the outdegree of node #p(n−1)
p+
n the indegree of node #pn
p−
n the outdegree of node #pn

Table 3.15: General format to calculate row sum of binary edge adjacency matrix
based on triple tuple

Edge ID Row sum for edge #ID


l1+ the indegree of edge #l1
l1− the outdegree of edge #l1
l2+ the indegree of edge #l2
l2− the outdegree of edge #l2
... ...
+
l(m−1) the indegree of edge #lm−1
l−
(m−1) the outdegree of edge #lm−1
l+
m the indegree of edge #lm
l−
m the outdegree of edge #lm

41
3.3.2 Mathematical Approval for Permutation Theorem

In this section, we discuss several important characteristics of mathematical approval


for the Permutation Theorem.

Duality of Vertex and Edge Adjacency

From the binary vertex and edge adjacency matrix, we can observe that either bi-
nary vertex adjacency representation or edge adjacency representation can uniquely
represent a graph if we treat the edge as vertex shown in Figure 3-3 and create the
new edge presentation way. As we can see, the edge adjacency labelling method
shown in Figure 3-8 is too complicated, and we usually do not use it but the binary
edge adjacency matrix. The following theorems can only approve the “if and only
if” conditions when there is only one exchange among arrays in the vertex adjacency
matrix. If there are duplicate arrays, it is impossible to identify whether there is only
one exchange among the permutation at this stage. Therefore, we use the duality of
vertex and edge adjacency to guarantee our proposed algorithm.

Figure 3-8: Duality of vertex and edge for Exp2 as treating edge as vertex

42
Dual Equivalence of Permutation and Bijection

The graph isomorphism needs to guarantee two bijections for both the vertex set and
edge set. The bijection is equivalent to the permutation of a set. We produce two
matrices as a binary vertex adjacency matrix (2n ∗ 2n) and a binary edge adjacency
matrix (2m ∗ 2m), where n is the number of vertices and m is the number of edges.
The sum of the row for both matrices will be computed as four arrays for two graphs
based on vertex/edge adjacency matrices. Comparing the different summing arrays
in the permutation theorem, we could draw the conclusion as to whether two graphs
are permutable or not [55] and [54]. Regarding the two comparing sets of two arrays,
the definition of permutation refers to the act of arranging all the members of a set
into some sequence or order. A permutation of a set is defined as a bijection from
the set to itself. If we could claim one set A is another set B ’s permutation, A and B
are bijective [55], and [54].

Proposed polynomial algorithm for undirected and directed


graph isomorphism

Theorem 1. Two undirected or directed graph G1 and G2 . The vertex adjacency


matrices of the two graphs are two symmetric matrices A and B. We then calculate the
sum of rows and produce two sets of array of row sums rowsum(A) and rowsum(B),
based on A and B. While it could not tell if there is only one exchange, then we need
to build up the edge adjacency matrix and go through the same procedure. The core
idea is the duality of edge and vertex in the graph. The edge adjacency matrices of
the two graphs are two symmetric matrices C and D. We then calculate the sum
of rows and produce two sets of an array of row sums rowsum(C) and rowsum(D).
If and only if set rowsum(A) is a permutation of another set rowsum(B), and set
rowsum(C) is a permutation of another set rowsum(D), G1 and G2 are permutable.
In this section, this theorem is applicable to both undirected and directed graph; the
value of n and m should be multiply 2 for directed graph.

Proof For “if” (sufficient condition), because of the equivalence between permu-

43
tation and bijection, if two sets of edges and vertexes are bijective, two graphs are
bijective. The duality of edge and vertex will guarantee the corresponding relation-
ship between edge and vertex. The “only if” (necessary condition) of the theorem is
not complex because two isomorphic directed graphs always have the two permuta-
tions of arrays in both vertex and edge adjacency matrices as follows. For a simple
graph, there is all zero in the diagonal, and the vertex adjacency matrix for the graph
is symmetric. The approval for the edge adjacency matrix is similar. If it is the
weighted graph, simply replace 0 and 1 with weights. The weight must be a positive
integer, and 0 means there are no links.
A permutation of a set ∆ is defined as a bijection from ∆ to itself [54]. There is a
function from ∆ to ∆ for which every element occurs exactly once as an image value.
This is related to the rearrangement of the elements of ∆ in which each element δ is
replaced by the corresponding f (s).
The theorem indicates that any two vertices u and v of G1 are adjacent in G2 if and
only if f (u) and f (v) are adjacent in G2 . This kind of bijection is commonly described
as “edge-preserving bijection”, in accordance with the general notion of isomorphism
being a structure-preserving bijection [55] and [54]. If an isomorphism exists between
two graphs, then the graphs are called isomorphic and denoted as G1 ≈ G2 . The
mixed graph contains both directed and undirected edges. The undirected edges
have been treated as two directed edges in this thesis.
Therefore, the proposed theorem could be stated as follows:

Assertion 1. Two graphs are isomorphic if and only if the array of the sum of rows
in both vertex and edge adjacency matrices are bijective.

Theorem 2. (permutation theorem) Given two natural number sets of arrays Γ


′ ′ ′ ′ ′ Pk Pk ′
and Γ , Γ = (γ1 , γ2 , . . . γk ), Γ = (γ1 , γ2 , . . . γk ), If and only if i=1 Γ = i=1 Γ ,
Pk 2
Pk ′ 2 Pk k
Pk ′ k ′
i=1 (Γ) = i=1 (Γ ) , . . . and i=1 (Γ) = i=1 (Γ ) , Γ is a permutation of Γ and

vice versa, where k ≥ 1.


Assertion 2. Given two arrays of natural number Γ = (γ1 , γ2 , . . . γk ) and Γ =
′ ′ ′ ′ ′
(γ1 , γ2 , . . . γk ), Γ and Γ are bijective and equivalent if and only if = ki=1 Γ = ki=1 Γ ,
P P

44
Pk 2
Pk ′ Pk Pk ′
i=1 (Γ) = i=1 (Γ )2 , . . . i=1 (Γ)
k
= i=1 (Γ )k . The sequence of the two arrays

are bijective and equivalent, where k, γk and γk are all integers greater than or equal
to 1.

Theorem 3. Based on Theorem 1, 2, Two graph G1 and G2 . The number of nodes


for two graphs are same as n. The number of edges for two graphs are same as m.
The vertex and edge adjacency matrix of two graphs generate four matrices A and B,
C and D. The sum of rows produce four sets of array as rowsum(A) and rowsum(B),
rowsum(C) and rowsum(D), based on A and B, C and D.
Pn Pn Pn Pn Pn Pn
1. i=1 Vv i = i=1 Vv ′ i , i=1 (Vv
i 2
) = i=1 (Vv

i 2
) , ... i=1 (Vv
i n
) = i=1 (Vv

i n
) .
Then matrix of A is a permutation of matrix B ;
Pm Pm Pm Pm Pm Pm
2. j=1 Ee j = j=1 E e′ j , j=1 (Ee
j 2
) = j=1 (Ee

j 2
) , ... j=1 (Ee
j m
) = j=1 (Ee

j m
) .
Then matrix C is a permutation of matrix D.

G1 and G2 are isomorphic.

Example 1. Given two natural number sets of arrays Γ and Γ′ , Γ = {γ1 , γ2 }, Γ′ =


{γ1′ , γ2′ }. If and only if γ1 + γ2 = γ1′ + γ2′ , and γ12 + γ22 = γ1′2 + γ2′2 . Then (γ1 + γ2 )2 =
(γ1′ + γ2′ )2 , γ12 + γ22 + 2γ1 γ2 = γ1′2 + γ2′2 + 2γ1′ γ2′ , then we have γ1 γ2 = γ1′ γ2′ .
If there is any γ1 , γ2 , γ1′ , γ2′ equals to 1, k = 2 case holds. The proof is as follows:
suppose γ1 = 1, we have 1 + γ2 = γ1′ + γ2′ and 1 + γ22 = γ1′2 + γ2′2 . Thus γ2 = γ1′ γ2′ ,
1 + γ1′ γ2′ = γ1′ + γ2′ , 1 + γ1′ (γ2′ − 1) = γ2′ , (γ1′ − 1)(γ2′ − 1) = 0. Therefore, either γ1′ = 1
or γ2′ = 1. When γ1′ = 1, we have 1 + γ2 = 1 + γ2′ and 1 + γ22 = 1 + γ2′2 , γ2 = γ2′ . When
γ2′ = 1, we have 1 + γ2 = 1 + γ1′ γ and 1 + γ22 = 1 + γ1′2 , γ2 = γ1′ . Therefore, k = 2 case
holds.

If every γ1 , γ2 , γ1′ , γ2′ is a positive integer and larger than 1, according to fun-
damental theorem of arithmetic, γ1 , γ2 , γ1′ , γ2′ either is a prime number itself or
can be represented as the product of prime numbers; moreover, this representation is
unique, up to (except for) the order of the factors. Then K = p1 p2 p3 . . . pl pl+1 . . . pr =
q1 q2 . . . ql′ ql′ +1 . . . qs , where γ1 = p1 p2 . . . pl , γ2 = pl+1 pl+2 . . . pr , γ1′ = q1 q2 . . . ql′ ,
γ2′ = ql′ +1 ql′ +2 . . . qs , where pi and qi are prime. Assume that m(m ̸= 0) and

45
n are integers. We say that m could be divided by n if n is a multiple of m,
namely, if there exists an integer parameter o such that n = m ∗ o. If m divides
n, it can be represented as m|n. The order of the factors will not affect the re-
sults. We have p1 |K, then p1 |q1 q2 . . . ql′ ql′ +1 ql′ +2 . . . qs . p1 divides at least one of qi ,
then if we rearrange qi , we could have p1 |q1 . Because q1 is prime, factors are 1 or
q1 , and then we have p1 = q1 . Now remove it from both sides of the equation.
p2 p3 . . . pl pl+1 . . . pr = q2 q3 . . . ql′ ql′ +1 . . . qs .

Repeat the previous proof. p2 divides at least one of qi , then if we rearrange qi , we


have p2 |q2 . Because q2 is prime, factors are 1 or q2 , and then we have p2 = q2 . Then we
remove it from both sides of the equation. p3 p4 . . . pl pl+1 . . . pr = q3 q4 . . . ql′ ql′ +1 . . . qs .

Continue this process until all of pi and qi are removed. If all of pi is removed, the
left side of the equality is 1, so there is no left qi . Similarly, if all of qi are removed,
the right side of the equality is 1. The number of pi is equal to qi . Then we have
proved, K = p1 p2 . . . pl pl+1 . . . . . . pr = q1 q2 . . . ql′ ql′ +1 . . . qs , all of pi and qi are prime,
r = s, l = l′ , rearrange qi , we have p1 = q1 , p2 = q2 , . . . pl = ql′ , . . . pr = qs , thus γ1 = γ2
and γ1′ = γ2′ , because γ1 and γ2 are commutative and γ1′ and γ2′ are commutative.
There could be γ1 = γ2′ and γ2 = γ1′ . Then the set of γ1 and γ2 is a permutation of
the set of γ1′ and γ2′ .

Next, we will prove the uniqueness of this condition. That is, there exists a
quaternary and quadratic system of equations 3.1 as

γ1 + γ2 = γ1′ + γ2′

(3.1)
γ 2 + γ 2 = γ ′ 2 + γ ′ 2

1 2 1 2

is a system of two equations involving the four variables γ1 , γ2 , γ1′ , γ2′ , where all
variables are natural numbers. A solution to this system of integer equations is an
assignment of values to the variables such that all the equations are simultaneously

46
satisfied. Two solutions to the system above are given:

γ1 = γ1′

solution set A : (3.2)
γ2 = γ ′

2


γ1 = γ1′

solution set A : (3.3)
γ2 = γ ′

2

Γ ∪ Γ′ is either solution A or solution B, which is a solution of Formula 3.1. since


it makes all two equations valid. The word “system” indicates that the equations are
′ ′
to be considered collectively rather than individually. Because γ12 + γ22 = γ12 + γ22 ,
′ ′
γ12 − γ22 = γ22 − γ12 , then we have (γ1 − γ1′ )(γ1 + γ1′ ) = (γ2 − γ2′ )(γ2 + γ2′ ). Because
(γ1 + k) + (γ2 − k) = (γ1′ + k) + (γ2′ − k), then we could construct ζ1 = γ1 + k, ζ2 =
γ2−k, ζ1′ = γ1′ +k, ζ2′ = γ2′ −k, where ζ1 ̸= γ1 , ζ2 ̸= γ2 , ζ1′ ̸= γ1′ , ζ2′ ̸= γ2′ , ζ1 +ζ2 = ζ1′ +ζ2′
holds. Because ζ12 + ζ22 = (γ1 + k)2 + (γ2 − k)2 = γ12 + γ22 + 2k 2 + 2γ1 k − 2γ2 k and
′ ′ ′ ′
ζ1′2 +ζ2′2 = (γ1′ +k)2 +(γ2′ −k)2 = γ12 +γ22 +2k 2 +2γ1′ k−2γ2′ k. To make ζ12 +ζ22 = ζ12 +ζ22 ,
′ ′
we must have γ12 + γ22 + 2k 2 + 2γ1 k − 2γ2 k = γ12 + γ22 + 2k 2 + 2γ1′ k − 2γ2′ k, and then
γ1 − γ2 = γ1′ − γ2′ . Because γ1 − γ1′ = γ2 − γ2′ , then we have γ2 = γ2′ , and then
γ2 − k = γ2′ − k, ζ2 = ζ2′ . ζ2 could be equal to ζ2′ , this results in a contraction since
A ∩ B. Therefore, the initial assumptions as Formula 3.2 and 3.3 are not unique must
be false. Thus, k = 2 case is approved.

Therefore, Theorem 2 where k = 2 has been proved. Our mathematical proof


with k ≥ 3 is verified by the following induction-based method.

Questions for k = 3, n = 4, . . . , and k = K for extended permutation theorem.


′ ′ ′
When k = 3, if and only if γ1 + γ2 + γ3 = γ1′ + γ2′ + γ3′ , γ12 + γ22 + γ32 = γ12 + γ22 + γ32 ,
the set of γ1 , γ2 , γ3 is a permutation of the set of γ1′ , γ2′ , γ3′ .

When k = 4, if and only if γ1 + γ2 + γ3 + γ4 = γ1′ + γ2′ + γ3′ + γ4′ , γ12 + γ22 + γ32 + γ42 =
′ ′ ′ ′
γ12 + γ22 + γ32 + γ42 , the set of γ1 , γ2 , γ3 , γ4 is a permutation of the set of γ1′ , γ2′ , γ3′ , γ4′ .

k = K is established, if and only if γ1 + γ2 + · · · + γK = γ1′ + γ2′ + · · · + γK



,
′ ′ ′
γ12 + γ22 + · · · + γK
2
= γ12 + γ22 + · · · + γK2 , the set of γ1 , γ2 , . . . , γK is a permutation of

47
the set of γ1′ , γ2′ , . . . , γK

.
Mathematical Proof: Let P er(K − 1) be the statement of the permutation theo-
rem. We give proof by induction on K.
Base case. The statement holds for k = 1 and k = 2. P er(1) is easily seen to be
true, and P er(2) is true by the above-mentioned proof when k = 2.
Inductive Step. The following steps will show that for any K − 1 ≥ 0 that if
P er(K − 1) holds, then also P er(K) holds. This can be done as follows. Assume the
induction hypothesis that P er(K −1) is true (for some unspecified value of K −1 ≥ 2),
that is, if and only if γ1 +γ2 +· · ·+γK−1 = γ1′ +γ2′ +· · ·+γK−1

and γ12 +γ22 +· · ·+γK−1
2
=
′ ′ ′ ′ ′ ′
γ12 + γ22 + · · · + γK−1
2
, . . . , γ1K−1 + γ2K−1 + · · · + γK−1
K−1
= γ1K−1 + γ2K−1 + · · · + γK−1
K−1
,
the set of γ1 , γ2 , . . . , γK−1 is a permutation of the set of γ1′ , γ2′ , . . . , γK−1

.
Using the induction hypothesis, the permutation theorem as P er(K − 1) can be

written to: if and only if γ1 + γ2 + · · · + γK−1 = γ1′ + γ2′ + · · · + γK−1 and γ12 + γ22 +
′ ′ ′ ′ ′
2
· · · + γK−1 = γ12 + γ22 + · · · + γK−1
2
, . . . , γ1K−1 + γ2K−1 + · · · + γK−1
K−1
= γ1K−1 + γ2K−1 +
′ K−1
· · · + γK−1 , there must exist a permutation matrix P , where [γ1 γ2 . . . γK−2 γK−1 ]P =
[γ1′ γ2′ . . . γK−2
′ ′
 
γK−1 ]. For [γ1 γ2 . . . γK−2 γK−1 γK ] P0 10 = [[γ1 γ2 . . . γK−2 γK−1 ]P γK ] =

[[γ1′ γ2′ . . . γK−2 ′

γK−1 ]γK ], where [ P0 10 is a square binary matrix that has exactly one
entry of 1 in each row and each column and 0s elsewhere. Because γ1 +γ2 +· · ·+γK−1 =
γ1′ + γ2′ + · · · + γK−1

, and γ1 + γ2 + · · · + γK−1 + γK = γ1′ + γ2′ + · · · + γK−1′
+ γK′
, then we

, thus, [γ1 + γ2 + · · · + γK−1 + γK ] P0 10 = [γ1′ γ2′ . . . γK−1
′ ′
 
have γK = γK γK ]. Then, the

set of γ1 , γ2 , . . . , γK is a permutation of the set of γ1′ , γ2′ , . . . , γK . Therefore, P er(K)
is true, that is, if and only if γ1 + γ2 + · · · + γK−1 + γK = γ1′ + γ2′ + · · · + γK−1
′ ′
+ γK and
′ ′ ′ ′
γ12 +γ22 +· · ·+γK−1
2 2
+γK = γ12 +γ22 +· · ·+γK−1
2
+γK2 , . . . , γ1K +γ2K +· · ·+γK−1
K K
+γK =
′ ′ ′ ′
γ1K + γ2K + · · · + γK−1
K
+ γKK . The set of γ1 , γ2 , . . . , γK is a permutation of the set of
γ1′ , γ2′ , . . . , γK

. It shows that, indeed, P er(K) holds. Since the base case and the in-
ductive step have been performed by mathematical induction, the statement P er(K)
holds for all natural numbers K.

Example 2. To check if two arrays are a permutation of another one, such as array
Γ = 2, 3, 3, 2, 2, 3, 3, 2 and array Γ′ = 2, 3, 2, 3, 2, 3, 2, 3. We calculate the sum of
array as (array Γ) = 2 + 3 + 3 + 2 + 2 + 3 + 3 + 2 = 20 and (array Γ′ ) =
P P

48
2 + 3 + 2 + 3 + 2 + 3 + 2 + 3 = 20, the sum of squared individual elements in the
array (array Γ)2 = 22 + 32 + 32 + 22 + 22 + 32 + 32 + 22 = 52 and (array Γ′ )2 =
P P

22 + 32 + 22 + 32 + 22 + 32 + 22 + 32 = 52 and (array Γ)3 = 23 + 33 + 33 + 23 + 23 +


P

33 + 33 + 23 = 140 and (array Γ′ )3 = 23 + 33 + 23 + 33 + 23 + 33 + 23 + 33 = 140


P

and (array Γ)4 = 24 + 34 + 34 + 24 + 24 + 34 + 34 + 24 = 388 and (array Γ′ )4 =


P P

24 + 34 + 24 + 34 + 24 + 34 + 24 + 34 = 388, (array Γ)5 = 25 + 35 + 35 + 25 + 25 +


P

35 + 35 + 25 = 1100 and (array Γ′ )5 = 25 + 35 + 25 + 35 + 25 + 35 + 25 + 35 = 1100,


P

(array Γ)6 = 26 + 36 + 36 + 26 + 26 + 36 + 36 + 26 = 3172 and (array Γ′ )6 =


P P

26 + 36 + 26 + 36 + 26 + 36 + 26 + 36 = 3172, (array Γ)7 = 27 + 37 + 37 + 27 + 27 +


P

37 + 37 + 27 = 9260 and (array Γ′ )7 = 27 + 37 + 27 + 37 + 27 + 37 + 27 + 37 = 9260,


P

(array Γ)8 = 28+38+38+28+28+38+38+28 = 27268 and (array Γ′ )8 = 28+


P P

38+28+38+28+38+28+38 = 27268. Then we check if (array Γ) = (array Γ′ ),


P P

if P (array Γ)2 = (array Γ′ )2 , if (array Γ)3 = (array Γ′ )3 , if (array Γ)4 =


P P P P

(array Γ′ )4 , if (array Γ)5 = (array Γ′ )5 , if (array Γ)6 = (arrayΓ′ )6 , if


P P P P P

(array Γ)7 = (array Γ′ )7 , if (array Γ)8 = (arrayΓ′ )8 , if and only if they


P P P P

are equal respectively, we could conclude that array Γ is a permutation of array Γ′ .

3.4 Equinumerosity Theorem

If the vertex or edge adjacency matrices for graph G1 and G2 have the equinumerous

eigenvalue, and there exist the elementary row interchange matrix Rn∗n and Rn∗n
P P
which satisfies: for every eigenvalue 1P and 2P (multiplicity of P ) in G1 and

G2 , the corresponding left singular vector set U1P and U2P , there exist the square
P P
matrix MP ∗P and U1P M = RU2P ; likewise, for every eigenvalue 1P and 2P in
G1 and G2 , the corresponding right singular vector set V1P and V2P , there exist the
square matrix MP′ ∗P and V1P M ′ = R′ V2P . The equinumerosity theorem would execute
singular value decomposition of the vertex and edge adjacency matrix. Determine
whether the eigenvalue sequence is equinumerous. If not, the two graphs is not
isomorphic. If the eigenvalue sequence is equinumerous, then produce the maximally
linearly independent system of the left and right singular vectors for the P -multiple

49
eigenvalues. If they are not equinumerous, the two graphs are not isomorphic. If so,
the two graphs are isomorphic.

The basic algorithmic theory of the equinumerous theorem shows in Figure 3-9,
and 3-10 and the pseudo-code shows in Algorithm 2.

Figure 3-9: SVD for vertex or edge adjacency matrix of candidate 1

Figure 3-10: SVD for vertex or edge adjacency matrix of candidate 2

The mathematical proof can be found in Section 3.4.2.

50
Algorithm 2 Equinumerosity Theorem Verification
Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates.
Output: If the two candidates are isomorphic
T
1: A = Uv1 Σv1 Vv1
T
2: B = Uv2 Σv2 Vv2
T
3: C = Ue1 Σe1 Ve1
T
4: D = Ue2 Σe2 Ve2
5: if map(Σv1 ) = map(Σv2 ) & map(Σe1 ) = map(Σe2 ) then
6: if map(Uv1 ) = map(Uv2 ) & map(Vv1 ) = map(Vv2 ) then
7: if map(Ue1 ) = map(Ue2 ) & map(Ve1 ) = map(Ve2 ) then
8: return True
9: else
10: return False
11: end if
12: else
13: return False
14: end if
15: else
16: return False
17: end if

3.4.1 Maximal Linearly Independent Subsets

We say that a set of vectors is a maximal linearly independent set if the addition of
any vector to the vector space will result in a set which is not linearly independent.
For the other implication, note that any vector from a vector space can be combined
as a linear combination of the basis vectors, and try to conclude that adding any
vector to the set of the basis vectors makes it linearly dependent.

In graph isomorphism, a set B of vectors in a vector space W can be expressed


uniquely as a finite linear combination and are referred to as components or coordi-
nates of the vector to B. The elements of a basis are called basis vectors.

Equivalently, a set B is a basis. All that is required is its elements are linear
independent, and each element of W is a linear combination of elements of B. Namely,
a basis is a set of linearly independent vectors span the space shown in Figure 3-11.
It’s evident that orthogonal basis vectors all point in different directions.

In the theory of vector space, linear dependence just means some vectors can

51
(a) b1 (b) b2

Figure 3-11: The geometric explanation for the base. The same vector can be repre-
sented in two different bases (green and red arrows)

possibly be written as a linear combination of the remaining vectors. At the same


time, a linearly independent set of vectors has a non-zero determinant. A finite-
dimensional vector space has several bases with the same number of elements. This
number is called the dimension of the space. The maximal linearly independent set
L is guaranteed to be a basis because if its span missed some element, we could add
that element to the set, and it would still be linearly independent.
For all vector group s : {a1 , a2 , . . . ai } is a partial part group of the vector group
S : {a1 , a2 , . . . as }, i.e.s ∈ S, and s satisfies:

• s is linearly independent.

• Any vector in S can be represented linearly by s; that is, any vector in S is


added to s, it could make s linearly dependent.

• It is said that s is the maximally linearly independent system of S.

The maximal linearly independent vector group is with the most significant num-
ber of vectors in the linear space, and it is not unique.
Give a set of vectors, and we can compute the number of independent vectors by
calculating the rank of the set and finding a maximal linearly dependent subset.

1. Place the vectors as columns of a matrix. We call this matrix A.

2. Using Gaussian elimination to obtain the matrix in row-echelon form from B


by reduction.

52
3. Identify the columns of B that contain the leading 1s (the pivots).

4. The columns of A that correspond to the columns identified in step (3) from a
maximal linearly independent set of our original set of vectors.

Example 3. a1 = (2, 4, 2)T , a2 = (1, 1, 0)T , a3 = (2, 3, 1)T , a4 = (3, 5, 2)T .


Try to find the maximally linearly independent system. Apply the elementary
transformation to the matrix.
     
2 1 2 3 2 1 2 3 2 1 2 3
     
A= 4 1 3 5 → 0 1 → 0 1 1 1
     
1 1 
     
2 0 1 2 0 −1 −1 −1 0 0 0 0

a1 and a2 is a maximally
  linearly independent system, and
a = 1 a + a 
3 2 1 2
a =a +a 
4 1 2

3.4.2 Mathematical Approval for Equinumerosity Theorem

In this section, we analyse the properties of the proposed equinumerosity theorem in


terms of singular value decomposition, maximal linearly independent subsets and its
computation.

Singular value decomposition of the vertex adjacency matrix


and edge adjacency matrix

Let M be a real symmetric matrix of n ∗ n, so there is a singular value decomposition


such that
M = U ΣV T (3.4)

U is an n ∗ n unitary matrix, Σ is an n ∗ n real diagonal matrix, V T is a conjugate


transpose of V, and is also an n ∗ n unitary matrix. The element Σii on the diagonal
of Σ is the singular value of M. For singular values ranging from large to small, Σ
can be uniquely determined by M. Of course, U and V cannot be determined. A

53
non-negative real number σ is a singular value of M only if there are k µ unit vectors
u and k υ unit vectors v as follows:

M v = σu and M u = σv (3.5)

Where u and v are the left and right singular vectors of σ, respectively.
The elements on the diagonal of the matrix Σ are equal to the singular values
of M. The columns of U and V are the left and right singular vectors, respectively.
Therefore, the above definition of SVD states:

1. A set of orthogonal bases U consisting of the left singular vectors of M can


always be found in k µ .

2. A set of orthogonal bases V consisting of the right singular vector of M can


always be found at k υ .

Without loss of generality, the columns of U and the rows of V T are defined and
used in this thesis, which is called left and right singular vectors.

Definition 4. P -multiple eigenvalue. An nth order matrix has n eigenvalues. If


there are P eigenvalues that are the same, then these emph P eigenvalues are called
P -multiple eigenvalues.

Definition 5. A maximally independent vector set is defined as: if a vector group


satisfies the following:

1. a1 , a2 , . . . , ar are not collinear, that is linearly independent;

2. if any other vector in the space can be expressed as a linear combination of


elements of a maximal set – the basis a1 , a2 , . . . , ar . Then a set of vectors is
maximally linearly independent if including any other vector in the vector space
would make it linearly dependent.

Definition 6. A maximally independent vector system. Under the linear transfor-


mation, the maximally linear independent subset has been transferred to have only

54
one 1, and the other is 0. The current format of the original vector set is called the
maximally independent vector system.

Property 1. Let M be a real symmetric matrix. There exists an orthogonal sym-



metric matrix A and a diagonal matrix Σ such that A M A = Σ, where the diagonal
element of Σ is the eigenvalue of M and the column vector of A is the eigenvector of
M.

Proof : By mathematical induction, it is true for first-dimension square matri-


ces. Suppose the above proposition is true for a square matrix with the dimension
of n − 1. Then for the n dimension square matrix Mn , it has at least one eigen-
value λ. The eigenvector corresponding to λ is x, and x is extended to a set of the
orthogonal basis of Rn , and arranged into a matrix A1 = [x y1 . . . yn−1 ]. Then
′ ′ ′
there is AT1 M A1 = λ0 Mn−1
0
 
(here x M y = yk M x = 0, x M x = λ). Whereas
Mn−1 is a real symmetric matrix of dimension n − 1, the orthogonal matrix A2 is
T  λ 0  1 0   λ1 0 
assumed such that AT2 Mn−1 A2 = λ02 M0n , then 10 A02
  
0 Mn−1 0 An = ... ... .
0 λ2
1 0 
Let A be an orthogonal symmetric matrix, A = A1 0 A2 . Since the eigenvalues
of AT1 M A1 and M are the same, the eigenvalues λm of Mn−1 are also the eigen-
values of M. Finally, it is only necessary to prove that the column vector of A is
 λ1 0 
the eigenvector of M. Set A = [x1 x2 . . . xn ] and substitute M A = ... ... A, then
0 λn
[M x1 M x2 M xn ] = [λ1 x1 λ2 x2 λn xn ], compared with each other, xn is the eigenvec-
tor of M.

Property 2. There must be P corresponding linearly independent eigenvectors for


the P eigenvalues λ of the vertex adjacency matrix and edge adjacency matrix.
(Eigenvalue of the multiplicity of P a real symmetric matrix has exactly P linearly
independent eigenvector).

Lemma 1. For the general matrices, there are at most k linearly independent eigen-
vectors corresponding to k eigenvalues. For a specific eigenvalue λ, there are m
linearly independent eigenvectors. The following proves that m ≤ k. For them, us-
ing Schmidt’s method to obtain m orthogonal eigenvectors. The following proves

55
that m ≤ k. For them,using Schmidt’s method to obtain m orthogonal eigenvec-



 M ϵ1 = λϵ1


tors ϵ1 , ϵ2 , . . . , ϵm x = . . . . Extend it to get a set of orthogonal basis




M ϵm = λϵm


ϵ1 , ϵ2 , . . . ϵm , θm+1 , . . . θn , and L = [ϵ1 , ϵ2 , . . . ϵm , θm+1 , . . . θn ]. Then L M L = λI0m M0 ′
 

(It can be seen from the form of this matrix that the eigenvalue λ is at least m dimen-

sions). Since L M L and M have the same eigenvalues, that is to say, the dimension
of the eigenvalue λ of M is k ≤ m.

Thus Lemma 1 is proved. According to the algebra theorem, the total number of
algebraic multiples of the nth -dimensional equation roots is n, so the sum of the num-
ber of all linearly independent eigenvectors to each eigenvalue ≤ n, and (property 3)
proves that there is n independent eigenvector for the real symmetric matrices. That
is, in the above inequality, the equal sign holds. The condition that the equal sign
is true as long as there are exactly k linearly independent eigenvectors corresponding
to k -multiple eigenvalues.
Combining Definition 4, 5, 6 and Property 2, it is proved that the rank of
the maximally linearly independent subset Un∗P /VP ∗n of left/right singular vector
corresponding to the P -multiple eigenvalues is P.
All two graphs satisfied the permutation theorem as above, must have Mn∗n En∗1 =
′ ′ ′ ′ ′ ′
Mn∗n En∗1 , then M x = λx, M y = λ y, y T M = y T λ , thus y T λx = y T λ x.

Property 3. Two graphs are isomorphic if the eigenvalue sequences of the vertex
adjacency matrix and edge adjacency matrix are equinumerous.

Lemma 2. Two sequences satisfying the permutation theorem must be equinumer-


ous. But the two equinumerous sequences do not necessarily have permutation rela-
tions.

Property 4. The two graphs satisfying the permutation theorem must have equinu-
merous eigenvalues.

56
Theorem 4. (Equinumerosity Theorem) The two graphs are isomorphic if and only
if the eigenvalues of the two graphs’ vertex and edge adjacency matrices are equinu-
merous, and the maximally linearly independent vector systems of the left and right
singular vectors corresponding to the P -multiple eigenvectors are equinumerous.

To approve Theorem 4, the following theorem is put forward.

Lemma 3. VP ∗n and VP′ ∗n are the right singular vector set for P -multiple eigenvalues
for graph G1 and G2 , respectively. For the elementary row interchange operation

matrix exchange matrix S, if there exists QP ∗P , and SV = V Q, then Q is invertible.
′ ′
Proof : Rank(S T V ) = Rank(V ) = Rank(V ) = r ≤ Rank(Q), according to the
′ ′
matrix theory, r = Rank(V ) = Rank(S T V Q) ≤ min(Rank(S T V ), Rank(Q)), so
Rank(Q) ≥ r, because of r ≥ Rank(Q), then rank(Q) = r, therefore Q is invertible.

Lemma 4. Equinumerosity theorem for graph isomorphism: If graph G1 and G2 have


the equinumerous eigenvalue, and there exist the elementary row interchange matrix
En∗n which satisfies: for every eigenvalue λ (multiplicity of r ) in G1 and G2 , the
λ ′λ ′
corresponding left singular vector set Un∗P and Un∗P , there exist the square matrix

HPλ ∗P and U λ E λ = HU ′λ , for every eigenvalue λ (multiplicity of r ) in G1 and G2 , the
λ ′λ ′
corresponding right singular vector set Vn∗P and Vn∗P , there exist the square matrix
T ′
EPλ ∗P and E λ V λ = V λ H T . Then two graphs are isomorphic.

Proof : Both G1 and G2 are similar to the diagonal matrix Σ (the element at the
diagonal is the eigenvalue sequence). Suppose there are t different eigenvalues. Then,

M = (U λ1 , . . . , U λt )Σ(V λ1 , . . . , V λt )−1 (3.6)

′ ′ ′ ′
M ′ = (U ′λ1 , . . . , U ′λt )Σ′ (U ′λ1 , . . . , U ′λt )−1 =
(3.7)
H T (V ′λ1 E1λ , . . . , V ′λt Etλ )Σ′ (V ′λ1 E1λ , . . . , V ′λt E λt )−1 (H T )−1

From Theorem 4, E λ is invertible, then U λ E λ could be the r linearly independent


eigenvector of M , substitute U λ in (3.6) by U λ E λ , then:

57
M = (U λ1 E1λ , . . . , U λt Etλ )Σ(U λ1 E1λ , . . . , U λt Etλ )−1 (3.8)

Put (3.8 into 3.7), and H T = H −1 . If the left singular vector set also applies, then
′ ′
M = HM H T . Under the same principle, for edge adjacency matrix N = HN H T ,
two graphs are isomorphic.

(Theorem 1 and 4) have laid the foundations for the algorithm in this thesis.

Theorem 5. There exist the row interchange matrix E, and an∗P and a′n∗P satisfies
a = Ea′ if and only if they are equinumerous. Similarly, there exist the column
interchange matrix E, and an∗P and a′n∗P satisfies a = Ea′ if and only if they are
equinumerous.

Proof : The necessary proof is obvious for sufficient proof. From (definition 8),
both the vector set (a and a′ ) correspond to the equinumerous sequence η, then

η = Pi=1 Ql a = Qa, η = Pi=1 = Q′l a′ = Q′ a′ , so Qa = Q′ a′ . Ql , Ql (1 ≤ l ≤ p), both
Q Q

Q, and Q′ are row interchange operation matrix, then a = QT Q′ a′ , then P = QT Q′ , it


is still the elementary row interchange matrix. Then, the sufficient proof is completed.

3.5 Extensions

Edge Weighted and Labeled graph

The proposed algorithm also works for the weighted graph. In the graph representa-
tion, we use 0 or 1 to indicate adjacency. If we have the weighted graph, we change
the 0 or 1 to different weights like 1, 2, . . . k, where k is the natural number. 0
still indicates non-link. If the original weight is not the natural number, we need to
standardize and round them. Both vertex and binary edge adjacency matrices for
the weighted graph are still symmetric. Because the method only works with natural
numbers, standardisation and rounding are required to manage the weights.

58
Creation of an isomorphic group for a certain graph

There is a simple method to create all of the isomorphic graphs for a certain graph
G1 . In the vertex adjacency matrix n ∗ n, we select any two vertices as r1 and r2 to
swap in row and row and column and column. The new vertex adjacency matrix will
correspond to another graph G2 , and we could prove that G1 ≈ G2 . We can prove
that in this process, both vertex adjacency and edge adjacency matrix of a graph is
a permutation of another if and only if G1 ≈ G2 .
For the binary vertex adjacency matrix, if we swap any two rows and two columns,
the newly formed graph is isomorphic with the original one (referring to Theorem 1).
We pick up any two combinations of numbers from 1 to n to swap, say 2 and n-1 for
the binary vertex adjacency matrix.
The calculation procedures are the same as the examples above, so we do not
specify them here.

3.6 Performance Studies and Applications


Theoretical analysis of this algorithm has been carried out on its time and space
requirements in Section 3.6.1. We show the detailed verification process in terms of
an undirected graph, undirected multigraph and directed graph isomorphism. The
algorithm flow chart 3-12 [57] is presented below.

3.6.1 Complexity Analysis

In this section, we present the algorithm execution process and analyze the compu-
tational complexity of the proposed isomorphism algorithm. The graph isomorphism
algorithm is based on the permutation theorem and equinumerosity theorem [54],
which is shown in the following pseudocode:

1. Generate the triple tuple for two undirected (or directed) graphs G1 and G2 .

2. Generate (binary) vertex and edge adjacency matrices V1 and V2 , E1 and E2


for undirected (or directed) graphs G1 and G2 . If either the number of nodes

59
Figure 3-12: Flow Chart of Graph Isomorphism Matching.

60
or the number of edges is not equivalent, produce they are not isomorphic. If
yes, then proceed to the next step. If the number of vertices of the matrix is n,
then the space complexity for a graph is n2 .

3. Examine the permutation theorem-based isomorphism for (binary) vertex and


edge adjacency matrices. If the condition is satisfied, move to the subsequent
step. If not, then they are not isomorphic. The permutation theorem verifica-
tion process shows in Algorithm 1. The temporal complexity for calculating the
row sum of the vertex adjacency matrix and edge adjacency matrix is n6 .

- Calculate the sum of rows based on the rows (even/odd rows) of V1 and
V2 and produce n sets of an array as Vv and Vv′ . Compute and check if
Pn i
Pn i
Pn i 2
Pn i 2
Pn i n
Pn i n
i=1 Vv = i=1 Vv ′ , i=1 (Vv ) = i=1 (Vv ′ ) , . . . , i=1 (Vv ) = i=1 (Vv ′ )

for rows ( 2n
P i
P2n i P2n i 2
P2n i 2
P2n i 2n
i=1 Vv = i=1 Vv ′ , i=1 (Vv ) = i=1 (Vv ′ ) , . . . , i=1 (Vv ) =
P2n i 2n
i=1 (Vv ′ ) for even rows and odd rows). That corresponds to the node
degree (or indegree and outdegree) arrays. Check if one array is a per-
mutation of another by the permutation theorem. If so, go to the next
step; if not, output that undirected (or directed) graph G1 and G2 are not
isomorphic.

- Calculate the sum of rows based on the rows (even/odd rows) of E1 and
E2 and produce m sets of an array as Ee and Ee′ . Compute and check
Pm j
Pm j Pm j 2
Pm j 2 Pm j m
if j=1 Ee = j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) , . . . , j=1 (Ee ) =
Pm j m
for rows ( 2m j 2m j 2m j 2 2m j 2
P P P P
j=1 (Ee′ ) j=1 Ee = j=1 Ee′ , j=1 (Ee ) = j=1 (Ee′ ) ,

. . . , 2m j 2m
= 2m j 2m
P P
j=1 (Ee ) j=1 (Ee′ ) for even rows and odd rows). That cor-
responds to the edge degree (or indegree and outdegree) arrays. Check if
one array is a permutation of another by the permutation theorem. If so,
go to the next step; if not, the undirected (or directed) graph G1 and G2
are isomorphic.

4. Check the isomorphism based on the Equinumeriosity Theorem for (binary)


vertex and edge adjacency matrices. If the requirement has been met, they are
isomorphic. Otherwise, they are not isomorphic. The equinumerosity theorem

61
verification process shows in Algorithm 2. The temporal complexity for calcu-
lating the eigenvalues of (binary) vertex adjacency matrix and edge adjacency
matrix is n3 .

P -multiple eigenvalues. A matrix of nth order has n eigenvalues. If P eigen-


values are identical, then these P eigenvalues are referred to as P -multiple
eigenvalues.

- Applying singular value decomposition on the two (binary) vertex adja-


cency matrices of the respective graphs.

- Determine if the singular value sequences of the two matrices are equinu-
merous. If so, go to the following step; otherwise, the undirected (or di-
rected) graph G1 and G2 are not isomorphic. The complexity of temporal
is n.

- Verify that the maximum independent vector set of the corresponding P -


multiple eigenvalues of the left and right singular vectors have equal num-
bers. If so, proceed to the next stage; otherwise, the findings indicate that
graphs G1 and G2 are not isomorphic. The temporal complexity is n3 .

- Performing the singular value decomposition for the two corresponding


edge adjacency matrices.

- Verify that the singular values of the two matrices are equinumerous. If
so, proceed to the next stage; otherwise, the results indicate that graphs
G1 and G2 are not isomorphic. The temporal complexity is n.

- Check if the maximally independent vector set of the corresponding P -


multiple eigenvalues of the left and right singular vector are equinumerous.
If so, the graphs G1 and G2 are not isomorphic; otherwise, the findings indi-
cate that graphs G1 and G2 are not isomorphic. The temporal complexity
is n3 .

Temporal and spatial complexity are very commonly used characteristics to exam-
ine a graph or subgraph isomorphism algorithm. The comparison with the existing

62
benchmark algorithms and our proposed graph isomorphism algorithm is shown be-
low:

Table 3.16: The comparison with the existing benchmark algorithms and our proposed
algorithm.

Complexity Temporal Spatial


Best case O(log(n)c ) O(n)
Quasipolynomial GI-algorithm [41]
Worst case nO(log(n)) O(log(2n))
Best case O(n3 ) O(n3 )
Ullmann [13]
Worst case O(n!n2 ) O(n3 )
Best case O(n2 ) O(n)
VF2 [16]
Worst case O(n!n) O(n)
Best case O(n) O(m)
Proposed method
Worst case O(n6 ) O(n2 )

Table 3.16 summarizes the time and spatial complexity of our algorithm compared
with that of Ullmann and VF2’s Algorithm as can be deduced from [13] and [14], in
the best and worst case. In this paper, the time complexity of the best case is
O(n2 ), and the worst case is O(n6 ), including the storage. The space complexity for
a graph is just O(3 ∗ m) in triple tuple format, and the worst case is O(n2 ). Note, n
represents the number of vertex of pattern graph; m represents the number of edges;
the maximum number of edges is n(n − 1)/2 in an undirected graph and there are
twice as many in a directed graph, that is n(n − 1), as each node can at most have
edges to every other node.

As can be seen, the performance of our proposed algorithm to algorithms from


the Table 3.16, our subgraph detection method can lead to a noticeable efficiency in
detecting graph performance in integrating the search space and search time.

3.6.2 Case Study 1 - Undirected Graph Isomorphism

In the experiment, the procedure of checking if undirected graphs g1 , g2 are isomorphic


is shown in Figure 3-13.

63
(a) g1 (b) g2

Figure 3-13: Undirected Graph Isomorphism Case.

Triple Tuples for g1 and g2

Two triple tuples are produced to represent the undirected graphs g1 and g2 as below
in Table 3.17 and 3.18:

Table 3.17: Triple tuple of graph g1 . Table 3.18: Triple tuple of graph g2

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 2 3 1 1 2
2 1 2 2 2 3
3 1 4 3 3 4
4 4 5 4 4 5
5 1 5 5 2 4
6 1 3 6 1 4
7 3 5 7 1 5

Vertex and Edge Adjacency Matrix Generation for g1 and g2

Vertex adjacency matrices of graph g1 and g2 are shown in Table 3.19 and 3.20.

Edge adjacency matrices of graph g1 and g2 are shown in Table 3.21 and 3.22.

64
Table 3.19: Vertex Adjacency Matrix of g1 . Table 3.20: Vertex Adjacency Matrix of g2 .
(step 4 marked as G1 ) (step 6 marked as G2 )
′ ′ ′ ′ ′
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5

v1 0 1 1 1 1 v1 0 1 0 1 1

v2 1 0 0 1 0 v2 1 0 1 1 0

v3 1 0 0 0 1 v3 0 1 0 1 0

v4 1 1 0 0 1 v4 1 1 1 0 1

v5 1 0 1 1 0 v5 1 0 0 1 0

Table 3.21: Edge adjacency matrix of g. Table 3.22: Edge Adjacency Matrix of g1 .
(step 5 marked as G1 ) (step 7 marked as G2 )
′ ′ ′ ′ ′ ′ ′
e1 e2 e3 e4 e5 e6 e7 e1 e2 e3 e4 e5 e6 e7

e1 0 1 1 1 0 0 0 e1 0 1 0 0 1 1 1

e2 1 0 1 0 1 1 0 e2 1 0 1 0 1 0 0

e3 1 1 0 1 1 1 0 e3 0 1 0 1 1 1 0

e4 1 0 1 0 1 0 1 e4 0 0 1 0 1 1 1

e5 0 1 1 1 0 1 1 e5 1 1 1 1 0 1 0

e6 0 1 1 0 1 0 1 e6 1 0 1 1 1 0 1

e7 0 0 0 1 1 1 0 e7 1 0 0 1 0 1 0

Permutation check for V1 and V2

Follow the theory from permutation theorem, the row sum of vertex adjacency ma-
trices shown in Table 3.23 and 3.24.

Table 3.23: Computation results of vertex adjacency matrix for graph g1 (step 8)

Sum of row 1th power 2th power 3th power 4th power 5th power
v1 4 16 64 256 1024
v2 2 4 8 16 32
v3 2 4 8 16 32
v4 3 9 27 81 243
v5 3 9 27 81 243
Sum 14 42 134 450 1574

According to the permutation theorem [28], [29], [37], [38], the row sum of the
vertex adjacency matrix of g1 and g2 are highlighted in the last cell line, we can figure
out the two arrays have the same values, so they are permutated. Therefore, the two
vertex adjacency matrices are permutated.

65
Table 3.24: Computation results of vertex adjacency matrix for graph g2 (step 9)

Sum of row 1th power 2th power 3th power 4th power 5th power

v1 3 9 27 81 243

v2 3 9 27 81 243

v3 2 4 8 16 32

v4 4 16 64 256 1024

v5 2 4 8 16 32
Sum 14 42 134 450 1574

Permutation check for E1 and E2

Following the theory from the permutation theorem, the row sum of edge adjacency
matrices is shown in Table 3.25 and 3.26.

Table 3.25: Computation results of edge adjacency matrix for graph g1 (step 10)
Sum of row 1th power 2th power 3th power 4th power 5th power 6th power 7th power
e1 3 9 27 81 243 729 2187
e2 4 16 64 256 1024 4096 16384
e3 5 25 125 625 3125 15625 78125
e4 4 16 64 256 1024 4096 16384
e5 5 25 125 625 3125 15625 78125
e6 4 16 64 256 1024 4096 16384
e7 3 9 27 81 243 729 2187
Sum 28 116 496 2180 9808 41306 209776

Table 3.26: Computation results of edge adjacency matrix for graph g2 (step 11)
Sum of row 1th power 2th power 3th power 4th power 5th power 6th power 7th power

e1 4 16 64 256 1024 4096 16384

e2 3 9 27 81 243 729 2187

e3 4 16 64 256 1024 4096 16384

e4 4 16 64 256 1024 4096 16384

e5 5 25 125 625 3125 15625 78125

e6 5 25 125 625 3125 15625 78125

e7 3 9 27 81 243 729 2187
Sum 28 116 496 2180 9808 41306 209776

According to the permutation theorem [28], [29], [37], [38], the row sum of the
edge adjacency matrix of g1 and g2 are highlighted in the last cell line, we can figure
out the two arrays have the same values, so they are permutated. Therefore, the two
edge adjacency matrices are permutated.

66
Equinumerosity check for V1 and V2

In this part, we implement the singular vector composition (SVD) to vertex adjacency
matrices V1 and V2 . Firstly, check if the two corresponding eigenvalue sequences are
equinumerous. And then, check if the maximally independent vector set of the corre-
sponding P -multiple eigenvalues of the left and right singular vector are equinumer-
ous.

Table 3.27: Left Singular Matrix Uv1 for g1 . (step 18)

-0.55903 1.94E-16 0.770242 1.67E-16 -0.30694


-0.35054 0.371748 -0.42937 -0.6015 -0.43904
-0.35054 -0.37175 -0.42937 0.601501 -0.43904
-0.46996 -0.6015 -0.13784 -0.37175 0.510036
-0.46996 0.601501 -0.13784 0.371748 0.510036

Table 3.28: Singular Matrix Σv1 for g1 . (step 19)

2.935432 0 0 0 0
0 1.618034 0 0 0
0 0 1.472834 0 0
0 0 0 0.618034 0
0 0 0 0 0.462598

Table 3.29: Right Singular Matrix VvT1 for g1 . (step 20)

-0.55903 0 -0.77024 0 0.306936


-0.35054 -0.37175 0.429374 -0.6015 0.439042
-0.35054 0.371748 0.429374 0.601501 0.439042
-0.46996 0.601501 0.137845 -0.37175 -0.51004
-0.46996 -0.6015 0.137845 0.371748 -0.51004

1. Checking singular values by equinumerosity theorem for V1 and V2 . (step 30)

In Table 3.28 and 3.31, Σv1 and Σv2 are checked by equinumerosity theorem for
g1 and g2 . The eigenvalues are equinumerous. There are no equal items in the
eigenvalue sets and two arrays could be changed to (0, 0, 0, 0, 0), and (0, 0, 0,
0, 0) in terms of the equinumerosity degree [54].

67
Table 3.30: Left Singular Matrix Uv2 for g2 . (step 21)

-0.46996 0.601501 -0.13784 -0.37175 0.510036


-0.46996 -0.6015 -0.13784 0.371748 0.510036
-0.35054 0.371748 -0.42937 0.601501 -0.43904
-0.55903 -4.84E-16 0.770242 7.62E-17 -0.30694
-0.35054 -0.37175 -0.42937 -0.6015 -0.43904

Table 3.31: Singular Matrix Σv2 for g2 . (step 22)

2.935432 0 0 0 0
0 1.618034 0 0 0
0 0 1.472834 0 0
0 0 0 0.618034 0
0 0 0 0 0.462598

2. Checking left singular vector by equinumerosity theorem for V1 and V2 . (step


31)

In Table 3.27 and 3.30, Uv1 and Uv2 are checked by equinumerosity theorem
for the left singular vector for g1 and g2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. Uv1 and Uv2 are equinumerous.

3. check right singular vector by equinumerosity theorem for V1 and V2 . (step 32)

In Table 3.27 and 3.30, VvT1 and VvT2 are checked by equinumerosity theorem for
the right singular vector for g1 and g2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. VvT1 and VvT2 are equinumerous.

Table 3.32: Right Singular Matrix VvT2 for g2 . (step 23)

-0.46996 -0.60150 0.137845 -0.37175 -0.51004


-0.46996 0.60150 0.137845 0.371748 -0.51004
-0.35054 -0.37175 0.429374 0.601501 0.439042
-0.55903 -0.13859 -0.77024 -6.42E-17 0.306936
-0.35054 0.37175 0.429374 -0.6015 0.439042

68
Table 3.33: Maximally linearly indepen- Table 3.34: Maximally linearly indepen-
dent system of left singular vector for V1 . dent system of left singular vector for V2 .

1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1

Table 3.35: Maximally linearly indepen- Table 3.36: Maximally linearly indepen-
dent system of right singular vector for V1 . dent system of right singular vector for V2 .

1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1

Equinumerosity check for E1 and E2

In this part, we implement the singular vector composition (SVD) to edge adjacency
matrices E1 and E2 . Firstly, check if the two corresponding eigenvalue sequences
are equinumerous. And then, check if the maximally independent vector set of the
corresponding P -multiple eigenvalues of the left and right singular vector are equinu-
merous.

Table 3.37: Left Singular Matrix Ue1 for E1 . (step 24)

-0.28975 0.516398 -3.33E-16 -0.0849 0.571073 0.498545 0.2598


-0.38248 -0.5164 2.96E-16 -0.53702 0.212007 0.215531 -0.45526
-0.45272 -0.1291 0.5 0.452112 0.359066 -0.43911 -0.0472
-0.36 -0.3873 -0.5 -1.45E-16 6.81E-17 -0.15609 0.66786
-0.45272 0.516398 -4.88E-16 -0.45211 -0.35907 -0.43911 -0.0472
-0.38248 0.129099 -0.5 0.537016 -0.21201 0.215531 -0.45526
-0.28975 -0.1291 0.5 0.084904 -0.57107 0.498545 0.2598

1. Checking sigmoid by equinumerosity theorem for E1 and E2 . (step 33)

In Table 3.38 and 4.24, Σe1 and Σe2 are checked by equinumerosity theorem for
g1 and g2 . There are two groups of equal items in the eigenvalue sets as 1=1,
and 2=2. And the two arrays could be changed to (0, 21, 22, 23, 24, 0, 0) and

69
Table 3.38: Singular Matrix Σe1 for E1 . (step 25)

4.124885 0 0 0 0 0 0
0 2 0 0 0 0 0
0 0 2 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 0.761557 0
0 0 0 0 0 0 0.636672

Table 3.39: Right Singular Matrix VeT1 for E1 . (step 26)

-0.28975 -0.5164 4.14E-16 -0.0849 0.571073 -0.49855 0.2598


-0.38248 0.516398 -5.10E-16 0.452112 0.359066 -0.21553 -0.45526
-0.45272 0.129099 -0.5 -0.53702 0.212007 0.439108 -0.0472
-0.36 0.387298 0.5 -2.52E-16 3.78E-16 0.156094 0.66786
-0.45272 -0.5164 5.55E-16 0.537016 -0.21201 0.439108 -0.0472
-0.38248 -0.1291 0.5 -0.45211 -0.35907 -0.21553 -0.45526
-0.28975 0.129099 -0.5 0.084904 -0.57107 -0.49855 0.2598

Table 3.40: Left Singular Matrix Ue2 for E2 . (step 27)

-0.36 -8.88E-16 0.632456 0 2.98E-16 -0.15609 -0.66786


-0.28975 0.408248 -0.31623 -0.37169 -0.44179 0.498545 -0.2598
-0.38248 -0.40825 0.316228 0.196752 -0.54279 0.215531 0.455256
-0.38248 0.408248 0.316228 -0.19675 0.542791 0.215531 0.455256
-0.45272 -0.40825 -0.31623 -0.56845 0.101003 -0.43911 0.047196
-0.45272 0.408248 -0.31623 0.568447 -0.101 -0.43911 0.047196
-0.28975 -0.40825 -0.31623 0.371695 0.441788 0.498545 -0.2598

Table 3.41: Singular Matrix Σe2 for E2 . (step 28)

4.124885 0 0 0 0 0 0
0 2 0 0 0 0 0
0 0 2 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 0.761557 0
0 0 0 0 0 0 0.636672

70
Table 3.42: Right Singular Matrix VeT2 for E2 . (step 29)

-0.36 9.93E-16 -0.63246 -1.55E-16 0 0.156094 -0.66786


-0.28975 -0.40825 0.316228 -0.37169 -0.44179 -0.49855 -0.2598
-0.38248 0.408248 -0.31623 -0.56845 0.101003 -0.21553 0.455256
-0.38248 -0.40825 -0.31623 0.568447 -0.101 -0.21553 0.455256
-0.45272 0.408248 0.316228 0.196752 -0.54279 0.439108 0.047196
-0.45272 -0.40825 0.316228 -0.19675 0.542791 0.439108 0.047196
-0.28975 0.408248 0.316228 0.371695 0.441788 -0.49855 -0.2598

(0, 21, 22, 23, 24, 0, 0), in terms of the equinumerosity degree. Σe1 and Σe2 are
equinumerous.

2. Checking left singular vector by equinumerosity theorem for E1 and E2 . (step


34)

In Table 3.37 and 3.40, Ue1 and Ue2 are checked by equinumerosity theorem
for the left singular vector for E1 and E2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55) and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree. 3.37 and 3.40 are equinumerous.

Table 3.43: Maximally linearly indepen- Table 3.44: Maximally linearly indepen-
dent system of left singular vector for E1 dent system of left singular vector for E2

1 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1

3. check right singular vector by equinumerosity theorem for E1 and E2 . (step 35)

In Table 3.39 and 3.42, VeT1 and VeT2 are checked by equinumerosity theorem for
the right singular vector for E1 and E2 . The maximally linearly independent
system could be changed to (51, 52, 53, 54, 55), and (51, 52, 53, 54, 55) in terms
of the equinumerosity degree [37], [38]. VeT1 and VeT2 are equinumerous.

Therefore, we could identify g1 and g2 are isomorphic.

71
Table 3.45: Maximally linearly indepen- Table 3.46: Maximally linearly indepen-
dent system of right singular vector for E1 . dent system of right singular vector for E2

1 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 0 0 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1

3.6.3 Case Study 2 - Directed Graph Isomorphism

The following experiment was designed to check if g3 , g4 are isomorphic, shown in


Figure 3-14.

(a) g3 (b) g4

Figure 3-14: Directed Graph Isomorphism Case.

Triple Tuples for g3 and g4

Vertex and Edge Adjacency Matrix Generation for g1 and g2

Permutation check for V3 and V4

According to the theory of permutation theorem, we only present the array of row
sum in the last cell line: [10, 10, 20, 28, 44, 76, 140, 268, 524, 1036, 2060, 4108,
8204, 16396, 32780, 65548, 131084, 262156] due to the space limitation, the row
sums of vertex adjacent matrices are the same, so we can conclude that binary vertex

72
Table 3.47: Triple tuple of graph g3 . Table 3.48: Triple tuple of graph g4

Starting Ending Starting Ending


Edge ID Edge ID
node ID node ID node ID node ID
1 1 2 1 6 5
2 3 1 2 1 3
3 6 5 3 4 2
4 7 5 4 8 7
5 8 7 5 3 4
6 6 8 6 2 1
7 6 2 7 7 5
8 2 4 8 4 6
9 8 4 9 7 1
10 4 3 10 6 8

Table 3.49: Vertex adjacency matrix of g3 . (step 4 marked as G1 )

v+
1 v−
1 v+
2 v−
2 v+
3 v−
3 v+
4 v−
4 v+
5 v−
5 v+
6 v−
6 v+
7 v−
7 v+
8 v−
8
v+
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
v−
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
v+
2 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
v−
2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
v+
3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
v−
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v+
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
v−
4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
v+
5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
v−
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
v+
6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
v−
6 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
v+
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
v−
7 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
v+
8 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
v−
8 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0

73
Table 3.50: Vertex adjacency matrix of g4 . (step 6 marked as G2 )
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
v1+ v1− v2+ v2− v3+ v3− v4+ v4− v5+ v5− v6+ v6− v7+ v7− v8+ v8−

v1+ 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0

v1− 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

v2+ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

v2− 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

v3+ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

v3− 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

v4+ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

v4− 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0

v5+ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

v5− 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

v6+ 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0

v6− 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

v7+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

v7− 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

v8+ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

v8− 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

Table 3.51: Edge adjacency matrix of g3 . (step 5 marked as G1 )


e+
1 e−
1 e+
2 e−
2 e+
3 e−
3 e+
4 e−
4 e+
5 e−
5 e+
6 e−
6 e+
7 e−
7 e+
8 e−
8 e+
9 e−
9 e+
10 e−
10
e+
1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
e−
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e+
2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e−
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
e+
3 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
e−
3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
e+
4 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e−
4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
e+
5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
e−
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
e+
6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
e−
6 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
e+
7 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
e−
7 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
e+
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
e−
8 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
e+
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
e−
9 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
e+
10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e−
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0

74
Table 3.52: Edge adjacency matrix of g4 . (step 7 marked as G2 )
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
e1+ e1− e2+ e2− e3+ e3− e4+ e4− e5+ e5− e6+ e6− e7+ e7− e8+ e8− e9+ e9− e10+ e10−
′+
e1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1

e1− 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

e2+ 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

e2− 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0

e3+ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

e3− 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0

e4+ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0

e4− 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

e5+ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

e5− 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e6+ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

e6− 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e7+ 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e7− 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0

e8+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

e8− 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0

e9+ 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

e9− 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0

e10+ 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

e10− 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

Table 3.53: Computation results of binary vertex adjacency matrix for graph g3 (step
8)

Sum of odd row Sum of even row Sum of row 2th power ... 16th power
v+
1 1 1 1 ... 1
v−
1 1 1 1 ... 1
v+
2 1 1 1 ... 1
v−
2 2 2 4 ... 65536
v+
3 1 1 1 ... 1
v−
3 1 1 1 ... 1
v+
4 1 1 1 ... 1
v−
4 2 2 4 ... 65536
v+
5 1 1 1 ... 1
v−
5 1 1 1 ... 1
v+
6 2 2 4 ... 65536
v−
6 1 1 1 ... 1
v+
7 1 1 1 ... 1
v−
7 1 1 1 ... 1
v+
8 2 2 4 ... 65536
v−
8 1 1 1 ... 1
Sum 10 10 20 28 ... 262156

75
Table 3.54: Computation results of binary vertex adjacency matrix for graph g4 (step
9)

Sum of odd row Sum of even row Sum of row 2th power ... 16th power

v1+ 2 2 4 ... 65536

v1− 1 1 1 ... 1

v2+ 1 1 1 ... 1

v2− 1 1 1 ... 1

v3+ 1 1 1 ... 1

v3− 1 1 1 ... 1

v4+ 1 1 1 ... 1

v4− 2 2 4 ... 65536

v5+ 1 1 1 ... 1

v5− 1 1 1 ... 1

v6+ 2 2 4 ... 65536

v6− 1 1 1 ... 1

v7+ 1 1 1 ... 1

v7− 2 2 4 ... 65536

v8+ 1 1 1 ... 1

v8− 1 1 1 ... 1
Sum 10 10 20 28 ... 262156

adjacency matrix for graph g3 and g4 are permutable. So we have to continue carrying
out the calculation on edge adjacent matrices.

Permutation check for E3 and E4

According to the theory of permutation theorem, we only present the arrays of


row sum in the last cell line: [17, 15, 32, 58, 116, 250, 572, 1378, 3476, 9130,
24812, 69298, 197636, 572410, 1676252, 4946818, 14676596, 43702090, 130450892,
390041938, 1167504356, 3497270170] for g3 and [16, 16, 32, 56, 104, 200, 392, 776,
1544, 3080, 6152, 12296, 24584, 49160, 98312, 196616, 393224, 786440, 1572872,
3145736, 6291464, 12582920] for g4 due to the space limitation, the row sums of edge
adjacent matrices are not same, so we can conclude that binary edge adjacency matrix
for graph g3 and g4 are not permutable. We could get the final results that graph g3
and g4 are not isomorphic, so there is no need to verify the equinumerosity theorem
with respect to vertex and edge matrices.

76
Table 3.55: Computation results of binary edge adjacency matrix for graph g3 (step
10)

Sum of Sum of
Sum of row 2th power ... 20th power
odd row even row
e+
1 2 2 4 ... 1048576
e−
1 1 1 1 ... 1
e+
2 1 1 1 ... 1
e−
2 1 1 1 ... 1
e+
3 2 2 4 ... 1048576
e−
3 1 1 1 ... 1
e+
4 1 1 1 ... 1
e−
4 1 1 1 ... 1
e+
5 1 1 1 ... 1
e−
5 2 2 4 ... 1048576
e+
6 2 2 4 ... 1048576
e−
6 2 2 4 ... 1048576
e+
7 3 3 9 ... 3486784401
e−
7 1 1 1 ... 1
e+
8 2 2 4 ... 1048576
e−
8 2 2 4 ... 1048576
e+
9 2 2 4 ... 1048576
e−
9 2 2 4 ... 1048576
+
e10 1 1 1 ... 1
e−
10 2 2 4 ... 1048576
Sum 17 15 32 58 ... 3497270170

77
Table 3.56: Computation results of binary edge adjacency matrix for graph g4 (step
11)

Sum of Sum of
Sum of row 2th power ... 20th power
odd row even row

e1+ 2 2 4 ... 1048576

e1− 1 1 1 ... 1

e2+ 1 1 1 ... 1

e2− 2 2 4 ... 1048576

e3+ 1 1 1 ... 1

e3− 2 2 4 ... 1048576

e4+ 2 2 4 ... 1048576

e4− 1 1 1 ... 1

e5+ 2 2 4 ... 1048576

e5− 1 1 1 ... 1

e6+ 2 2 4 ... 1048576

e6− 1 1 1 ... 1

e7+ 1 1 1 ... 1

e7− 2 2 4 ... 1048576

e8+ 2 2 4 ... 1048576
′|
e8 2 2 4 ... 1048576

e9+ 2 2 4 ... 1048576

e9− 2 2 4 ... 1048576

e10+ 1 1 1 ... 1

e10− 2 2 4 ... 1048576
Sum 16 16 32 56 ... 12582920

78
3.6.4 Case Study 3 - Multigraph Isomorphism

(a) g5 (b) g6

Figure 3-15: Multigraph isomorphism matching case

To handle the loop and parallel edges in the multigraph, we follow the graph
representation rules in Section 3.2.3 of this Chapter.

Triple Tuples for g5 and g6

Table 3.57: Triple tuple of graph g5 . Table 3.58: Triple tuple of graph g2

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 2 2 2 2
3 2 3 3 2 3
4 1 3 4 1 3
5 1 4 5 1 3
6 1 4 6 1 4

Vertex and Edge Adjacency Matrix Generation for g5 and g6

Permutation check for V5 and V6

According to the theory of permutation theorem, We could get the final results in the
last cell line in Table 3.65 and 3.66: [11, 33, 104, 369] for g5 and [11, 35, 119, 419] for
g6 , they are not same. So graph g5 and g6 are not isomorphic, and there is no need

79
Table 3.59: Vertex Adjacency Matrices for induced Matching.

Table 3.60: c. Table 3.61: C.


′ ′ ′ ′
v1 v2 v3 v4 v1 v2 v3 v4

v1 0 1 1 2 v1 0 1 2 1

v2 1 1 1 0 v2 1 1 1 0

v3 1 1 0 0 v3 2 1 0 0

v4 2 0 0 0 v4 1 0 0 0

Table 3.62: Vertex Adjacency Matrices for induced Matching.

Table 3.63: c. Table 3.64: C.


′ ′ ′ ′ ′ ′
v1 v2 v3 v4 v5 v6 v1 v2 v3 v4 v5 v6

v1 0 1 1 1 1 1 v1 0 1 1 1 1 1

v2 1 1 1 0 0 0 v2 1 1 1 0 0 0

v3 1 1 0 1 0 0 v3 1 1 0 1 1 0

v4 1 0 1 0 1 1 v4 1 0 1 0 1 1

v5 1 0 0 1 0 1 v5 1 0 1 1 0 1

v6 1 1 0 1 1 0 v6 1 0 0 1 0 0

Table 3.65: Computation results of vertex adjacency matrix for graph g1 (step 8)

Sum of row 1th power 2th power 3th power 4th power
v1 4 16 64 256
v2 3 9 27 81
v3 2 4 8 16
v4 2 4 8 16
Sum 11 33 104 369

Table 3.66: Computation results of vertex adjacency matrix for graph g2 (step 9)

Sum of row 1th power 2th power 3th power 4th power

v1 4 16 64 256

v2 3 9 27 81

v3 3 9 27 81

v4 1 1 1 1
Sum 11 35 119 419

80
to verify the permutation theorem on edge adjacency matrices and equinumerosity
theorem on the vertex and edge matrices further.

3.7 Summary
In this chapter, we focused on the undirected and directed graph isomorphism prob-
lem. An efficient mechanism is proposed to support and guide some relative down-
stream applications to make movements. Unlike many other approaches where back-
tracking is a commonly used manner, this procedure maps vertices by composing rows
and columns of the distance matrix to obtain the initial partitioning of vertices of the
directed graphs, which reduces the size of the search tree. The proposed algorithm
guarantees returning the best matching candidates at the stage of permutation the-
orem verification. We ran comprehensive and representative illustrations to explain
the efficiency of our isomorphism algorithm. The algorithmic complexity analysis of
temporal and spatial demonstrate the proposed algorithm significantly outperforms
the state-of-the-art algorithms.

81
82
Chapter 4

Subgraph Isomorphism Verification

Subgraph isomorphism is another elemental topic in graph isomorphism. In Chap-


ter 3, we discussed the general graph isomorphism problems. A subgraph isomor-
phism query is an important means of detecting geometrical patterns in larger graphs.
Specifically, given a query graph Q (also named pattern graph) and a data graph G,
such a query returns all the mappings of nodes in Q to nodes in G that retain their
respective edges. It is useful to answer subgraph isomorphism queries, for example,
to analyse epidemic propagation patterns in social networks [3] or to query protein
interactions in protein networks and so on [58]. Due to the subgraph isomorphism
search problem is NP-Hard, various heuristics have been proposed to speed up the
search [59], [60]. What these algorithms have in common is that they are all measures
based on node similarity and subgraph similarity. These algorithms mentioned above
exploit different representation orders to search potential objects, as well as pruning
rules to exclude unprofitable paths and the use of auxiliary information to eliminate
false candidates to speed up progress.
The subgraph isomorphism has not yet been proven as an NP-complete problem.
Consequently, the design objective of our subgraph topological detection technique
may result in an appreciable improvement in subgraph detection performance by
combining the search space and search time. The algorithm designs a new component
encoding method to represent a graph and studies the rules to determine whether one
query graph is sub-isomorphic to the other.

83
This chapter is organised as follows. In Section 4.1, we introduce the concepts
of subgraph isomorphism and partial and induced subgraphs. Complexity analysis is
discussed in Section 4.2.1. Then, experiments are set up in Section 4.2.3 and 4.2.2.
The work of this chapter is summarised in Section 4.3.

4.1 Subgraph Isomorphism

The subgraph isomorphism problem is not known yet as an NP-hard problem [61],
and the definition can be provided as follows: suppose there are two graphs, which
are a query graph Q and a database graph G and the operation is to identify exact
matching instances of Q in G completely. To be specific, a query graph Q = (VS , ES )
is regarded as a subgraph of G = (V, E), graph supposing VS ⊂ V and ES ⊂ E. The
induced graph is the graph with the edges that link VS ∈ G are in S. In other words,
graph G = (V, E) contains graph Q = (VS , ES ). Simply, we recursively emulate all
sets of subgraphs of G and each of them is isomorphic to Q.
To achieve substructure matching results across a graph in this study, we must
build a set of candidate matches and compare the query graph to these candidate
graphs over the appropriate number of vertices and edges, which may significantly
decrease the number of candidate graphs. Clearly, the number of vertices and edges
in two isomorphic graphs must be same. The corresponding subgraphs with the
same number of vertices have to be emulated on the basis of the query graph scale.
Then, the array of the vertex adjacency matrices and the array of the edge adjacency
matrices would be generated according to the triple tuples subset. Thirdly, the sum
of the array will be calculated according to the rules of the permutation theorem and
equinumerosity theorem in sequence to check if they are isomorphic. Our proposed
algorithm can work effectively and efficiently for subgraph matching. In this work,
our proposed algorithm is able to solve induced and partial substructure searching.
A subset of the vertices of the data graph forms an induced subgraph and it
contains all of the edges that have both endpoints in the subset, whereas a partial
subgraph may miss some edges in the subset. An induced subgraph is a special case

84
of a subgraph. Also, this subgraph isomorphism covers both directed and undirected
graphs and it applies the weighting just by replacing 1 with the corresponding weight
value as mentioned in Section 3.5 of Chapter 3.
We have already provided very detailed illustrations about the process of graph
representation method, permutation and equinumerosity theorem verification in Sec-
tion 3.2, 3.3 and 3.4 of Chapter 3, and they are also equally applicable to subgraph
isomorphism verification. Therefore, readers are referred back to Chapter 3 to find
the relevant contents. In order to clearly elaborate the idea of our algorithm, we first
provide some common definitions and then give two examples to illustrate.

4.1.1 Induced Subgraph Matching

An induced subgraph is a special case of a subgraph. An induced subgraph (also


named a vertex-induced subgraph) is a subset of the vertices of query graph g together
with any edges whose endpoints are both in this subset.
As shown in Figure 4-1, given a data graph G and a query graph g. There exist
several instances of query graph g in data graph G as Figure 4-1 illustrates by dotted
lines, one is the subgraph consisting of the nodes 11, 10, 9, 13, 12 and the relationships
(11, 10), (10, 9), (10, 12), (9, 13), (9, 12), (13, 12), (12, 11).

1. Select the same number of vertices from G like g, termed as g ′ .

2. Check if the number of vertices and edges are the same or not. If not, they are
not isomorphic. If yes, go to the next step.

3. Generate the vertex adjacency matrix for two graphs g and g ′ . (g ′ based on the
corresponding indices of row and column from the vertex and edge adjacency
matrices G)

4. Check the isomorphism based on the permutation theorem. If apply, go to the


next step. If not, they are not isomorphic.

5. Check the isomorphism based on the equinumerosity theorem. If apply, they


are isomorphic. If not, they are not isomorphic.

85
Figure 4-1: Graphs with vertex and edge labeling.

In this example, we performed a strict subgraph searching process, which means


we included the subgraphs with the same number of vertexes as the query graph,
while those with a dissimilar number of edges were excluded.

Vertex and Edge Adjacency Matrix Generation for g and G

Table 4.1: Vertex Adjacency Matrix of g.

v1 v2 v3 v4 v5
v1 0 1 0 1 1
v2 1 0 1 1 0
v3 0 1 0 1 0
v4 1 1 1 0 1
v5 1 0 0 1 0

We have graph G and subgraph g shown in Figure 4-1 for checking. The total 429
subgraphs from G was created according to the full combination of 5 (the number of
nodes in g) out of 13 (the number of nodes in G). For example, these subgraphs from
G as g1 shown in Figure 4-2a, g2 shown in Figure 4-2b, and g3 shown in Figure 4-2c
were created. We input g1 and g into the algorithm in Figure 3-12 for demonstrating
the process (Step 1). The vertex and edge adjacency matrices of G are shown in

86
(a) g1 (b) g2

(c) g3

Figure 4-2: Subgraph matching case.

Table 4.2: Vertex adjacency matrix of G.


′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ ′
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13

v1 0 0 1 0 0 0 1 1 0 0 1 0 0

v2 0 0 1 1 1 0 0 0 1 0 0 0 1

v3 1 1 0 0 1 0 0 0 0 0 1 1 0

v4 0 1 0 0 0 1 1 0 1 1 0 0 0

v5 0 1 1 0 0 0 0 0 0 0 0 1 1

v6 0 0 0 1 0 0 1 1 0 0 0 0 0

v7 1 0 0 1 0 1 0 1 0 1 1 0 0

v8 1 0 0 0 0 1 1 0 0 0 0 0 0

v9 0 1 0 1 0 0 0 0 0 1 0 1 1

v10 0 0 0 1 0 0 1 0 1 0 1 1 0

v11 1 0 1 0 0 0 1 0 0 1 0 1 0

v12 0 0 1 0 1 0 0 0 1 1 1 0 1

v13 0 1 0 0 1 0 0 0 1 0 0 1 0

87
Table 4.3: Edge adjacency matrix of g.

e1 e2 e3 e4 e5 e6 e7
e1 0 1 0 0 1 1 1
e2 1 0 1 0 1 0 0
e3 0 1 0 1 1 1 0
e4 0 0 1 0 1 1 1
e5 1 1 1 1 0 1 0
e6 1 0 1 1 1 0 1
e7 1 0 0 1 0 1 0

Table 4.4: Edge adjacency matrix of G.


e′1 e′2 e′3 e′4 e′5 e′6 e′7 e′8 e′9 e′10 e′11 e′12 e′13 e′14 e′15 e′16 e′17 e′18 e′19 e′20 e′21 e′22 e′23 e′24 e′25 e′26 e′27 e′28 e′29 e′30

e1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1

e2 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e3 1 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e4 0 1 1 0 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e5 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e6 0 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e7 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0

e8 0 0 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0

e9 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e10 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

e11 0 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0

e12 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0

e13 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0

e14 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0

e15 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1

e16 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 0 1

e17 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0

e18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1

e19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 1 1 1

e20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 0

e21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0

e22 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0

e23 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0

e24 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 0 0

e25 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0

e26 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1

e27 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 1 1

e28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 0 1 0

e29 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1

e30 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 1 0 1 0

Table 4.1 and 4.3 and g are shown in Table 4.2 and 4.4.
g, g1 and g2 do have the same number of vertices and edges, so these two subgraphs
could go into the permutation theorem checking process. While g and g3 do not have
the same number of vertices and edges, therefore, there is no need to check. Next,
we use the results of g and g1 for demonstration purposes. The vertex and edge
adjacency matrices of g1 are shown in Table 4.5 and 4.6.
According to the permutation theorem [54], the row sum of the vertex adjacency
matrix of g and g1 are permutated, and the row sum of edge adjacency matrix in
g and g1 are permutated. Equinumerosity theorem, the vertex and edge adjacency

88
Table 4.5: Vertex Adjacency Matrix of g1 .
′ ′ ′ ′ ′
v2 v4 v5 v9 v13

v2 0 1 1 1 1

v4 1 0 0 1 0

v5 1 0 0 0 1

v9 1 1 0 0 1

v13 1 0 1 1 0

Table 4.6: Edge Adjacency Matrix of g1 .


′ ′ ′ ′ ′ ′ ′
e14 e15 e16 e17 e18 e19 e20

e14 0 1 1 1 0 0 0

e15 1 0 1 0 1 1 0

e16 1 1 0 1 1 1 0

e17 1 0 1 0 1 0 1

e18 0 1 1 1 0 1 1

e19 0 1 1 0 1 0 1

e20 0 0 0 1 1 1 0

matrices are equinumerous. For a detail calculation procedure, refer to Section 3.6.2
in the last Chapter.
In Graph 4-1, there are 34 subgraphs isomorphic to query graph g 4-2a (1, 3, 5,
11, 12), (1, 3, 7, 8, 11), (1, 3, 7, 10, 11), (1, 3, 7, 11, 12), (1, 3, 10, 11, 12), (1, 4, 6,
7, 8), (1, 4, 7, 10, 11), (1, 6, 7, 8, 11), (1, 7, 8, 10, 11), (1, 7, 10, 11, 12), (2, 3, 5, 9,
13), (2, 3, 5, 11, 12), (2, 4, 5, 9, 13), (2, 4, 7, 9, 10), (2, 4, 9, 10, 12), (2, 4, 9, 10, 13),
(2, 4, 9, 12, 13), (2, 9, 10, 12, 13), (3, 5, 9, 12, 13), (3, 5, 10, 11, 12), (3, 5, 11, 12,
13), (3, 7, 10, 11, 12), (3, 9, 10, 11, 12), (4, 6, 7, 8, 10), (4, 6, 7, 9, 10), (4, 6, 7, 10,
11), (4, 7, 9, 10, 11), (4, 7, 9, 10, 12), (4, 7, 10, 11, 12), (4, 9, 10, 11, 12), (4, 9, 10,
12, 13), (5, 9, 10, 12, 13), (7, 9, 10, 11, 12), (9, 10, 11, 12, 13).

4.1.2 Partial Subgraph Matching

A partial subgraph can have fewer edges between the same vertices than the query
graph.
The figure 4-3 below illustrates the subgraph spanned on the data graph by the
vertex subset 2, 5, 3, 12, 13. The induced subgraph searching process performs in a

89
Algorithm 3 Subgraph Selection Scheme 1
Input: A pattern graph G and a subgraph g
Output: If g is a subgraph candidate of G

1: while choose the same number of vertices as g from G, termed as g do

2: if g has the same number of edges as g then
3: go to Algorithm 1
4: else
5: return false
6: end if
7: end while

relatively strict rule comparaed with partial subgraph matching.

(a) g (b) g4

Figure 4-3: partial subgraph matching case

Based on the algorithm presented in the previous Section 4.1.1, due to g4 having
an extra edge between vertex 3 and 4, g is not a subgraph of g4 . In some scenarios, it
is not true. For ease of understanding, we use a simple case to illustrate the proposed
scheme to deal with this issue. The pseudo-code of inexact subgraph selection is
shown in Algorithm 4.

1. We have a pattern graph C and a subgraph c. Select the same number of


vertices from C like c, termed as c′ .

2. Check if the number of edges of c′ is equal or more than c or not. If not, they
are not isomorphic. If yes, go to the next step.

90
3. Regenerating the vertex and edge adjacency matrices of c′ based on the corre-
sponding indices of row and column from the vertex and edge adjacency matri-
ces C and go to the next step. Ensure the size of vertex and edge adjacency
matrices of c and c′ is the same.

4. Check the isomorphism based on the permutation theorem. If applicable, go to


the next step. If not, they are not isomorphic.

5. Check the isomorphism based on the equinumerosity theorem. If applicable,


they are isomorphic. If not, they are not isomorphic.

Algorithm 4 Subgraph Selection Scheme 2


Input: A pattern graph C and a subgraph c
Output: If c is a subgraph candidate of C
1: while choose the same number of vertices as c from C to generate a subgraph,

which termed as c do

2: if the number of edges of c equals or less than the number of edges of c then
3: enumerate the same number of edges as c of C

4: generate the final c
5: go to Algorithm 1
6: else
7: return false
8: end if
9: end while

Based on Algorithm 4, the vertices combination of (2, 3, 5, 12, 13) is also a


subgraph of graph g 4-3a. We can conclude that induced subgraph isomorphism
matches in a relatively more strict way than partial subgraph isomorphism matching.

4.2 Performance Studies and applications

This section presents the experimental results on the performance of temporal and
spatial (see Section 4.2.1). We show the detailed verification process in terms of
undirected partial and induced subgraph isomorphism in Section 4.2.2. The algorithm
flow chart 3-12 is presented below.

91
4.2.1 Complexity Analysis

The computational procedure mainly contains 3 parts:

1. Subgraph Selection Algorithm analysis (refer to Algorithm 3 and 4); The tem-
poral complexity is O(min(nk , n(n−k) )), though if k ≥ n/2 it is more accurate
to say O(n(n−k) ), where n indicates the number of vertices of the data graph, k
represents the number of vertices of the query graph; or n indicates the number
of edges of the candidate subgraph, k represents the number of edges of the
query graph.

2. The Permutation theorem algorithm analysis; (refer to Algorithm 1)

3. The Equinumerosity theorem algorithm analysis. (refer to Algorithm 2)

The temporal and spatial complexity of these algorithms (Ullmann [13], VF2 [16]
and Quasipolynomial GI-algorithm [41]) are suitable for both graph isomorphism and
subgraph isomorphism. The detailed analysis and demonstration of permutation and
equinumerosity theorem are referred to Section 3.3 and 3.4.
For ease of understanding, we use two simple cases to illustrate further.

4.2.2 Case Study - Induced Subgraph Matching

The generation process of the vertex adjacency matrix of graph c′ for induced match-
ing is choosing the corresponding indexes according to enumerating vertex combina-
tions from the vertex adjacency matrix of graph C.

Table 4.7: Query graph c Table 4.8: step 1: enumerate vertices from
graph C
v1 v2 v3 v4
v1 0 1 0 0 v1′ v2′ v3′ v4′ v5′
v2 1 0 1 0 v1′ 0 1 1 0 0
v3 0 1 0 1 v2′ 1 0 1 0 0
v4 0 0 1 0 v3′ 1 1 0 1 0
v4′ 0 0 1 0 1
v5′ 0 0 0 1 0

92
(a) c (b) step 1: enumerate vertices from graph C.

(c) step 2a: enumerate edges from graph C. (d) step 2b: enumerate edges from graph C.

Figure 4-4: The Procedure of Generating Graph c .

The generation process of the edge adjacency matrix of graph c′ for induced match-
ing is choosing the corresponding indexes according to enumerating edge combinations
from the edge adjacency matrix of graph C.

Table 4.9: step 2a: enumerate edges from Table 4.10: step 2b: enumerate edges from
graph C graph C

e′1 e′2 e′3 e′4 e′5 e′1 e′2 e′3 e′4 e′5
e′1 0 1 0 1 0 e′1 0 1 0 1 0
e′2 1 0 1 1 0 e′2 1 0 1 1 0
e′3 0 1 0 1 1 e′3 0 1 0 1 1
e′4 1 1 1 0 0 e′4 1 1 1 0 0
e′5 0 0 1 0 0 e′5 0 0 1 0 0

The subgraph c 4-4a has 4 vertices and 3 edges, so we have to choose the same
number of vertexes from the pattern graph C 4-4b. In this example, 5 subgraphs are
generated with vertex id: (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5) and (2, 3, 4,
5). Among these subgraphs, only the subgraph with vertexes combination of (1, 2, 4,
5) has 2 edges, which are less than 3 edges in c. Therefore, only (1, 2, 3, 4), (1, 2, 3,
5), (1, 3, 4, 5) and (2, 3, 4, 5) could be the potential subgraph candidates. Here, we

93
use the subgraph c′ with vertex ID (1, 2, 3, 4) in C and c for demonstration purposes.
The vertex adjacency matrix of subgraph c′ is shown in Table 4.8. In Algorithm 3,
subgraph c′ would be excluded in the initial selection process because there are 4
edges between the corresponding vertices.

While in Algorithm 4, we consider a relative complexity situation. Say, the sub-


graphs with the same number of vertices have more edges than the query graph c
4-4a. If this applies, the matching process should be continued. 4 subgraphs with 3
edges are created according to the full combination of 3 (the number of edges in c)
out of 4 (the number of edges in C), the subgraphs with edges combinations (1, 2,
3), (1, 2, 4), (1, 3, 4) and (2, 3, 4) are potential candidates.

The vertex and edge matrices of c′ are generated by selecting the corresponding
rows and columns from C, refer to Table 4.8, 4.9 and 4.10. By going through the
permutation and equinumerosity theorem checking process, only subgraphs with edge
ID (1, 2, 3) and (1, 3, 4) are isomorphic to graph c. The specific calculation process
is as Section 3.6.2 in Chapter 3.

The procedure of generating a subgraph contains 2 steps:

1. Enumerate the same number of vertices as subgraph c from graph C.

2. Enumerate the same number of edges as subgraph c among the vertices based
on step 1.

Based on the principle of induced subgraph matching, we can conclude that the
subgraphs with vertices combinations of (1, 2, 3, 4), (1, 3, 4, 5), (2, 3, 4, 5) in Figure
4-6c are isomorphic to Figure 4-6a.

94
(a) c (b) C

Figure 4-5: partial subgraph matching case

Table 4.11: Triple tuple of graph c. Table 4.12: Triple tuple of graph C

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 3 2 2 3
3 3 4 3 3 4
4 1 3
5 4 5

4.2.3 Case Study - partial Subgraph Matching

Table 4.13: Vertex adjacency matrices for Table 4.14: Vertex adjacency matrices for
query graph c. data graph C
′ ′ ′ ′ ′
v1 v2 v3 v4 v1 v2 v3 v4 v5

v1 0 1 0 0 v1 0 1 1 0 0

v2 1 0 1 0 v2 1 0 1 0 0

v3 0 1 0 1 v3 1 1 0 1 0

v4 0 0 1 0 v4 0 0 1 0 1

v5 0 0 0 1 0

The subgraph c 4-5a has 4 vertices and 3 edges. In Algorithm 3, we only consider the
subgraphs with the same number of vertexes has the same number of edges with the
query graph c 4-5a as potential candidates. So we have to choose the same number of
vertexes from the pattern graph C 4-5b. If this applies, the matching process should
be continued. In this example, 5 subgraphs with 4 edges are generated according
to the full combination of 4 (the number of vertexes in c) out of 5 (the number of

95
Table 4.15: Edge adjacency matrix for Table 4.16: Edge adjacency matrix for
query graph c data graph C
′ ′ ′ ′ ′
e1 e2 e3 e1 e2 e3 e4 e5

e1 0 1 0 e1 0 1 0 1 0

e2 1 0 1 e2 1 0 1 1 0

e3 0 1 0 e3 0 1 0 1 1

e4 1 1 1 0 0

e5 0 0 1 0 0

vertexes in C ), these subgraphs with vertex id: (1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5),
(1, 3, 4, 5) and (2, 3, 4, 5). Among these subgraphs, (1, 2, 3, 4) has 4 edges and (1, 2,
4, 5) has 2 edges, which are not equal to 3 as query graph c. Therefore, only (1, 2, 3,
5), (1, 3, 4, 5) and (2, 3, 4, 5) could be the potential subgraph candidates. Here, we
use the subgraph c′ with vertex id (1, 2, 3, 5) in C and c for demonstration purposes.

(a) c (b) C

(c) C

Figure 4-6: partial Matching Case.

The vertex and edge matrices of c′ are generated by selecting the corresponding
rows and columns from C. Based on the principle of partial subgraph matching and
by going through the permutation and equinumerosity theorem checking process, we
can conclude that the subgraph the subgraphs with vertices combinations of (1, 3,

96
Table 4.17: Triple tuple of graph c. Table 4.18: Triple tuple of graph C

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 3 2 2 3
3 3 4 4 1 3

Table 4.19: c. Table 4.20: Vertex adjacency matrix of c′


′ ′ ′ ′
v1 v2 v3 v4 v1 v2 v3 v5

v1 0 1 0 0 v1 0 1 1 0

v2 1 0 1 0 v2 1 0 1 0

v3 0 1 0 1 v3 1 1 0 0

v4 0 0 1 0 v5 0 0 0 0

Table 4.21: Edge adjacency matrix of c′ Table 4.22: Edge adjacency matrix of C
′ ′ ′
e1 e2 e3 e1 e2 e4

e1 0 1 0 e1 0 1 1

e2 1 0 1 e2 1 0 1

e3 0 1 0 e4 1 1 0

Table 4.23: Computation results of vertex adjacency matrix for graph g1 (step 8)

Sum of row 1th power 2th power 3th power 4th power
v1 1 1 1 1
v2 2 4 8 16
v3 2 4 8 16
v4 1 1 1 1
Sum 6 10 18 34

Table 4.24: Computation results of vertex adjacency matrix for graph g1 (step 8)

Sum of row 1th power 2th power 3th power 4th power
v1′ 2 4 8 16
v2′ 2 4 8 16
v3′ 2 4 8 16
v5′ 0 0 0 0
Sum 6 12 24 48

97
4, 5), (2, 3, 4, 5) with edge ID (1, 2, 3), (1, 3, 4) and (2, 3, 5) in Figure 4-6c are
isomorphic to graph c 4-6a. Readers are referred to Section 3.6.2 of the previous
chapter for more details about the calculating process.

4.3 Summary
In this chapter, We introduced the concept of subgraph isomorphism. Then, we dis-
cussed subgraph matching problems in which partial and induced subgraph scenarios
are considered. On the theoretical basis of permutation and equinumerosity theorem
proposed in Section 3, the experiment results suggest that our proposed subgraph
isomorphism algorithm achieves a good solution quality and is effective in two situa-
tions.

98
Chapter 5

Quantitative Graph Distance


Measurement

In Chapter 3 and 4, we discussed undirected and directed graph and subgraph iso-
morphism verification. There are two conditions of distance: 0 and 1, for every two
graphs. This representation, however, fails to show a real distance between the two
graphs. The real distance could display the grade of differences, whose range should
be [0,1] rather than only 0 and 1. In this chapter, we propose a quantitative graph
distance measurement that extends the distance of two graphs from 0, 1 to [0, 1].
Due to the graphs that could have dissimilar vertices and edges, to solve this tricky
question, we transform the two graphs into two graphs with the equivalent number of
vertices and edges. Therefore, it is comparable with each other. Our method defines
a synopsis quantitative distance (or dissimilarity) measure for approximate matching.
The distance calculation between different chemical structures will be used as
running examples in this chapter. Figuring out the connection between the structure
and the function is an essential problem in biology, where the biological structure
is usually represented by graphs. The graph-isomorphism-based representation is
breaking through in bioinformatics. Chemical compound identifications could be
converted into graph querying problems, which is a knotty problem pressing for a
solution.
We adopt adjacent matrix graph presentations by simplifying molecular structures

99
to detect isomorphic topological patterns and to further improve the substructure
retrieval efficiency. Inspired by the graph isomorphism algorithm [54] and the SMILES
representation, we created another representation for the chemical components as the
follows.

Flow chart of Graph Isomorphism Matching.

In this section, we illustrate the procedure of our proposed Quantitative Graph Dis-
tance model in detail. The fundamental structure of this algorithm is depicted in
Figure 5-1.

(1) Data Source (3) Proposed Graph Distance Measurement Approach

(2) Data A B is satisfied (6) Metric


is satisfied
 a11 a12  a1n   b11 b12  b1n  0.125*S1+0.125*S2
   
 
n n
CCC=O k
rowsum ( A)  rowsum ( B ) k
If map (A) = map(B)
COC=C  a21 a22  a2n   b21 b22  b2n  If k 1 k 1 +
            0.125*S3+0.125*S4
    Permutation Theorem Equinumerosity Theorem +
  ann   bn1 bn1  bnn 
SMILES notation  an1 an1 For details, refer to Fig 2 For details, refer to Fig 3 and 4 0.125*S5+0.125*S6
Vertex or Edge adjacent matrix +
Compounds 0.125*S7+0.125*S8

is not satisfied is not satisfied Distance

(4) Metric (5) Metric

0.5S1+0.5S2 0.25S1+0.25S2+0.25S3+0.25S4
Distance Distance

Figure 5-1: Flow chart of Quantitative Graph Distance algorithm

This chapter is organised as follows. Section 5.1 describes the Permutation Topo-
logical Distance calculation process of the vertex and edge adjacency matrices. In
Section 5.2, we will then describe the equinumerosity topological distance calculation
process of the vertex and edge adjacency matrices. In Section 5.3, we illustrate the
procedure of constructing the vertex and edge adjacency matrix of labelled graphs.
In Section 5.4, detailed processes for introducing the complexity of our proposed al-
gorithm for graph topological distance and a number of real-world applications are
discussed. The work of this chapter is summarised in Section 5.5.

5.1 Permutation Topological Distance


The permutation distance between two arrays is the Euclidean distance between the
sum, the powered sum and until nth power sum, where n indicates the size of the cor-

100
responding vertex/edge adjacency matrices. If the distance is 0, they are isomorphic.
If a graph’s array of row sum of vertex/edge adjacency matrix is a permutation of
another graph, then the two graphs are isomorphic. The computational procedure of
Permutation Topological Distance is as follows:
key
z }| {
1. Constructing a vertex-edge map based on edge-vertex map: 〈ID of vertex v :
value
z }| {
all adjacent edges of vertex v 〉.
key
z }| {
2. Constructing a edge-vertex map: 〈ID of edge e:
value
z }| {
starting point of the edge e, ending point of edge e 〉.

3. The row sum sequence of the vertex adjacency matrix is determined according
to the vertex-edge map.

4. The row sum sequence of the edge adjacency matrix is determined according to
the edge-vertex map.

5. Obtaining sum of the row, sum of the square of the row up to sum of nth power
of row for vertex adjacency matrix and the sum of a row, the sum of square of
a row up to sum of mth power of row for edge adjacency matrix respectively, as
shown in Table 5.3 and 5.4.

Table 5.1: Row sums of vertex matrices

Power of sum row Vertex adjacency matrix of G1 Vertex adjacency matrix of G2



1th power α1 α1

2th power α1 α2
... ... ...

(n − 1)th power αn−1 αn−1

nth power αn αn

Note, the number of n and m in Table 5.3 and 5.4 indicate the size of the vertex
and edge adjacency matrices, respectively.
Euclidean distance can easily compute the dissimilarity or distance between two
candidates. The formula 5.1 shows the row sum distance of vertex adjacency matrices:

101
Table 5.2: Row sums of edge matrices

Power of sum row Edge adjacency matrix of G1 Edge adjacency matrix of G2



1th power β1 β1

2th power β1 β2
... ... ...

(m − 1)th power βm−1 βm−1

mth power βm βm

s ′ ′ ′
(α1 − α1 )2 + (α2 − α2 )2 + · · · + (αn−1 − αn−1 )2 + (αn − αn′ )2
S1 = 2 2 2
(5.1)
+ αn−1 + αn2 + αn′ 2
′ ′ ′
2
α12 + α1 + α22 + α2 + · · · + αn−1

The formula 5.2 shows the row sum distance of edge adjacency matrices:
s ′ ′ ′
(β1 − β1 )2 + (β2 − β2 )2 + · · · + (βn−1 − βn−1 )2 + (βn − βn′ )2
S1 = 2 2 2
(5.2)
+ βn−1 + βn2 + βn′ 2
′ ′ ′
β12 + β1 + β22 + β2 + · · · + βn−1
2

If S1 = 0 and S2 = 0, the calculation should be continue. If S1 ̸= 0 or S2 ̸= 0, the


final distance is S = 0.5 ∗ S1 + 0.5 ∗ S2 .

The pseudo code for Permutation Topological Distance Measurement shows in


Algorithm 5.

Algorithm 5 Permutation Topological Distance Measurement


Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates
Output: Permutation Topological Distance
for i = 1 to n, j = 1 to m do
if rowsum(A)i = rowsum(B)i & rowsum(C)j = rowsum(D)j then
S1 = 0
S2 = 0
go to Algorithm 6
else
S = 0.5 ∗ S1 + 0.5 ∗ S2
end if
end for

102
5.2 Equinumerosity Topological Distance
This section presents the computational procedure of equinumerosity topological dis-
tance, which contains two parts: 1) equinumerosity topological distance of singular
values and 2) equinumerosity topological distance of the left and right singular matrix.

5.2.1 Equinumerous topological distance of singular values

The computing process of the equinumerous topological distances of singular value


sequence is as follows:

1. With the eigendecomposition of the vertex and edge matrix, we convert a singu-
lar value sequence to an equinumerous sequence. Firstly, grouping the singular
values of matrix S with the same values, and the number of elements in each
group termed as P -multiple eigenvalues (P >1). (This step is named as mapping
function)

For example, there is a singular value sequence:

(3.6, 3.6, 3.2, 3.2, 2.4, 2.4, 2.4, 2.4, 2.4, 2.4).

〈singular value: index of the corresponding singular value 〉: {〈3.6: (1, 2)〉, 〈3.2:
(3, 4)〉, 〈2.4: (5, 6, 7, 8, 9, 10)〉}.

〈frequency of singular values: total number of the corresponding frequency of


singular values 〉: {〈2: 4〉, 〈6: 6〉}.

Equinumerous sequence: {21, 22, 23, 24, 61, 62, 63, 64, 65, 66}.

Calculate sum, the sum of square and sum of cube of equinumerous sequences:
471, 26241, 1585521, 99018069, 6263167401, 398174714421, 25371059927481,
1618644679753029, 103360482666590121, 6605253359089495701.

103
2. Performing step 1) on the corresponding vertex and edge matrices of two given
graphs to get the sum, sum of square and sum of cube of the equinumerous
sequence of singular values, respectively.

Table 5.3: Row sums of vertex matrices


Power of equinumerous arrays of eigenvalue Vertex adjacency matrix of G1 Vertex adjacency matrix of G2

1th power γ1 γ1

2th power γ1 γ2
... ... ...

(n − 1)th power γn−1 γn−1

nth power γn γn

Table 5.4: Row sums of edge matrices


Power of equinumerous arrays of eigenvalue Edge adjacency matrix of G1 Edge adjacency matrix of G2

1th power δ1 δ1

2th power δ1 δ2
... ... ...

(m − 1)th power δm−1 δm−1

mth power δm δm

The formula of equinumerous sequence distance of singular values of the vertex


adjacency matrix is as follows:

s ′ ′ ′
(γ1 − γ1 )2 + (γ2 − γ2 )2 + · · · + (γn−1 − γn−1 )2 + (γn − γn′ )2
S3 = 2 2 2
(5.3)
+ γn−1 + γn2 + γn′ 2
′ ′ ′
γ12 + γ1 + γ22 + γ2 + · · · + γn−1
2

The formula of equinumerous sequence distance of singular values of the edge


adjacency matrix is as follows:
s ′ ′ ′
(γ1 − γ1 )2 + (γ2 − γ2 )2 + · · · + (γn−1 − γn−1 )2 + (γn − γn′ )2
S4 = 2 2 2
(5.4)
+ γn−1 + γn2 + γn′ 2
′ ′ ′
γ12 + γ1 + γ22 + γ2 + · · · + γn−1
2

If S3 = 0 and S4 = 0, the calculation should be continue. If S3 ̸= 0 or S4 ̸= 0, the


final distance is S = 0.25∗S1 +0.25∗S2 +0.25∗S3 +0.25∗S4 = 0+0+0.25∗S3 +0.25∗S4 =
0.25 ∗ S3 + 0.25 ∗ S4 .

104
5.2.2 Equinumerous topological distance of left and right sin-
gular matrix

The more detailed computing process of the equinumerous topological distance of the
left and the right singular matrix is as follows:

1. Getting P -multiple eigenvalues of the vertex adjacency matrix of candidate 1


P -multiple eigenvalues multiplicity
z }| { z}|{
and candidate 2 sequentially, noted as ( value , num ) and
P -multiple eigenvalues multiplicity
z }| { z }| {
′ ′
( value , num ).

• If there is no P -multiple eigenvalues existing in candidate 1 or candidate 2, then


move to the last step and return results.


• If num ̸= num : the algorithm returns to step 1) to check next P -multiple
eigenvalue.


• If num = num :

1. Extracting P columns from left singular matrix U according to the corre-


sponding indices of P -multiple eigenvalues after singular value decomposition
of vertex adjacent matrix of candidate 1, noted as UP .

2. Extracting P columns from left singular matrix U according to the corre-
sponding indices of P -multiple eigenvalues after singular value decomposition

of vertex adjacent matrix of candidate 2, noted as UP .

3. Extracting P rows from right singular matrix V according to the corre-


sponding indices of P -multiple eigenvalues after singular value decomposition
of vertex adjacent matrix of candidate 1, noted as VP .

4. Extracting P rows from right singular matrix V according to the corre-
sponding indices of P -multiple eigenvalues after singular value decomposition

of vertex adjacent matrix of candidate 2, noted as VP .

105
2. Performing Gaussian elimination on the maximally independent vector set of
the corresponding P -multiple eigenvalues of the left and right eigenvector of
′ ′
candidate 1 and candidate 2. Say, UP , UP , VP and VP .

3. The calculation should continue until completing the comparisons of all P -


multiple eigenvectors of P -multiple eigenvalues between candidate 1 and candi-
date 2.

If the left eigenvectors of two candidates are equinumerous of vertex adjacency


matrix, that is S5 = 0. Otherwise, S5 = 1. If the right eigenvector of two can-
didates is equinumerous of vertex adjacency matrix, that is S6 = 0. Otherwise,
S6 = 1. If S5 = 0 and S6 = 0, the calculation should be continued. If S5 ̸= 0 or
S6 ̸= 0, the final distance is S = 0.125 ∗ S1 + 0.125 ∗ S2 + 0.125 ∗ S3 + 0.125 ∗ S4 +
0.125∗S5 +0.125∗S6 = 0+0+0+0+0.125∗S5 +0.125∗S6 = 0.125∗S5 +0.125∗S6 .

4. Similarly, we can apply the same calculating process on the right singular ma-
trix. Thus obtaining equinumerous topological distances of left and right sin-
gular matrix of edge adjacency matrix S7 and S8 .

If the left eigenvectors of two candidates are equinumerous of edge adjacency


matrix, that is S7 = 0. Otherwise, S7 = 1. If the right eigenvector of two
candidates is equinumerous of edge adjacency matrix, that is S8 = 0. Otherwise,
S8 = 1. If S7 = 0 and S8 = 0, the calculation should be continued. If S7 ̸= 0
or S8 ̸= 0, the final distance is The final distance is S = 0.125 ∗ S1 + 0.125 ∗
S2 + 0.125 ∗ S3 + 0.125 ∗ S4 + 0.125 ∗ S5 + 0.125 ∗ S6 + 0.125 ∗ S7 + 0.125 ∗ S8 =
0 + 0 + 0 + 0 + 0 + 0 + 0.125 ∗ S7 + 0.125 ∗ S8 = 0.125 ∗ S7 + 0.125 ∗ S8 .

5. If two graphs are isomorphic, the topological distance is 0. The similarity is 1.

The pseudo-code for Equinumerosity Topological Distance Measurement shows in


Algorithm 6.
The distance based on permutation theorem [54] will count for 0.5 weight. If
two graphs are isomorphic according to the equinumerosity theorem [54], the 0.5 will
time 0. Otherwise, 0.5 will time the Euclidean distance based on the equinumerosity

106
Algorithm 6 Equinumerosity Topological Distance Measurement
Input: Vertex adjacency matrices A and B, edge adjacency matrices C and D of two
candidates
Output: Equinumerosity Topological Distance
T
A = Uv1 Σv1 Vv1
T
B = Uv2 Σv2 Vv2
C = Ue1 Σe1 Ve1T
D = Ue2 Σe2 Ve2T
if map(Σv1 ) = map(Σv2 ) & map(Σe1 ) = map(Σe2 ) then
S3 = 0
S4 = 0
if map(Uv1 ) = map(Uv2 ) & map(Vv1 ) = map(Vv2 ) then
S5 = 0
S6 = 0
if map(Ue1 ) = map(Ue2 ) & map(Ve1 ) = map(Ve2 ) then
S=0
else
S = 0.125 ∗ S7 + 0.125 ∗ S8
end if
else
S = 0.125 ∗ S5 + 0.125 ∗ S6
end if
else
S = 0.25 ∗ S3 + 0.25 ∗ S4
end if

107
theorem [20]. The mathematical proof is shown in 3.3.2. Therefore, the range of the
topological distance should be (0, 1).
The value of the quantitative graph distance measurement is the dissimilar value
for every pair of two graphs. Then, the formulas mentioned above are the corre-
sponding distance function. We, therefore, calculate how the difference between two
graphs implies the degree of two graphs being non-isomorphic. It offers a detailed and
accurate measurement of graph isomorphism. The design of topological distance mea-
surement follows Euclidean distance between two graphs. Furthermore, a real-time
application is shown in 5.4.

5.3 Graphs with Labeled Nodes

Labels are a kind of naming only — and so label nodes are either present or absent
in graphs. We discussed the shaping isomorphism scenarios in the last chapters
because, without labelling, the only thing that differentiates two graphs is their shape.
Thus the number of unlabeled graphs with n vertices is the number of graph shapes
with n vertices. While graphs with labelled nodes are commonly used in practical
applications. If you’re modelling a situation in which nodes represent something
that is distinguishable, it matters how those things are connected by whatever the
edges represent. With labelled vertices, it characterises different objects. Two graphs
can have the same shape but not be the same because if we take Chloroethane,
for example, which is shown in Figure 5-2a, if splitting the chemical structure of
Chloroethane from the middle and without labelling the left node as ‘H’ and right
node as ‘Cl’, they are symmetrical. That is, Without labelling nodes, these two
possible subgraphs are isomorphic.
Chemical compounds can be represented as graph structures in which the atoms
represent the nodes, and the bonds represent the links.
The SMILES code for Chloroethane is CCCl and Propanol is CCCO (it can also
be written as OCCC). Before calculating the distance between Chloroethane and
Propanol, we have to convert the SMILES code into graph form. We summarize

108
(a) Chloroethane (b) Propanol

Figure 5-2: Graphs with labeled nodes

several problems that need to be solved as follows:

1. How to deal with SMILES strings containing different attributes of nodes;

2. How to identify different chemical compounds be represented by many different


but equally valid SMILES strings;

3. How to compare two SMILES strings with dissimilar numbers of atoms that is,
two graphs with different numbers of nodes;

4. How to compare two SMILES strings with a dissimilar number of bonds that
is, two graphs with different numbers of edges.

In the next section, we will solve the problems above by considering the charac-
teristic chemical instances.

5.4 Performance Studies and applications

In this section, we evaluate the computational complexity of our proposed quanti-


tative graph distance measurement whilst using some examples from chemistry to
illustrate the information that is discussed. We will also be taking vertex attributes
into consideration in these instances. This algorithm can also be made applicable to
graphs with edge attributes.

109
5.4.1 Complexity Analysis

The calculation process of Topological structure Comparison between two given graphs
is shown below:

1. Based on the two supplied graphs, generate vertex and edge adjacency matrices.
If the number of vertices of the matrix is n, then the space complexity for a
graph is n4 .

2. The Permutation Topological Distance Measurement: The Permutation topo-


logical distance measurement process shows in Algorithm 5.

S1 = Euclidean distance between the pairwise permutation arrays of row sum


of vertex adjacency matrices. S2 = Euclidean distance between the pairwise
permutation arrays of edge adjacency matrices. The temporal complexity for
calculating the row sum of the vertex adjacency matrix and edge adjacency
matrix is n6 .

3. The Equinumerosity Topological Distance Measurement: The Equinumerosity


Topological Distance Measurement process shows in Algorithm 6.

S3 = Euclidean distance between the equinumerosity arrays of eigenvalues of


two vertex adjacency matrices. S4 = Euclidean distance between the equinu-
merosity arrays of eigenvalues of two edge adjacency matrices. The temporal
complexity for calculating the eigenvalues of the vertex adjacency matrix and
edge adjacency matrix is n3 .

S5 = Euclidean distance between the equinumerosity arrays of the left singular


vectors of two vertex adjacency matrices. S6 = Euclidean distance between the
equinumerosity arrays of right singular vectors of two vertex adjacency matrices.
S7 = Euclidean distance between the equinumerosity arrays of left singular
vectors of two edge adjacency matrices. S8 = Euclidean distance between the
equinumerosity arrays of right singular vectors of two edge adjacency matrices.
The temporal complexity for calculating the eigenvalues of the vertex adjacency
matrix and edge adjacency matrix is n3 .

110
A detailed discussion of various practical instances may be found in the next
section.

5.4.2 Applications

The extension of the application domain for our proposed graph isomorphism al-
gorithm includes numerous fields: chemical data analysis, computational biology
and drug discovery. Chemical compound identifications could be converted into
(sub)graph querying problems, where a node and an edge corresponding to an atom
and a chemical bond, respectively. It can facilitate the process of drug design. An
assumption is that the chemical compound structure is correlated with its properties
and biological activities, a powerful concept in drug discovery research known as the
structure-activity relationship (SAR) principle. Therefore, graph isomorphism can
benefit from uncovering chemical and biological characteristics such as stability, toxi-
city, metabolism, absorption and activity, etc. The essential features of chemical data
graphs are relatively small, each node and edge with distinct labels (the name of the
corresponding atoms and bonds may be repeated many times in one molecule).

Equivalent SMILES Strings

Chemical compounds may be represented in a variety of ways, including Fingerprint


Representation, SMILES string Representation, 3D structure, and other representa-
tions. SMILES are a kind of data that may be organised in a variety of data storage
formats and can be made extremely compact and readily saved and read. SMILES
is the ”Simplified Molecular Input Line Entry System,” which is used to convert a
chemical’s three-dimensional structure into a line formula. SMILES is a string often
used in chemistry to represent the connection between elements. It seeks to describe
the essential information included in a molecular system, such as elements, connec-
tivity, and connection attributes, in the shortest possible formula. Using SMILES
strings for the description of the chemical compound is very convenient.
SMILES are generated from a depth-first traversal of the corresponding molecular

111
graph. Different starting atoms and different neighbour atom traversals will yield
different SMILES as a result. Generally, one molecular structure usually has a unique
graph but a number of equally valid SMILES representations. For example, the
structure of ethanol could be specified as CCO, OCC and C(O)C. The process of
generating canonical SMILES by a canonicalisation algorithm is inconsistent among
chemical toolkits. It is hard to distinguish if the given SMILES strings present the
same chemical structure. A SMILES encodes a molecule, and it cannot be transformed
the structure search problem into a simple string matching problem. For example,
the chemical structure of Aspirin is shown in Figure 5-3:

Figure 5-3: Chemical structure for Aspirin

The chemical structure for Aspirin can be encoded into 10 very different SMILES
strings. (C9 H8 O4 ) are shown as below:

1. C1CCCC(C1OC(=O)C)C(O)=O

2. C(=O)(C1CCCCC1OC(C)=O)O

3. C1C(C(OC(=O)C)CCC1)C(=O)O

4. C1(OC(C)=O)C(CCCC1)C(O)=O

5. C1CC(C(C(O)=O)CC1)OC(C)=O

6. C1C(C(=O)O)C(OC(C)=O)CCC1

7. C1(C(CCCC1)C(O)=O)OC(C)=O

112
8. C1CCCC(C(=O)O)C1OC(=O)C

9. C1CCCC(C1C(=O)O)OC(C)=O

10. C1CC(C(CC1)C(=O)O)OC(C)=O

Transformed graphs generation

This algorithm exploits molecule topological similarity and functional compound sim-
ilarity so that they can be reliable via the application of the concept of bioisosterism.

The algorithms which are designed for chemical data need to take into account
repetitions in node end edge labels, which is a knotty problem pressing for a solution.
The design philosophy is identifying the structural symbols and chemical atom and
bond symbols to convert the corresponding chemical label information into structural
information, which is called transformed graphs. It transforms SMILES expressions
into undirected graphs. And then generating vertex and edge adjacency matrices
from graphs. Chemical constitution matching verification in this study also uses two
theorems involving: the permutation theorem and the equinumerosity theorem.

The methodology of generating a transformed graph is to create two lists, which


are an index list and a SMILES element list. That is, projecting the features of atoms
and bonds to a vertex and edge adjacency matrix. The index list must include all
unduplicated items from the union of the two SMILES expressions provided. In the
index list, all atom symbols and bond characteristics are represented by vertices. A
SMILES list comprises all of the atoms and bonds in the SMILES expression. If the
lengths of the two SMILES lists are mismatched, more empty vertexes and edges
must be added to the shorter list to ensure that they are of equal length.

The next section will present 4 chemical instances to further illustrate our graph
distance measurement method, which the aforementioned algorithm can support. The
calculation process of the algorithm is shown below:

113
Case Study 1 - The Identification of Multiple SMILES Notations for One
Chemical Compound

The structure of molecules with identical formulas may not be unique. Thus, varia-
tions in the structure of a molecule are referred to as Isomers. Geometrical isomers
need to be distinguished from each other - and checking these structural isomers is
a challenging problem. In this set of experiments, we look into the algorithm per-
formances at identifying equivalent SMILES strings for Oxetane. Oxetane can be
represented by four other equivalent SMILES strings: O1CCC1, C1OCC1, C1COC1
and C1CCO1. The presence of cycles makes it hard to deal with, the isomer graphs
have been generated as Figure 5-5.

Figure 5-4: Chemical structure for Oxetane.

(a) O1CCC1 (b) C1COC1

Figure 5-5: Isomer graph generation based on SMILES formulas.

Based on the permutation theorem, the row sums of O1CCC1 and C1COC1 for
vertex and edge adjacency matrix are both [18, 56, 180] and [18, 56, 180]. In this sec-
tion, we only do an approximate calculation of the sum of rows to the third power and
we refer the reader to 3.6.2 for more computational details, so they are permutated.

114
Table 5.5: Triple tuple of O1CCC1 Table 5.6: Triple tuple of C1COC1

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 3 4 2 3 4
3 4 5 3 4 5
4 5 6 4 5 6
5 3 6 5 3 6
6 1 3 6 1 5
7 1 5 7 2 3
8 2 4 8 2 4
9 2 6 9 2 6

Table 5.7: Vertex adjacency matrix of Table 5.8: Vertex adjacency matrix of
O1CCC1 C1COC1.

v1 v2 v3 v4 v5 v6 v1′ v2′ v3′ v4′ v5′ v6′


v1 0 1 1 1 0 1 v1′ 0 1 1 1 0 1
v2 1 0 0 0 1 0 v2′ 1 0 0 0 1 0
v3 1 0 0 1 0 1 v3′ 1 0 0 1 0 1
v4 1 0 1 0 1 0 v4′ 1 0 1 0 1 0
v5 0 1 0 1 0 1 v5′ 0 1 0 1 0 1
v6 1 0 1 0 1 0 v6′ 1 0 1 0 1 0

Table 5.9: Edge adjacency matrix of O1CCC1.

e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 1 1 0 0
e3 1 1 0 1 0 1 0 1 0
e4 1 1 1 0 0 0 1 0 1
e5 1 0 0 0 0 0 0 1 1
e6 0 1 1 0 0 0 1 1 0
e7 0 1 0 1 0 1 0 0 1
e8 0 0 1 0 1 1 0 0 1
e9 0 0 0 1 1 0 1 1 0

115
Table 5.10: Edge adjacency matrix of C1COC1
′ ′ ′ ′ ′ ′ ′ ′ ′
e1 e2 e3 e4 e5 e6 e7 e8 e9

e1 0 1 1 1 1 0 0 0 0

e2 1 0 1 1 0 1 1 0 0

e3 1 1 0 1 0 1 0 1 0

e4 1 1 1 0 0 0 1 0 1

e5 1 0 0 0 0 0 0 1 1

e6 0 1 1 0 0 0 1 1 0

e7 0 1 0 1 0 1 0 0 1

e8 0 0 1 0 1 1 0 0 1

e9 0 0 0 1 1 0 1 1 0

Graph G1 (O1CCC1) and G2 (C1COC1) satisfy the permutation theorem. The sin-
gular values of the two matrices are equinumerous and the maximally independent
vector set of the corresponding P -multiple eigenvalues of the left and right singular
vector of vertex and edge adjacency matrices are equinumerous, the graph G1 and G2
are satisfying equnumerosity theorem. So graph G1 and G2 are isomorphic.

Case Study 2 - Why Index List Matters?

We will clarify why we add an index list to the graph generation procedure by use
of examples in Figure 5-7. The index list can map corresponding attributes to each
vertex.
The chemical structure of Oxetane and Thietane are shown in Figure 5-4 and
5-6. It is evident that they share the same topological structure except for one node
with a different label. This is the reason why we created an index list on the left of
the element list, and this mapping relationship can distinguish nodes with a different
types of atoms. Therefore we can calculate the distance between graphs not just from
their topological level.
The distance between Oxetane (O1CCC1) and Thietane (S1CCC1) is 0.064865.
The row sums of Oxetane and Thietane for the vertex adjacency matrix are [20, 62,
200] and [20, 60, 188], S1 = 0.042078. The row sums of Oxetane and Thietane for
the edge adjacency matrix are [42, 184, 828] and [40, 168, 730], so S = 0.064865.

116
Figure 5-6: Chemical structure for Thietane

(a) Oxetane (O1CCC1) (b) Thietane (S1CCC1)

Figure 5-7: Isomer graph generation based on SMILES formulas.

Table 5.11: Triple tuple of Oxetane Table 5.12: Triple tuple of Thietane
(O1CCC1) (S1CCC1)

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 3 2 2 3
3 4 5 3 4 5
4 5 6 4 5 6
5 6 7 5 6 7
6 4 7 6 4 7
7 1 4 7 3 4
8 2 5 8 2 5
9 2 6 9 2 6
10 2 7 10 2 7

117
Table 5.13: Vertex adjacency matrix of Table 5.14: Vertex adjacency matrix of
Oxetane (O1CCC1) Thietane (S1CCC1)

v1 v2 v3 v4 v5 v6 v7 v1′ v2′ v3′ v4′ v5′ v6′ v7′


v1 0 1 0 1 1 0 1 v1′ 0 1 0 1 1 0 1
v2 1 0 1 0 0 1 0 v2′ 1 0 1 0 0 0 0
v3 0 1 0 0 0 0 0 v3′ 0 1 0 0 0 1 0
v4 1 0 0 0 1 0 1 v4′ 1 0 0 0 1 0 1
v5 1 0 0 1 0 1 0 v5′ 1 0 0 1 0 1 0
v6 0 1 0 0 1 0 1 v6′ 0 0 1 0 1 0 1
v7 1 0 0 1 0 1 0 v7′ 1 0 0 1 0 1 0

Table 5.15: Edge adjacency matrix of Oxetane (O1CCC1)

e1 e2 e3 e4 e5 e6 e7 e8 e9 e10
e1 0 1 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 0 1 1 0 0
e3 1 1 0 1 0 0 1 0 1 0
e4 1 1 1 0 0 0 0 1 0 1
e5 1 0 0 0 0 1 0 0 0 0
e6 1 0 0 0 1 0 0 0 1 1
e7 0 1 1 0 0 0 0 1 1 0
e8 0 1 0 1 0 0 1 0 0 1
e9 0 0 1 0 0 1 1 0 0 1
e10 0 0 0 1 0 1 0 1 1 0

118
Table 5.16: Edge adjacency matrix of Thietane (S1CCC1)

e′1 e′2 e′3 e′4 e′5 e′6 e′7 e′8 e′9 e′10
e′1 0 1 1 1 1 0 0 0 0 0
e′2 1 0 1 1 0 0 1 1 0 0
e′3 1 1 0 1 0 0 1 0 1 0
e′4 1 1 1 0 0 0 0 1 0 1
e′5 1 0 0 0 0 1 0 0 0 0
e′6 0 0 0 0 1 0 0 0 1 1
e′7 0 1 1 0 0 0 0 1 1 0
e′8 0 1 0 1 0 0 1 0 0 1
e′9 0 0 1 0 0 1 1 0 0 1
e′10 0 0 0 1 0 1 0 1 1 0

Case Study 3 - Isomer Distance Measurement

In Figure 5-8, we show two chemical molecular structures of C3 H6 O for further expla-
nation. The atom list on the left side (marked as blue colour) represents the index list
and the SMILES element list is on the right side (marked as purple colour). In this set
of experiments, we look into the algorithm performances at identifying the SMILES
formula for Propanal is CCC=O and Methoxyethene is COC=C, respectively. In this
case, we treat the double bond as a labelled node.

(a) Propanal (b) Methoxyethene

Figure 5-8: Chemical structure for Propanal and Methoxyethene

119
(a) CCC=O (b) COC=C

Figure 5-9: Graph Generation for Propanal and Methoxyethene

Table 5.17: Triple tuple of Propanal Table 5.18: Triple tuple of Methoxyethene
(CCC=O) (COC=C)

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 3 2 2 3
3 4 5 3 4 5
4 5 6 4 5 6
5 6 7 5 6 7
6 7 8 6 7 8
7 1 4 7 1 4
8 1 5 8 1 6
9 1 6 9 3 5
10 2 7 10 2 7
11 3 8 11 1 8

The distance between Propanal (CCC=O) and Methoxyethene (COC=C) is 0.029399.


The row sums of Propanal and Methoxyethene for vertex adjacency matrix are both
[18, 50, 150], so S1 = 0. The row sums of Propanal and Methoxyethene for edge
adjacency matrix are [32, 124, 512] and [32, 120, 470], so S2 = 0.058798.

120
Table 5.19: Vertex adjacency matrix of Table 5.20: Vertex adjacency matrix of
Propanal (CCC=O) Methoxyethene (COC=C)

v1 v2 v3 v4 v5 v6 v7 v1′ v2′ v3′ v4′ v5′ v6′ v7′


v1 0 1 1 1 1 0 0 v1′ 0 1 1 0 1 0 1
v2 1 0 0 0 0 0 1 v2′ 1 0 0 0 0 1 0
v3 1 0 0 1 0 0 0 v3′ 1 0 0 1 0 0 0
v4 1 0 1 0 1 0 0 v4′ 0 0 1 0 1 0 0
v5 1 0 0 1 0 1 0 v5′ 1 0 0 1 0 1 0
v6 0 0 0 0 1 0 1 v6′ 0 1 0 0 1 0 1
v7 0 1 0 0 0 1 0 v7′ 1 0 0 0 0 1 0

Table 5.21: Edge adjacency matrix of Propanal (CCC=O)

e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 1 0 0 0 0
e2 1 0 1 1 0 1 0 0 0
e3 1 1 0 1 0 1 1 0 0
e4 1 1 1 0 0 0 1 1 0
e5 1 0 0 0 0 0 0 0 1
e6 0 1 1 0 0 0 1 0 0
e7 0 0 1 1 0 1 0 1 0
e8 0 0 0 1 0 0 1 0 1
e9 0 0 0 0 1 0 0 1 0

Table 5.22: Edge adjacency matrix of Methoxyethene (COC=C)

e′1 e′2 e′3 e′4 e′5 e′6 e′7 e′8 e′9


e′1 0 1 1 1 1 0 0 0 0
e′2 1 0 1 1 0 1 0 0 0
e′3 1 1 0 1 0 0 1 1 0
e′4 1 1 1 0 0 0 0 0 1
e′5 1 0 0 0 0 0 0 1 1
e′6 0 1 0 0 0 0 1 0 0
e′7 0 0 1 0 0 1 0 1 0
e′8 0 0 1 0 1 0 1 0 1
e′9 0 0 0 1 1 0 0 1 0

121
Case Study 4 - Chemical Compound with Different Number of Atoms

Chloroethane (CCCl) and Propanol (CCCO) are in Figure 5-2. Because the two
given SMILES strings are not equal in length, we must create an empty node and
two empty edges for Chloroethane at the same time, ensuring the vertex and edge
adjacency matrices generated from graphs in Figure 5-10 have the same dimensions,
which means adding zero-rows and all zero-columns at the end of the corresponding
matrices).

(a) Chloroethane (CCCl) (b) Propanol (CCCO)

Figure 5-10: Graph Generation for Chloroethane and Propanol

Table 5.23: Triple tuple of Chloroethane Table 5.24: Triple tuple of Propanol
(CCCl) (CCCO)

Edge ID Node1 ID Node2 ID Edge ID Node1 ID Node2 ID


1 1 2 1 1 2
2 2 3 2 2 3
3 4 5 3 4 5
4 5 6 4 5 6
5 1 4 5 6 7
6 1 5 6 1 4
7 2 6 7 1 5
8 1 6
9 3 7

The distance between Chloroethane (CCCl) and Propanol (CCCO) is 0.522018.


The row sums of Chloroethane and Propanol for vertex adjacency matrix are [14, 34,

122
Table 5.25: Vertex adjacency matrix of Table 5.26: Vertex adjacency matrix of
Chloroethane (CCCl) Propanol (CCCO)

v1 v2 v3 v4 v5 v6 v7 v1′ v2′ v3′ v4′ v5′ v6′ v7′


v1 0 1 0 1 1 0 0 v1′ 0 1 0 1 1 1 0
v2 1 0 1 0 0 0 0 v2′ 1 0 1 0 0 0 1
v3 0 1 0 0 0 1 0 v3′ 0 1 0 0 0 0 0
v4 1 0 0 0 1 0 0 v4′ 1 0 0 0 1 0 0
v5 1 0 0 1 0 1 0 v5′ 1 0 0 1 0 1 0
v6 0 0 1 0 1 0 0 v6′ 1 0 0 0 1 0 1
v7 0 0 0 0 0 0 0 v7′ 0 1 0 0 0 1 0

Table 5.27: Edge adjacency matrix of Chloroethane (CCCl)

e1 e2 e3 e4 e5 e6 e7 e8 e9
e1 0 1 1 1 0 0 0 0 0
e2 1 0 1 0 0 1 0 0 0
e3 1 1 0 0 0 1 1 0 0
e4 1 0 0 0 1 0 0 0 0
e5 0 0 0 1 0 0 1 0 0
e6 0 1 1 0 0 0 1 0 0
e7 0 0 1 0 1 1 0 0 0
e8 0 0 0 0 0 0 0 0 0
e9 0 0 0 0 0 0 0 0 0

Table 5.28: Edge adjacency matrix of Propanol (CCCO)

e′1 e′2 e′3 e′4 e′5 e′6 e′7 e′8 e′9


e′1 0 1 1 1 1 1 0 0 0
e′2 1 0 1 1 0 0 1 0 0
e′3 1 1 0 1 0 0 1 1 0
e′4 1 1 1 0 0 0 0 1 1
e′5 1 0 0 0 0 1 0 0 0
e′6 1 0 0 0 1 0 0 0 1
e′7 0 1 1 0 0 0 0 1 0
e′8 0 0 1 1 0 0 1 0 1
e′9 0 0 0 1 0 1 0 1 0

123
86] and [18, 52, 162], so S1 = 0.401077. The row sums of Propanal and Methoxyethene
for edge adjacency matrix are [20, 60, 188] and [34, 138, 592], so S2 = 0.642959.

5.5 Summary
In this chapter, we explored the problem of distance between graphs with labelled
nodes. We elaborate on the theory of conversion of a chemical structure into the
corresponding representation forms of vertex and edge adjacency matrix. Built upon
the Permutation and equinumerosity theorem proposed in Section 3, we proposed
the Permutation and equinumerosity distance measurement to quantify the two given
candidates. We consider several possible situations with different structural features
of instance and concrete results are given.

124
Chapter 6

Epilogue

Graphs reflect a potential relationship among different entities of data which has been
widely recognized as an effective mechanism in response to the proliferating demands
across industries and academia, which helps them to make operational and strategic
decisions.
(Sub)graph isomorphism is one of the most important practical applications of
graph mining, and it has been widely applied in a diversity of real-world applications.
We propose a tripartite architecture graph isomorphism-based algorithm to verify
two graphs are identical and subgraph isomorphism-based approach to obtain query
subgraphs across huge graphs. First, we significantly reduce the computational com-
plexity by comparing the number of vertices and edges between two graphs for graph
isomorphism; and by extracting a succession of subgraphs to be matched and then
comparing the number of vertices and edges between the putative isomorphic sub-
graphs and the query graph for subgraph isomorphism. Following this, the vertex and
edge matrix including vertex and edge positional relation, directional relation, and
distance relation has been constructed. Then, using the permutation theorem, the
row sum of the vertex and edge adjacency matrix of the query graph and prospective
sample is computed. According to the equinumerosity theorem, we next determine
if the eigenvalues of the vertex and edge adjacency matrices of the two graphs are
equal. The topological distance can be computed based on graph isomorphism, and
subgraph isomorphism can be implemented after subgraph combination.

125
A wealth of applications can receive benefits from our subgraph matching tech-
nique. Even though the applications mentioned in this thesis are derived from the
chemical domain, they can still be commonly leveraged in other fields. Because of
the linear structure of SMILES notations, we parse these strings into geometrical
forms. Namely, our proposed method is able to handle linear data structures by
translating them into graph structures, which can subsequently be translated into
two-dimensional matrix forms using the quantitative graph distance measurement.
Many approximations and sub-optimal polynomial algorithms have been proposed
lately, but they do not guarantee to return efficient polynomial time solutions to graph
matching problems under any type of graph scenario. In order to enhance the effec-
tiveness of checking whether two graphs are isomorphic, this thesis puts forward a
polynomial-time algorithm to compute metrics. Theoretical analysis of this algorithm
has been carried out on its time and space requirements, and it has been successfully
applied to measure the similarity between two structures. The computation complex-
ity is O(n6 ) in the worst case and O(n2 ) in the best case, where n is the number of
sequences. The experiment results show our proposed method achieves a much lower
computational complexity than the state-of-the-art algorithms, which demonstrates
that the accuracy, effectiveness, and efficiency have been dramatically enhanced by
our proposed method for analysis. Our proposed algorithms can handle exact and
inexact graph matching. Therefore, the newly proposed subgraph isomorphism and
graph distance algorithms can spur further research with breakthrough significance
in the field of artificial intelligence theory.
This thesis studied graph isomorphism and graph distance problems and provided
solutions to some downstream applications. There are several potential research di-
rections to explore and further opportunities to build upon this research, such as the
maximal common subgraph search problem, which attempts to match the maximum
number of nodes between the two graphs. It is a promising research direction to
extend the work of this thesis.

126
Bibliography

[1] Chengyi Xia, Zhishuang Wang, Chunyuan Zheng, Quantong Guo, Yongtang
Shi, Matthias Dehmer, and Zengqiang Chen. A new coupled disease-awareness
spreading model with mass media on multiplex networks. Information Sciences,
471:185–200, 2019.

[2] Yang Liu, Yun Yuan, Jieyi Shen, and Wei Gao. Emergency response facility
location in transportation networks: a literature review. Journal of traffic and
transportation engineering (English edition), 8(2):153–169, 2021.

[3] Sancheng Peng, Yongmei Zhou, Lihong Cao, Shui Yu, Jianwei Niu, and Weijia
Jia. Influence analysis in social networks: A survey. Journal of Network and
Computer Applications, 106:17–32, 2018.

[4] Mikaela Koutrouli, Evangelos Karatzas, David Paez-Espino, and Georgios A


Pavlopoulos. A guide to conquer the biological network era using graph theory.
Frontiers in bioengineering and biotechnology, page 34, 2020.

[5] Fen Zhao, Yi Zhang, Jianguo Lu, and Ofer Shai. Measuring academic influence
using heterogeneous author-citation networks. Scientometrics, 118(3):1119–1140,
2019.

[6] Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, and Tat-Seng
Chua. Hierarchical fashion graph network for personalized outfit recommen-
dation. In Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval, pages 159–168, 2020.

[7] Lucia Cavallaro, Ovidiu Bagdasar, Pasquale De Meo, Giacomo Fiumara, and
Antonio Liotta. Graph and network theory for the analysis of criminal networks.
In Data Science and Internet of Things, pages 139–156. Springer, 2021.

[8] Stephen A Cook. The complexity of theorem-proving procedures. In Proceedings


of the third annual ACM symposium on Theory of computing, pages 151–158,
1971.

[9] Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. Thirty
years of graph matching in pattern recognition. International journal of pattern
recognition and artificial intelligence, 18(03):265–298, 2004.

127
[10] Kaspar Riesen, Xiaoyi Jiang, and Horst Bunke. Exact and inexact graph match-
ing: Methodology and applications. Managing and Mining Graph Data, pages
217–247, 2010.

[11] Tibério S Caetano, Julian J McAuley, Li Cheng, Quoc V Le, and Alex J Smola.
Learning graph matching. IEEE transactions on pattern analysis and machine
intelligence, 31(6):1048–1058, 2009.

[12] Yinghui Wu and Arijit Khan. Graph Pattern Matching, pages 871–875. Springer
International Publishing, Cham, 2019.

[13] Julian R Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM
(JACM), 23(1):31–42, 1976.

[14] Luigi P Cordella, Pasquale Foggia, Carlo Sansone, Francesco Tortorella, and
Mario Vento. Graph matching: a fast algorithm and its evaluation. In Pro-
ceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.
98EX170), volume 2, pages 1582–1584. IEEE, 1998.

[15] Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. Performance
evaluation of the vf graph matching algorithm. In Proceedings 10th International
Conference on Image Analysis and Processing, pages 1172–1177. IEEE, 1999.

[16] Luigi Pietro Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. An
improved algorithm for matching large graphs. In 3rd IAPR-TC15 workshop on
graph-based representations in pattern recognition, pages 149–159, 2001.

[17] Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Al-
fredo Ferro. A subgraph isomorphism algorithm and its application to biochem-
ical data. BMC bioinformatics, 14(7):1–13, 2013.

[18] Vincenzo Carletti, Pasquale Foggia, and Mario Vento. Vf2 plus: An improved
version of vf2 for biological graphs. In International Workshop on Graph-Based
Representations in Pattern Recognition, pages 168–177. Springer, 2015.

[19] Alpár Jüttner and Péter Madarasi. Vf2++—an improved subgraph isomorphism
algorithm. Discrete Applied Mathematics, 242:69–81, 2018.

[20] Vincenzo Bonnici and Rosalba Giugno. On the variable ordering in subgraph
isomorphism algorithms. IEEE/ACM transactions on computational biology and
bioinformatics, 14(1):193–203, 2016.

[21] Vincenzo Carletti, Pasquale Foggia, Alessia Saggese, and Mario Vento. Intro-
ducing vf3: A new algorithm for subgraph isomorphism. In International Work-
shop on Graph-Based Representations in Pattern Recognition, pages 128–139.
Springer, 2017.

128
[22] Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. Taming verifi-
cation hardness: an efficient algorithm for testing subgraph isomorphism. Pro-
ceedings of the VLDB Endowment, 1(1):364–375, 2008.

[23] Huahai He and Ambuj K Singh. Graphs-at-a-time: query language and access
methods for graph databases. In Proceedings of the 2008 ACM SIGMOD inter-
national conference on Management of data, pages 405–418, 2008.

[24] Shijie Zhang, Shirong Li, and Jiong Yang. Gaddi: distance index based sub-
graph matching in biological networks. In Proceedings of the 12th international
conference on extending database technology: advances in database technology,
pages 192–203, 2009.

[25] Peixiang Zhao and Jiawei Han. On graph query optimization in large networks.
Proceedings of the VLDB Endowment, 3(1-2):340–351, 2010.

[26] Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. Turboiso: towards ultrafast
and robust subgraph isomorphism search in large graph databases. In Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data,
pages 337–348, 2013.

[27] Xuguang Ren and Junhu Wang. Exploiting vertex relationships in speeding up
subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment,
8(5):617–628, 2015.

[28] Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. Efficient sub-
graph matching by postponing cartesian products. In Proceedings of the 2016
International Conference on Management of Data, pages 1199–1214, 2016.

[29] Bibek Bhattarai, Hang Liu, and H Howie Huang. Ceci: Compact embedding
cluster index for scalable subgraph matching. In Proceedings of the 2019 Inter-
national Conference on Management of Data, pages 1447–1462, 2019.

[30] Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin
Han. Efficient subgraph matching: Harmonizing dynamic programming, adap-
tive matching order, and failing set together. In Proceedings of the 2019 Inter-
national Conference on Management of Data, pages 1429–1446, 2019.

[31] James J McGregor. Relational consistency algorithms and their application in


finding subgraph and graph isomorphisms. Information Sciences, 19(3):229–250,
1979.

[32] Javier Larrosa and Gabriel Valiente. Constraint satisfaction algorithms for graph
pattern matching. Mathematical structures in computer science, 12(4):403–422,
2002.

[33] Christine Solnon. Alldifferent-based filtering for subgraph isomorphism. Artificial


Intelligence, 174(12-13):850–864, 2010.

129
[34] Julian R Ullmann. Bit-vector algorithms for binary constraint satisfaction and
subgraph isomorphism. Journal of Experimental Algorithmics (JEA), 15:1–1,
2011.
[35] Brendan D McKay et al. Practical graph isomorphism. 1981.
[36] Brendan D McKay and Adolfo Piperno. Practical graph isomorphism, ii. Journal
of symbolic computation, 60:94–112, 2014.
[37] Bruno T Messmer and Horst Bunke. A decision tree approach to graph and
subgraph isomorphism detection. Pattern recognition, 32(12):1979–1998, 1999.
[38] Bruno T Messmer and Horst Bunke. Efficient subgraph isomorphism detection: a
decomposition approach. IEEE transactions on knowledge and data engineering,
12(2):307–323, 2000.
[39] Markus Weber, Marcus Liwicki, and Andreas Dengel. Indexing with well-founded
total order for faster subgraph isomorphism detection. In International Work-
shop on Graph-Based Representations in Pattern Recognition, pages 185–194.
Springer, 2011.
[40] Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. Efficient
subgraph matching on billion node graphs. arXiv preprint arXiv:1205.6691, 2012.
[41] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the
forty-eighth annual ACM symposium on Theory of Computing, pages 684–697,
2016.
[42] Alfred V Aho and John E Hopcroft. The design and analysis of computer algo-
rithms. Pearson Education India, 1974.
[43] Eugene M Luks. Isomorphism of graphs of bounded valence can be tested in
polynomial time. Journal of computer and system sciences, 25(1):42–65, 1982.
[44] Xiaoyi Jiang and Horst Bunke. Optimal quadratic-time isomorphism of ordered
graphs. Pattern Recognition, 32(7):1273–1283, 1999.
[45] Peter J Dickinson, Horst Bunke, Arek Dadej, and Miro Kraetzl. Matching graphs
with unique node labels. Pattern Analysis and Applications, 7(3):243–254, 2004.
[46] John E Hopcroft and Jin-Kue Wong. Linear time algorithm for isomorphism
of planar graphs (preliminary report). In Proceedings of the sixth annual ACM
symposium on Theory of computing, pages 172–184, 1974.
[47] Shinji Umeyama. An eigendecomposition approach to weighted graph match-
ing problems. IEEE transactions on pattern analysis and machine intelligence,
10(5):695–703, 1988.
[48] Bin Luo, Richard C Wilson, and Edwin R Hancock. Spectral embedding of
graphs. Pattern recognition, 36(10):2213–2230, 2003.

130
[49] Richard C Wilson, Edwin R Hancock, and Bin Luo. Pattern vectors from alge-
braic graph theory. IEEE transactions on pattern analysis and machine intelli-
gence, 27(7):1112–1124, 2005.

[50] Antonio Robles-Kelly and Edwin R Hancock. A riemannian approach to graph


embedding. Pattern Recognition, 40(3):1042–1056, 2007.

[51] Terry Caelli and Serhiy Kosinov. An eigenspace projection clustering method
for inexact graph matching. IEEE transactions on pattern analysis and machine
intelligence, 26(4):515–519, 2004.

[52] Terry Caelli and Serhiy Kosinov. Inexact graph matching using eigen-subspace
projection clustering. International Journal of Pattern Recognition and Artificial
Intelligence, 18(03):329–354, 2004.

[53] Ali Shokoufandeh, Diego Macrini, Sven Dickinson, Kaleem Siddiqi, and Steven W
Zucker. Indexing hierarchical structures using graph spectra. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 27(7):1125–1140, 2005.

[54] Jing He, Jinjun Chen, Guangyan Huang, Jie Cao, Zhiwang Zhang, Hui Zheng,
Peng Zhang, Roozbeh Zarei, Ferry Sansoto, Ruchuan Wang, et al. A polynomial-
time algorithm for simple undirected graph isomorphism. Concurrency and Com-
putation: Practice and Experience, 33(7):1–1, 2021.

[55] Jing He, Jinjun Chen, Guangyan Huang, Mengjiao Guo, Zhiwang Zhang, Hui
Zheng, Yunyao Li, Ruchuan Wang, Weibei Fan, Chi-Huang Chi, et al. A fuzzy
theory based topological distance measurement for undirected multigraphs. In
2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–
10. IEEE, 2020.

[56] H Zenil, F Soler-Toscano, K Dingle, and A Louis. Graph automorphisms and


topological characterization of complex networks by algorithmic information con-
tent. Physica A: Statistical Mechanics and its Applications, 404:341–358, 2014.

[57] Mengjiao Guo, Chi-Hung Chi, Hui Zheng, Jing He, and Xiaoting Zhang. A sub-
graph isomorphism-based attack towards social networks. In IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology,
pages 520–528, 2021.

[58] Sheng-Yong Yang. Pharmacophore modeling and applications in drug discovery:


challenges and recent advances. Drug discovery today, 15(11-12):444–450, 2010.

[59] John W Raymond, Eleanor J Gardiner, and Peter Willett. Heuristics for sim-
ilarity searching of chemical graphs using a maximum common edge subgraph
algorithm. Journal of chemical information and computer sciences, 42(2):305–
316, 2002.

131
[60] Daniel Rehfeldt and Thorsten Koch. Combining np-hard reduction techniques
and strong heuristics in an exact algorithm for the maximum-weight connected
subgraph problem. SIAM Journal on Optimization, 29(1):369–398, 2019.

[61] Anthony Mansfield. Determining the thickness of graphs is np-hard. In Math-


ematical Proceedings of the Cambridge Philosophical Society, volume 93, pages
9–23. Cambridge University Press, 1983.

132

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy