23 - AAAI - Substructure Aware Graph Neural Networks
23 - AAAI - Substructure Aware Graph Neural Networks
Dingyi Zeng1 * , Wanlong Liu1 * , Wenyu Chen1 , Li Zhou1 , Malu Zhang1 , Hong Qu1†
1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
zengdingyi@std.uestc.edu.cn, liuwanlong@std.uestc.edu.cn, cwy@uestc.edu.cn, li zhou@std.uestc.edu.cn,
maluzhang@uestc.edu.cn, hongqu@uestc.edu.cn
11129
tively removing edges. Unlike most methods using MPNN on the message passing mechanism (Gilmer et al. 2017) sim-
to encode subgraphs, we extend random walks to the re- ilar to GCN, a series of advanced graph neural networks are
turn probabilities in subgraphs to encode the structural in- proposed to solve the problems of over-smoothing and over-
formation of subgraphs, which reduces time complexity and squashing (Chen et al. 2020a; Topping et al. 2021; Xu et al.
improves expressiveness. Then we propose a graph neu- 2018b; Zeng et al. 2022).
ral network framework based on subgraph encoding in-
jection called Substructure Aware Graph Neural Network Message Passing Neural Networks
(SAGNN), which greatly enhances the expressiveness of MPNNs (Gilmer et al. 2017) provide a methodology to ab-
GNNs without increasing complexity. We further theoreti- stract GNNs with a unified view of message passing mech-
cally prove that any 1-WL GNNs equipped with any com- anisms. In the case of this paradigm, node-to-node infor-
ponents of our framework are strictly more powerful than mation is propagated by iteratively aggregating neighboring
1-WL. Our extensive experiments validate the state-of-the- node information to a central node. Fomally, given a node
art performance of our framework on various base model i in graph G, its hidden representation hti at t iteration, its
networks, tasks and datasets, especially on the graph regres- neighboring nodes N (i), edge eij connecting node i to node
sion task of drug constrained solubility prediction (ZINC- j, a iteration of standard message passing paradigm can be
FULL). Our framework achieves a maximum MAE reduc- expressed as:
tion of 83% compared to the base model and a maximum X
MAE reduction of 32% compared to the previous state-of- cti = ϕt hti , htj , eij ,
(1)
the-art model. j∈N (i)
In summary, our main contributions are as follows:
• We propose a Cut subgraph which can be obtained from ht+1
i = σ t (cti , hti ), (2)
the original graph by continuously and selectively re- t t
where ϕ and σ are the aggregate and update function at
moving edges to help solve graph isomorphism problem. t iteration.
Then we extend random walks to the return probabilities
in subgraphs to encode the structural information of sub- Weisfeiler-Leman Algorithm
graphs.
The Weisfeiler-Lehman (WL) algorithm (Weisfeiler and Le-
• We propose a GNN framework based on subgraph en- man 1968) is a computationally efficient heuristic algorithm
coding injection called Substructure Aware Graph Neural for testing graph isomorphism proposed by Weisfeiler and
Network (SAGNN), which greatly enhances the expres- Lehman. The main idea of the WL algorithm is to contin-
siveness and performance of GNNs. uously relabel the root node through the neighbor nodes
• Extensive and diverse experiments demonstrate the state- until the label converges, and to judge whether the two
of-the-art performance of our framework on various tasks graphs are isomorphic by comparing the labels of the two
and datasets1 . graphs. Significantly, WL algorithm provides a theoretical
basis for many further graph methods (Morris, Rattan, and
Preliminaries and Related Work Mutzel 2020; Bevilacqua et al. 2022) due to its low com-
putational complexity. The Weisfeiler-Lehman subtree ker-
Notations and Background nel (Shervashidze et al. 2011), one of the most successful
Let G = (V, E, X) be a simple, undirected, connected graph kernel approaches, utilizes WL algorithm to generate node
with a finite set of nodes V and a finite set of edges E, features through an iterative relabeling. Moreover, based on
⊤ WL algorithm, a more hierarchical graph isomorphism test-
where the node feature X = [x1 , x2 , . . . , xn ] .
ing framework (Grohe 2017) is proposed, and higher-order
Graph Neural Networks WL tests are also instantiated from the framework.
The concept of GNNs is not new (Sperduti and Starita 1997; Limitations of MPNNs
Baskin, Palyulin, and Zefirov 1997) while the idea of using
neural networks to model graph data and extract features has Compared with traditional graph algorithms, GNNs exhibit
long existed and achieved certain results (Gori, Monfardini, better adaptability and generalization capabilities, which are
and Scarselli 2005; Scarselli et al. 2008). A more modern mainly reflected in data-driven training methods for classi-
version is graph convolutional neural network (GCN) (Kipf fication and regression. The similarity of the message pass-
and Welling 2017), a graph neural network used for semi- ing mechanism of MPNNs to the WL algorithm makes it
supervised node classification, which laid the foundation for regarded as a neural network implementation of the WL al-
advanced Graph Neural Networks. Besides, researchers also gorithm (Gilmer et al. 2017), which brings about the prob-
introduce the graph neural networks to the unsupervised lem of limited expressiveness of MPNNs. Specifically, the
scheme like graph contrastive learning and node cluster- upper limit of the expressiveness of MPNNs is 1-WL (Xu
ing (Liu et al. 2022c,d,e, 2023; Chen and Kou 2023). Based et al. 2019a), which makes it cannot distinguish a large class
of graphs (Cai, Fürer, and Immerman 1992), and have cer-
1
Our implementation is available at https://github.com/ tain defects in structure perception. Both MPNNs and 2nd-
BlackHalo-Drake/SAGNN-Substructure-Aware-Graph-Neural- order Invariant Graph Networks (Maron et al. 2019c) can-
Networks not count induced subgraphs of any connected pattern of
11130
3 or more nodes but only star-shaped patterns (Chen et al. Methodology
2020b), however such structures have a strong impact on
Generally speaking, MPNNs still cannot exceed 1-WL
certain downstream tasks such as functional groups in or-
in terms of expressiveness with sufficient depth and
ganic chemistry (Lemke 2003).
width (Loukas 2020) while distinguishing larger graphs
requires stronger expressiveness. Based on the fact that
Beyond 1-WL by inductive coloring Some task-specific subgraphs are more easily to distinguish than original
inductive node coloring methods (Zhang and Chen 2018; graphs, we generalize the problem of graph isomorphism
Veličković et al. 2019; Xu et al. 2019b) are proposed to im- to subgraph isomorphism. In this section we introduce our
prove the performance of existing GNNs, which are mainly SAGNN framework, which includes (1) Subgraph Extrac-
used for tasks such as link prediction and algorithm exe- tion Strategies, (2) Subgraph Random Walk Return Proba-
cution. An inductive node coloring framework (You et al. bility Encoding, and (3) Subgraph Information Injection.
2021) is proposed to fill the gaps of coloring methods on
graph-level and node-level tasks. Moreover, CLIP (Dasoulas Subgraph Extraction Strategies
et al. 2020) use colors to disambiguate identical node at-
tributes and is capable of capturing structure characteristics For subgraph-based methods, the subgraph extraction strat-
that traditional MPNNs fail to distinguish. egy has a crucial impact on the expressiveness of the model.
In this section, we classify subgraph extraction strategies
into node-based strategies and graph-based strategies.
Beyond 1-WL by positional encoding There are absolute
positions in text and images, so explicit position encoding Node-based Strategies We define the node-based strat-
for them is efficient. Due to the non-Euclidean spatial char- egy as a single-node-based subgraph extraction strategy
acteristic of graph structures, it becomes particularly diffi- that does not require information about the entire graph.
cult to explicitly encode graph data, often resulting in loss Generally speaking, the number of subgraphs of the node-
of information (Liu et al. 2022a). Direct index encoding of based extraction strategy is equal to the number of nodes
all nodes requires n! possible index permutations (Murphy of the original graph, and the most common strategy is
et al. 2019), so it is difficult for neural networks to induc- EgoN etwork. Specifically, N (v) denotes the set of neigh-
tive generalize such encoding methods. Laplacian eigenvec- boring nodes of the root node v and a more generalized no-
tors are also used for positional encoding (Dwivedi et al. tation Nk (v) denotes the set of nodes within k-hop from
2020; Dwivedi and Bresson 2020) to encode local and global the root node v. Then the Ego(v)k is a k-hop Egonetwork
structure information of the graph, but this encoding method rooted at node v and its corresponding nodes are the neigh-
does not address the global sign ambiguity problem. The bors within k-hop Nk (v) of the root node v.
method using random anchor sets of nodes (You, Ying,
and Leskovec 2019) has no problem of symbol ambiguity, Graph-based Strategies Different from the node-based
but its random selection strategy of anchor points limits its strategy, the graph-based strategy requires the information
inductive generalization ability. A more generalized posi- of the entire graph to obtain subgraphs, and the number of
tional encoding method based on random walks and Lapla- subgraphs obtained is not directly related to the number of
cian eigenvectors is proposed and instantiated as a frame- nodes in the original graph. In practice, the simplest graph-
work (Dwivedi et al. 2021), which achieves a great perfor- based strategy is to randomly delete a single edge or delete
mance. a single node, which is difficult to stably improve the ex-
pressiveness of the model and has a poor performance on
Beyond 1-WL by substructure perceiving Since spe- high-density strongly regular graphs. To better improve the
cific substructures are extremely important for some graph expressiveness of GNNs, we propose the Cut subgraph, a
tasks, directly prior encoding the substructures (Bourit- subgraph obtained from the original graph by continuously
sas et al. 2022; Bodnar et al. 2021a; Barceló et al. 2021) and selectively removing edges.
becomes an option, which achieves state-of-the-art perfor- Definition 1. Formally, we define the block containing node
mance. Pattern-specific subgraphs encoding (Fey, Yuen, and v among b blocks as a Cut(v)b subgraph obtained by remov-
Weichert 2020; Thiede, Zhou, and Kondor 2021) are also ing edges from the original graph using a specific method. In
used directly for highly related downstream tasks. Simi- order to get Cut(v)b subgraph, we first calculate the Edge
lar hand-crafted substructure count information is useful in Betweenness Centrality (EBC) (Girvan and Newman 2002)
global tasks for graph kernels (Zhang et al. 2018b). The- of all edges in the original graph, and then continuously re-
oretical analysis (Tahmasebi, Lim, and Jegelka 2020) veri- move the edge with the biggest EBC until the original graph
fies the effectiveness of the substructure on graph tasks. G- is split into b blocks.
meta (Huang and Zitnik 2020)conducts message passing on 1. Cut(v)b is equal to the original graph G when b = 1.
subgraphs to improve the efficiency of message passing in As b increases, both the size of the Cut(v)b and the level of
meta-learning. Some works (Chen et al. 2020b; Nikolent- information it contains decreases.
zos, Dasoulas, and Vazirgiannis 2020; Sandfelder, Vijayan, 2. Cut(v)b is equal to the node i when b is equal to the
and Hamilton 2021) also focus on encoding structure infor- number of nodes in the graph G and b = i.
mation of k-hop subgraphs and injecting them into nodes for 3. Any Cut(v)b is a connected graph containing node v.
propagation.
11131
Encoding
...
...
MPNN
Extract Ego Subgraph Ego
hEgo
Encoding
hG
Original Graph
...
G hG
MPNN
Extract Cut Subgraph
Encoding
...
G Cut hCut
Figure 1: SAGNN’s main framework. The main components are (1) Subgraph Encoding (2) Subgraph Information Injection (3)
Message Passing.
4 6 4 6
3 5 7 3 7
Then we use a linear layer to encode the subgraph random
0 5 walk return probabilities into a subgraph hidden representa-
2 0 8 2
1 9
8
tion vector hG
v
sub
. Similarly, the subgraph hidden represen-
1 9
A B
tation of node v corresponding to the Ego subgraph 2 and
3
4
5
6
7
4 6
7
the Cut subgraph is hEgov and hCut
v .
Encoding( 0
)= 5
Encoding ( 0 5 )= 5
8
1 9
1 9 Message Passing with Subgraph Information
A's Ego(5)2 B's Ego(5) Injection
4 6
2
4 6
3 5 7 3 7 We concatenate our subgraph hidden representation with the
0 5
2 0 8 2 8
initial features of nodes to obtain Ego subgraph information
1 9 1 9 injection feature hE,0v and Cut subgraph information injec-
A with Ego Subgraph B with Ego Subgraph tion feature hC,0
v . In order to capture the global structural
Inforamtion Injection Inforamtion Injection
information, we also concatenate hG v , which is the structural
Figure 2: Two non-isomorphic graphs that cannot be distin- hidden representation of node v corresponding to the origi-
guished by WL test but can be distinguished by a MPNN nal graph G. Formally, hE,0 v and hC,0
v are defined as follows:
with Ego subgraph information injection.
hE,0
v = [xv, hEgo
v , hG C,0
v ], hv = [xv, hCut G
v , hv ]. (4)
We adopt two message passing channels: Ego channel and
Subgraph Random Walk Return Probability Cut channel defined as follows,
Encoding X
cE,t ϕtE hE,t E,t
v = v , hu , evu , (5)
The use of random walk for graph encoding is not new,
u∈N (v)
but existing methods (Zhang et al. 2018b; Li et al. 2020; X
Dwivedi et al. 2021) have shortcomings in complexity or cC,t ϕtC hC,t C,t
v = v , hu , evu , (6)
expressiveness. To address these issues, we extend random u∈N (v)
walk encoding to return probabilities in subgraphs, while re-
ducing time complexity and improving expressiveness. For- hE,t+1
v = σE t
(cE,t E,t
v , hv ), hv
C,t+1
= σC t
(cC,t C,t
v , hv ), (7)
mally, the random walk return probability encoding of node where ϕtE , ϕtC and σE t t
, σC are the aggregate and update
v in the subgraph Gsub is defined as: functions for two channels respectively at t iteration, t = 0, ...
h iT , L-1. Finally, the whole graph representation hG is obtained
pG sub
= R1Gsub (v, v), R2Gsub (v, v), . . . , RSGsub (v, v) , through pooling operation defined in detail as follows:
v
E,L C,L
(3) hG = POOL hv , hv |v∈V , (8)
where RsGsub (v, v), s = 1, 2, . . . , S, is the return probability
where POOL is a global pooling function for all nodes.
of a s-step random walk starting from the root node v in the
subgraph Gsub . We apply the subgraph encoding to the Ego 2
In our implementation, the Ego encoding of a single node is
subgraph and the Cut subgraph and obtain the correspond- the aggregation of all Ego subgraph encodings that contain this
ing return probability pEgo
v and pCut v . node.
11132
Method MUTAG PTC PROTEINS NCI1 IMDB-B
RWK (Gärtner, Flach, and Wrobel 2003) 79.2±2.1 55.9±0.3 59.6±0.1 >3 days N/A
GK (k = 3) (Shervashidze et al. 2009) 81.4±1.7 55.7±0.5 71.4±0.3 62.5±0.3 N/A
PK (Neumann et al. 2016) 76.0±2.7 59.5±2.4 73.7±0.7 82.5±0.5 N/A
WL kernel (Shervashidze et al. 2011) 90.4±5.7 59.9±4.3 75.0±3.1 86.0±1.8 73.8±3.9
DCNN (Atwood and Towsley 2016) N/A N/A 61.3±1.6 56.6±1.0 49.1±1.4
DGCNN (Zhang et al. 2018a) 85.8±1.8 58.6±2.5 75.5±0.9 74.4±0.5 70.0±0.9
IGN (Maron et al. 2019b) 83.9±13.0 58.5±6.9 76.6±5.5 74.3±2.7 72.0±5.5
GIN (Xu et al. 2018a) 89.4±5.6 64.6±7.0 76.2±2.8 82.7±1.7 75.1±5.1
PPGNs (Maron et al. 2019a) 90.6±8.7 66.2±6.6 77.2±4.7 83.2±1.1 73.0±5.8
Natural GN (de Haan, Cohen, and Welling 2020) 89.4±1.6 66.8±1.7 71.7±1.0 82.4±1.3 73.5±2.0
GSN (Bouritsas et al. 2022) 92.2±7.5 68.2 ± 7.2 76.6±5.0 83.5±2.0 77.8±3.3
SIN (Bodnar et al. 2021b) N/A N/A 76.4±3.3 82.7±2.1 75.6±3.2
CIN (Bodnar et al. 2021a) 92.7±6.1 68.2±5.6 77.0±4.3 83.6±1.4 75.6±3.7
GIN-AK+ (Zhao et al. 2022) 91.3±7.0 67.7±8.8 77.1±5.7 85.0±2.0 75.0±4.2
ESAN-GIN (Bevilacqua et al. 2022) 91.0±7.1 69.2±6.5 77.1±4.6 83.8±2.4 77.1±3.0
DropGIN (Papp et al. 2021) 90.4±7.0 66.3±8.6 76.3±6.1 N/A 75.7±4.2
SAGIN(Ours) 95.2±3.0 72.1±7.6 79.8±3.8 85.3±1.7 75.9±3.8
SAPNA*(Ours) 92.0±6.8 70.3±6.5 79.3±3.6 85.0±1.1 77.4±3.7
Table 1: Test results for TUDatasets. The first section of the table includes the results of graph kernel methods, while the second
includes the results of GNNs, and the third part includes the results of the GNNs boosted by our framework. The top three are
highlighted by red, green, and blue.
4 6 4 6
3 5 7 3 7
injection is more powerful than 1-WL.
0 5 Assuming that there are graphs A and B that are judged
2 8
2 0 8
1 9
to be isomorphic by 1-WL MPNN with Ego subgraph
1 9
A B
information injection. The sets of A and B generated by
5
6
7
6
7
1-WL MPNN with n Ego subgraph information injection
o
h,(0) h,(0)
Encoding ( 8
)= 5
Encoding ( 5
8
)= 5
after t iterations HASHt (N (hi ))|hi ∈ VA and
9 n o
9
h,(0) h,(0)
A's Cut(5)2 B's Cut(5)
2 HASHt (N (hi ))|hi ∈ VB are identical. So there
4 6 4 6
h,(0),A h,(0),A h,(0),A
3 5 7 3
0 5
7
exists an ordering of nodes h1 , h2 , · · · , hn
h,(0),B h,(0),B h,(0),B
2 0 8
2 8
and h1 , h2 , · · · , hn n , such that for any sub-
1 9
o
1 9 h,(0),A
graphs order i = 1, 2, · · · , n, HASHt (N (hi ))
A with Cut Subgraph B with Cut Subgraph n o
Information Injection Information Injection and HASHt (N (hi
h,(0),B
)) are identical. If two
Figure 3: Two non-isomorphic graphs that cannot be distin- node feature with Ego subgraph information injec-
guished by WL test but can be distinguished by a MPNN tion are identical, their original feature are identical,
with Cut subgraph information injection. there exists an ordering of nodes xA 1 , x2
A
, · · · , xAn
and xB 1 , x B
2 , · · · , x B
n , such that for any node or-
HASHt (N (xA
der i = 1, 2, · · · , n, v )) and
t
A
Expressiveness Analysis HASH (N (xv )) are identical. Finally it can be
deduced that the sets HASHt (N (xA
Proposition 1. With Ego subgraph information injection, v ))|v ∈ VA and
HASHt (N (xA
an 1-WL MPNN is strictly more powerful than 1-WL Test. v ))|v ∈ V B are identical which means
1-WL MPNN with Ego subgraph information injection is
Proof. This proof for Proposition 1 mainly consists of two at least as powerful as 1-WL test.
parts. We first prove that if two graphs are judged to be iso- In Figure 2, two graphs are judged to be isomorphic by the
morphic by the 1-WL MPNN with Ego subgraph informa- 1-WL test but not isomorphic by the 1-WL MPNN with Ego
tion injection, then the 1-WL test also must judge the two subgraph information injection, which demonstrates 1-WL
graphs to be isomorphic, which proves that 1-WL MPNN MPNN with Ego subgraph information injection is more
with Ego subgraph information injection is at least as pow- powerful than 1-WL test.
erful as 1-WL. Then we give two graphs that are judged to Proposition 2. With Cut subgraph information injection,
be isomorphic by the 1-WL, but not isomorphic from the 1- an 1-WL MPNN is strictly more powerful than 1-WL Test.
WL MPNN with Ego subgraph information injection thus
proving the 1-WL MPNN with Ego subgraph information Proof. Similarly to previous proof, this proof for Proposi-
11133
EXP SR25 ZINC ZINC-FULL MolPCBA MolHIV
Method
(ACC) (ACC) (MAE) (MAE) (AP) (ROC)
HIMP (Fey, Yuen, and Weichert 2020) N/A N/A 0.151±0.006 0.036±0.002 N/A 78.80±0.82
PNA (Corso et al. 2020) N/A N/A 0.188±0.004 N/A 28.38±0.35 79.05±1.32
GSN (Bouritsas et al. 2022) N/A N/A 0.108±0.018 N/A N/A 77.99±1.00
CIN (Bodnar et al. 2021a) N/A N/A 0.079±0.006 0.022±0.002 N/A 80.94±0.57
GIN (Xu et al. 2018a) 50% 6.67% 0.163±0.004 0.088±0.002 26.82±0.06 78.81±1.19
GIN-AK+ (Zhao et al. 2022) 100% 6.67% 0.080±0.001 N/A 29.30±0.44 79.61±1.19
ESAN-GIN (Bevilacqua et al. 2022) 100% N/A 0.102±0.003 N/A N/A 78.00±1.42
SAGIN(Ours) 100% 100% 0.072±0.001 0.016±0.002 28.53±0.30 80.64±0.42
PNA* (Corso et al. 2020) 50% 6.67% 0.140±0.006 N/A 27.37±0.09 79.05±1.02
PNA*-AK+ (Zhao et al. 2022) 100% 6.67% 0.085±0.003 N/A 28.85±0.06 78.80±1.53
SAPNA*(Ours) 100% 100% 0.073±0.001 0.016±0.003 27.84±0.03 79.44±1.44
Table 2: Test results for expressiveness datasets and large scale datasets. The first section of the table includes the results of
some specific methods, while the other sections include results of base model and that have been boosted by different methods.
The top three are highlighted by red, green, and blue.
ZINC SR25
Dataset
(MAE) (ACC)
Layers Valid Test Test
2 0.1006±0.0024 0.0822±0.0063 100%
4 0.0974±0.0045 0.0778±0.0001 100%
6 0.0825±0.0023 0.0721±0.0025 100%
8 0.0897±0.0023 0.0782±0.0038 100% Figure 4: Hyperparameter study on ZINC, PROTEINS and
10 0.0945±0.0020 0.0790±0.0057 100% MOLHIV.
11134
Methods MUTAG PTC PROTEINS NCI1 IMDB-B
SAGIN 95.23±2.99 72.07±7.64 79.79±3.75 85.28±1.68 75.90±3.84
SAGIN w/o Ego 94.68±4.38 71.21±6.95 78.53±2.49 84.94±1.53 75.30±3.97
SAGIN w/o Cut 94.12±3.09 70.91±7.47 78.98±2.44 84.90±1.37 75.50±3.89
SAGIN w/o Global 93.10±5.62 72.38±5.82 79.54±3.67 84.60±1.46 75.40±3.75
models from (Bodnar et al. 2021a), ESAN (Bevilacqua the same performance trend, that is, the best performance
et al. 2022), k-Reconstruction GNNs (Cotta, Morris, and in the case of Ego=3, Cut=4, which shows our model’s
Ribeiro 2021) and DropGNN (Papp et al. 2021). We refer- great hyperparameter stability. Our models perform opti-
ence other real world datasets results for PNA* (Corso et al. mally with relatively similar hyperparameters under differ-
2020), CIN (Bodnar et al. 2021a), GNN-AK+ (Zhao et al. ent data structures and different tasks, which allows for less
2022), GraphSNN (Wijesinghe and Wang 2022), MPGNNs- hyperparameter tuning when deploying our model on real
LSPE (Dwivedi et al. 2021), GatedGCN (Dwivedi et al. world tasks. Then we investigate the impact of the number
2020), HIMP (Fey, Yuen, and Weichert 2020) in their lit- of layers on SR25 and ZINC. As is shown in Table 3, the
erature for comparison. accuracy achieves 100% when the number of layers is only
2 and remains 100% with the increase of the number of lay-
Results and Discussion ers, which illustrates that the expressiveness of our model
Small real-world datasets Table 1 presents the results of does not depend on model depth. But on real-world dataset
our framework and other competitive models on small real- (ZINC), our model needs proper layer numbers to achieve
world datasets. Our framework achieves state-of-the-art per- good performance.
formance on all TUDatset datasets, demonstrating the great Ablation study To better demonstrate the capabilities of
advantage of our framework in structural awareness. our components, we conduct ablation study on all datasets
of our experiments. Table 4 and Table 5 present all results
Expressiveness datasets Table 2 presents the perfor-
of our ablation experiments, where different datasets exhibit
mance of our framework on two expressiveness datasets,
different dependencies on different components. For exam-
where our framework achieves 100% accuracy on both the
ple, the ZINC dataset pays more attention to low-level local
EXP and SR25 datasets with different base models. Com-
structure information, so the performance loss of ablating
pared with other methods, our framework shows a superior
Cut component is not as large as ablating Ego component.
boosting effect, which can obtain more than 3-WL capabili-
ties with the base models of less than 1-WL capabilities.
Conclusion
Large scale datasets We further validate the performance In this paper, we first propose a Cut subgraph which can
of our framework on large-scale datasets and achieve signifi- be obtained from the original graph by continuously and se-
cant performance improvements with different base models, lectively removing edges to help solving graph isomorphism
which can be viewed in Table 2. Our framework achieves problem. Then we further propose a GNN framework called
a maximum MAE reduction of 83% compared to the base Substructure Aware Graph Neural Network, which enhances
model and a maximum MAE reduction of 32% compared the expressiveness and performance of GNNs by encoding
to the previous state-of-the-art model on the graph regres- subgraphs at different levels and injecting information into
sion task of drug constrained solubility prediction (ZINC- nodes. Our extensive and diverse experiments demonstrate
FULL). On other relative structure-insensitive graph-level the state-of-the-art performance of our framework on vari-
tasks, our framework still outperforms other frameworks and ous tasks and datasets.
competitive models on multiple datasets.
Hyperparameter effect We first conduct an ablation Acknowledgements
study to analyze the impact of the values of Ego and Cut This work was supported by the National Science Founda-
on model performance on different tasks and different data tion of China under Grant 61976043, and in part by the Sci-
schema. Figure 4 visualizes the results of our hyperparam- ence and technology support program of Sichuan Province
eter experiments on three datasets. All three datasets show under Grant 2022YFG0313.
11135
References Fey, M.; Yuen, J.-G.; and Weichert, F. 2020. Hierarchical inter-
Abboud, R.; Ceylan, I. I.; Grohe, M.; and Lukasiewicz, T. 2021. message passing for learning on molecular graphs. arXiv preprint
The Surprising Power of Graph Neural Networks with Random arXiv:2006.12179.
Node Initialization. In Proc. of IJCAI. Gainza, P.; Sverrisson, F.; Monti, F.; Rodola, E.; Boscaini, D.;
Bronstein, M.; and Correia, B. 2020. Deciphering interaction fin-
Atwood, J.; and Towsley, D. 2016. Diffusion-convolutional neural
gerprints from protein molecular surfaces using geometric deep
networks. Proc. of NeurIPS.
learning. Nature Methods.
Balcilar, M.; Héroux, P.; Gauzere, B.; Vasseur, P.; Adam, S.; and Gärtner, T.; Flach, P.; and Wrobel, S. 2003. On graph kernels:
Honeine, P. 2021. Breaking the limits of message passing graph Hardness results and efficient alternatives. In Learning theory and
neural networks. In Proc. of ICML. kernel machines.
Barceló, P.; Geerts, F.; Reutter, J.; and Ryschkov, M. 2021. Graph Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl,
neural networks with local graph parameters. Proc. of NeurIPS. G. E. 2017. Neural message passing for quantum chemistry. In
Baskin, I. I.; Palyulin, V. A.; and Zefirov, N. S. 1997. A neural de- Proc. of ICML.
vice for searching direct correlations between structures and prop- Girvan, M.; and Newman, M. E. 2002. Community structure in so-
erties of chemical compounds. Journal of chemical information cial and biological networks. Proceedings of the national academy
and computer sciences. of sciences.
Battaglia, P.; Pascanu, R.; Lai, M.; Jimenez Rezende, D.; et al. Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-
2016. Interaction networks for learning about objects, relations Lobato, J. M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-
and physics. Proc. of NeurIPS. Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; and Aspuru-Guzik, A.
Bevilacqua, B.; Frasca, F.; Lim, D.; Srinivasan, B.; Cai, C.; Bala- 2018. Automatic chemical design using a data-driven continuous
murugan, G.; Bronstein, M. M.; and Maron, H. 2022. Equivariant representation of molecules. ACS central science.
subgraph aggregation networks. In Proc. of ICLR. Gori, M.; Monfardini, G.; and Scarselli, F. 2005. A new model for
Bodnar, C.; Frasca, F.; Otter, N.; Wang, Y. G.; Liò, P.; Montufar, learning in graph domains. In Proc. of IJCNN.
G. F.; and Bronstein, M. 2021a. Weisfeiler and lehman go cellular: Grohe, M. 2017. Descriptive complexity, canonisation, and defin-
Cw networks. Proc. of NeurIPS. able graph structure theory. Cambridge University Press.
Bodnar, C.; Frasca, F.; Wang, Y.; Otter, N.; Montufar, G. F.; Lio, P.; Hu, W.; Fey, M.; Ren, H.; Nakata, M.; Dong, Y.; and Leskovec, J.
and Bronstein, M. 2021b. Weisfeiler and lehman go topological: 2021. OGB-LSC: A Large-Scale Challenge for Machine Learning
Message passing simplicial networks. In Proc. of ICML. on Graphs. In Proc. of NeurIPS.
Hu, W.; Fey, M.; Zitnik, M.; Dong, Y.; Ren, H.; Liu, B.; Catasta,
Bouritsas, G.; Frasca, F.; Zafeiriou, S. P.; and Bronstein, M. 2022.
M.; and Leskovec, J. 2020. Open graph benchmark: Datasets for
Improving graph neural network expressivity via subgraph isomor-
machine learning on graphs. Proc. of NeurIPS.
phism counting. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence. Huang, K.; and Zitnik, M. 2020. Graph meta learning via local
subgraphs. Proc. of NeurIPS.
Cai, J.-Y.; Fürer, M.; and Immerman, N. 1992. An optimal lower
bound on the number of variables for graph identification. Combi- Jin, W.; Barzilay, R.; and Jaakkola, T. 2018. Junction tree vari-
natorica. ational autoencoder for molecular graph generation. In Proc. of
ICML.
Chen, J.; and Kou, G. 2023. Attribute and Structure preserving
Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classification
Graph Contrastive Learning. In Proc. of AAAI.
with Graph Convolutional Networks. In Proc. of ICLR.
Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; and Li, Y. 2020a. Simple Kishan, K.; Li, R.; Cui, F.; and Haake, A. R. 2022. Predicting
and Deep Graph Convolutional Networks. In Proc. of ICML. biomedical interactions with higher-order graph convolutional net-
Chen, Z.; Chen, L.; Villar, S.; and Bruna, J. 2020b. Can graph works. IEEE/ACM Transactions on Computational Biology and
neural networks count substructures? Proc. of NeurIPS. Bioinformatics.
Corso, G.; Cavalleri, L.; Beaini, D.; Liò, P.; and Veličković, P. Lemke, T. L. 2003. Review of organic functional groups: intro-
2020. Principal neighbourhood aggregation for graph nets. Proc. duction to medicinal organic chemistry. Lippincott Williams &
of NeurIPS. Wilkins.
Cotta, L.; Morris, C.; and Ribeiro, B. 2021. Reconstruction for Li, H.; Zhang, L.; Zhang, D.; Fu, L.; Yang, P.; and Zhang, J.
powerful graph representations. Proc. of NeurIPS. 2022. TransVLAD: Focusing on Locally Aggregated Descriptors
for Few-Shot Learning. In Proc. of ECCV.
Dasoulas, G.; Santos, L. D.; Scaman, K.; and Virmaux, A. 2020.
Coloring Graph Neural Networks for Node Disambiguation. In Li, J.; Peng, H.; Cao, Y.; Dou, Y.; Zhang, H.; Yu, P.; and He, L.
Proc. of IJCAI. 2021. Higher-order attribute-enhancing heterogeneous graph neu-
ral networks. IEEE Transactions on Knowledge and Data Engi-
de Haan, P.; Cohen, T. S.; and Welling, M. 2020. Natural graph neering.
networks. Proc. of NeurIPS.
Li, P.; Wang, Y.; Wang, H.; and Leskovec, J. 2020. Distance en-
Dwivedi, V. P.; and Bresson, X. 2020. A generalization of trans- coding: Design provably more powerful neural networks for graph
former networks to graphs. arXiv preprint arXiv:2012.09699. representation learning. Proc. of NeurIPS.
Dwivedi, V. P.; Joshi, C. K.; Laurent, T.; Bengio, Y.; and Bresson, Liu, C.; Yang, Y.; Ding, Y.; and Lu, H. 2022a. EDEN: A Plug-in
X. 2020. Benchmarking graph neural networks. arXiv preprint Equivariant Distance Encoding to Beyond the 1-WL Test.
arXiv:2003.00982. Liu, Y.; Long, C.; Zhang, Z.; Liu, B.; Zhang, Q.; Yin, B.; and Yang,
Dwivedi, V. P.; Luu, A. T.; Laurent, T.; Bengio, Y.; and Bresson, X. 2022b. Explore Contextual Information for 3D Scene Graph
X. 2021. Graph neural networks with learnable structural and po- Generation. IEEE Transactions on Visualization and Computer
sitional representations. arXiv preprint arXiv:2110.07875. Graphics.
11136
Liu, Y.; Tu, W.; Zhou, S.; Liu, X.; Song, L.; Yang, X.; and Zhu, E. Sterling, T.; and Irwin, J. J. 2015. ZINC 15–ligand discovery for
2022c. Deep Graph Clustering via Dual Correlation Reduction. In everyone. Journal of chemical information and modeling.
Proc. of AAAI, volume 36, 7603–7611. Tahmasebi, B.; Lim, D.; and Jegelka, S. 2020. Counting substruc-
Liu, Y.; Xia, J.; Zhou, S.; Wang, S.; Guo, X.; Yang, X.; Liang, K.; tures with higher-order graph neural networks: Possibility and im-
Tu, W.; Li, Z. S.; and Liu, X. 2022d. A Survey of Deep Graph possibility results. arXiv preprint arXiv:2012.03174.
Clustering: Taxonomy, Challenge, and Application. arXiv preprint Thiede, E.; Zhou, W.; and Kondor, R. 2021. Autobahn:
arXiv:2211.12875. Automorphism-based graph neural nets. Proc. of NeurIPS.
Liu, Y.; Yang, X.; Zhou, S.; and Liu, X. 2022e. Simple Contrastive Topping, J.; Giovanni, F. D.; Chamberlain, B. P.; Dong, X.; and
Graph Clustering. arXiv preprint arXiv:2205.07865. Bronstein, M. M. 2021. Understanding over-squashing and bottle-
Liu, Y.; Yang, X.; Zhou, S.; Liu, X.; Wang, Z.; Liang, K.; Tu, W.; necks on graphs via curvature. CoRR.
Li, L.; Duan, J.; and Chen, C. 2023. Hard Sample Aware Network Veličković, P.; Ying, R.; Padovano, M.; Hadsell, R.; and Blundell,
for Contrastive Deep Graph Clustering. In Proc. of AAAI. C. 2019. Neural Execution of Graph Algorithms. In Proc. of ICLR.
Loukas, A. 2020. What graph neural networks cannot learn: depth Vignac, C.; Loukas, A.; and Frossard, P. 2020. Building power-
vs width. In Proc. of ICLR. ful and equivariant graph neural networks with structural message-
Maron, H.; Ben-Hamu, H.; Serviansky, H.; and Lipman, Y. 2019a. passing. Proc. of NeurIPS.
Provably powerful graph networks. Proc. of NeurIPS. Wang, H.; Zhao, M.; Xie, X.; Li, W.; and Guo, M. 2019. Knowl-
Maron, H.; Ben-Hamu, H.; Shamir, N.; and Lipman, Y. 2019b. In- edge graph convolutional networks for recommender systems. In
variant and Equivariant Graph Networks. In Proc. of ICLR. Proc. of WWW.
Wang, L.; and Chen, L. 2023. FTSO: Effective NAS via First
Maron, H.; Fetaya, E.; Segol, N.; and Lipman, Y. 2019c. On the
Topology Second Operator. Preprints.
universality of invariant networks. In Proc. of ICML.
Wang, L.; Gong, Y.; Ma, X.; Wang, Q.; Zhou, K.; and Chen, L.
Morris, C.; Kriege, N. M.; Bause, F.; Kersting, K.; Mutzel, P.; and
2022. IS-MVSNet: Importance Sampling-Based MVSNet. In Proc.
Neumann, M. 2020. Tudataset: A collection of benchmark datasets
of ECCV, 668–683. Springer.
for learning with graphs. arXiv preprint arXiv:2007.08663.
Wang, L.; Gong, Y.; Wang, Q.; Zhou, K.; and Chen, L. 2023.
Morris, C.; Rattan, G.; and Mutzel, P. 2020. Weisfeiler and Leman Flora: dual-Frequency LOss-compensated ReAl-time monocular
go sparse: Towards scalable higher-order graph embeddings. Proc. 3D video reconstruction. In Proc. of AAAI.
of NeurIPS.
Weisfeiler, B.; and Leman, A. 1968. The reduction of a graph to
Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W. L.; Lenssen, J. E.; canonical form and the algebra which appears therein. NTI, Series.
Rattan, G.; and Grohe, M. 2019. Weisfeiler and leman go neural:
Wijesinghe, A.; and Wang, Q. 2022. A New Perspective on” How
Higher-order graph neural networks. In Proc. of AAAI.
Graph Neural Networks Go Beyond Weisfeiler-Lehman?”. In Proc.
Murphy, R.; Srinivasan, B.; Rao, V.; and Ribeiro, B. 2019. Rela- of ICLR.
tional pooling for graph representations. In Proc. of ICML.
Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2018a. How Powerful
Neumann, M.; Garnett, R.; Bauckhage, C.; and Kersting, K. 2016. are Graph Neural Networks? In Proc. of ICLR.
Propagation kernels: efficient graph kernels from propagated infor- Xu, K.; Hu, W.; Leskovec, J.; and Jegelka, S. 2019a. How Powerful
mation. Machine Learning. are Graph Neural Networks? In Proc. of ICLR.
Nikolentzos, G.; Dasoulas, G.; and Vazirgiannis, M. 2020. k-hop Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.; and
graph neural networks. Neural Networks. Jegelka, S. 2018b. Representation Learning on Graphs with Jump-
Pan, Z.; Sharma, A.; Hu, J. Y.-C.; Liu, Z.; Li, A.; Liu, H.; Huang, ing Knowledge Networks. In Proc. of ICML.
M.; and Geng, T. T. 2023. Ising-Traffic: Using Ising Machine Xu, K.; Li, J.; Zhang, M.; Du, S. S.; Kawarabayashi, K.-i.; and
Learning to Predict Traffic Congestion under Uncertainty. In Proc. Jegelka, S. 2019b. What Can Neural Networks Reason About? In
of AAAI. Proc. of ICLR.
Papp, P. A.; Martinkus, K.; Faber, L.; and Wattenhofer, R. 2021. You, J.; Gomes-Selman, J. M.; Ying, R.; and Leskovec, J. 2021.
Dropgnn: random dropouts increase the expressiveness of graph Identity-aware Graph Neural Networks. In Proc. of AAAI.
neural networks. Proc. of NeurIPS. You, J.; Ying, R.; and Leskovec, J. 2019. Position-aware graph
Sandfelder, D.; Vijayan, P.; and Hamilton, W. L. 2021. Ego-gnns: neural networks. In Proc. of ICML.
Exploiting ego structures in graph neural networks. In Proc. of Zeng, D.; Zhou, L.; Liu, W.; Qu, H.; and Chen, W. 2022. A Simple
ICASSP. Graph Neural Network via Layer Sniffer. In Proc. of ICASSP.
Sato, R.; Yamada, M.; and Kashima, H. 2021. Random features Zhang, D.; Li, C.; Li, H.; Huang, W.; Huang, L.; and Zhang, J.
strengthen graph neural networks. In Proc. of SDM. 2022. Rethinking Alignment and Uniformity in Unsupervised Im-
Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfar- age Semantic Segmentation. arXiv preprint arXiv:2211.12875.
dini, G. 2008. The graph neural network model. IEEE transactions Zhang, M.; and Chen, Y. 2018. Link prediction based on graph
on neural networks. neural networks. Proc. of NeurIPS.
Shervashidze, N.; Schweitzer, P.; Van Leeuwen, E. J.; Mehlhorn, Zhang, M.; Cui, Z.; Neumann, M.; and Chen, Y. 2018a. An end-
K.; and Borgwardt, K. M. 2011. Weisfeiler-lehman graph kernels. to-end deep learning architecture for graph classification. In Proc.
Journal of Machine Learning Research. of AAAI.
Shervashidze, N.; Vishwanathan, S.; Petri, T.; Mehlhorn, K.; and Zhang, Z.; Wang, M.; Xiang, Y.; Huang, Y.; and Nehorai, A.
Borgwardt, K. 2009. Efficient graphlet kernels for large graph com- 2018b. Retgk: Graph kernels based on return probabilities of ran-
parison. In Proc. of AISTATS. dom walks. Proc. of NeurIPS.
Sperduti, A.; and Starita, A. 1997. Supervised neural networks Zhao, L.; Jin, W.; Akoglu, L.; and Shah, N. 2022. From Stars to
for the classification of structures. IEEE Transactions on Neural Subgraphs: Uplifting Any GNN with Local Structure Awareness.
Networks. In Proc. of ICLR.
11137