0% found this document useful (0 votes)
4 views60 pages

Trustworthy GNNs

This document presents a comprehensive survey on Trustworthy Graph Neural Networks (GNNs), focusing on key aspects such as privacy, robustness, fairness, and explainability. It highlights the vulnerabilities of GNNs, including the potential for private information leakage and the amplification of societal biases, while proposing frameworks and future research directions to enhance trustworthiness. The survey aims to consolidate existing literature and provide insights into the interconnectedness of these aspects to foster the development of reliable GNN applications in high-stakes scenarios.

Uploaded by

zhangjingyan15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views60 pages

Trustworthy GNNs

This document presents a comprehensive survey on Trustworthy Graph Neural Networks (GNNs), focusing on key aspects such as privacy, robustness, fairness, and explainability. It highlights the vulnerabilities of GNNs, including the potential for private information leakage and the amplification of societal biases, while proposing frameworks and future research directions to enhance trustworthiness. The survey aims to consolidate existing literature and provide insights into the interconnectedness of these aspects to foster the development of reliable GNN applications in high-stakes scenarios.

Uploaded by

zhangjingyan15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

A Comprehensive Survey on Trustworthy Graph Neural

Networks: Privacy, Robustness, Fairness, and Explainability


ENYAN DAI, The Pennsylvania State University, USA
TIANXIANG ZHAO, The Pennsylvania State University, USA
arXiv:2204.08570v2 [cs.LG] 27 Sep 2023

HUAISHENG ZHU, The Pennsylvania State University, USA


JUNJIE XU, The Pennsylvania State University, USA
ZHIMENG GUO, The Pennsylvania State University, USA
HUI LIU, Michigan State University, USA
JILIANG TANG, Michigan State University, USA
SUHANG WANG, The Pennsylvania State University, USA
Graph Neural Networks (GNNs) have made rapid developments in the recent years. Due to their great ability in
modeling graph-structured data, GNNs are vastly used in various applications, including high-stakes scenarios
such as financial analysis, traffic predictions, and drug discovery. Despite their great potential in benefiting
humans in the real world, recent study shows that GNNs can leak private information, are vulnerable to
adversarial attacks, can inherit and magnify societal bias from training data and lack interpretability, which
have risk of causing unintentional harm to the users and society. For example, existing works demonstrate that
attackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.
GNNs trained on social networks may embed the discrimination in their decision process, strengthening the
undesirable societal bias. Consequently, trustworthy GNNs in various aspects are emerging to prevent the
harm from GNN models and increase the users’ trust in GNNs. In this paper, we give a comprehensive survey
of GNNs in the computational aspects of privacy, robustness, fairness, and explainability. For each aspect, we
give the taxonomy of the related methods and formulate the general frameworks for the multiple categories
of trustworthy GNNs. We also discuss the future research directions of each aspect and connections between
these aspects to help achieve trustworthiness.
Additional Key Words and Phrases: Graph Neural Networks; Trustworthy; Privacy; Robustness; Fairness;
Explainability;
ACM Reference Format:
Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang
Wang. 2023. A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness,
and Explainability. 1, 1 (September 2023), 60 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION
Graph-structured data such as bioinformatics network [110], trading network [216], and social
network [84] are pervasive in the real-world. Inspired by the great success of deep learning
Authors’ addresses: Enyan Dai, The Pennsylvania State University, USA, emd5759@psu.edu; Tianxiang Zhao, The Pennsyl-
vania State University, USA, tkz5084@psu.edu; Huaisheng Zhu, The Pennsylvania State University, USA, hvz5312@psu.edu;
Junjie Xu, The Pennsylvania State University, USA, jmx5097@psu.edu; Zhimeng Guo, The Pennsylvania State University,
USA, zhimeng@psu.edu; Hui Liu, Michigan State University, USA, liuhui7@msu.edu; Jiliang Tang, Michigan State University,
USA, tangjili@msu.edu; Suhang Wang, The Pennsylvania State University, USA, szw494@psu.edu.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2023 Association for Computing Machinery.
XXXX-XXXX/2023/9-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn

, Vol. 1, No. 1, Article . Publication date: September 2023.


2 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Fig. 1. The ethical principles of trustworthy AI.

on independent and identically distributed (i.i.d) data such as images, Graph Neural Networks
(GNNs) [27, 113, 236, 271] are investigated to generalize deep neural networks to model graph-
structured data. GNNs have shown great performance for various applications across various
domains including finance [86, 143], healthcare [127] and social analysis [67, 200]. The success of
GNNs relies on the message-passing mechanism, where node representations are updated by ag-
gregating the information from neighbors. With this mechanism, node representations can capture
node features, information of neighbors and local graph structure, which facilitate various graph
mining tasks, such as node classification [113], link prediction [28] and graph classification [102].
Despite their achievements in modeling graphs, the concerns in the trustworthiness of GNNs
are rising. Firstly, GNN models are vulnerable to the attacks that steal the private data information
or affect the behaviors of the model. For example, hackers can utilize the embeddings of nodes to
infer their attribute information and friendship information in social network [90, 265]. They also
can easily fool the GNNs to give target prediction to a node by injecting malicious nodes to the
network [197]. Secondly, GNN models themselves have problems in fairness and interpretability.
More specifically, GNN models can magnify the bias in the training data, resulting discrimination
towards the people with certain genders, skin colors, and other protected sensitive attributes [24, 44].
Finally, due to the high nonlinearity of the model, predictions from the GNNs are difficult to
understand. The lacking of interpretability also make the GNNs untrustworthy, which largely
limit the applications of GNNs. Those weaknesses significantly hinder the adoption of GNNs
in real-world applications, especially those high-stake scenarios such as finance and healthcare.
Therefore, how to build trustworthy GNN models has become a focal topic.
Recently, a guideline of trustworthy AI system have been proposed by the European Union [189].
As shown in Figure 1, the guideline indicates that trustworthy AI should obey the following four
ethical principles: Respect for human autonomy, Prevention of harm, Fairness, and Explainability.
The principle of respect for human autonomy requires AI systems to follow human-centric design
principles and leave meaningful opportunity for human choice. This generally fails in the domain
of human-computer interaction. Therefore, we do not focus on this direction of trustworthy GNNs
in this survey. According to the principle of prevention of harm, AI systems should be technically
robust and be ensured not open to malicious use, which corresponds to the robustness and privacy
aspects of our survey. The principle of fairness requires that AI systems should ensure the individuals
and groups free from unfair bias, discrimination and stigmatisation. As for the explainability, it
requires the decision process of AI to be transparent and explainable. It is worth to mention that
the four aspects are not isolated with each other. For instance, the attacker may poison the training
data to degrade the fairness of the model [153, 191] or mislead the GNN explainer model [66]. And
the explanations from explainable GNN methods can also be helpful for other aspects. Specifically,
based on the explanations from the explainable GNN model, the human can debug the model to
avoid the adversarial attacks. In addition, with the analysis of the explanations, we can evaluate
whether the deployed model is giving biased predictions. Therefore, it is important to explore the
connections of these aspects to finally achieve trustworthy GNNs that simultaneously address the

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 3

concerns in robustness, privacy, fairness, and explainability. In this survey, we also have some
discussions about the interactions of the trustworthiness aspects in the future directions.
Due to the demand for trustworthy GNNs, a large number of literature in different aspects of
trustworthy GNNs are emerging in recent years. For example, robust GNN against the perturbations
from attackers have been developed [42, 105, 229]. To prevent the private information, Privacy-
preserving GNN models [132, 228] are also proposed in various real-world applications such as
financial analysis. Fair GNNs [44] and explainable GNNs [45, 248] also become hot topics to address
the concerns in trustworthiness. There are several surveys of GNNs in robustness [104, 195, 225,
278], explainability [253], and fairness [55]. However, none of them thoroughly discuss about the
trustworthiness of GNNs, which should also cover the dimensions of privacy and fairness. For
the aspects of robustness and explainability, they also do not include the emerging directions
and techniques such as scalable attacks, backdoor attacks, and self-explainable GNNs, which are
discussed in this survey. A recent survey [136] gives a review about the trustworthy AI systems.
But it mainly focus on the techniques of trustworthy AI systems on i.i.d data. Considering the
complexity of the graph topology and the deployment of message-passing mechanism in GNNs, the
trustworthy AI designed for i.i.d data generally can not be adopted to process graph-structured data.
There is a concurrent survey in trustworthy in graph neural networks [255]. Compared with [255],
we cover more recent advanced topics of trustworthiness such as machine unlearning, model
ownership verification, scalable adversarial attacks, fair contrastive learning, explanation-enhanced
fairness, and self-explainable GNNs. To summarize, our major contributions are:
• In Section 3, we give a comprehensive survey of the existing works in privacy attacks and defense
on GNNs followed by the future directions. The graph datasets in privacy domain are also listed.
• Various categories of adversarial attack and defense methods on GNN models are discussed in
Section 4. Some recent advances about the robustness of GNN such as scalable attacks, graph
backdoor attacks, and self-supervised learning defense methods are further introduced.
• The fairness of trustworthy GNNs is thoroughly discussed in the Section 5 of this survey, which
includes the the biases and fairness definitions on graph-structured data, vairous fair GNN models
and the datasets they applied.
• A comprehensive survey of GNN explainability is presented in Section 6, in which we go through
the motivations, challenges, and experiment settings adopted by existing works. A taxonomic
summary of methodologies is also introduced.

2 PRELIMINARIES OF GRAPH NEURAL NETWORKS


To facilitate the discussion of trustworthy GNNs, we firstly introduce notations, the basic design of
GNNs, and graph analysis tasks in this section.

2.1 Notations
We use G = (V, E) to denote a graph, where V = {𝑣 1, ..., 𝑣 𝑁 } is the set of 𝑁 nodes, E ⊆ V × V
is the set of edges. The graph can be either attributed or plain graph. For attributed graph, node
attributes X = {x1, ..., x𝑁 } are provided, where x𝑖 ∈ R𝑑 corresponds to the 𝑑-dimensional attributes
of node 𝑣𝑖 . A ∈ R𝑁 ×𝑁 is the adjacency matrix of the graph G, where A𝑖 𝑗 = 1 if nodes 𝑣𝑖 and 𝑣 𝑗 are
connected; otherwise, A𝑖 𝑗 = 0.

2.2 Inner Working of Graph Neural Networks


Apart from the node features, the graph topology also offers crucial important information for
representation learning. Generally, GNNs adopt the message-passing mechanism to learn node
representations that capture both node features and graph topology information. Specifically, in

, Vol. 1, No. 1, Article . Publication date: September 2023.


4 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

each layer, GNNs will update the representations of a node by aggregating the information from
their neighborhood nodes. As a result, a 𝑘-layers GNN model would capture the information of
the local graph containing 𝑘-hop neighbors of the central nodes. The general form of updating
representations in the 𝑘-th layer of GNNs can be formulated as:

h𝑣(𝑘 ) = COMBINE (𝑘 ) (h𝑣(𝑘 −1) , AGGREGATE (𝑘 −1) ({h𝑢(𝑘 −1) : 𝑢 ∈ N (𝑣)})), (1)

where h𝑣(𝑘 ) stands for the representation of node 𝑣 ∈ V after the 𝑘-th GNN layer and N (𝑣) denotes
the set of neighborhoods of node 𝑣. For node classification, a linear classifier can be applied to the
representation h𝑣 to predict the label of node 𝑣. For graph classification, a READOUT function will
summarize the node embeddings to a graph embedding h𝐺 for future predictions:
h𝐺 = READOUT({h𝐾𝑣 |𝑣 ∈ G}), (2)
where READOUT can be various graph pooling functions such as max pooling and average pooling.
Similar to node classification, graph classification can be conducted by applying a linear classifier
on the graph embedding h𝐺 .
Extensive graph neural networks that follow the Eq.(1) have been proposed. Here, we only
introduce the design of GCN [113], which is one the most popular GNN architectures. For the
design of other GNNs, please refer to the survey of GNN models [275]. More specifically, each layer
of GCN can be written as:
H (𝑘 ) = 𝜎 ( ÃH (𝑘 −1) W (𝑘 ) ), (3)
where H (𝑘 ) denote the representations of all the nodes after the 𝑘-th layer; W (𝑘 ) stands for the
parameters of the 𝑘-th layer. Ã is the normalized adjacency matrix. Generally, the symmetric
1 1
normalizedÍform is used, which can be written as à = D− 2 (A + I)D− 2 , and D is a diagonal matrix
with 𝐷𝑖𝑖 = 𝑖 𝐴𝑖 𝑗 . I is the identity matrix. 𝜎 is the activation function such as ReLU.

2.3 Graph Analysis Tasks


The learned node representations by GNNs can facilitate various tasks, such as node classification,
link prediction, community detection and graph classification. Next, we briefly introduce them.
Node Classification. Many real-world problems can be treated as the node classification problem
such as such as user attribute prediction in social media [44, 270], fraud detection in transaction
networks [216, 239], and protein function prediction on protein-protein interaction networks [244].
In node-level classification, the GNN model aims to infer the labels of the test nodes V𝑇 given the
graph G = (V, E). Generally, the node classification task is semi-supervised where only partial
nodes V𝐿 ∈ V of the are provided with labels Y. Based on whether the test samples V𝑇 are seen
during the training phase, node classification can be split into transductive setting and inductive
setting. In transductive setting, the test nodes V𝑇 are available during the training phase. The node
features of the test nodes can be utilized for better prediction performance. In contrast, in inductive
setting the test nodes are totally new for the trained GNN model.
Community Detection. Communities are subgraphs of a network, which are more densely
connected to each other than the rest nodes in the network. Formally, the set of communities in
the network can be represented by {C1, . . . C𝐾 }, where C𝑖 ⊂ G is a partition of the whole graph G.
These communities could be either disjointed or overlapping. The goal of the community detection
is to identity which communities each node 𝑣 ∈ G belongs to. Community detection is often
unsupervised [184, 220, 262]. Recently, supervised community detection based on GNNs are also
investigated [37]. Community detection can be useful for various domains such as social network
analysis [77] and functional region identification in brain [74].

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 5

Link Prediction. Link Given a graph G = (V, E), the link prediction model will predict the
existence of link between nodes 𝑣𝑖 ∈ V and 𝑣 𝑗 ∈ V, where (𝑣𝑖 , 𝑣 𝑗 ) ∉ E. A very common form of link
prediction is to give prediction based on the representations of two nodes from a GNN model. Let h𝑖
and h 𝑗 denote the representations of node 𝑣𝑖 and 𝑣 𝑗 , it can be formulated as 𝑔(𝑣𝑖 , 𝑣 𝑗 ) = 𝑀𝐿𝑃 (h𝑖 , h 𝑗 ).
Link prediction have various applications such as friend recommendation on social media [177]
and knowledge graph completion [10].
Graph Classification. For graph classification, each graph instance belongs to a certain class. The
| D𝑇 |
training set of the graph classification can be denoted as D𝑇 = {(G𝑖 , 𝑦𝑖 )}𝑖=1 , where 𝑦𝑖 denotes the
label of graph G𝑖 and |D𝑇 | represents the number of graphs in the training set. The goal of the
graph classification is to learn a function 𝑓𝜃 : G → 𝑦 to classify the unlabeled test graphs D𝑈 . As
it is mentioned in Section 2.2, a READOUT function is added to the GNN model to obtain graph
embedding for classification. Similarly, there are many applications of graph classification such as
property prediction of drugs [100] where each drug is represented as a graph.

3 PRIVACY OF GRAPH NEURAL NETWORKS


Similar to deep learning algorithms on images and texts, the remarkable achievements of GNNs
also rely on the big data. Extensive sensitive data are collected from users to obtain powerful GNN
models for various services in critical domains such as healthcare [128], banking systems [216],
and bioinformatics [127]. For example, GNNs have been applied on brain networks for FMRI
analysis [127]. In addition, the GNN model owner may provide the query API service to share the
knowledge learned by GNNs. It is also very common that the pretrained GNN models are released to
third parties for knowledge distillation or various downstream tasks [139]. However, the collection
and utilization of private data for GNN model training, the API service and model release are
threatening the safety of the private and sensitive information. First, GNNs are generally trained in
a centralized way, where the users’ data and models are stored in the centralized server. In case of an
untrustworthy centralized server, the collected sensitive attributes might be leaked by unauthorized
usage or data breach. For instance, the personal data of more than half a billion Facebook users was
leaked online for free in a hacker forum in 20211 . Second, the private information of users can also
be leaked from model release or the provided services due to privacy attacks. Taking the online
service for brain disease classification as example, membership inference attack can figure out
the patients that covered in the training dataset, which severely threaten the privacy of patients.
Moreover, various types of privacy attacks [58, 265] such as link inference and attribute inference
have been proved effective to steal users’ information from the pretrained model. Therefore, it is
crucial to develop privacy-preserving GNNs to achieve trustworthiness.
There are several surveys on the privacy aspect of machine learning model. In [173], it com-
prehensively reviews the current privacy attacks in i.i.d data. Privacy-preserving methods are
reviewed in [6, 99, 107, 245] as well. However, they are overwhelmingly dedicated to the privacy
issue on models for i.i.d data such as images and text, and rarely discuss the privacy attacks and
defense methods on graphs; while these methods are challenged by the topology information in
graphs and the message-passing mechanism of GNN models. Therefore, in this section, we give the
overview of the privacy attacks on GNNs and privacy-preserving GNNs to defend against privacy
attacks. We also include related datasets and the applications of privacy-preserving GNNs followed
by future research directions on privacy-preserving GNNs.

1 https://www.bbc.com/news/technology-56745734

, Vol. 1, No. 1, Article . Publication date: September 2023.


6 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 1. Different types of privacy attack methods on GNNs.


Privacy attack types References
Membership inference [58], [161], [91], [227]
Property Inference [265]
Reconstruction attack [90], [265], [58], [267]
Model extraction [226], [186]

3.1 Taxonomy of Privacy Attacks


In this subsection, we will introduce the categorization of privacy attacks on GNNs according to
the target private information. We will also briefly explain two settings of the attacker’s accessible
knowledge for conducting privacy attacks. Finally, we will present more details of existing privacy
attack methods on GNNs.

3.1.1 Types of Privacy Attacks on GNNs. The goal of privacy attacks on GNNs is to extract infor-
mation that is not intended to be shared. The target information can be about the training graph
such as the membership, sensitive attributes of nodes, and connections of nodes. In addition, some
attackers aim to extract the model parameters of GNNs. Based on the target knowledge, the privacy
attacks can generally be split into four categories:
• Membership Inference Attack: In membership inference attack, the attackers try to determine
whether a target sample is part of the training set. For example, suppose researchers train a
GNN model on social network of COVID-19 patients to analyze the propagation of virus. The
membership inference attack can identify if a target subject is in training patient network,
resulting in information leakage of the subject. Different from i.i.d data, the format of the target
samples can be nodes or graphs. For instance, for node classification task, the target samples can
be subgraph of the target node’s local graph [161] or only contain the node attributes [91]. For
graph classification task, the target sample is a graph to be classified [227].
• Reconstruction Attack: Reconstruction attack, also known as model inversion attack, aims to
infer the private information of the input graph. Since the graph-structure data is composed of
graph topology and node attributes, the reconstruction attack on GNNs can be split into structure
reconstruction, i.e., infer the structures of target samples, and attribute reconstruction (also known
as attribute inference attack), i.e., infer the attributes of target samples. Generally, the embeddings
of the target samples are required to conduct the reconstruction attack.
• Property Inference Attack: Different from attribute reconstruction attack, property inference
attack aims to infer dataset properties that are not encoded as features. For instance, one may want
to infer the ratio of women and men in a social network, where this information is not contained
in node attributes. The attacker may also be interested in structure-related properties such as
degrees of a node, which is the number of friends of the target user in a social network [265].
• Model Extraction Attack: This attack aims to extract the target model information by learning
a model that behaves similarly to the target model. It may focus on different aspects of the
model information, which results in two goals in model extraction: (i) The attacker aims to
obtain a model that matches the accuracy of the target model; (ii) The attacker tries to replicate
the decision boundary of the target model. Model Extraction Attack can threaten the security
of model for API service [159] and can be a stepping stone for various privacy attacks and
adversarial attacks.
Table 1 categorizes existing methods on privacy attacks on GNNs based on the attack types. We
will introduce the details of these methods in Section 3.2.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 7

Table 2. Categorization of attacker’s knowledge.


Knowledge References
White-box [58], [267]
Black-box [58], [161], [91], [227], [226], [186], [265], [90]

3.1.2 Threat Models of Privacy Attacks. To conduct privacy attacks, auxiliary knowledge about the
target GNN and/or dataset is usually possessed by attackers. In this subsection, we introduce the
categorization of threat models of privacy attacks in the aspect of attackers’ knowledge. Generally,
based on whether the model parameters of the target GNN are available, attacker’s knowledge
about the threat model can be split into two settings, i.e., white-box attack and black-box attack:
• White-Box Attack: In white-box attack, the model parameters or the gradients during training
is accessible for the attackers. Apart from the knowledge about the trained GNNs, the attacks
may require some other knowledge such as the nodes/graphs to be attacked in inference attacks
and a shadow dataset, i.e., dataset that follows the same distribution as the training dataset of
the target GNN. The white-box attack can be used to attack pretrained GNNs whose model is
publicly releasedg [175]. It is also practical during the training process of federated learning [58]
where the intermediate computations.
• Black-Box Attack: In contrast to white-box attack, the parameters of the target GNN are
unknown in black-box attack; while the architecture of the target GNN and hyperparameters
during training may be known. In this setting, attackers are generally allowed to query the target
GNN model to get the prediction vectors or embeddings of the queried samples. Similar to the
white-box attacks, shadow datasets and the target nodes/graphs are also required to conduct
black-box attacks. A practical example of the black-box privacy attack is to attack the API service
that sends the output of the GNN models when receiving queries from the users.
The categorization of existing privacy attack methods according to the assumption on attacker’s
knowledge is shown in Table 2.

3.2 Methods of Privacy Attack on GNNs


The Unified Framework. Supervised privacy attack is a common design strategy of privacy
attacks [58, 90, 91, 161, 186, 226, 227, 265]. The core idea of supervised privacy attack methods is to
utilize the shadow dataset and the output of the target models to get the supervision for training a
privacy attack mode, which can be described as a unified framework shown in Figure 2a. As shown
in Figure 2a, the attacker uses a shadow dataset D𝑆 as input to the target model 𝑓𝑇 to obtain the
predictions or embeddings. Then, ground-truth of various types of privacy attacks on the shadow
dataset can be attained. With the attack labels from the shadow dataset, attackers can train an
attack model that performs inference based on the outputs of the target models by:
1 ∑︁
min 𝑙 (𝑓𝐴 (𝑓𝑇 (G𝑖 )), 𝑦𝑖 ), (4)
𝜃𝐴 |D𝑆 |
G𝑖 ∈ D𝑆

where G𝑖 indicates the samples from the shadow dataset, which can be subgraphs of node 𝑣𝑖 ’s local
graph for node classification or a sample graph for graph classification. 𝑦𝑖 represents the extracted
attack labels of the sample G𝑖 . As illustrated in Fig. 2a, it can be varied from attributes to network
properties for different privacy attacks. 𝑙 (·) denotes the loss function such as cross entropy loss to
train the attack model 𝑓𝐴 . 𝜃 𝐴 denotes the parameters of the attack model 𝑓𝐴 . After the attack model
is trained, privacy attack on the target example G𝑡 can be conducted by 𝑓𝐴 (𝑓𝑇 (G𝑡 )). Next, we will
give more details about the methods of each types of privacy attacks.

, Vol. 1, No. 1, Article . Publication date: September 2023.


8 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

(a) Unified framework of privacy attacks. (b) Shadow training for membership inference.
Fig. 2. The illustration of the privacy attack methods.

Membership Inference Attack. The membership inference attack aims to identify if a target
sample was used for training the target model 𝑓𝑇 . The privacy leakage of membership is caused by
the overfitting of the model on the training dataset, which leads to the prediction vectors (predicted
label distributions) of training and test dataset follow different distributions [187, 227]. Thus, an
attacker can utilize the prediction vector to judge if a data instance was in the training set for 𝑓𝑇 .
To learn an membership inference attack model, the most common way is to apply shadow training
to obtain supervision for membership inference and train an attack model. The process of shadow
training is shown in Fig. 2b. In shadow training, part of the shadow dataset D𝑆𝑡𝑟𝑎𝑖𝑛 are used to train
a surrogate model 𝑓𝑆 to mimic the behaviors of the target model as:
1 ∑︁
min 𝑙 (𝑓𝑆 (G𝑖 ), 𝑓𝑇 (G𝑖 )), (5)
𝜃𝑆 |D𝑆𝑡𝑟𝑎𝑖𝑛 |
G𝑖 ∈ D𝑆𝑡𝑟𝑎𝑖𝑛

where G𝑖 is a graph for graph classification or 𝑘-hop subgraph centered at node 𝑣𝑖 for node
classification. 𝑓𝑇 (G𝑖 ) denotes the predicted label distribution of G𝑖 and 𝑙 (·) is loss function such as
cross entropy loss to ensure 𝑓𝑆 (G𝑖 ) is similar to 𝑓𝑇 (G𝑖 ). Since G𝑖 ∈ D𝑆𝑡𝑟𝑎𝑖𝑛 is used for training 𝑓𝑆 ,
the predictive probability vectors of G𝑖 from 𝑓𝑆 , i.e., [𝑓𝑆 (G𝑖 ), 𝑦𝑖 ], can be labeled as positive, where 𝑦𝑖
is the ground-truth class label of G𝑖 , used to help the attack model to judge if 𝑓𝑆 (G𝑖 ) is overfitting.
Similarly, the probability vectors of other shadow samples D𝑆𝑂𝑢𝑡 = D𝑆 − D𝑆𝑡𝑟𝑎𝑖𝑛 can be labeled as
negative samples for membership inference attack. Then, the prediction vectors and the obtained
labels for membership inference will be used to train an attackers model such as logistic regression,
which can infer whether a sample is in the training dataset of the target model or not.
Membership inference attack has already been extensively investigated on i.i.d data [98, 169, 187].
Due to the great success of GNNs, recently, membership inference attack on graphs has raised
increasing attention [58, 91, 161, 227], which generally follow the same scheme of shadow training.
In [161], membership inference based on subgraphs of the target node’s local graph is investigated
for node classification task. More specifically, the shadow dataset D𝑆 in [161] is a graph that comes
from the same underlying distribution as the graph used for training. A GCN is deployed as the
shadow model 𝑓𝑆 . In [91], the authors further investigate the attack on target samples that are only
provided with node features. To infer the membership of a singe test node, the attacker model is
trained to distinguish {𝑓𝑆 (x𝑖 ) : 𝑣𝑖 ∈ D𝑆𝑡𝑟𝑎𝑖𝑛 } and {𝑓𝑆 (x𝑖 ) : 𝑣𝑖 ∈ D𝑆𝑂𝑢𝑡 }, where 𝑓𝑆 (x𝑖 ) denotes the
prediction of 𝑓𝑆 when only attributes of a single node 𝑣𝑖 is feed into the surrogate GNN. In [227],
the authors also adopt the introduced framework. The major difference is that they achieve the
membership inference attack on graph classification task.
Reconstruction Attack. As mentioned in Sec. 3.1.1, reconstruction attack aims to infer the
sensitive attributes or/and links in target datasets. Due to the message passing of GNNs, the learned
node/graph embeddings capture both node attributes and graph structure information. Thus,

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 9

existing reconstruction attack method [58] reconstructs the information from node embeddings
H = [h1, . . . , h𝑁 ] learned by the target GNN. For attribute reconstruction, 𝑓𝐴 can simply be a
multilayer perceptron (MLP) and reconstruct the attributes as X̂ = 𝑀𝐿𝑃 (H). For link inference, 𝑓𝐴
generally predicts the link based on the embeddings of node 𝑣𝑖 and 𝑣 𝑗 by 𝑤 (𝑖, 𝑗) = 𝑀𝐿𝑃 (h𝑖 , h 𝑗 ).
Following the unified framework, a shadow graph G𝑆 is used to provide the adjacency matrix
A𝑆 and sensitive attributes X𝑆 as supervision. The node embeddings H𝑆 of the shadow graph
are assumed to be available. Then, the attack model can be trained Í in a supervised manner. For
attribute reconstruction attack, the training loss is given as min𝜃𝐴 𝑣𝑖 ∈ G𝑆 𝑙𝑎𝑡𝑡𝑟 (x𝑖 , x̂𝑖 ), where 𝑙𝑎𝑡𝑡𝑟
could be MSE loss for continuous attributes and cross entropy loss for categorical attributes. As
for the link inference, the objective function is: min𝜃𝐴 ∥ Â𝑆 − A𝑆 ∥ 2𝐹 , where Â𝑆 is the adjacency
matrix reconstruted by 𝑓𝐴 . Similar to link prediction, negative sampling may also be applied here.
Reconstructing adjacency matrix on graph embeddings are also investigated in [265]. The main
difference between [58] is that the attacker model directly infers the adjacency matrix with the
graph embedding by  = 𝑀𝐿𝑃 (h𝐺 ), where h𝐺 denotes a graph embedding.
For some applications, the embeddings might not be available while the prediction vector of an
instance can be obtained by querying the target model. Therefore, He et al. [90] propose to infer the
link of two nodes with their prediction vectors as 𝑤 (𝑖, 𝑗) = 𝑀𝐿𝑃 (y𝑖 , y 𝑗 ), where y𝑖 is the prediction
vector of node 𝑣𝑖 from target model 𝑓𝑇 . Following shadow training, the authors firstly query the
prediction vectors of shadow dataset from the target model 𝑓𝑇 . Then, a surrogate model is trained
on the shadow dataset and queried prediction vectors. Finally, the link inference attacker can be
learned with the supervision generated from the surrogate model.
The aforementioned methods all focus on black-box settings. However, with the development of
pretraining GNN and federated learning that share the model parameters, white-box reconstruction
attack methods also start to attracting attentions. For example, GraphMI [267] proposes to recon-
struct the adjacency matrix in a white-box setting where the trained model parameters are known.
Intuitively, the reconstructed adjacency matrix A will be similar to the original adjacency matrix if
the loss between true node label 𝑦𝑖 and predictions using the reconstructed matrix, 𝑓𝜃 (X, X)𝑖 is
minimized. In addition, the graph structure is updated to ensure accurate predictions from GNN
model under the feature smoothness constraint. Formally, GraphMI aims to solve the following
optimization problem:
1 ∑︁
min 𝑙 (𝑓𝜃 (A, X)𝑖 , 𝑦𝑖 ) + 𝛼tr(X𝑇 LX) + 𝛽 ∥A∥ 1 (6)
A∈ {0,1} 𝑁 ×𝑁 |V𝐿 |
𝑣𝑖 ∈ V𝐿

where L = D − A is the Laplacian matrix of A and D is the diagonal matrix of A. ∥A∥ 1 is the ℓ1 -norm
to make A sparse. 𝛼 and 𝛽 control the contributions of feature smoothness constraint and sparsity
constraint on the adjacency matrix A.
Property Inference Attack. The property inference attack is still in an early stage. An initial
effort is taken in [265] to infer the properties of a target graph by its embedding. The proposed
method also follows the unified framework. Let 𝑝𝑖 denote Í the property of a graph G𝑖 in the shadow
dataset D𝑆 , the attack model is trained by min𝜃𝐴 | D1𝑆 | G𝑖 ∈ D𝑆 𝑙 (𝑓𝐴 (h𝐺
𝑖 ), 𝑝𝑖 ), where h𝑖 = 𝑓𝑇 (G𝑖 ) is
𝐺

the embedding of G𝑖 . 𝑙 (·) can be MSE loss or cross entropy loss for different types of properties.
Model Extraction Attack: The model extraction attack aims to learn a surrogate model that
behaves similarly to the target model. The process of training a surrogate model is also included in
the membership inference attack, which is shown in Fig. 2b. Generally, the attacker will first query
the target model to obtain predictions on the shadow dataset. It then leverages the shadow dataset
and the corresponding predictions to train the surrogate model for model extraction attack, which
has been formulated in Eq.(5). Following this framework, recent works [186, 226] have investigated

, Vol. 1, No. 1, Article . Publication date: September 2023.


10 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 3. The categorization of privacy-preserving graph neural networks.


Category Privacy Attacks to Defense References
Differential Privacy (DP) Membership Inference [160], [176], [240], [261]
Federated Learning - [89], [166], [237], [212], [273]
Federated Learning + DP Membership Inference [228], [274], [137]
Machine Unlearning - [33], [231], [230], [39]
Attribute Reconstruction [125], [132]
Adversarial Privacy-Preserving
Structure Reconstruction [210]
Model Ownership Verification Model Extraction [241], [272], [209]
Other privacy protection methods Membership Inference [41] [161]

the GNN model extraction attacks with different levels of knowledge on shadow dataset and training
graph of the target model. For example, the general framework of model extraction is applied
for training on the shadow dataset that provided with graph structures [186, 226]. If no structure
is given in the shadow dataset, the missing graph structures can be firstly learned. For example,
in [186], the graph structure is firstly initialized by KNN on the node attributes then updated by a
graph structure learning framework [34].

3.3 Privacy-Preserving Graph Neural Networks


As GNNs are vulnerable to privacy attacks and may leak the private information of users, privacy-
preserving GNNs are developed to protect the privacy. Current privacy-preserving GNNs generally
fall into the following categories, i.e., differential privacy, federated learning, machine unlearning,
adversarial privacy-preserving and model ownership verification. The categorization of the privacy-
preserving GNNs are listed in Table 3. Next, we will introduce representative and state-of-the-art
methods in each category.
3.3.1 Differential Privacy for Privacy-Preserving GNNs. Differential Privacy (DP) [60] is a popular
approach that can provide privacy guarantee of training data. The core idea of differential privacy
is that that if two datasets differ only by one record and are used by the same algorithm, the outputs
of the algorithm on the two datasets should be similar. With differential privacy, the impact of a
single sample is strictly controlled. Thus, the membership inference attack can be defensed by DP
with theoretical guarantee. Formally, differential privacy is defined as follows:
Definition 3.1 ((𝜖, 𝛿)-Differential Privacy [60]). Given 𝜖 > 0 and 𝛿 ≥ 0, a randomized mechanism
M satisfies (𝜖, 𝛿) differential privacy, if for any adjacent datasets 𝐷 and 𝐷 ′ ∈ R and for any subsets
of outputs S, the following equation is met:
𝑃 (M (𝐷) ∈ S) ≤ 𝑒 𝜖 𝑃 (M (𝐷 ′ ) ∈ S) + 𝛿, (7)
where 𝜖 is the privacy budget to trade-off the utility and privacy. A larger 𝜖 will lead to stronger
privacy guarantee but weaker utility. When 𝛿 = 0, it is equivalent to 𝜖-Differential Privacy. (𝜖,
𝛿)-DP allows for the possibility that plain 𝜖-DP is broken with a small probability 𝛿.
To achieve (𝜖, 𝛿)-Differential Privacy, some additive noise mechanisms such as Gaussian mecha-
nism [61] and Laplace mechanism [60] are widely adopted. Based on the privacy budget and the
mechanism to be protected, certain levels of Gaussian noise or Laplace noise will be injected to
achieve a differentially private mechanism. Recently, various differential-privacy preserving deep
learning methods [1, 8, 165, 176] are proposed to protect the training data privacy. For instance,
NoisySGD [1] adds noises to the gradients during model training so the trained model parameters
will not leak training data with certain guarantee. PATE [165] firstly trains an ensemble of teacher

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 11

models on subsets of sensitive training data that are split disjointly. Then, the student model will
be trained with the aggregated output of the ensemble on public data. As a result, the student
model is unlikely to be affected by a change of a single sensitive data, which meets the differential
privacy. Theoretical guarantee of PATE in privacy is also analyzed in [165]. In differential privacy,
a trusted curator will be required to apply calibrated noise to produce DP. To handle the situation
of untrusted curator, local differential privacy methods [8, 176] that perturbs users’ data locally
before uploading to the central server are also investigated for privacy protection.
To protect the privacy of graph-structured data, many differential privacy preserving network
embedding methods are investigated [160, 176, 240, 261]. For instance, DPNE [240] applies pertur-
bations on the objective function of learning network embeddings. In [260], a perturbed gradient
descent method that guarantees the privacy of graph embeddings learned by matrix factorization is
proposed. More recently, several works [160, 176] that focus on differentially private GNNs are ex-
plored. For example, the locally private GNN [176] adopts local differential privacy [109] to protect
the privacy of node features by perturbing the user’s features locally. Furthermore, a robust graph
convolution layer is investigated to reduce the negative effects of the injected noises. PrivGnn [160]
extends the Private Aggregation of Teacher Ensembles (PATE) [165] for graph-structured data to
release GNNs with differential privacy guarantees. In particularly, random subsampling on the
training set of teacher and noisy labeling mechanism on the public data are used in PrivGNN [160]
to achieve practical privacy guarantees. As for [261], both the user feature perturbation at input
stage and loss perturbation at optimization stage are investigated to achieve a privacy-preserving
GNN for recommendation. In addition, they also empirically show that their proposed method can
help defend against the attribute inference attack as the noises added to node features can prevent
the attacker from reconstructing the original sensitive attribute.
3.3.2 Other Methods for Membership Privacy. Beyond differential privacy approaches, recent
developments have given rise to a variety of other methods aimed at preserving membership
privacy. Next, we will delve into a detailed exploration of these newly established works.
LBP and NSD [161]: These two methods are preliminary explorations in defending against member-
ship inference attacks. Specifically, LBP is an output perturbation method by where noise is infused
to the posterior before it is released to end users. Intuitively, noise can obfuscate the posteriors,
making it challenging to discern between member and non-member node posteriors. As for NSD, it
randomly chooses neighbors of the queried node during inference. This can limit the amount of
information used in the target model, thereby protecting membership privacy.
RM-GIB [41]. This work represents a pioneering effort to craft a unified graph neural network
framework designed to both preserve membership privacy and defend against adversarial attacks.
In this paper, Dai et al. [41] first connect the Information Bottleneck (IB) with the membership
privacy preservation. In particularly, the proposed RM-GIB principle can be written as:
min −𝐼 (z𝑥 , N𝑆 ; 𝑦) + 𝛽𝐼 (z𝑥 , N𝑆 ; x, N ), (8)
𝜃
where z𝑥 represents the attribute bottleneck encoding the node attribute information, N𝑆 is a
subset of 𝑣’s neighbors that bottleneck the neighborhood information, and 𝐼 (; ) denotes the mutual
information. As analyzed in [41], the regularization in IB (latter term in Eq.(8)) can constrain the
mutual information between representations and labels on the training set. This constraint can
narrow the gap between training and test sets to avoid membership privacy leakage. In addition,
RM-GIB collects pseudo labels on unlabeled nodes and integrate them with the given labels in
the optimization, further benefiting the membership privacy. The proposed RM-GIB also benefit
the robustness, as an attribute bottleneck and a neighbor bottleneck are deployed to remove the
redundant information and/or adversarial perturbations in both node attributes and graph topology.

, Vol. 1, No. 1, Article . Publication date: September 2023.


12 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Fig. 3. The illustration of the federated learning.

3.3.3 Federated Learning for Privacy-Preserving GNNs. Currently, the majority of deep learning
methods require a centralized storage of user data for training. However, this can be unrealistic due
to the privacy issue. For example, when several companies or hospitals want to combine their data
to train a GNN model, the data of their users are not allowed to be shared according to the privacy
terms. Furthermore, the users may be not willing to upload their data to the platform server due to
the concern of information leakage. For such situations, the data will remain in the local devices of
users or the data holder organizations. To address this problem, federated learning [116, 151] is
proposed to collectively learn models with decentralized user data in a privacy-preserving manner.
In particular, it aims to optimize the following objective function:
𝑛
∑︁
min 𝑝𝑘 L𝑘 (D𝑘 , 𝑤), (9)
𝑤
𝑘=1
where 𝑛 is the total number of devices/clients, D𝑘 is the local dataset stored in the 𝑘-th client,
and L𝑘 is the local objective function for the 𝑘-th device. The impact of each device is controlled
by 𝑝𝑘 with 𝑝𝑘 ≥ 0 and 𝑘 𝑝𝑘 = 1. The 𝑝𝑘 is often set as 𝑛1 or | |DD𝑘| | , where |D | = 𝑘 |D𝑘 | is the
Í Í
total size of samples. A general framework of federated learning to solve Eq.(9) is given in Fig. 3,
where the user data and a local model 𝑓𝑘 (𝑤𝑡𝑘 ) are maintained locally in 𝑘-th client in federated
learning. In the training step 𝑡, each client will compute the local model updates based on their
own data. Then, the central server will aggregate the model updates from clients and update the
global model parameters to 𝑤𝑡 +1 . The updated global model will be distributed to the clients for
future iterations. The federated learning framework is firstly proposed in FedAvg [151], which is
the most commonly used federated learning algorithm now. specifically, FedAvg aggregates the
model parameters by averaging the updated model parameters from clients as 𝑤𝑡 +1 = 𝑛𝑘=1 | |DD𝑘| | 𝑤𝑡𝑘 ,,
Í

where 𝑤𝑡𝑘 denotes the updated parameters in 𝑘-th client at step 𝑡. More comprehensive survey
about federated machine learning can be found in [246].
To protect the user privacy, federated learning has been extended to train GNNs [89, 137, 166,
212, 228, 237, 273, 274]. To handle the challenges caused by non-i.i.d. graphs, Xie et al. [237] propose
to dynamically cluster local systems based on GNN gradients, which can reduce the structure and
feature heterogeneity among graphs. In [212], the work proposes a hybrid of meta learning and
federated framework. They view the training on a client as a task in meta-learning and learn a
global model to mitigate the issue of non-i.i.d data. Then, federated learning methods are leveraged
to further update the global model. In [274], vertically federated learning that assumes that different
clients hold different features and neighborhood information of the user data is proposed. To
protect the private data, i.e., node attributes, edges and labels, the computations on private data

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 13

are carried out by the data-holders. A semi-honest server focuses on computations on encoded
node representations that are non-private. Apart from federated learning GNNs in node/graph
classification, federated frameworks on GNNs for recommendation are also developed [137, 228].
FedGNN [228] incorporates the high-order user-item interactions by building the local user-item
graphs in a privacy-preserving way. Furthermore, noises are injected locally in federated learning to
meet the local differential privacy for privacy projection. FeSoG [137] further extends the federated
GNNs for social recommendation which involves social information for predictions.
Moreover, decentralized federated learning for GNNs is also explored in [89, 166], where a
client can communicate with a set of neighbor clients for aggregation without the central server.
Therefore, existing algorithms [89, 166] propose to aggregate the local model with neighbor local
models. For example, decentralized periodic averaging SGD [89] applies SGD on each client locally
and synchronizes parameters with their neighbors every certain number of iterations.
3.3.4 Machine Unlearning. One of very important regulations on privacy is to ensure the right
to be forgotten [189]. This principle demands not only the deletion of data from storage but also
the removal of related information from trained AI models. Retraining can be a possible solution
for training data removal. However, this can be overwhelmingly expensive especially for models
trained on large-scale datasets. Therefore, machine unlearning methods [] have been developed to
efficiently erase the user information in deep models including graph neural networks. Based on
the targets, machine unlearning can be split into exact unlearning and Approximate unlearning.
Definition 3.2 (Exact unlearning [23]). Given a learning algorithm 𝐴(·), a dataset D, and a forget
set D 𝑓 ⊂ D, the exact unlearning process 𝑈 (·) is required to meet:
𝑃 (𝐴(D\D 𝑓 )) = 𝑃 (𝑈 (D, D 𝑓 , 𝐴(D))), (10)
where 𝑃 (𝐴(D)) denote the distributions of all models trained on D by the learning algorithm 𝐴(·).
SISA (Sharded, Isolated, Sliced and Aggregated) [23] is one popular framework for exact un-
learning. In this approach, the training set D will be first partitioned into 𝐾 disjointed shards.
Subsequently, 𝐾 independent models are trained with the 𝐾 shards. Predictions are given by as-
sembling the outputs from the these models. When a data point 𝑥 requests removal, only the
model using the shard containing 𝑥 will be retrained on the small shard. For the graph-structed
data, the partition will destroy the training graph’s structure which largely degrade the utility.
Therefor, GraphEraser [33] introduces a balanced graph partition which split graphs based on
community/clusters.
Definition 3.3 ((𝜖, 𝛿)-Approximate Unlearning [80]). Given 𝜖 > 0 and 𝛿 ≥ 0, a unlearn process M
satisfies (𝜖, 𝛿) approximate unlearning, if for any data point 𝑧 ∈ D and model sets T , the following
equation is met:
𝑃 (𝑈 (D, D 𝑓 , 𝐴(D)) ∈ T ) ≤ 𝑒 𝜖 𝑃 (𝐴(D\D 𝑓 ) ∈ T ) + 𝛿, (11)
where 𝜖 is the budget to trade-off the utility and privacy. A larger 𝜖 will lead to stronger privacy
guarantee but weaker utility. (𝜖, 𝛿)-approximate unlearning allows that plain 𝜖-approximate un-
learning is broken with a small probability 𝛿. Existing methods [39, 230, 231] mainly adopt the
influenced function-based parameter updating for approximate unlearning by: 𝜃 − = 𝜃 + 𝐻𝜃−1 Δ,
where 𝐻𝜃−1 Δ is the influence function of the training point to be unlearned. This is feasible because
influence functions quantify how model parameters will adjust if the loss of a particular data point is
excluded from training. However, merely adopting influence functions does not assure approximate
unlearning because gradient residual persists after updating model parameters. Certain noises in
the training would be necessary to ensure the gradient residual would not leak privacy.

, Vol. 1, No. 1, Article . Publication date: September 2023.


14 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

3.3.5 Adversarial Privacy-Preserving GNNs. To defend against the sensitive attributes/links leakage
attack, adversarial learning is adopted for privacy-preserving GNNs [125, 132, 210]. Let H be node
representations learned by the encoder/GNN as H = 𝑓𝐸 (G; 𝜃 ). The core idea of the adversarial
privacy-preserving is to adopt an adversary 𝑓𝐴 to infer sensitive attributes from node representations
H while the encoder 𝑓𝐸 aims to learn representation that can fool 𝑓𝐴 , i.e., making 𝑓𝐴 unable to infer
sensitive attributes. It is theoretically shown in [210] that through this minmax game, the mutual
information between learned representation and sensitive attribute, 𝑀𝐼 (H, 𝑠), can be minimized,
which protects the sensitive information from leakage. The process can be formally written as
min max L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 (𝑓𝐸 (G; 𝜃 𝐸 )) − 𝛽L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 (𝑓𝐴 (H; 𝜃 𝐴 )), (12)
𝜃𝐸 𝜃𝐴
where 𝜃 𝐸 and 𝜃 𝐴 are parameters of encoder 𝑓𝐸 and adversary 𝑓𝐴 , respectively. L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 is the loss
function to ensure the utility of the learned representations such as classification loss and recon-
struction loss. L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 is the adversarial loss which generally is cross entropy loss of sensitive
attribute prediction of the adversary based on node representations H. 𝛽 is the hyperparameter to
balance the contributions of these two loss terms.
Adversarial privacy-preserving GNN is firstly proposed in [125] to defend against attribute
inference attack, where link prediction loss and the node attribute prediction loss are combined
together for utility. Let x𝑐𝑣 denotes the 𝑐-th attribute of node 𝑣 and 𝑓𝑐 (h𝑣 ) denotes the prediction of
𝑐-th attribute based on the representation of node 𝑣. The utility loss can be written as
𝑁 ∑︁
∑︁ 𝑁 ∑︁ ∑︁
L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 = 𝑙𝑒𝑑𝑔𝑒 (A𝑖 𝑗 , Â𝑖 𝑗 ) + 𝛼 𝑙𝑎𝑡𝑡𝑟 (x𝑐𝑣 , 𝑓𝑐 (h𝑣 )), (13)
𝑖=1 𝑗=1 𝑣∈V 𝑐∈ C
where 𝑙𝑒𝑑𝑔𝑒 and 𝑙𝑎𝑡𝑡𝑟 denote loss functions for link prediction and attribute reconstruction, C
denotes
Í the set of attribute to be reconstructed. As for the adversarial loss, it is L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 =
𝑣 ∈ V𝑆 −[𝑠 𝑣 log(𝑠ˆ𝑣 ) + (1−𝑠 𝑣 ) log(1− 𝑠ˆ𝑣 )], where 𝑠ˆ𝑣 = 𝑓𝐴 (h𝑣 ) and V𝑆 is the set of nodes with sensitive
attributes. GAL [132] deploys a WGAN-based adversarial privacy-preserving for information
obfuscation of sensitive attributes. Both attribute privacy protection and link privacy protection
with adversarial learning are discussed in [210].
3.3.6 Model Ownership Verification. Nowadays, it can be
overwhelmingly expensive to train a high-performance
model due to the demands in high-quality data collection
and expensive computation cost. For example, the well-
known ChatGPT is reported to cost around 12 million dol-
lars to train once. Similarly, models for graph-structured
data, such as graph contrastive learning on molecules [168],
also tend to require massive computation on large-scale
data. Therefore, trained DNNs is considered as deep Intellec- Fig. 4. An illustration of model ownership
tual Property (IP) with high business values. It is necessary verification.
to protect the deep models from stealing and abusing by adversaries. Extensive model ownership ver-
ification methods have been proposed for deep neural networks for images and text [196]. Recently,
model ownership verification methods for graph neural networks are emerging [209, 241, 272]. In
this subsection, we will illustrate the general workflow of model ownership verification followed
by recent advances in protecting GNNs.
Workflow of Model Ownership Verification. As it is shown in Fig. 4, the workflow protecting
the deep IP mainly consists of two stages, i.e., IP construction and IP verification.
• In the IP construction phase, an IP identifier for the deep model will be built either in a invasive
or non-invasive way. Generally, the IP identifier is in the form of key-message pair. Model owners

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 15

Table 4. Datasets in Privacy


Type Dataset #Graphs Avg.Nodes Avg.Edges #Features References
Cora 1 2,708 5,429 1,433 [58] [161] [91] [90] [267] [226]
Citation Citeseer 1 3,312 4,715 3,703 [58] [161] [91] [90] [267] [226] [186]
PubMed 1 19,717 44,338 500 [58] [161] [90] [226] [186]
DBLP 1 17,716 105,734 1,639 [186]
Authorship Coauthor 1 34,493 247,962 8,415 [186]
ACM 1 3,025 26,256 1,870 [186]
Facebook 1 4,039 88,234 - [58]
Social Networks LastFM 1 7,624 27,806 7,842 [58] [91]
Reddit 1 232,965 57,307,946 602 [161]
Image Flickr 1 89,250 449,878 500 [161]
PROTEINS 1,113 39.06 72.82 29 [227] [90]
Bioinformatics DD 1,178 284.32 715.66 89 [227] [265]
ENZYMES 600 32.63 62.14 21 [227] [265] [90] [267]
NCI1 4,110 29.87 32.30 37 [227] [265]
Molecule AIDS 2,000 15.69 16.20 42 [265] [90] [267]
OVCAR-8H 4,052 46.67 48.70 65 [227] [265]

possess the secret keys, which can be predefined matrices or special input samples. The IP
messages could be a bit string or model outputs that are triggered by the secret keys.
• In the verification phase, given the secret keys, we will testify whether the same IP identifier
exists in the suspect model. Specifically, a stealing model will convey the same IP message. On
the contrary, an independent model will not output the IP message.

Based on whether the IP identifier is constructed in an invasive way, the model ownership verifica-
tion for GNNs can be split into watermarking methods [241, 272] and fingerprinting methods [209].
GNN Watermarking. It is an invasive solution that will embed the detectable and unforgettable
IP identifier to the GNN model to obtain a watermarked GNN. Current methods [241, 272] focus
on applying backdoor attacks to obtain watermarked GNN model. Specifically, backdoor attacks
aim to learn a model that predicts the target class 𝑦𝑡 given arbitrary sample attached with the
predefined trigger G𝑡 . Therefore, the trigger graph G𝑡 can work as the secret key for watermarking.
The target class 𝑦𝑡 will be the IP message. The training process in [241, 272] is the same as the graph
backdoor attacks. Firstly, a trigger graph G𝑡 will be generated. Then, samples with the defined
trigger G𝑡 will be labeled as a target class 𝑦𝑡 and join the training process of GNN. More details
about the backdoor attacks are illustrated in Sec. 4.2.5. As for watermarking GNN model without
using backdoor attacks, it still remains further investigation.
GNN Fingerprinting. Fingerprinting is a noninvasive solution which aims to build IB identifier for
a trained model. The main idea is based on the assumption that a trained neural network models will
exhibit distinct characteristics when compared to an independently trained model. These differences
can be model predictions, decision boundaries, adversarial samples, etc. Fingerprinting techniques
have been deeply studied in the context of i.i.d data [196]. However, for GNN fingerprinting, there
is only one initial effort named GROVE [209], which proposes a fingerprinting scheme based
on GNN embeddings. Specifically, More precisely, a set of test samples, denoted as D 𝑓 , serve as
the secret key for fingerprinting. During the verification, the similarity between the embeddings
from protected model and suspect model will be computed as the IP message. If the similarity of
embeddings from the protected model and suspect model are high, it indicates a high probability of
model stealing.

, Vol. 1, No. 1, Article . Publication date: September 2023.


16 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

3.4 Datasets for Privacy-Preserving GNNs


In this subsection, we list the datasets that have been used in the literature about GNN’s privacy.
The statistics of the datasets along with papers used the datasets are presented in Table 4.
• Cora, Citeseer, PubMed, DBLP [164, 182]: Cora, Citeseer, PubMed, and DBLP are citation
network datasets. Cora consists of seven classes of machine learning papers. CiteSeer has six
classes. Papers are represented by nodes, while citations between two papers are represented by
edges. Each node has features defined by the words that appears in the paper’s abstract. Similarly,
PubMed is a collection of abstracts from three types of medical papers. The data of DBLP comes
from four research areas.
• Facebook, LastFM [122]: Facebook and LastFM are social network datasets. Different user
accounts are represented by nodes that are connected by edges. Each user node in Facebook has
features such as gender, education, hometown, and location. LastFM was collected through the
following relationship in social network.
• Flickr [149]: Flickr is an online photo management and sharing application. The edges reflect
the common properties shared between two images, whereas the nodes represent an image
submitted to Flickr. Users’ node features are specified by their tags, which indicate their interests.
• Reddit [84]: The Reddit dataset represents the post-to-post interactions of a user. An edge
between two posts indicates that the same user commented on both posts. The labels correspond
to the community that a post is associated with.
• Coauthor [185]: Coauthor is a dataset of co-authorship. Nodes represent authors, and edges
connect two authors if they co-authored a paper. Features are collected by keywords in the
author’s papers. The label of a node is the area that the author focuses on.
• ACM [219]: In ACM, nodes are papers. Edges mean if there are same authors in two papers.
Features of a node are keywords of the paper, while labels are the conferences that the papers
published.
• PROTEIN, DD, ENZYMES [156]: DD, ENZYMES, and PROTEINS are macromolecules datasets.
Nodes are secondary structural elements labeled with their type and a variety of physical and
chemical data. If two nodes are neighbors along the amino acid sequence or one of the three
nearest neighbors in space, an edge links them. ENZYMES assigns enzymes into six classes,
which reflect the catalyzed chemical reaction. The labels in PROTEINS show whether a protein
is an enzyme. In DD, nodes are amino acids, and edges are their spatial proximity.
• AIDS, NCI1, OVCAR-8H, COX2 [156, 265]: AIDS, NCI1, OVCAR-8H, and COX2 are molecule
datasets. Atoms are represented by nodes, while chemical bonds are represented by edges. The
node features consist of atom types. The label is decided by toxicity or biological activity in drug
discovery projects.

3.5 Applications of Privacy Preserving GNNs


Pretraining and Model Sharing. Nowadays, there is an increasing trend of pretraining models [93,
168] on large-scale datasets to benefit the downstream tasks. In practice, the pretrained model will
often be shared to other parties for their use. However, the pretrained model itself has embedded
the information of the training data, which can cause private data leakage by privacy attacks such
as membership inference and attribute inference. The privacy-preserving GNNs can be applied
to address this concern. For instance, differential privacy-preserving GNNs [160, 176, 261] can be
adopted in the pretraining phase to defend against the membership inference attack.
Distributed Learning. Due to challenges of processing the large amount of data such as privacy
concerns, computational cost, and memory capability, the demand of distributed learning is in-
creasing dramatically. In this situation, federated learning on GNNs provide solutions to distributed

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 17

learning by processing data in local devices. In addition, combining the other privacy-preserving
methods such as differential privacy [228], the client’s data can be protected from privacy attacks.
Healthcare. Graph-structured data such as protein molecules, brain network, and patient network
are pervasive in healthcare domain. GNNs have been trained on these private healthcare data for
various applications. For example, GNNs have been used to process electronic health record (EHR)
of patients for diagnosis prediction [128]. GNNs are also deployed to better capture the graph
signal of brain activity for medical analysis [5]. To ensure the privacy of sensitive data of patients,
privacy-preserving GNNs are required for the applications in healthcare domain.
Recommendation System. GNNs are widely applied in recommendation system to involve social
context and better utilization of high-order neighbor information [228]. Similarly, information
can be leaked from the GNN-based recommendation system. To protect the user privacy, various
privacy-preserving recommendation systems have been proposed [137, 228, 261].

3.6 Future Research Directions of Privacy Preserving GNNs


Defense Against Various Privacy Attacks. Though many privacy-preserving GNNs have been
proposed, they mostly focus on defending against membership inference attack and attribute
reconstruction attack. The privacy-preservation GNNs against structure attack, property inference
attack, and model extraction attack are less studied. Therefore, it is promising to develop privacy-
preserving GNNs against various privacy attacks.
Privacy Attack and Preservation in GNN Pretraining. Model pretraining have been a common
scheme to benefit the downstream tasks that are lack of labels. Recently, pretraining of GNNs
with supervised tasks [92] and self-supervised tasks [93, 168] have achieved great success. The
parameters of pretrained GNNs will be released for downstream tasks, which may lead to private
information leakage. However, existing privacy attacks mostly focus on black-box settings and do
not investigate the information leakage caused by model releasing. Hence, privacy attack and the
corresponding defense methods for pretrained GNN modes need to be explored.
Trade off Between Privacy and Utility. Though methods that apply differential privacy, federated
learning or adversarial learning have been proposed to protect the privacy of training data, the
relations between the privacy protection performance and the prediction accuracy are rarely
discussed. For example, in differentially private GNNs [160, 176, 261], the actual performance in
defending against various privacy attacks is generally not evaluated. And in adversarial privacy-
preserving [125, 132, 210], how to control the balance between the prediction performance and
privacy protection is still not well discussed.

4 ROBUSTNESS OF GRAPH NEURAL NETWORKS


As an extension of neural networks on graph-structured data, GNNs are also vulnerable to adver-
sarial attacks. In addition, due to the message-passing mechanism and graph structure, GNNs can
be negatively affected by adversarial perturbations on both graph structures and node attributes.
For example, Nettack [280] can fool GNNs to give false predictions on target nodes by poison the
training graph with unnoticeable perturbations to graph structure and node attributes. NIPA [197]
manages to significantly reduce the global node classification performance of GNNs by injecting a
small amount of labeled fake nodes to the training graph. The vulnerability of GNNs has arisen
tremendous concerns on adopting GNNs in safety-critical domains such as credit estimation and
healthcare. For example, fraudsters can create several transactions with deliberately chosen high-
credit users to escape GNN-based fraud detectors, which can cause tremendous loss to individuals,
and institutions. Hence, developing robust GNNs is another important aspect of trustworthiness and
many efforts have been taken. There are already several comprehensive surveys about adversarial

, Vol. 1, No. 1, Article . Publication date: September 2023.


18 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 5. Categorization of representative graph adversarial attacks.


Aspect Category References
White-box [229], [243], [30], [46], [75]
Knowledge
Black-box [46], [146], [280],[30], [281], [243], [215], [197], [20], [26]
Evasion Attack [229], [243], [30], [46],[204], [146], [144], [146], [75]
Capability
Poisoning Attack [280], [281], [197], [20], [26], [215], [235], [266], [124], [75]
Targeted Attack [46], [215], [204], [235],[266], [280], [146], [30], [124], [20]
Attackers’ Goal
Untargeted Attack [197]. [281], [243], [75], [20], [229], [144], [26]

attacks and defenses on graphs [31, 104, 195, 225]. Therefore, in this section, we briefly give the
overview of adversarial learning on graphs, but focus more on methods in emerging directions
such as scalable attacks, graph backdoor attacks, and recent defense methods.

4.1 Threat Models of Graph Adversarial Attacks


Graph adversarial attacks aim to degrade the performance of GNNs or to make GNN models give
desired output by injecting deliberate perturbations to the graph dataset. Generally, attackers are
constrained in the knowledge about the data and the model they attack, as well as the capability of
manipulations on the graph. In this subsection, we introduce the threat models in various aspects
to show different settings of graph adversarial attacks.
Attackers’ Knowledge. Similar to privacy attacks, attackers need to possess certain knowledge
about the dataset and target model to achieve the adversarial goal. Based on whether the model
parameters are known for the attacker, they can be split into white-box and black-box attack:
• White-box Attack: In this setting, the attacker knows all information about the model parameters
and the training graph such as adjacency matrix, attribute matrix and labels [30, 75, 243]. Since
it is impractical to obtain all information in the real world, the white-box attack is less practical
but often used to show the worst performance of a model under adversarial attacks.
• Black-box Attack: In black-box attack [20, 46, 280], attackers do not have access to the model’s
parameters but can access graph dataset. More specifically, the full/partial of the graph structure
and node features could be accessible for attackers. Attackers may be allowed to have labels
used for training or query the outputs of the target GNN, which could be used to mimic the
predictions of the target model to achieve black-box attack.
Attackers’ Capability. In adversarial attacks, adversarial perturbations are added to data samples
to mislead the target GNN model to give the output desired by the attacker. According to the stage
the attack occurs, the attacks can be divided into poisoning attack and evasion attack:
• Evasion Attack: The perturbations in evasion attacks [46, 243] are added to the graph in the
test stage, where the GNN model parameters have been well trained and cannot be affected by
attackers. Depending on whether the attacker knows the model parameters, evasion attack can
be further categorized into white-box or black-box evasion attack.
• Poisoning Attack: In poisoning attacks [197, 280, 281], the training graph is poisoned before
GNNs are trained. The GNN model trained on the poisoned dataset will exhibit certain designed
behaviors such as misclassifying target nodes or having low overall performance. As poisoning
attack happens before model training, it belongs to the black-box attack where the model
parameters are unknown. Thus, attackers usually train a surrogate model and poison the graph
to reduce the performance of the surrogate model. Due to the transferability of adversarial attack,
the poisoned graph can also reduce the performance of the target GNN trained on it.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 19

Currently, many graph mining tasks such as semi-supervised node classification and link prediction
are transductive learning, where the test samples participate in training phase. Therefore, most of
existing works focus on poisoning attacks, which are often more practical for graph mining.
Based on the way that the graph data is perturbed, the adversarial attacks can be categorized
into manipulation attacks, node injection attack, and backdoor attacks:
• Manipulation Attack: In manipulation attack, an attacker manipulates either graph structure
or node features to achieve the attack goal. For example, Nettack [280] perturbs the graph
by deliberately adding/deleting edges and revising the node attributes with a greedy search
algorithm based on gradients. To make the perturbation more unnoticeable, ReWatt [146] poisons
the training graph by rewiring edges with reinforcement learning.
• Node Injection Attack: Different from manipulation attack that modifies the original graph,
node injection attack aims to achieve the adversarial goal by injecting malicious nodes into the
graph [197, 204, 215]. Compared to manipulation attack, node injection attack is more practical.
For example, in e-commercial network, attackers need to hack servers or user accounts to
manipulate the network; while injecting new malicious accounts is much easier.
• Backdoor Attack: Backdoor attacks [235, 266] inject backboor triggers to the training set to
poison the model. The backdoor trigger is a predefined or learned pattern, such as a single node
or a subgraph. The attacker relabel training nodes/graphs attached with the backdoor trigger to
the target label so that a GNN trained on the poisoned dataset will predict any test sample with
backdoor trigger to the target label. Compared with other types of adversarial attacks, backdoor
attack on GNNs is still in an early stage.
Attackers’ Goal. Based on whether the goal of the attacker is to misclassify a set of target instances
or reduce the overall performance of GNN model, threat models can also be categorized as:
• Targeted Attack: The attacker aims to fool a GNN model to misclassify a set of target nodes [266,
280]. Meanwhile, the attacker might want the performance of the target model on non-targeted
samples remain unchanged to avoid being detected.
• Untargeted Attack: The untargeted attack [197, 281] aims to reduce the overall performance of
the target GNN model. Since evasion attack cannot affect the parameters, it can only be achieved
by poisoning the dataset in the training stage.
The categorization of the existing representative graph adversarial attacks in different aspects are
summarized in Table 5. Next we will give more details of these methods.

4.2 Graph Adversarial Attack Methods


In this subsection, we first give a unified formulation of adversarial attacks followed by represen-
tative methods in evasion attacks and poisoning attacks. Finally, we survey recent advances in
backdoor attacks and scalable attacks on GNNs.
4.2.1 A Unified Formulation of Adversarial Attack. The adversarial attacks can be conducted on both
node-level and graph-level tasks. Since the majority of the literature focuses on node classification
problem, we give a unified formulation on node-level graph adversarial attacks as an example,
which can be easily extended to other tasks.
Definition 4.1. Given a graph G = {V, E} with adjacency matrix A and attribute matrix X, let
V𝑇 be the set of nodes to be attacked, the goal of adversarial attack is to find a perturbed graph Ĝ
that meets the unnoticeable requirement by minimizing the following objective function:
∑︁
min L𝑎𝑡𝑘 ( Ĝ) = 𝑙𝑎𝑡𝑘 (𝑓𝜃 ∗ ( Ĝ)𝑣 , 𝑦 𝑣 ) s.t. 𝜃 ∗ = arg min L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (G ′ )) (14)
Ĝ ∈Φ( G) 𝑣 ∈ V𝑇
𝜃

, Vol. 1, No. 1, Article . Publication date: September 2023.


20 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

where 𝑙𝑎𝑡𝑘 is the loss for attacking, which is typically set as −𝐻 (𝑓𝜃 ∗ ( Ĝ)𝑣 , 𝑦 𝑣 ) with 𝐻 (·) being the
cross entropy. L𝑡𝑟𝑎𝑖𝑛 is the loss for training target model. Generally, in node classification, we apply
the classification loss L𝑡𝑟𝑎𝑖𝑛 = 𝑣 ∈ V𝐿 𝐻 (𝑓𝜃 ( Ĝ)𝑣 , 𝑦 𝑣 ). As for G ′ , it can be either Ĝ or G, which
Í
correspond to poisoning attack and evasion attack, respectively. As for the search space Φ(G), apart
from the way of perturbing the graph, it is also constrained by the budget to ensure unnoticeable
attacks. Specially, the constraint from the budget is typically implemented as ∥ Â−A∥ + ∥ X̂−X∥ ≤ Δ,
where Δ denotes the budget value. For non-targeted attack, V𝐿 is set all the unlabeled nodes V𝑈
and 𝑦 𝑣 will be the prediction of the unlabeled nodes fro a GNN trained on the clean graph G
4.2.2 Evasion Attacks. Evasion attacks focus on the inductive setting, which aims to change the
predictions on new nodes/graphs. Based on the applied techniques, evasion attack methods can be
split into Gradient-Based Methods and Reinforcement Learning-Based Methods.
Gradient-Based Methods. In the early stage of adversarial attacks [30, 46, 229, 243], they gener-
ally focus on white-box evasion attack to demonstrate the vulnerability of GNNs and assess the
robustness of model under worst situations. Since the model parameters are available in white-box
evasion attacks, it is natural to optimize the objective function in Eq.(14) by gradient decent. More
specifically, the objective function of white-box evasion attack can be rewritten as:
∑︁
min L𝑎𝑡𝑘 ( Ĝ) = 𝑙𝑎𝑡𝑘 (𝑓𝜃 ( Ĝ)𝑣 , 𝑦 𝑣 ), (15)
Ĝ ∈Φ( G) 𝑣 ∈ V𝑇

where 𝜃 represents mode parameters of the target GNN. However, due to the discreteness of the
graph structure, it is challenging to directly solve the optimization problem in Eq.(15). Therefore,
FGA [30] and GradMax [46] use a gradient-based greedy algorithm to iteratively modify the
connectivity of the node pair within attack budget. Xu et al. [243] adopt projected gradient decent
to ensure the discreteness of perturbed adjacency matrix. Instead of directly using derivatives
from the attacks, Wu et al. [229] use integrated gradients to identify the optimal edges and node
attributes to be modified for attack.
Recently, there are several attempts in developing gradient-based evasion attacks [144, 204] in
a more practical black-box setting, which can be applied to graph classification task and node
classification on evolving graphs. By exploiting the connection between the backward propagation
of GNNs and random walks, Ma et al. [144] investigates the connections between the change of
classification loss under perturbations and the random walk transition matrix. Based on that, they
generalize the white-box gradients into a model-independent important scores of PageRank, which
avoids using model parameters. Tao et al. [204] investigate the black-box evasion attack with single
node injection. Without knowing the model parameters, they train a surrogate model on the train
graph and attack the surrogate model to inject a fake node to the graph. A parameterized attribute
generator and edge generator is adopted for the node injection attack on unseen test nodes.
Reinforcement Learning-Based Methods. In black-box evasion attacks, lacking of model pa-
rameters will challenge the gradient-based methods. However, in many scenarios such as drug
property prediction, the attacker is allowed to query the target model. In this situation, reinforce-
ment learning can be employed to conduct evasion attacks to learn the optimal actions for graph
perturbation [46, 146]. RL-S2V [46] is the first work to apply reinforcement learning for black-box
targeted attack. They model the attack process as a Markov Decision Process (MDP) defined as:
• State: The state 𝑠𝑡 at time 𝑡 is represented by the tuple ( Ĝ𝑡 , 𝑣), where Ĝ𝑡 is the intermediate
modified graph at time 𝑡 and 𝑣 is the target node to be attacked.
• Action: The attacker of RL-S2V needs to add or delete an edge in each step, which is equivalent
to select a node pair. It decomposes the node pair selection action 𝑎𝑡 ∈ V × V into a hierarchical
structure of sequentially selecting two nodes, i.e., 𝑎𝑡 = (𝑎𝑡(1) , 𝑎𝑡(2) ), where 𝑎𝑡(1) , 𝑎𝑡(2) ∈ V.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 21

• Reward: The goal of the attack is to fool the target classifier 𝑓 on the target node 𝑣. In RL-S2V,
no reward is given in intermediate steps. The non-zero reward is only given at the end as:

1 if 𝑓 ( Ĝ𝑚 )𝑣 ≠ 𝑦 𝑣 ;
𝑟 (𝑠𝑚 , 𝑎𝑚 ) = (16)
−1 if 𝑓 ( Ĝ𝑚 )𝑣 = 𝑦 𝑣 ,
• Termination: The process will stop once the agent modifies 𝑚 edges.
RL-S2V adopts Q-learning algorithm to solve the MDP problem for targeted attack. A parameterized
Q-learning is implemented for better transferablility. Similar reinforcement learning framework is
also applied in ReWatt [146]. To make the perturbations more unnoticeable, ReWatt perturbs the
graphs by rewiring, i.e., break an edge 𝑒𝑖 𝑗 and rewire the edge to another node to form 𝑒𝑖𝑘 . Hence,
ReWatt employs a different action space that consists of all the valid rewiring operations.
4.2.3 Poisoning Attacks by Graph Manipulation. Apart from the evasion attacks, extensive poison-
ing attack methods [46, 146, 197] have been investigated for the transductive learning setting, which
are more practical for semi-supervised node classification. The majority of them focus on perturbing
the graph data by manipulating the original graph [280, 281]. From a technical standpoint, most
poisoning attacks through manipulation can be categorized under gradient-based methods.
As Eq.(14) shows, the poisoning attack can be formulated as a bilevel optimization problem. To
address this problem, various methods [30, 243, 280, 281] that perform gradient-based attacks on
a static/dynamic surrogate model have been investigated. For instance, Nettack [280] deploys a
tractable surrogate model, i.e., 𝑓𝑆 ( Â, X) = softmax( Â2 XW), to conduct poisoning targeted attack.
The surrogate model is firstly trained to capture the major information of graph convolutions. Then,
the poisoning attack is reformulated to learning perturbations by attacking the surrogate model as:
arg max L ( Â, X̂; W, 𝑣) = max [ Â2 X̂W] 𝑣𝑐 − [ Â2 X̂W] 𝑣𝑦𝑣 , (17)
𝑐≠𝑦𝑣
( Â,X̂) ∈Φ(A,X)

where Φ(A, X) denotes the search space under the unnoticeable constraint, which considers both
attack budgets and maintaining important graph properties. To solve Eq.(17), Nettack proposes
an effective way of evaluating the change of surrogate loss after adding/removing a feature or an
edge. The final poisoned graph is obtained by repeatably selecting the most malicious operation in
a greedy search manner until reaching the budget. Similarly, FGA [30] adopts a static surrogate
model for poisoning attack. Different from Nettack, FGA directly adopts the GCN model and select
the perturbations based on gradients.
As the static surrogate model is trained on the raw graph, which cannot accurately reflect the
performance of GNN on the poisoned graph, there are also many works [243, 281] that adopt
a dynamic surrogate model to consider the effects of added perturbations to the target model.
In [243], min-max topology attack generation is employed for untargeted attack. Specifically, the
inner maximization updates the surrogate model on the partial modified graph, and the outer
minimization conducts projected gradient decent topology attack on GNN model. Metattack [281]
introduces meta-learning to solve the bi-level optimization on the untargeted attack. Essentially,
the graph structure matrix is treated as a hyperparameter and the meta-gradient is computed as:
∇𝑚𝑒𝑡𝑎
G = ∇ G L𝑎𝑡𝑘 (𝑓𝜃 ∗ (G)) = ∇ 𝑓 L𝑎𝑡𝑘 (𝑓𝜃 ∗ (G)) · [∇ G 𝑓𝜃 ∗ (G) + ∇𝜃 ∗ (𝑓𝜃 ∗ (G) · ∇ G 𝜃 ∗ ], (18)
where 𝜃 ∗ , i.e., the parameters of surrogate GNN model, is usually obtained by gradient descent
in 𝑇 iterations. For each inner iteration, the gradients of 𝜃 𝑡 +1 is obtained by ∇ G 𝜃 𝑡 +1 = ∇ G 𝜃 𝑡 −
𝛼∇ G ∇𝜃𝑡 L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃𝑡 (G)), where 𝛼 is the learning rate of the gradient descent in the inner iteration.
Though surrogate model provides a way for poisoning attack, if the target model architecture
differs a lot from the surrogate model, the poisoned graph might not able to significantly reduce
the performance of the target model when trained on it. Thus, instead of using surrogate models,

, Vol. 1, No. 1, Article . Publication date: September 2023.


22 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

some works [20, 26] design generalized attack loss functions to poison the graph to improve the
transferability. For example, Bojchevski et al. [20] exploit the eigenvalue perturbation theory to
efficiently approximate the unsupervised DeepWalk loss to manipulate the graph structure. It further
demonstrates that the perturbed graph structure is transferable to various GNN models such as
GCN. Graph embedding learning is formulated as a general signal process with corresponding graph
filter in GF-Attack [26], which enables a general attacker that theoretically provides transferability
of adversarial samples. GF-Attack is able to perturb both adjacency matrix and feature matrix,
which leads to better attacking performance than [20].

4.2.4 Node Injection Attacks. Node injection attacks aim to achieve the adversarial goal by adding
malicious nodes. This attack process will not affect the existing link structures and node attributes.
Hence, node injection is more practical to execute compared with manipulation attacks. A uniform
objective function of node injection attack can be formulated as:
∑︁
min L𝑎𝑡𝑘 ( Ĝ) = 𝑙𝑎𝑡𝑘 (𝑓𝜃 ∗ ( Ĝ)𝑣 , 𝑦 𝑣 ) s.t. 𝜃 ∗ = arg min L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 ( Ĝ)) (19)
E𝐴 ,V𝐴 𝜃
𝑣 ∈ V𝑇

Where E𝐴 is the set of edges that link nodes inside V𝐴 and connect V𝐴 with clean graph G.
Node injection attack is firstly proposed by NIPA [197], which assigns fake nodes to degrade the
overall performance of the target GNN. To achieve stronger attack performance, the attacker in
NIPA will also add the labels of the malicious nodes into the training set. The objective function is
solved by reinforcement learning, which is similar to RK-S2V that described in Sec.4.2.2. But the
action space and design of reward are designed for node injection positioning attack. Specifically,
in each step, the action is to connect a fake node to the graph and assign a class label to the fake
of node to effectively fool GNN trained on the poisoned graph. The action is decomposed into
three steps: (i) select a fake node; (ii) select a real node and connect to the fake node selected; and
(iii) assign the label to the selected fake node. The reward is defined based on the performance of
the surrogate model trained on the poisoned graph. Let 𝐴𝑡 denote the attack success rate on the
surrogate model that trained on Ĝ𝑡 . The reward is defined as 𝑟𝑡 (𝑠𝑡 , 𝑎𝑡 ) = 1 if 𝐴𝑡 +1 > 𝐴𝑡 , where 𝑠𝑡
and 𝑎𝑡 denotes the state and action at time 𝑡; Otherwise, 𝑟𝑡 (𝑠𝑡 , 𝑎𝑡 ) = 0.
Beyond the practicality, scalability is another advantage of node injection attacks. When we
inject a malicious node for adversarial attacks, we only need to consider 𝑑 · 𝑁 options, where
𝑑 and 𝑁 are the expected degree of the fake node and the graph size. Since 𝑑 is generally very
small and even can be set as one [204], the node injection attack is naturally more scalable than
the manipulation attack. Several following works [204, 215, 279] further investigate the node
injection attacks for better scalability. AFGSM [215] approximately linearizes the target GCN and
derives a closed-form solution for a node injection targeted attack, which has much lower time
cost. Experiments on Reddit with over 100K nodes demonstrate its effectiveness and efficiency for
targeted attack. G-NIA [204] explore to conduct targeted attack by injecting a singe node, which
avoids the cost of generating multiple nodes for the attack. To efficiently choose nodes connecting
with the injected ones, TDGIA [279] introduces a topological defective edge selection strategy.
Specifically, a metric based on basic graph structural information is employed to assess nodes most
impacted by perturbations in their neighbors. To generate the features for the injected nodes, a the
smooth feature optimization objective is designed in TDGIA.
Recently, the limitations of node injections attacks in unnoticeability are explored in [35]. In
this work, the authors observed that these injected nodes and edges can disrupt the homophily
distribution in the original graphs, compromising its unnoticeability. To solve this issue, Chen et
al. propose homophily unnoticeability to regularize the node generation and attachment. In their
experiments, the homophily regularization is applied on various existing node injection attacks.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 23

Fig. 5. General framework of graph backdoor attack [43].

Extensive results on many massive size graphs indicate the effectiveness of the proposed homophily
regularization.
4.2.5 Graph Backdoor Attacks. In this subsection, we review the recent emerging graph backdoor
attacks. In backdoor attacks, the attacker aims to make the host system to misbehave when the
pre-defined trigger is present. The backdoor attack on graphs can be applied in various scenarios,
which can largely threat the applications of GNNs. For example, an attacker may inject backdoor
trigger to the drug evaluation model that possessed by the community. Then, even a useless drug
containing the backdoor trigger from the attacker will be classed as an effective medicine. In
addition, the backdoor can be applied to federated learning [242], which threatens the safety of the
final global model. Though successful backdoor attacks may lead to enormous loss, there are only
few works on backdoor attack methods on graphs [235, 266]. Next, we will introduce a general
framework of graph backdoor attacks followed by details of representative works.
General Framework of Graph Backdoor Attacks The key idea of the backdoor attacks is to
associate the trigger with the target class in the training data to mislead target models. As illustrated
in Fig. 5, during the poisoning phase, the attacker will attach a trigger 𝑔 to a set of poisoned nodes
V𝑃 ⊆ V and associate V𝑃 with the target class label 𝑦𝑡 . This process generates a backdoored
dataset. The GNNs trained on the backdoored dataset will be optimized to predict the poisoned
nodes V𝑃 (attached with the trigger 𝑔) as target class 𝑦𝑡 . This association will force the target GNN
to relate the existence of the trigger 𝑔 in neighbors with the target class. In the test phase, the
attacker can control the prediction on the target node 𝑣 to be 𝑦𝑡 by attaching the trigger 𝑔 to node 𝑣.
SBA [266] proposes a subgraph-based backdoor attack for graph classification. In [266], the sub-
graph trigger is generated by random graph generation algorithms such as Erdős-Rényi model and
small world model. The set of poisoned nodes V𝑃 is randomly sampled. In this paper, the authors
also investigate the impacts of the trigger size, trigger density, and the size of poisoned graphs,
which demonstrates the vulnerability of GNN models to backdoor attacks.
GTA [235] focuses on injecting backdoor to a pretrained model for node/graph-level tasks. The
backdoor is expected to remain effective even after fine-tuning the pretrained model on downstream
tasks. Hence, GTA formulates it as the following bi-level optimization problem
G𝑡∗ = arg min L𝑎𝑡𝑘 (𝜃 ∗ (G𝑡 )) s.t. 𝜃 ∗ = arg min L𝑡𝑟𝑎𝑖𝑛 (𝜃, D𝐶 ∪ D̂𝑃 ), (20)
G𝑡 𝜃

where D𝐶 and D̂𝑃 represent the clean training samples and the samples poisoned by G𝑡 , respectively.
In [235], the downstream tasks are assumed unknown. They adopt an unsupervised training and

, Vol. 1, No. 1, Article . Publication date: September 2023.


24 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

design an attack loss that forces the poisoned graphs/nodes have similar embeddings with a pre-
defined sample. Eq.(20) is solved by iteratively conducting inner and outer optimization with a
first-order approximation. In addition, a parameterized backdoor generator is adopted to obtain
personalized backdoor for each test graph/node.
UGBA [43]. While SBA and GTA demonstrate impressive attack performance, Dai et al. [43]
observed that the triggers they utilize differ significantly from the attached poisoned nodes. This
violates the homophily property of the graph. Hence, trigger attachments are vulnerable, as elimi-
nating edges connecting dissimilar nodes can easily disrupt them. Moreover, existing works require
expensive budget to backdoor GNNs trained on large datasets. To address these problems, Dai et
al. [43] develop UGBA [43] to execute unnoticeable grpah backdoor attacks with limited budget. To
make more efficient use of the budget, UGBA will attach triggers with deliberately chosen poisoned
nodes denoted as V𝑃 . To ensure unnoticeability, an adaptive trigger generator is deployed to obtain
trigger 𝑔𝑖 for node 𝑣𝑖 under the following constraint:
min 𝑠𝑖𝑚(𝑢, 𝑣) ≥ 𝑇 , (21)
(𝑢,𝑣) ∈ E𝐵𝑖

where E𝐵𝑖 denotes the edge set that contain edges inside trigger 𝑔𝑖 and edge attaching trigger 𝑔𝑖 to
node 𝑣𝑖 . 𝑠𝑖𝑚 represents the cosine similarity on node features. 𝑇 is the threshold of the similarity
score which can be tuned based on datasets. The objective functon of UGBA can be formluated as:
∑︁
min L𝑎𝑡𝑘 (V𝑃 , 𝜃𝑔 ) = 𝑙 (𝑓𝜃 ∗ ( Ĝ𝑖 ), 𝑦𝑡 )
V𝑃 ,𝜃𝑔
𝑣𝑖 ∈ V𝑈
∑︁ ∑︁
𝑠.𝑡 . 𝜃 ∗ = arg min 𝑙 (𝑓𝜃 (G𝑖 ), 𝑦𝑖 ) + 𝑙 (𝑓𝜃 ( Ĝ𝑖 ), 𝑦𝑡 ), (22)
𝜃 𝑣𝑖 ∈ V𝐿 𝑣𝑖 ∈ V𝑃
∀𝑣𝑖 ∈ V𝑃 ∪ V𝑈 , 𝑔𝑖 meets Eq.(21) and |𝑔𝑖 | < Δ𝑔 , |V𝑃 | ≤ Δ𝑃

where Ĝ𝑖 = 𝑎(G𝑖 , 𝑔𝑖 ) denotes the computation graph of node 𝑣𝑖 after attaching the generated
trigger 𝑔𝑖 . 𝑙 (·) represents the cross entropy loss and 𝜃𝑔 denotes the parameters of the adaptive
trigger generator. UGBA splits the optimization process in Eq.(22) into poisoned node selection
and adaptive trigger generator learning. During the poisoned node selection phase, representative
nodes situated at the clustering center are chosen. As for the training of the adaptive generator, a
bi-level optimization with a surrogate GCN model is applied.
4.2.6 Scalable Attack Methods. In real-world scenarios, the graph to be attacked is often in large
scale. For example, the Facebook social network contains billions of users. It is challenging to
conduct adversarial attacks on such large-scale network. First, the majority of existing works focus
on manipulation attacks that try to find the optimal node pairs to be manipulated, which will
require very high computation cost. For example, the time and space complexity of Mtattack is
𝑂 (𝑁 2 ) with 𝑁 being the number of nodes in the graph, as it requires to compute the meta-gradient
of each node pair. Second, the large-scale graph may exhibit different properties. Therefore, the
attack methods on small graphs could be ineffective on large graphs. However, there are only few
works investigating the vulnerability of GNNs on large-scale graphs [75]. In this section, we present
three promising directions of scalable attacks.
Perturbation Sampling: As mentioned, the gradient-based manipulation attack needs to compute
the gradients of each pair of node to decide the perturbation on the topology, resulting in unafford-
able time and space complexity. In [75], Projected Randomized Block Coordinate Descent (PR-BCD)
is proposed to sample the perturbations and update their corresponding probability scores so as
to reduce time complexity. Specifically, the manipulation attack on graph topology is modeled as
minP L𝑎𝑡𝑘 (𝑓𝜃 (A ⊕ P), X), where P ≤ Δ and P𝑖 𝑗 = 1 denotes an edge flip. The operation ⊕ stands
Í

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 25

for the operation of element-wise edge flipping. To optimize P, it is relaxed to P ∈ [0, 1] 𝑁 ×𝑁 , where
P𝑖 𝑗 is the probability of flipping the edge for attacks. In each training iteration, PR-BCD randomly
samples a fixed number of perturbations, which lead to a sparse adjacency matrix for both forward
and backward computation. The final perturbations can be obtained by the flipping probability
score matrix P. This method is applicable for both targeted and untargeted attacks. Experiments on
massive graphs with over 100 million nodes empirically show the effectiveness of PR-BCD and the
vulnerability of GNNs on large-scale graphs.
Candidates Reduction: For attack on a target node 𝑣, the search space can actually be reduced as
many candidate topology manipulations are ineffective to affect the prediction on 𝑣. For instance,
linking two nodes far from the target node 𝑣 can hardly change the representation of node 𝑣.
And linking a node 𝑢 that share the same label as 𝑣 to 𝑣 can even lead to a more robust graph
structure [42]. Therefore, SGA [124] develops a mechanism of perturbation candidate reduction
to avoid excessive computation. First, for manipulation on a node pair (𝑣 1, 𝑣 2 ), one of the nodes,
say 𝑣 1 , should be in the computation graph of the target node 𝑣, i.e., 𝑘-hop subgraph of node 𝑣 in
a 𝑘-layer GNN. Second, for the other node 𝑣 2 , its class label should be the one that is most easily
misclassified to the original class of target node 𝑣. Finally, SGA assumes that a node 𝑣 2 is more likely
to be selected if node 𝑣 2 can largely affect the prediction of 𝑣 when the manipulation is directly on
node pair (𝑣, 𝑣 2 ). Therefore, several best candidates of 𝑣 2 for each manipulation can be selected.
With these strategies, the time complexity can be reduced to be linear with the graph size.

4.3 Robust Graph Neural Networks


As GNNs are vulnerable to adversarial attacks, various robust graph neural networks against
adversarial attacks have been proposed, which can be generally categorized into three types:
Adversarial Training, Graph Denoising, and Certifiable Robustness. Next, we will introduce the
representative methods of each category and some defense methods lie in other category.

4.3.1 Adversarial Training. Adversarial training is a popular and effective approach to defend
against adversarial evasion attacks, which has been widely applied in computer vision [79] to
defend against evasion attacks. Generally, adversarial training simultaneously generate adversarial
samples that can fool a classifier and force the classifier to give similar predictions for a clean
sample and its perturbed version so as to improve the robustness of the classifier. Adversarial
training [47, 50, 68, 214, 243] is also investigated to defend against graph adversarial attacks, which
can be generally formulated as the following min max game:

min max L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (A + ΔA, X + ΔX )), (23)


𝜃 ΔA ∈ PA ,Δx ∈ PX

where L𝑡𝑟𝑎𝑖𝑛 is the classification loss on the labeled nodes. ΔA and ΔX stand for the perturbations
on the topology structure and node attributes, respectively. P (A) and P (X) denote the allowable
perturbations within the attack budget. Adversarial training on GNN is firstly explored in [243],
where the perturbations on graph topology are generated by a PGD algorithm. Some variants
of [243] are investigated in [29, 221]. Considering that node classification is a semi-supervised
learning task by nature, virtual graph adversarial training [50, 68] is applied to further encourage
the smoothness of model predictions on both labeled and unlabeled nodes. In [50, 68], only node
feature perturbations are considered in the virtual graph adversarial training as:
∑︁
min max L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (A, X + ΔX )) + 𝛼 𝐷 𝐾𝐿 (𝑝 (𝑦|x𝑖 ; 𝜃 )||𝑝 (𝑦|x𝑖 + Δx𝑖 ; 𝜃 )), (24)
𝜃 ΔX ∈ P (X)
𝑣𝑖 ∈ V

, Vol. 1, No. 1, Article . Publication date: September 2023.


26 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

where 𝛼 is the weight of virtual adversarial training regularizer. 𝑝 (𝑦|x𝑖 denotes the prediction of
node 𝑣𝑖 . 𝐷 𝐾𝐿 (𝑝 (𝑦|x𝑖 ; 𝜃 )||𝑝 (𝑦|x𝑖 + Δx𝑖 ; 𝜃 )) enforces the predictions on unlabeled nodes to be similar
with and without perturbation.
4.3.2 Certifiable Robustness. Though various approaches such as graph adversarial training have
been proposed to improve robustness against adversarial samples, new attacks may be developed
to invalidate the defense methods, leading to an endless arms race. To address this problem, recent
works [211, 282] analyze the certifiable robustness of GNNs to understand how the worst-case
attacks will affect the model. Certifiable robustness aims to provide certificates to nodes that are
robust to potential perturbations in considered space. For each node 𝑣 ∈ V with label 𝑦, the
certificate 𝑚(𝑣; 𝜃 ) can be obtained by solving the following optimization problem
𝑚(𝑣; 𝜃 ) = min max 𝑓𝜃 ( Ĝ)𝑣𝑦 − 𝑓𝜃 ( Ĝ)𝑣𝑖 , (25)
Ĝ ∈Φ( G) 𝑖≠𝑦

where 𝑓𝜃 ( Ĝ)𝑣𝑖 denotes the predicted logit of node 𝑣 in class 𝑖 and Φ(G) indicates all allowable
perturbed versions of the graph. If 𝑚(𝑣) > 0, the GNN is certifiably robust w.r.t. node 𝑣 in considered
space Φ(G). In other other, any adversarial samples in Φ(G) cannot change the prediction to node
𝑣 from the target model. The work [282] firstly investigates the certifiable robustness against
the perturbations on node features. Some following works [21, 101, 211, 283] further analyze the
certifiable robustness under topology attacks. For instance, [283] proposes a branch-and-bound
algorithm that obtains a tight bound on the global optimum of the certificates for topology attacks.
In [21], the certificates of a page-rank and a family of GNNs such as APPNP [115] are efficiently
computed by exploiting connections to PageRank and Markov decision processes. A technique
of randomized smoothing is applied in [211] to give certifiable guarantees to any GNNs. The
randomized smoothing will inject noises to the test samples to mitigate the negative effects of the
adversarial perturbations. And the obtained certificates are proven to be tight.
Apart from methods of computing certificates of a trained GNN, robust training that aims to
increase the certifiable robustness is also investigated in [21, 282]. The main idea is to directly
maximize the worst-case margin 𝑚(𝑣; 𝜃 ) during training to encourage the model to learn more
robust weights. In particular, a robust hinge loss can be added to the training loss to improve
certifiable robustness, which can be formally written as
∑︁
min L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (G)) + max(0, 𝑀 − 𝑚(𝑣; 𝜃 )), (26)
𝜃
𝑣∈V

where L𝑡𝑟𝑎𝑖𝑛 denotes the classifcation loss, and 𝑀 > 0 is the hyperparameter for the hinge loss.
The worst-case margin,i.e., 𝑚(𝑣; 𝜃 ) is encouraged to be larger than 𝑀 with Eq.(26).
4.3.3 Graph Denoising. The adversarial training and certifiable robustness are effective to train
robust GNNs to defend against evasion attacks, i.e., attacks happen in the test stage. However, they
cannot deal with a poisoned training graphs which have been perturbed by adversarial attacks. In
the early investigation about poisoning attacks [229], topology attack is found to be more effective
and in favor by the positioning attacks; while feature-only perturbations generally fail to change the
predictions of the target node due to the high dimension of node attribute. Therefore, a promising
direction of defensing positioning attacks is to denoise the graph structure to reduce the negative
effects of the injected perturbations. Based on the way of denoising the graph, existing methods
can be split into Pre-processing, Graph Structure Learning, and Attention-Based methods.
Pre-processing. Pre-processing based approaches first denoise the graph using heuristics about
network properties or attack behaviors. Then, the GNN model can be trained on the denoised
graph to give correct predictions that are not affected by the poisoning attacker. The work [229]

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 27

firstly proposes a simple and effective pre-processing defense method based on the following two
observations on graph adversarial attacks : (i) perturbing graph structures are more effective than
modifying the node attributes; and (ii) attackers tend to add adversarial edges by linking dissimilar
nodes instead of deleting existing edges. Hence, GCN-Jaccard is proposed in [229] to defend against
adversarial attacks by eliminating the edges connecting nodes with low Jaccard similarity of node
features. Experimental results show the effectiveness and efficiency of this defense method. Apart
from the observations about the properties of linked nodes, the adversarial attack is found to result
in high-rank spectrum of adjacency matrix, which corresponds to low singular values [63]. Based on
the observation of high-rank attack, low-rank approximation with truncated SVD is used to denoise
the graph to resist poisoning attacks. Specifically, they retain a truncated SVD that contains only
the top-k singular values of the adjacency matrix. Then, the denoised graph can be reconstructed
from the truncated SVD. Their experiments show that only keeping the top 10 singular values of
the adjacency matrix is able to defend against Nettack [280].
Graph Structure Learning. Graph structure learning methods [42, 105, 142] aim to simultaneously
learn a denoised graph and a GNN model that can give accurate predictions based on the denoised
graph. Inspired by the fact that adversarial attacks will lead to high-rank adjacency matrix, Pro-
GNN [105] proposes to learn a clean adjacency matrix S with the constraint that (i) S is low-rank
and close to the raw perturbed adjacency matrix A such that the adversarial edges are likely to be
removed; (ii) S should facilitate node classification; and (iii) S should maintain feature smoothness,
i.e., link nodes of similar features. The overall objective function of Pro-GNN can be written as:
min L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (X, S)) + 𝛼 ∥A − S∥ 2𝐹 + 𝛽 ∥S∥ ∗ + 𝜆𝑡𝑟 (X𝑇 L̂X), (27)
𝜃,S

where L𝑡𝑟𝑎𝑖𝑛 (𝑓𝜃 (X, S)) is the classification loss using S. ∥S∥ ∗ stands for the nuclear norm of the
learned adjacency matrix S to encourage low-rank of S. The last term 𝑡𝑟 (X𝑇 L̂X) is to encourage
the learned adjacency matrix to link nodes of similar features, where L̂ is the Laplacian matrix
of S. Similar to Pro-GNN, PTDNet [142] also adopt a low-rank constraint to learn to drop noisy
edges in an end-to-end manner. But different from Pro-GNN which directly optimizes the adjacency
matrix of denoised graph, PTDNet deploys a parameterized denoising network to predict whether
to remove the edge with the representations of two nodes from a GNN model.
Though the defense methods using low-rank constraint are proven to be effective, the com-
putation cost of nuclear norm is too expensive for large-scale graphs. Recently, a robust struc-
tural noise-resistant GNN (RS-GNN) [42] is proposed to learn an link predictor that efficiently
eliminate/down-weight the noisy edges with the weak supervision from the adjacency matrix.
In real-world graphs, nodes with similar features and labels tend to be linked; while noisy edges
would link nodes of dissimilar features. Therefore, RS-GNN deploys a MLP to predict the weight
of the link between 𝑣𝑖 and 𝑣 𝑗 by: 𝑤 (𝑖, 𝑗) = 𝑓 (h𝑇𝑖 h 𝑗 ), where h𝑖 = 𝑀𝐿𝑃 (x𝑖 ) and 𝑓 is the activation
function such as sigmoid. A novel feature similarity weighted edge-reconstruction loss to train
the link predictor to encourage lower weights are assigned to noisy edges. The link prediction is
further utilized to predict the missing links in the graph, which can involve more unlabeled nodes
in the training to address the challenge of label sparsity.
Reinforcement learning is also applied in graph structure learning for robust representation
learning [217]. Specifically, Graph Denoising Policy Network (GDPNet) [217] focuses on denoising
the one-hop subgraph of each node. Whether to involve the neighbors of a node is determined
sequentially. Therefore, the action space would be whether the selected node 𝑢𝑡 ∈ N (𝑣) at step
𝑡 should be linked with 𝑣. The state 𝑠𝑡 = [h𝑡𝑣 , h𝑢𝑡 ] contains the representations of node 𝑣 by
aggregating the previously selected neighbors N̂𝑡 (𝑣) and the selected node 𝑢𝑡 . The prediction
scores on the downstream tasks are used as the reward signal for the neighbor selection phase.

, Vol. 1, No. 1, Article . Publication date: September 2023.


28 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Attention Mechanism. Attention-based defense methods [202, 263] aim to penalize the weights
of adversarial edges or nodes in the aggregation of each GNN layer to learn robust representations.
In [202], PA-GNN utilizes auxiliary clean graphs that share the same data distribution with the
target poisoned graph to learn to penalize the adversarial edges. Specifically, adversarial edges are
injected to the clean graphs to provide supervision for the penalized aggregation mechanism. Let
a𝑖(𝑙𝑗 ) be the attention score assigned to the edge linking 𝑣𝑖 and 𝑣 𝑗 in 𝑙-th GNN layer. PA-GNN wants
the attention weights of clean edges to be larger than the perturbed edges by a margin 𝜂 as
 
L𝑑𝑖𝑠𝑡 = − min 𝜂, E a𝑖(𝑙𝑗 ) − E a𝑖(𝑙𝑗 ) , (28)
(𝑣𝑖 .𝑣 𝑗 ) ∈ E𝐶 ,1≤𝑙 ≤𝐿 (𝑣𝑖 .𝑣 𝑗 ) ∈ E𝑃 ,1≤𝑙 ≤𝐿

where E𝐶 and E𝑃 are the set of clean edges and perturbed adversarial edges from auxiliary graphs.
PA-GNN further adopts Meta-learning to transfer the ability of penalizing adversarial edges to GNN
on the target graph. GNNGuard [263] computes the attention scores based on the cosine similarity
of node representations from last layer. With the similarity-based attention, the adversarial edges
are likely to be assigned with small weights since they generally link dissimilar nodes.
4.3.4 Other Types of Defense Methods Against Graph Adversarial Attacks. In this subsection, we
briefly introduce defense methods that do not belong to the aforementioned categories.
Robust Aggregation. Some efforts [32, 75, 276] have been taken to design a robust aggregation
mechanism the restrict the negative effects of perturbations in the graphs. For instance, RGCN [276]
adopts Gaussian distributions as the hidden representations of nodes in each graph convolutational
layer. As a result, the proposed RGCN could absorb the effects of adversarial changes in the variances
of the Gaussian distributions. In [32], a median aggregation mechanism is designed to improve the
robustness of GNNs. In median aggregation, the median value of each dimension of the neighbor
embeddings is used to capture the context information. Only when the portion of clean nodes in
the neighbors is less than 0.5, the perturbed values will be selected in median aggregation, which
implies its benefits to the model robustness. Following [32], a soft median aggregation mechanism
is applied for scalable defense [75], which computes the weighted mean where the weight of a an
entry is based on the distance to the dimension-wise median. Extensive experiments on large-scale
graphs with up to 100M nodes demonstrate the validity and efficiency.
Self-Supervised Learning Defense Methods. To address the problem of lacking labels, various
self-supervised learning tasks such as link prediction [112] and contrastive learning [250] have
been proposed to help representation learning of GNNs. In addition to the prediction accuracy, it
is found that some self-supervised tasks can improve the robustness of the GNNs. For instance,
SimP-GCN [103] employs a self-supervised similarity preserving task which enforces similar rep-
resentations for nodes with similar attributes. Therefore, the nodes whose local graph structures
are perturbed can still preserve useful representations. In contrastive learning, maximizing the
representation consistency between the original graphs and the augmented views of edge perturba-
tion [45, 250] can also result in a more robust model. Some adversarial graph contrastive learning
and variants [69, 81, 133, 214? ] are developed to further improve the robustness by introducing an
adversarial view of graphs.

4.4 Applications of Robust GNNs


Since robust GNNs can defend against adversarial attacks, applications in safety-critical domains
will particularly benefit from robust GNNs. For instance, the investigations in bioinformatics graphs
such as protein-protein network [244] and brain network [110] require robust GNNs to defend the
attacks in bioinformatics [201] to guarantee the safety. In addition, recent research also shows the
vulnerability of GNNs in knowledge graph modeling [256]. Nowadays, GNNs have been widely

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 29

applied to learn useful representations from the knowledge graph to facilitate various downstream
tasks such as recommendation system [213]. Therefore, robust GNNs is required for the application
of knowledge graphs. Some works have attempted to apply GNNs in Financial analysis such as
credit estimation and fraud detection [216]. Therefore, robust GNNs are urgent for the security of
GNNs in real-world financial analysis.

4.5 Future Research Directions of Robust GNNs


Scalable Robust GNNs. As discussed in Sec.4.2.6, some initial efforts have verified that adversarial
attacks can be applied on extremely large graphs to achieve the attackers’ goal. However, scalable
defense methods are rather limited [75]. Though some methods such as GCN-Jaccard is efficient, the
defense performance is generally not good enough. For more advanced methods such as Pro-GNN,
the computation cost is unaffordable for large-scale graphs. Thus, it is an emerging and promising
direction to develop scalable robust GNNs.
Robust GNNs on Heterogeneous Graphs. Many real-world graphs such as product-user network
are heterogeneous, which contain diverse types of objects and relations. Extensive Heterogeneous
Graph Neural Networks (HGNNs) have been investigated for heterogeneous graphs. However,
recent analysis [258] also shows that the adversarial attacks bring more negative effects to metapath-
based HGNNs than general GNNs. Despite extensive works on robust GNNs, they are dedicated to
homogeneous graphs, which can rarely handle heterogeneous graphs. Thus, developing robust
HGNNs still remains an open problem.
Robust GNNs Against Label Noises. Existing works mainly focus on defending the adversarial
attacks on graph structure and node features; while the noises and attacks on labels such as
label-flipping attack [257] can also significantly degrade the performance for GNNs. Several initial
efforts [45, 129, 257] are conducted to address the challenge of label noises. For instance, the authors
in [40] firstly develop a label noise-resistant GNN (NRGNN) by linking the unlabeled nodes with
(pseudo) labeled nodes with similar features, which can improve the performance of GNNs against
label noise or label flipping attack. Though promising, the robust GNNs against label noises is still
in an early stage that needs further investigation.
Robust Pre-training GNNs. Recently, various pre-training GNN frameworks [92, 250] have been
investigated to leverage the large scale of data for downstream tasks. The adversarial attacks can
also be applied to the pre-training GNN. For example, backdoor can be injected to a self-supervised
learning GNN to mislead the model give target prediction to the target instance even after the
fine-tuning [235]. Considering the pre-training GNN model will be utilized to various datasets and
downstream tasks, a success adversarial attack on pretraining GNNs could cause huge losses. Thus,
it is necessary to develop robust pre-training GNNs.

5 FAIRNESS OF GRAPH NEURAL NETWORKS


Fairness is one of the most important aspects of trustworthy graph neural network. With the rapid
development of graph neural network, GNNs have been adopted to various applications. However,
recent evidence shows that similar to machine learning models on i.i.d data, GNNs also can give
unfair predictions due to the societal bias in the data. The bias in the training data even can be
magnified by the graph topology and message-passing mechanism of GNNs [44]. For example,
recommendation system based on random walk is found to prevent females from rising to the most
commented and liked profiles [170, 194]. A similar issue has been found in book recommendation
with graph neutral network, where the GNN methods could be biased towards suggesting books
with male authors [24]. These examples imply that GNNs could be discriminated towards the
minority group and hurt the diversity in culture. Moreover, the discrimination would largely limit

, Vol. 1, No. 1, Article . Publication date: September 2023.


30 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

the wide adoption of GNNs in other domains such as ranking of job applicants [152] and loan fraud
detection [239], and even can cause legal issues. Therefore, it is crucial to ensure that GNNs do
not exhibit discrimination towards users. Hence, many works have emerged recently to develop
fair GNNs to achieve various types of fairness on different tasks. In this section, we will give a
comprehensive review of the cutting-edge works on fair GNNs. Specifically, we first introduce
the major biases that challenge fairness of GNNs. We then describe various concepts of fairness
that are widely adopted in literature, followed by categorization and introduction of methods for
achieving fairness on graph-structured data. Finally, we present public datasets and applications
and discuss future directions that need further investigation.

5.1 Bias Issues in Graph-Structured Data and Graph Neural Networks


Biases widely exist in real-world datasets, which can lead to unfair predictions of machine learning
models. Olteanu et al. listed various biases that exist in social data [162]. Suresh et al. further
discussed various types of biases that cause discrimination issues of machine learning models [199].
According to [152], the bias in machine learning can appear in different stages such as data,
algorithm and user interaction. In this paper, we mainly focus on biases in graph-structured data
and on GNNs. For a comprehensive review of biases that occur in other phases such as training
and evaluation of machine learning models on i.i.d data, please refer to the survey [152].
First, similar to i.i.d data, node attributes/features are often available in graph-structured data. In
addition, the data collection of graph-structured data follows similar procedures to i.i.d data such
as data sampling and label annotation. Thus, the following biases that widely exist on i.i.d data also
exist in graphs.
• Historical Bias. Various biases such as gender bias and race bias exist in the real world due
to historical reasons. These biases can be embedded and reflected in the data. For a system
that reflect the world accurately, it can still inflict harm on a population who experience the
historical bias [199]. An example of this type of bias is the node embedding learning for link
prediction [170]. In particular, users with the same sensitive attributes such as gender and race
are more likely to be linked in a real-world graph. As a result, the learned node embeddings will
tend to link users with the same gender/race. Then, applications such as friend recommendation
that built on these types of node embeddings will reinforce the historical bias.
• Representation Bias. Representation bias occurs when the collected samples under-represent some
part of the population, and subsequently fails to generalize well for a subset of the use population [199].
The representation bias can be caused in several ways: (i) the target population does not reflect
the use population; (ii) the target population under-represent certain groups; and (iii) the sampled
data does not reflect the target population.
• Temporal Bias. Temporal bias arises from differences in populations and behaviors over time [162].
For graphs, temporal bias can be caused by both the change of node attributes and graph topology.
One example is the social network, where the attributes and links of users will evolve over time.
• Attribute Bias. Given an attributed network and the corresponding group indicator (w.r.t. the
sensitive attribute) for each node. For any attribute, if the distributions between different demographic
groups are different, then attribute bias exists in graph [53]. Attribute bias focuses on the biases in
the node attributes of the graphs.
In addition to the aforementioned biases, there are unique types of biases in graph-structured data
due to the graph topology:
• Structural Bias. Given an undirected attributed network and the corresponding group indicator
(w.r.t. sensitive attribute) for each node. If any information propagation promotes the distribution

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 31

difference between different groups at any attribute dimension, then structural bias exists in the
graph [53].
• Linking Bias. Linking bias arises when network attributes obtained from user connections, activities,
or interactions differ and misrepresent the true behavior of the users [162]. For instance, it is found
that younger people are more closely connected than older generations on social network [54].
Moreover, low-degree nodes are more likely to be falsely predicted [203], which leads to degree-
related bias.
Generally, GNNs adopt a message-passing mechanism, which aggregates the information of
neighbors to enrich the representation of the target nodes. As a result, the learned representations
can capture both node attributes and its local topology, which can facilitate various tasks such as
node classification. However, due to the biases in topology, the message passing mechanism of GNNs
can magnify the biases compared with MLP [44]. In graphs such as social networks, nodes of similar
sensitive attributes are more likely to connect to each other [54, 170]. For example, young people
tend to build friendship with people of similar age on the social network [54]. The message passing
in GNNs will aggregate the neighbor features. Thus, GNNs learn similar representations for nodes
of similar sensitive information while different representations for nodes of different sensitive
features, leading to severe bias in decision making, i.e., the predictions are highly correlated with
the sensitive attributes of the nodes.

5.2 Fairness Definitions


In this subsection, we introduce the most widely used fairness definitions, which can be generally
split into two categories, i.e., group fairness and individual fairness.
5.2.1 Group Fairness. The principle of group fairness is to ensure groups of people with different
protected sensitive attributes receive comparable treatments statistically. Various criteria of group
fairness have been proposed. Next, we will introduce the mostly used group fairness definitions.
Definition 5.1 (Statistical Parity [59]). Statistical parity, also known as demographic parity, requires
the prediction 𝑦ˆ to be independent with the sensitive attribute 𝑠, i.e., 𝑦⊥𝑠.ˆ The majority of the
literature focus on binary classification and binary attribute, i.e., 𝑦 ∈ {0, 1} and 𝑠 ∈ {0, 1}. In this
situation, statistical parity can be formally written as:
𝑃 (𝑦ˆ = 1|𝑠 = 0) = 𝑃 (𝑦ˆ = 1|𝑠 = 1). (29)
According to statistical parity, the membership in the protected sensitive attributes should have
no correlation with the decision from the classifier. Given the definition in Eq.(29), the fairness in
terms of statistical parity can be measured by:
Δ𝑆𝑃 = |𝑃 (𝑦ˆ = 1|𝑠 = 0) − 𝑃 (𝑦ˆ = 1|𝑠 = 1)|. (30)
A lower Δ𝑆𝑃 indicates a more fair classifier. The statistical parity can be easily extended to multi-
class and multi-category sensitive attributes problem by ensuring 𝑦⊥𝑠. ˆ According to [138], let
𝑦 ∈ {𝑦1, . . . , 𝑦𝑐 } and s ∈ {𝑠 1, . . . , 𝑠𝑘 } denotes the multi-class label and multi-category sensitive
attribute, where 𝑐 is number of classes and 𝑘 is number of sensitive attribute categories, the
evaluation metric can be extended to:
𝑘
1 ∑︁
Δ𝑆𝑃 = max |𝑃 (𝑦ˆ = 𝑦 𝑗 ) − 𝑃 (𝑦ˆ = 𝑦 𝑗 |𝑠 = 𝑠𝑖 )|. (31)
𝑘 𝑖=1 𝑦 𝑗
Statistical parity is the first fairness definition and have been widely adopted. However, some
following works [85] argue that statistical parity often cripples the utility of the model. Hence,
Equalized Odd is proposed to alleviate the issue, which is defined as:

, Vol. 1, No. 1, Article . Publication date: September 2023.


32 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Definition 5.2 (Equalized Odds [85]). A predictor satisfies equalized odds with respect to protected
ˆ ∥𝑦.
attribute 𝑠 and class label 𝑦, if the prediction 𝑦ˆ and 𝑠 are independent conditioned on 𝑦, i.e., 𝑦⊥𝑠
When 𝑦 ∈ {0, 1} and 𝑠 ∈ {0, 1}, this can be formulated as:
𝑃 (𝑦ˆ = 1|𝑦 = 𝑣, 𝑠 = 0) = 𝑃 (𝑦ˆ = 1|𝑦 = 𝑣, 𝑠 = 1), ∀𝑣 ∈ {0, 1}. (32)
Equalized odds enforces that the accuracy is equally high in all demographics, punishing models
that perform well only on the majority. In the binary classification, we often set 𝑦 = 1 as the
“advantaged” outcome, such as “not defaulting on a loan” or “admission to a college”. Hence, we
can relax the equalized odds to achieve fairness within the “advantaged” outcome group, which is
known as Equal Opportunity.
Definition 5.3 (Equal Opportunity [85]). It requires that the probability of an instance in a positive
class being assigned to a positive outcome should be equal for both subgroup members, i.e.,
𝑃 (𝑦ˆ = 1|𝑦 = 1, 𝑠 = 0) = 𝑃 (𝑦ˆ = 1|𝑦 = 1, 𝑠 = 1). (33)
Equal opportunity expects the classifier to give equal true positive rates across the subgroups,
which allows a perfect classifier. Similar to statistical parity, equal opportunity can be measured by
Δ𝐸𝑂 = |𝑃 (𝑦ˆ = 1|𝑦 = 1, 𝑠 = 0) − 𝑃 (𝑦ˆ = 1|𝑦 = 1, 𝑠 = 1)|. (34)
Equalized odds and equal opportunity can be naturally extended to multi-class and multi-category
sensitive attributes setting by changing the range of sensitive attributes and labels.
Definition 5.4 (Dyadic Fairness [126]). This can be viewed as an extension of statistical parity for
link prediction. It requires the link predictor to give predictions independent with the sensitive
attributes of the target nodes. A link prediction algorithm satisfies dyadic fairness if the predictive
score satisfies:
𝑃 (𝑔(𝑢, 𝑣)|𝑠 (𝑢) = 𝑠 (𝑣)) = 𝑃 (𝑔(𝑢, 𝑣)|𝑠 (𝑢) ≠ 𝑠 (𝑣)), (35)
where 𝑔(·) is the link predictor, 𝑠 (𝑢) and 𝑠 (𝑣) denote the sensitive attributes of node 𝑢 and 𝑣,
respectively.
Since the dyadic fairness is extended from the link prediction, the evaluation metric can be
simply extended from Δ𝑆𝑃 in Eq.(31) by replacing the classification probability to link prediction
probability, i.e., Δ𝐷𝐸 = |𝑃 (𝑔(𝑢, 𝑣)|𝑠 (𝑢) = 𝑠 (𝑣)) − 𝑃 (𝑔(𝑢, 𝑣)|𝑠 (𝑢) ≠ 𝑠 (𝑣))|
5.2.2 Individual Fairness. While group fairness can maintain fair outcomes for a group of people,
a model can still behave discriminatorily at the individual level. Individual fairness is based on the
understanding that similar individuals should be treated similarly.
Definition 5.5 (Fairness Through Awareness [59]). Any two individuals who are similar should
receive similar algorithmic outcome. Let 𝑢, 𝑣 ∈ X be two data points in dataset X, and 𝑓 (·) denotes
a mapping function. The fairness through awareness can be formulated as:
𝐷 (𝑓 (𝑢), 𝑓 (𝑣)) ≤ 𝑑 (𝑢, 𝑣), (36)
where 𝐷 (·) and 𝑑 (·) are two distance metrics required to be defined in the application context.
Definition 5.6 (Counterfactual Fairness [120]). The counterfactual fairness enforces that predic-
tions for an individual in real-world should remain unchanged in a counterfactual world where
the individual’s protected attributes had been different. Let 𝑌ˆ𝑆←𝑠 (𝑈 ) and 𝑌ˆ𝑆←𝑠 ′ (𝑈 ) denote the
predictions of a sample with background variable 𝑈 whose sensitive attributes are set as 𝑠 and 𝑠 ′ ,
respectively. A predictor is counterfactually fair if under any context 𝑋 = 𝑥 and 𝑆 = 𝑠:
𝑃 (𝑌ˆ𝑆←𝑠 (𝑈 ) = 𝑦|𝑋 = 𝑥, 𝑆 = 𝑠) = 𝑃 (𝑌ˆ𝑆←𝑠 ′ (𝑈 ) = 𝑦|𝑋 = 𝑥, 𝑆 = 𝑠), (37)
for all 𝑦 and any value 𝑠 ′ of protected attribute 𝑆.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 33

Table 6. Categorization of fair models on graphs according to the revised stage.


Category References
Pre-processing [108], [53], [192]
In-processing [44], [22], [148], [53], [218], [4], [126], [108], [52], [24], [117], [111], [170]
Post-processing [108]

The counterfactual fairness can be viewed as individual fairness whose similarity metric treats
the individual and its counterfactual sample as a similar pair. How to evaluate individual fairness
remains an under-explored research direction. In [108], a measure of individual fairness is proposed.
The metrics of group fairness such as Δ𝑆𝑃 are also utilized for evaluating the counterfactually fair
GNNs [4].

5.3 Fairness-Aware Graph Neural Networks


Extensive attempts have been proposed to eliminate the discrimination in machine learning models
on i.i.d data [152]. However, these methods cannot be directly applied to graph-structured data
because of the unique biases brought by the graph topology and message passing mechanism.
Recently, with the remarkable success of GNNs, the concern on fairness issue of GNNs is attracting
increasing attention. In this section, we introduce the debiasing methods for achieving fairness in
GNNs. Following the categorization of fair machine learning algorithms on i.i.d data [136, 152],
existing fairness-aware algorithms can be split into pre-processing methods, in-processing methods,
and post-processing methods, based on which stage the debiasing is conducted. Pre-processing
approaches are applied to eliminate the bias in data with fair pre-processing methods. In-processing
approaches are designed to revise the training of machine learning models to ensure the predictions
meet the target fairness definition. Post-processing methods directly change the predictive labels
to ensure fairness. Table 6 lists existing works on Fair GNNs into these three categories. Based
on the techniques they apply, we categorize the debiasing methods on graph-structured data into
Adversarial Debiasing, Fairness Constraints, and others. Next, we introduce the details of the methods
following the categorization based on the techniques.
5.3.1 Adversarial Debiasing. Using adversarial learning [78] to eliminate the bias is firstly in-
vestigated in the fair machine learning models on i.i.d data [19, 62, 131, 147]. Several efforts [22,
44, 53, 148] have been taken to extend the adversarial debiasing on graph-structured data. An
illustration of adversarial debiasing is presented in Figure 6a. Generally, adversarial debiasing
adopts an adversary 𝑓𝐴 to predict sensitive attributes from the representations H of an encoder 𝑓𝐸 ;
while the encoder aims to learn representations that can fool the adversary while can give accurate
predictions for the task at hand, say node classification. With the minmax game, the final learned
representations will contain no sensitive information, resulting in fair predictions that independent
with the sensitive attributes. Thus, statistical parity or dyadic fairness can be guaranteed with the
adversarial biasing in node classification and link prediction, respectively. The objective function
of adversarial debiasing can be formulated as
min max L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 (𝑓𝐸 (G; 𝜃 𝐸 )) − 𝛽L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 (𝑓𝐴 (H; 𝜃 𝐴 )), (38)
𝜃𝐸 𝜃𝐴

where 𝜃 𝐸 and 𝜃 𝐴 are parameters of encoder 𝑓𝐸 and adversary 𝑓𝐴 , respectively. L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 is the loss
function to ensure the utility of the learned representations such as node classification loss and
link reconstruction loss. L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 is the adversarial loss, which generally is cross entropy loss of
sensitive attribute prediction of the adversary based on the learned node representations H. 𝛽 is
the hyperparameter to balance the contributions of these two loss terms.

, Vol. 1, No. 1, Article . Publication date: September 2023.


34 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 7. Fair graph neural networks that adopt adversarial debiasing.


Methods Task Fairness L𝐴𝑑 𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙
FairGNN [44] Node Classification Statistical Parity Cross entropy between 𝑠ˆ and 𝑠˜
EDITS [53] Node Classification Statistical Parity Wasserstein distance
Node classification Statistical Parity
Compositional [22] Cross entropy between 𝑠ˆ and 𝑠
Link prediction Dyadic fairness
FLIP [148] Link Prediction Dyadic fairness Cross entropy between 𝑠ˆ and 𝑠

Adversary Predicted vectors


model 𝑓𝐴
GNN Classifier 𝓛𝒖𝒕𝒊𝒍𝒊𝒕𝒚
𝓛𝑨𝒅𝒗𝒆𝒓𝒔𝒂𝒓𝒊𝒂𝒍
𝓛𝒖𝒕𝒊𝒍𝒊𝒕𝒚
Training dataset
GNN Encoder Classifier
Fairness
𝑓𝐸 model 𝓛𝒇𝒂𝒊𝒓𝒏𝒆𝒔𝒔
Regularizer
Training dataset

(a) Framework of Adversarial Debiasing models. (b) Framework of Fairness Constraint models.
Fig. 6. An illustration of fairness-aware GNNs.

The seminar work [22] firstly applies adversarial debiasing on graph-structured data to learn fair
node embeddings. Specifically, it adopts adjacency matrix reconstruction with negative sampling
as the utility loss to learn node embeddings. Let 𝑔(𝑣𝑖 , 𝑣 𝑗 ) be the Ípredicted
Í probability that 𝑣𝑖 and
𝑣 𝑗 are linked. The utility loss can be written as: L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 = 𝑣𝑖 ∈ V 𝑣 𝑗 ∈ N (𝑣𝑖 ) −[log(𝑔(𝑣𝑖 , 𝑣 𝑗 )) +
Í𝑄
𝑛=1 E𝑣𝑛 ∼𝑃𝑛 (𝑣𝑖 ) log(1 − 𝑔(𝑣 𝑖 , 𝑣 𝑛 ))], where N (𝑣) represents the neighbors of node 𝑣 and 𝑃𝑛 (𝑣 𝑖 ) is the
distribution of sampling negative nodes for 𝑣𝑖 . 𝑄 is number of negative samples. An MLP is deployed
as the adversary 𝑓𝐴 to predict the sensitive attributes from the node embeddings H. The adversarial
loss is given as the binary cross entropy loss between the predictions from the adversary and the real
Í
sensitive attributes, i.e., L𝐴𝑑𝑣𝑒𝑟𝑠𝑎𝑟𝑖𝑎𝑙 = 𝑣 ∈ V𝑆 −[𝑠 𝑣 log(𝑠ˆ𝑣 ) + (1 −𝑠 𝑣 ) log(1 − 𝑠ˆ𝑣 )]. Aasrour et al. [148]
investigate adversarial debiasing with similar implementation of losses to learn representations for
fair link prediction to avoid the separation of users. In addition, they propose a metric to determine
whether the predicted links will lead to further separation of the network to evaluate the model.
Though the aforementioned methods achieve fairness with adversarial debiasing, they focus
on learning node embeddings and do not use GNN model as the encoder. Recently, FairGNN [44]
proposes a framework for fair node classification with graph neural networks. It uses the node
Í
classification loss as the utility loss, i.e., L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 = 𝑣 ∈ V𝐿 −[𝑦 𝑣 log(𝑦ˆ𝑣 ) + (1 −𝑦 𝑣 ) log(1 − 𝑦ˆ𝑣 )) where
V𝐿 is the set of labeled nodes. For many real-world applications such as user attribute prediction
in social medium, obtaining sensitive attributes of nodes for is difficult. To address the challenge
of lacking sensitive attributes for adversarial debiasing, FairGNN adopts a GCN-based sensitive
attribute estimator to estimate sensitive attributes 𝑠˜ for nodes with missing sensitive attributes. It
then uses the output of adversary 𝑓𝐴 and the estimated sensitive attribute 𝑠˜ for adversarial loss.
In addition, theoretical analysis in [44] demonstrates that the statistical parity can be guaranteed
with adversarial debiasing given the estimated sensitive attributes.
All the aforementioned methods focus on debiasing the node representations. Alternatively,
adversarial debiasing can also be applied to debias the original graph data. For example, EDITS [53]
utilize a WGAN-based [9] framework to eliminate the attribute bias and structural bias. The attribute
matrix X is revised to ensure same attribute distribution between different demographic groups.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 35

Table 8. Fair graph neural networks using fairness constraints.


Method Task Fairness L 𝑓 𝑎𝑖𝑟𝑛𝑒𝑠𝑠
FairGNN [44] Node classification Statistical Parity |𝐶𝑜𝑣 ( 𝑦,
ˆ 𝑠ˆ ) |
UGE [218] Node classification Statistical Parity |𝑔 (x𝑢 , x𝑣 ) − 𝑔 ( x̃𝑢 , x̃𝑣 ) |
FairAdj [126] Link prediction Dyadic fairness |E[𝑔 (𝑢, 𝑣) |𝑠𝑢 = 𝑠 𝑣 ] − E[𝑔 (𝑢, 𝑣) |𝑠𝑢 ≠ 𝑠 𝑣 ] |
PageRank Í Í
InFoRM [108] Spectrual Clustering Individual Fairness 𝑖 𝑗 S𝑖 𝑗 (y𝑖 − y 𝑗 ) 2
Embedding Learning
REDRESS [52] Node & Link Individual Fairness Consistency between SG and S𝑌ˆ
NIFTY [4] Node classification Counterfactual fairness | 𝑓 (𝑢˜ ) − 𝑓 (𝑢 ) |

Similarly, an adjacency matrix A that will not cause bias after information propagation is learned.
More details of the representative adversarial debiasing methods are listed in Table 7.
5.3.2 Fairness Constraints. In addition to adversarial debiasing, directly adding fairness constraints
to the objective function of machine learning modes is another popular direction. These constraints
are usually derived from the fairness definitions introduced in Section 5.2. As the general frame-
work of fairness constraints in Figure 6b, they work as the regularization term and balance the
performance in prediction and fairness. The overall objective function can be written as
min L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 + 𝛽L 𝑓 𝑎𝑖𝑟𝑛𝑒𝑠𝑠 , (39)
𝜃
where 𝜃 is the set of model parameters to be learned, L𝑢𝑡𝑖𝑙𝑖𝑡 𝑦 is the loss function for the utility of the
model, L 𝑓 𝑎𝑖𝑟𝑛𝑒𝑠𝑠 denotes the applied fairness constraint, and 𝛽 controls the trade-off between utility
and fairness. To enforce different notion of fairness, various constraints have been investigated for
fair GNNs [4, 44, 52, 108, 126, 218]. The details of these methods are given in Table 8.
Statistical Parity & Dyadic Fairness. In FairGNN [44], apart from the adversarial debiasing, it also
adopts the covariance constraint for statistical parity to further enforce fairness. Specifically, the
covariance constraint minimizes the absolute covariance between the estimated sensitive attribute
𝑠˜ and predictions 𝑦,ˆ i.e., |𝐶𝑜𝑣 (𝑠,˜ 𝑦)|
ˆ = |E[(𝑠˜ − E(𝑠))
˜ (𝑦ˆ − E(𝑦))]
ˆ |. Enforcing the predictions to have
no correlation with the estimated sensitive attributes will be helpful to learn classifier that give
predictions independent with the protected attributes. UGE [218] assumes that a bias-free graph
can be generated from the pre-defined non-sensitive attributes. Then, a regularization term pushes
the embeddings to satisfy properties of the bias-free graph to eliminate bias. In particular, they
enforce 𝑔(x𝑢 , x𝑣 ), i.e. the probability of predicting links between nodes 𝑢 and 𝑣 with complete
attributes, is the same as 𝑔( x̃𝑢 , x̃𝑣 ), i.e., the probability of prediction links between 𝑢 and 𝑣 with
bias-free attributes. In FairAdj [126], a regularization term, |E[𝑔(𝑢, 𝑣)|𝑠𝑢 = 𝑠 𝑣 ] − E[𝑔(𝑢, 𝑣)|𝑠𝑢 ≠ 𝑠 𝑣 ] |,
that directly derived from dynamic fairness based on Eq.(35) is used to debias the adjacency matrix.
Individual Fairness. Moreover, two constraints for individual fairness Í Í are explored in InFoRM [108]
and REDRESS [52]. InFoRM [108] proposed a regularization term 𝑖 𝑗 S𝑖 𝑗 (y𝑖 − y 𝑗 ) 2 , where y𝑖 ∈ R𝑐
and y 𝑗 ∈ R𝑐 denote the prediction vectors of nodes 𝑣𝑖 and 𝑣 𝑗 , and S𝑖 𝑗 ∈ [0, 1] is the similarity
score between 𝑣𝑖 and 𝑣 𝑗 . In this way, for two similar nodes, their predictions will be encouraged
to be similar. As for REDRESS [52], it aims to optimize the consistency between the prediction
similarity matrix S𝑌ˆ and the oracle similarity matrix S G from a ranking perspective. For each node,
the relative order of each node pair by S𝑌ˆ and that by S G are enforced to be the same.
Counterfactual Fairness. To achieve counterfactual fairness, NIFTY [4] proposes to maximize the
agreement between the original graph and its counterfactual augmented views. More specifically,
the counterfactual sample 𝑢˜ of an initial sample 𝑢 is generated by (i) modifying the value of sensitive
attribute; and (ii) randomly masking the other attributes and perturb the graph structure. Then, the

, Vol. 1, No. 1, Article . Publication date: September 2023.


36 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 9. Fair models on graphs belonging to other categories


Methods Task Fairness
FairWalk [170] Link Prediction Statistical parity
CrossWalk [111] Node & Link Statistical parity
FairDrop [192] Node Classification Statistical parity
Debayes [24] Link Prediction Statistical parity
GMMD [277] Node Classification Statistical parity

constraint described in Table 8 can be applied to reduce the the gap between the predictions on the
original graph and its counterfactual samples, which will enable the counterfactual fairness. Our
survey [83] about counterfactual learning give more details about Counterfactual Fairness.
5.3.3 Fairness-aware GNNs in Other Categories. Apart from aforementioned fair GNNs, there are
several methods that do not belong to the adversarial debiasing or fairness constraints, which
are presented in Table 9. More specifically, users in the same sensitive attribute group are more
likely be sampled into a trace with the generally random walk, resulting to the correlation between
the sensitive attributes and predictions. Therefore, unbiased sampling strategies in the random
walk are investigated in [111, 170] to learn unbiased embeddings for downstream tasks such as
node classification and link prediction. FairDrop [192] proposes to drop more connections between
nodes sharing the same sensitive attribute to reduce the bias of homophily in sensitive attributes.
Debayes [24] investigates a Bayesian method that is capable of learning debiased embeddings by
using a biased prior. GMMD [277] designs a fairness-awareness message-passsing mechanism that
will encourage a node to aggregate representations of other nodes from diverse sensitive groups,
resulting in fair representations.

Table 10. Recent advances in fair graph neural networks.


Catergory Task Reference
Node Classification [4], [114], [82], [224]
Fair Augmentation View Methods
Contrastive Learning [118], [117], [135]
Explanation-Enhanced Fairness Node Classification [57], [56]

5.3.4 Recent Advances. In addition to the previously discussed approaches adopting adversarial
debiasing and fairness constraints, there are methods emerging in fair augmentation views and
enhancement of fairness through model explanations, which are listed in Tab. 10. Next, we introduce
these cutting-edge categories of fair GNNs in details.
Fair Augmentation View for Prediction and Contrastive Learning. The main idea of the
fair augmentation is to generate fairness-aware augmentation views and enforce the agreement
between the fair views and original graphs. In this way, the learned representations will achieve
fairness and whilst maintaining useful information from original graphs. NIFTY [4] is one of the
earliest methods that fail in this category. Specially, counterfactual views of graphs are generated
by randomly perturbations on node attributes, sensitive attributes, and graph structures. Then,
similarity between the original graph and its counterfactual fair augmented representations is
maximized to ensure fairness for node classification. Similar fairness-aware augmented views are
further extended to graph contrastive learning for fair node representation learning [117, 118].
Recently, further improvements have been made in the fair augmentation view generation [82,
135, 145, 224]. For example, GEAR [145] considers the sensitive information in the latent space
and uses a GraphVAE [114] to model the potential biases from neighboring nodes and the overall

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 37

graph structure. Through counterfactual data augmentation, GEAR generates perturbed data based
on sensitive attributes, aiming to minimize discrepancies between original and counterfactual
representations. CAF-GNN [82] using a matching-based method to find potential views with
different sensitive attributes and labels. Thus, the additional views can be more realistic. Then,
CAF-GNN derives proper constraints to ensure the fairness and completeness of the learned
representation.
Explanation-Enhanced Fairness. It has been empirically proven that utilizing graph structure
with the messaging-passing mechanism in GNNs can yield bias in predictions [44]. Gaining insights
into the origins of such discriminatory predictions is very useful in developing strategies to mitigate
biases in GNNs. There are several initial efforts [56, 57] in explaining source of model biases, aiming
to enhance the fairness of GNNs through the interpretations to unfairness. Next, we present the
existing two works of explanation-enhanced fairness in more details:
• REFEREE [57]: While biased GNN predictions can arise from various factors, the biased network
topology plays a pivotal role in originating and amplifying the discrimination of GNNs. Hence,
REFEREE focuses on explaining the unfairness by exploring the edges that maximally account
the bias. Let 𝑃 ( Ŷ0 ) and 𝑃 ( Ŷ1 ) denote the distributions of predictions on group 0 and group
1, respectively. The problem of finding biased edge set in node 𝑣𝑖 ’s computation graph can be
formulated as max E𝑖 𝑊1 (𝑃 ( Ŷ0 ), 𝑃 ( Ŷ1 )), where 𝑊1 measures the Wasserstein-1 distance. Similarly,
a fair edge set can be found by minimizing the same loss function. Some other regularization
terms are applied for higher quality of obtained biased and fair edge sets. Fair predictions can be
given by removing the biased edges while keeping the fair edges.
• BIND [56]: This approach aims to ascertain the extent to which a GNN model’s bias is influenced
by the presence of a specific training node within the graph. Specifically, BIND use the influential
function to compute the change in the GNN parameters, denoted as ΔW, when node 𝑣𝑖 removed
during the training. With the ΔW, the contribution of node 𝑣𝑖 to the predictive bias can be
quantified. can be computed. Then, the fairness can be improved by eliminating the nodes that
contribute most to the model biases.

5.4 Datasets for Fair GNNs


Generally, graph datasets utilized to evaluate the performance of GNNs in terms of fairness need
to (i) exhibit bias issue; and (ii) have both node label and node sensitive attributes available if the
task is node classification. Below, we list some of the widely used datasets that are suitable for
evaluating the performance of fair GNNs for node classification and/or link prediction problems.
The statistics of the datasets along with papers using these datasets are presented in Table 11.
• Pokec-n & and Pokec-z [44]: Pokec-n and Pokec-z datasets collect users’ data from Pokec social
networks of two provinces in Slovakia in 2012, which is similar to Facebook. Each node in the
graph contains attributes such as gender, age, hobbies, interest, education, working field and etc.
The datasets target at predicting the occupations of users and the sensitive attribute is region.
• NBA [44]: The NBA dataset utilizes 400 NBA players and their social relations on Twitter to
construct the graph. The performance statistics of players in the 2016-2017 season and other
information e.g., nationality, age, and salary are provided. The task is to predict whether the
players’ salary is over median. The sensitive attribute for this dataset is nationality, which is
binarized to two categories, i.e., U.S. players and oversea players.
• German Credit [4]: The German Credit dataset collects data from a German bank [11]. Nodes
in the graph represent clients and edges are built between clients if their credit accounts are
similar. With clients gender as the sensitive attribute, the node classification task aims to predict
whether the credit risk of the clients is high.

, Vol. 1, No. 1, Article . Publication date: September 2023.


38 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 11. Datasets for fair graph neural networks.


Task Dataset Labels Sens. #Nodes #Edges #Features References
Pokec-n Job Region 66,569 729,129 59 [44] [218] [117]
Pokec-z Job Region 67,797 882,765 59 [44] [218] [117]
Node NBA Salary Nationality 403 16,570 39 [44]
Classification German Credit Credit Risk Gender 1,000 22,242 27 [4] [53]
Recidivism Bail Race 18,876 321,308 18 [4] [53]
Credit Def. Default Age 30,000 1,436,858 13 [4] [53]
MovieLens - Multi-attribute 9,940 1,000,209 - [22] [24] [163]
Reddit - Multi-attribute 385,735 7,255,096 - [22]
Polblog - Community 1,107 19,034 - [163]
Link Twitter - Politics 3,560 6,677 - [111]
Prediction Facebook - gender 22,470 171,002 - [148] [192] [108] [52] [126]
Google+ - gender 4,938 547,923 - [148]
Dutch - gender 26 221 - [148]

• Recidivism [4]: In the Recidivism dataset, nodes are defendants released on bail during 1990-
2009 [106]. Two nodes are connected if two defendants’ past criminal records and demographics
are similar. The task is to predict a defendant as bail (i.e., unlikely to commit a violent crime if
released) or no bail (i.e., likely to commit a violent crime) with race being the sensitive attribute.
• Credit Defaulter [4]: In this dataset, nodes represent credit card users and they are connected
based on the similarity of their purchase and payment records. The sensitive attribute of this
dataset is age and the task is to classify whether a user will default on credit card payment.
• MovieLens [22]: The MovieLens dataset is a recommender system benchmark [87], whose target
task is to predict the rating that users assign movies. Sensitive attributes about the user features,
such as age, gender, and occupation, are covered in the dataset.
• Reddit [22]: The Reddit dataset is based on the social media website Reddit where users can
comment on content in different topical communities, called “subreddits”. For sensitive attributes,
this dataset treats certain subreddit nodes as sensitive nodes, and the sensitive attributes for
users are whether they have an edge connecting to these sensitive nodes.
• Polblog [163]: Polblog is a blog website network [2]. Nodes represent blogs and links denote
hyperlink between blogs. The sensitive attributes for this dataset are blog affiliation communities.
• Twitter [111]: This is a subgraph of the Twitter dataset [13, 25]. The sensitive attribute is the
political leaning of each user, including neutrals, liberals and conservatives.
• Facebook [108]: The dataset is collected from the Facebook website . Nodes are users and edges
represent friendship between users [150]. The sensitive attribute for this dataset is gender.
• Google+ [148]: It is collected from Google+ [150]. Nodes in the dataset are users and they are
connected concerning their social relationships. The sensitive attribute for this dataset is gender.
• Dutch [148]: This is from the school network [190] with gender as the sensitive attribute. It
corresponds to friendship relations among 26 freshmen at a secondary school in the Netherlands.

5.5 Applications of Fair GNNs


Social Network Analysis. With the emerging of social media platforms such as Facebook, Twitter,
and Instagram, the social network analysis is widely conducted to provided better service to
users. For example, the platforms may use GNNs to recommend new friends to a user [67]. Node
classification are also widely conducted on social networks to further complete user profile for
better service [44]. However, recent works [44, 170, 194] indicate that GNNs can be biased to the
minority in friends recommendation and node classification on social networks. For instance, the
algorithm have been found to prevent minorities from becoming influencers. The message-passing
on graphs can magnify the bias [44]. Therefore, several fair GNNs [44, 111, 117, 148, 218] for social
network analysis have been proposed.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 39

Recommender System. The user interactions on products such as books can link the users and
products to compose bipartite graph. In addition, the social context of users may also be utilized for
recommendation. Because the great power of GNN in process graphs, many platforms have applied
GNNs for the recommendation system [84, 249]. But fairness issue is also reported in recommenda-
tion system. For instance, it is found that a GNN-based algorithm on book recommendation may
be biased towards suggesting books with male authors [24]. Hence, it is necessary to develop fair
graph neural networks for recommendation system.
Financial Analysis. Recently, there is a growing interest in applying GNNs to financial applications
such as loan default risk prediction [38, 130] and fraud detection [171]. In loan default risk, the
guarantee network [38] or user relational graph [130] can be applied to learn more powerful
representations for predictions. In fraud detection, GNNs on transaction [171] are also investigated.
Similar to the applications in other domains, GNNs also exhibit bias towards protected attributes
such as genders and ages in financial analysis [4]. Using fair GNNs [4, 53] in finance can ensure
the fairness to users and avoid the social and legal issues caused by the bias in the GNN model.

5.6 Future Research Directions of Fair GNNs


Though many fair models on graph-structured data have been investigated, there are still many im-
portant and challenging directions to be explored. Next, we list some promising research directions.
Attack and Defense in Fairness. Recent works have shown that an poisoning attacker can fool
the fair machine learning model to exacerbate the algorithmic bias [153, 191, 205]. For instance, one
can generate poisoned data samples by maximizing the covariance between the sensitive attributes
and the decision outcome and affect the fairness of the model. Thus, a seriously biased model
caused by the attacker might be treated as a fair model by the end-user due to the deployment of
fair algorithms, which can result in social, ethical and legal issues. Since GNNs are an extension of
deep learning on graphs, fair GNNs are also at a risk of being attacked. Without understanding
the vulnerability and robustness of fair GNNs, we cannot fully trust a fair GNN. Despite the initial
efforts on attacking fair models [153, 191, 205], all of them focus on i.i.d data; while the studies on
vulnerability of fair GNNs are rather limited. Note that to achieve a trustworthy GNN, the robustness
and fairness should be simultaneously meet. However, as it is discussed in Section 4, current robust
GNNs generally focus on the robustness in terms of performance and rarely investigate the robust
models against attacks in both accuracy and fairness. Therefore, it is crucial to investigate the
vulnerability of fair GNNs and develop robust fair GNNs.
Fairness on Heterogeneous Graphs. Many real-world graphs such as social networks, knowledge
graph, and biological networks are heterogeneous graphs, i.e., networks containing diverse types
of nodes and/or relationships. Various GNNs have been proposed to address the challenge of
representation learning on heterogeneous graphs, such as learning with meta-path [94, 219] and
designing new message-passing mechanisms for heterogeneous graphs [178]. Recently, it is reported
that the representations learned by heterogeneous GNNs can contain discrimination [254], which
could result in societal prejudice in the applications. For example, the social biases have been
identified in knowledge graphs [188]. And the representations of knowledge graph learned by
heterogeneous GNNs are widely adopted to facilitate the searching and recommender system. Hence,
the encoded biases in the representations could lead to detrimental societal consequences. However,
existing fair algorithms are generally designed for GNNs on homogeneous graphs, which are not
able to mitigate the bias brought by the meth-path neighbors or the message-passing mechanism
specifically designed for heterogeneous information networks. Therefore, it is necessary to develop
fair GNns to address the unique challenges brought by the heterogeneity graphs.

, Vol. 1, No. 1, Article . Publication date: September 2023.


40 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Fairness without Sensitive Attributes. Despite the ability of the aforementioned methods in
alleviating the bias issues, they generally require abundant sensitive attributes to achieve fairness;
while for many real-world applications, it is difficult to collect sensitive attributes of subjects due
to various reasons such as privacy issues, and legal and regulatory restrictions. As a result of, most
of existing fair GNNs are challenged due to the lacking of sensitive attributes in training data.
Though investigating fair models without sensitive attributes is important and challenging, only
some initial efforts on i.i.d data have been conducted [88, 121, 269]. How to learn fair GNN without
sensitive attributes is a promising research direction.

6 EXPLAINABILITY OF GRAPH NEURAL NETWORKS


Parallel to the effectiveness and prevalence of deep graph learning systems, the property of lacking
interpretability is shared by most deep neural networks (DNNs). DNNs typically stack multiple
complex nonlinear layers [259], resulting in predictions difficult to understand. To expose the black
box of these highly complex deep models in a systematic and interpretable manner, explainable
DNNs [158], have been explored recently. However, most of these works focus on images or texts,
which cannot be directly applied to GNNs due to the discreteness of graph topology and the
message-passing of GNNs. And it is very important to understand GNNs’ predictions for two
reasons. First, it enhances practitioners’ trust in GNN models by enriching their understanding
of the network characteristics. Second, it increases the models’ transparency to enable trusted
applications in decision-critical fields sensitive to fairness, privacy and safety challenges. High-
quality explanations can expose the knowledge captured, helping users to evaluate the existence of
possible biases, and make the model more trustworthy. For example, counterfactual explanations
are utilized in [183] to analyze the fairness and robustness of black-box models, in order to build a
responsible artificial intelligence system. A model-agnostic explanation interface is also designed
in [18] to continuously monitor model performance and validate its fairness. Therefore, explainable
GNNs are attracting increased attention recently and many efforts have been taken.
In this section, we will provide a comprehensive survey on the current progress in the explain-
ability of GNNs. First, we introduce the background of explaining GNN models and provide a
motivating example. Following that, a comprehensive review of existing explanation methods would
be presented. Popular datasets and evaluation metrics in this domain are also introduced. Finally,
we go through some future research directions. Compared to the existing review in [252], the
main improvement of this survey is that we covered more recent progress such as self-explainable
GNNs [45, 268] and discuss more reliable evaluation settings [64].

6.1 Backgrounds
6.1.1 Aspects of Explanation. As discussed in [48, 155], the term explainability itself needs to be
explained. Generally, explainability in GNNs should (i) guide end-users or model designers to
understand how the model arrives at its results; (ii) enable users to have an expectation on the
decisions; (iii) and provide information on how and when the trained model might break. To cope
with the needs of obtained explanations, explainability must be considered within the context of
particular disciplines. As a result, developed explanation methods often show a high level of variety
and provide explainability at different levels. Generally, explanations can be categorized from the
following perspectives:
• Global or Local explanations. A local explanation (or instance-level explanation) provides justifi-
cation for prediction on each specific instance. Currently, most GNN explanation works [141, 248]
fall within this group. On the other hand, global explanations (or model-level explanations) [251]
reveal how the model’s inference process works, independently of any particular input.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 41

• Self-explainable or Post-hoc explanations. Self-explainable GNNs design specific GNN models


that are interpretable intrinsically, which can simultaneously give the prediction and correspond-
ing explanation. The explainability arises as part of the prediction process for self-explaining
methods. On the contrary, post-hoc explanations focus on providing explanations on a trained
model. An additional explainer model is generally adopted for the post-hoc explanations [248].
However, due to the adoption of the explainer model, the post-hoc explanations may misinterpret
the actual inner working of the target model.
• Explanation forms to be presented to the end users. Explanations should help users to un-
derstand GNNs’ behaviors. Various explanation forms have been investigated such as bag-of-
edges [17], attributes importance [248], sub-graphs [253], etc. Different explanation forms can
give different visualizations and offer different information to end users.
• Techniques for deriving the explanations. To enable the explainability, various explanation
techniques have been developed including perturbing the input [248], analyzing internal inference
process [17], designing intrinsically interpretable GNN models [45], etc. These methods make
different assumptions about the model and their advantages may vary across datasets.
A comprehensive taxonomy and detailed introduction of methods are presented in Section 6.3.

6.2 Desired Qualities of Explanations


With the explanation model and its produced explanations obtained, different aspects regarding
explanation quality can be evaluated. Whereas a gold standard exists for comparing predictive
models, there is no agreed-upon evaluation strategy for explainable AI methods. As argued in [158],
evaluating the plausibility and convincingness of an explanation to humans is different from
evaluating its correctness, and those criteria should not be conflated. In this part, we systematically
analyze certain properties that good explanations should satisfy. Considerations of these qualities
motivate the design of different explanation methods and evaluation metrics, which will be discussed
in Section 6.3 and Section 6.5 respectively.
• Correctness: The obtained explanations should be correct, and truthfully reflect the reasoning
of the target predictive model (either locally or globally). This quality addresses the faithfulness
of explanations and requires that descriptive accuracy of the explanation is high [157].
• Completeness: Completeness addresses the extent to which identified explanations explain the
target model. Ideal explanations should contain “the whole truth” [119]. High completeness is
desired to provide enough details, and it should be balanced with correctness [158].
• Consistency: The obtained explanations should be consistent concerning to inputs. In other
words, explaining should be deterministic and the same explanation should be provided for
identical inputs [174]. It has also been argued that explanations should be invariant toward small
perturbations [7].
• Contrastivity: Contrastivity facilitates distinctions of obtained explanations of different predic-
tions. Explanation of a certain event should be discriminative in comparison to those of other
events [154]. For models taking different decision strategies, explanations of their behaviors
should also be distinct [3].
• User-friendly: An ideal explanation is expected to be user-friendly. Explanations should be
presented in a form that is clear, easy to interpret, and “agree with human rationales” [12]. For
example, it is argued that explanations should be compact, sparse and avoid redundancy [158].
Some formats are also found to be more easily interpretable than others [97].
• Causality: A causal explanation provides insights into the cause-and-effect relationships that
determines model decisions [16]. Such explanations not only describe the correlations or patterns
identified by the model but also delve into the reasons or mechanisms driving those patterns [207].

, Vol. 1, No. 1, Article . Publication date: September 2023.


42 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Ideally, a causal explanation should help users differentiate between spurious correlations and
actual causal influences, thereby enabling more reasonable explanations [134].
6.2.1 Explanation Example. Due to diverse settings and the complexity of existing algorithms,
discussing and comparing GNN explanation methods can often become abstract. To make this
explanation task more concrete, we give an example of instance-level explanation, which has been
widely taken in various works [141, 248, 253]. As shown in Figure 7, the explanation objective is
to find discriminative substructures, including edges and node attributes, that are important for
the prediction of to-be-explained GNN. Note that there are works investigating other explanation
forms, like class-wise prototypical structures [268] and interpretable surrogate models [95].
6.2.2 Challenges of Explainability of GNNs. Besides com-
mon difficulties in explaining deep models, there are cer- Node Attributes Target Node
tain properties making the explanation in graph domain
particularly challenging than in other domains like im-
ages or texts. First, GNN captures both node attributes and
structural topology information, relying upon the message-
passing [76] along edges. It is difficult to estimate contri-
bution of edges as they are discrete, and could appear in
the computational graphs multiple times at different lay-
ers. In turn, identifying important structures is even more
complicated as interactions among nodes and edges are in-
Fig. 7. An example of instance-level expla-
volved. Second, graph is less intuitive than images or texts.
nation, where important nodes, attributes
It is difficult for users to analogize the commonalities and and edges for predictions are highlighted
dissimilarities among graphs, making the evaluation and
interpretation of obtained explanations (still in the form of graphs) challenging. Domain knowledge
often is required to understand the obtained graph “explanations”.

6.3 Taxonomy of Explainability of Graph Neural Networks


To make deep models applicable in real scenarios, researchers have made extensive attempts to
get an explanation from deep models, especially in the image and text domains. However, due to
the complexity of graph data and the less human-understandable message-passing mechanism
in GNNs, it is difficult to directly extend explanation methods for image or text data to graph
data. Recently, to address these challenges, researchers begin to focus on the explainability of
GNN models and propose many specific models. In this section, we provide a high-level summary
of existing GNN explanation methods and categorize them into three groups: (1) instance-level
post-hoc explanations, (2) model-level post-hoc explanations, and (3) self-explainable methods
that are intrinsically interpretable. Most existing GNN explanation methods are designed for the
instance-level post-hoc setting, and we further arrange them based on their adopted techniques for
achieving explainability. A summary of method taxonomy is shown in Table 12.
6.3.1 Instance-level Post-hoc Explanation. Instance-level post-hoc explanation identifies elements
(like node attributes and edges) that are crucial for model’s prediction for each specific input
instance. Typically, given an input graph G, which could be a graph sample for graph-level tasks or
the local graph of a node for node-level tasks, it aims to find a sub-graph G𝑠 ⊂ G that accounts
for prediction output of the target GNN model. Based on different strategies in identifying input
substructures as explanations, we can further summarize existing methods into three groups: (1)
Attribution Methods, which directly analyze the influence of input elements on the prediction using
gradients or through perturbations; (2) Decomposition Methods, which examine the inference of

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 43

Table 12. Categorization of explanation models on graphs


Category References
Gradient
√ Perturbation Decomposition Surrogate
√ [17, 167]
Instance-level Post-hoc √ [141, 248, 253]
√ [17, 167, 181]
[95, 208, 264]
Model-level Post-hoc [251, 268]
Self-explainable [45, 268]

deep models by decomposing prediction result into importance mass on the input; (3) Surrogate
Methods, which train an interpretable model that mimics the behavior of to-be-explained deep
model within the neighborhood of current input. Next, we introduce the details of these three
categories.
Attribution Methods. Attribution, also referred to as relevance [238], aims to reveal components
of high importance in the input. These methods provide explanations through measuring the
contribution of input elements in the target decision, and finding the subgraph with top contribution
weights as explanation G𝑠 . Based on their strategies in estimating each element’s contribution, we
can further categorize them into two types: Gradient-based and Perturbation-based.
Gradient-based Attribution. Gradient-based attribution methods estimate importance weights
by back-propagated gradients. Based on Taylor expansion, gradients from model output w.r.t
input elements reflect its sensitivity towards them, which can be utilized as an importance estima-
tor [15]. For example, importance of node 𝑣 with attribute x𝑣 for prediction 𝑦𝑐 can be computed as
𝜕𝑦𝑐
∥𝑅𝑒𝑙𝑢 ( x𝑣 ) ∥ 1 [167]. One weakness of these methods is that local gradients could be unreliable be-
cause of gradient saturation problem [198]. Therefore, integrated gradients (IG) [198] that consider
the gradients along a path were proposed to address this problem. On graphs, IG score of node
attributes can be computed in the form of:
𝜕𝑓 (x′𝑣 + 𝛼 · (x𝑣 − x′𝑣 ))
∫ 1
𝐼𝐺 (x𝑣 ) = (x𝑣 − x′𝑣 ) × 𝑑𝛼, (40)
𝛼=0 𝜕x𝑣
where x′𝑣 represents the baseline attributes which can be set to the global average. Essentially,
Eq.(40) integrates the gradients at all points along the path from x′𝑣 to x𝑣 , instead of relying on
gradient at x𝑣 which may suffer from saturated gradients. Existing works have investigated various
gradient-based methods to explain GNNs, such as SA [17], Guided BP [17], CAM [167] and Grad-
CAM [167]. These methods share similar ideas to identify important input elements. The main
difference lies in the procedure of gradient back-propagation and how different hidden feature
maps are combined [252]. For example, Guided BP [17] clips negative gradients during conducting
back-propagation to estimate contribution weights. CAM [167] requires that a linear layer is
used for classification, and calculates heat maps over nodes using node embeddings from the last
GNN layer along with weights of that linear classification layer. Grad-CAM [167] generalizes it to
the model-agnostic setting by using average class-wise gradients in place of linear classification
parameters. It is straightforward to estimate importance weights of node attributes using these
methods, and edges connecting important nodes would also be taken as important [73].
Perturbation-based Attribution. Perturbation-based attribution methods try to learn a perturbation
mask and examine prediction variations w.r.t perturbations. Specifically, the mask is optimized
to maximize the perturbation (mask out as many edges and nodes as possible) while preserving
original predictions. Those left-out unperturbed nodes and edges are taken as the most important
elements contributing to the prediction, which corresponds to the explanations. Let M𝐴 ∈ {0, 1}𝑛×𝑛
and M𝑋 ∈ {0, 1}𝑛×𝑑 denote binary masks on edges and node attributes, respectively. It can be

, Vol. 1, No. 1, Article . Publication date: September 2023.


44 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

formulated as the following optimization problem:



min L𝑑𝑖 𝑓 𝑓 (A, X), 𝑓 ( Â, X̂) + 𝛽 · R (M𝑋 , M𝐴 ), Â ∼ P (A, M𝐴 ), X̂ ∼ P (X, M𝑋 ). (41)
M𝐴 ,M𝑋

where P denotes the perturbations on original input with provided importance masks, and we
use  to represent perturbed A. So is the case of X̂. With this optimization objective, explanations
are found by finding input elements that preserve model predictions. For example, in GNNEx-
plainer [248], P (A, M𝐴 ) = A ⊙ M𝐴 and P (X, M𝑋 ) = Z + (X − Z) ⊙ M𝑋 , where Z is sampled from
marginal distribution of node attributes F and ⊙ denotes elementwise multiplication. L𝑑𝑖 𝑓 is usually
implemented as cross entropy loss [248], which encourages consistency on prediction outputs. R
regularizes identified explanations and is usually implemented as sparsity constraint. This objective
promotes the correctness and non-redundancy of found explanations.
Eq.(41) is a discrete optimization problem, which is difficult to solve directly. Various learning
paradigms have been proposed to efficiently find the masks [141, 248, 253]. Based on the adopted
strategy in learning perturbation masks, these approaches can be further divided into three groups.
The first group identifies effective perturbation masks by conducting Searching [71, 253]. These
methods optimize Eq.(41) and learn perturbation masks with the explicit test-and-run paradigm.
And the optimization directions are found by search algorithms. For example, SubgraphX [253]
employs Monte Carlo Tree Search (MCTS) algorithm to search for the most important subgraph as
the explanation of predictions. The Shapley value is used as the measurement of component’s impor-
tance during the search phase. ZORRO [71] revises the search process by explicitly encoding fidelity
into the objective. Causal Screening [222] also falls into this group, which incrementally selects
input elements by maximizing individual causal effects at each search step. CF-GNNExplainer [140]
focuses on counterfactual explanations, which is derived from identifying minimal perturbations
that can change prediction results.
The second group uses Attention mechanism to learn perturbation masks [206, 248]. By relaxing
binary masks M𝐴 ∈ {0, 1}𝑛×𝑛 and M𝑋 ∈ {0, 1}𝑛×𝑑 into soft ones, i.e., M𝐴 ∈ [0, 1] 𝑛×𝑛 and M𝑋 ∈
[0, 1] 𝑛×𝑑 , the soft perturbation masks can be directly optimized in an end-to-end manner. For
example, GNNExplainer [248] employs a soft mask on attributes by element-wise multiplication
and a mask on edges with Gumbel softmax. Then, these two masks are directly optimized with the
objective of minimizing size of unperturbed parts while preserving prediction results.
The last group [141, 223] use an Auxiliary Model to predict effective perturbation masks with
the information from graph and target model. The auxiliary model is optimized with Eq.(41) on
training samples. And it is assumed to be general and can safely explain new-coming graphs after
training. For example, PGExplainer [141] adopts an explanation network 𝑔𝜙 to predict preserving
probability of each given edge based on node embeddings derived from the target model 𝑓𝑡 (G). It
can be formally written as M𝐴 ∼ 𝑔𝜙 (G, 𝑓𝑡 (G)). Unlike the previous two strategies, the perturbation
mask does not need to be re-learned from scratch for each to-be-explained graph, and this group of
methods is much faster to use in the test time. Another representative method is GraphMask [179]
which trains a model to produce layer-wise edge perturbation masks. It takes node embeddings
from the corresponding layer as input, and provides different edge importance masks at different
propagation steps. Gem [134] introduces the notion of causality into the explanation generation
process, which trains a causal explanation model equipped with a loss function based on Granger
causality. It has better generalization ability as it has no requirement w.r.t internal GNN structures
or knowledge of learning tasks. RCExplainer [16] aims to modeling the common decision logic of
GNNs across similar input graphs. This approach ensures noise resistance by leveraging shared
decision boundaries, and guarantees counterfactual integrity by ensuring prediction changes upon
the removal of identified edges.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 45

Decomposition Methods. These methods seek to decompose the prediction of target GNN model
into contribution of input features. A contribution score would be assigned to each input element,
and an explanation is obtained via identifying inputs with the highest scores. Concretely, the influ-
ence mass is back-propagated layer-by-layer onto each input element. And the influence mass from
input to output will be decomposed based on neural excitation at each layer. During this process,
nonlinear components of the GNN model are generally neglected to ease the problem. Popular
decomposition strategies on image data include Layer-wise Relevance Propagation (LRP) [14] and
Excitation BP [193]. Several efforts have been made to extend them to graphs data [17, 167, 181].
For example, 𝛼𝛽-rule and 𝜖-stabilized decomposition rule in original LRP algorithm are extended
to work on message passing mechanism of GNN [17]. GNN-LRP [180] evaluates contributions of
bag-of-edges and deduces back-propagation rules on graph walks with high-order Taylor decompo-
sition. For example in GCN, as shown in [180], the back-propagation rule decomposing contribution
mass of node 𝐾 at layer 𝑙 + 1 to node 𝐽 at layer 𝑙 can be written as Eq.(42). This algorithm starts by
assigning full contribution mass to the target output, and redistributes it with a backward pass on
GNN layer by layer. For simplicity, we use 𝑅𝐾𝑙+1,𝑘 to represent contribution mass of 𝑘-th dimension
of node K’s embedding at layer 𝑙 + 1, and use 𝑅𝑙,𝑗𝐽 𝐾 to denote its decomposed contribution to 𝑗-th
dimension of node J’s embedding at layer 𝑙. Assuming node embedding dimension to be 𝑑, we have:
𝑑
∑︁ 𝜆 𝐽 𝐾 ℎ𝑙,𝑗
𝐽 𝑤 𝑗𝑘
𝑅𝑙,𝑗
𝐽𝐾 = Í Í𝑑 𝑙,𝑗
𝑅𝐾𝑙+1,𝑘 , (42)
𝑘=1 𝐽 ∈V 𝑗=1 𝜆 𝐽 𝐾 ℎ 𝐽 𝑤 𝑗𝑘

where with ℎ𝑙,𝑗𝐽 is 𝑗-th dimension of node 𝐽 embedding at layer 𝑙, 𝑤 𝑗𝑘 represents the weight linking
neuron 𝑗 in layer 𝑙 to neuron 𝑘 in layer 𝑙 + 1 which is a scalar, and 𝜆 𝐽 𝐾 is the edge weight connecting
node 𝐽 to node 𝐾. Following this rule, contributions can be back-propagated to the inputs, and
those nodes with highest 𝑅 value are preserved as the explanation G𝑠 .
Surrogate Methods. Neural networks are treated as black-box models due to their deep architec-
tures and nonlinear operations. It is observed that they hold a highly-complex loss landscape [123]
and are challenging to be explained directly. To circumvent nonlinear classification boundary of the
trained DNN model, many attempts are made [172] to approximate DNN’s local prediction around
each instance x with simple interpretable models such as logistic regression. Specifically, for the
target instance x, a group of prediction records can be obtained by applying small random noises
to it and collecting the target model’s prediction on these perturbed inputs. Then, the interpretable
surrogate model can be trained on these records to mimic target model’s behavior locally, which
serves as explanations. The searching process of interpretable local model 𝜉 (𝑥) can be written as:
𝜉 (𝑥) = arg min
′ ′
L (𝑓 , 𝑓 ′, 𝜋𝑥 ) + Ω(𝑓 ′ ). (43)
𝑓 ∈𝐹

where 𝐹 ′ represents candidate interpretable model families and 𝜋𝑥 denotes local neighborhood
around instance 𝑥. L (𝑓 , 𝑓 ′, 𝜋𝑥 ) measures faithfulness of the surrogate model 𝑓 ′ in approximating
the target model 𝑓 in the locality 𝜋𝑥 . Ω() measures model complexity to encourage simple surrogate
models [172]. Once the surrogate model 𝜉 (x) is trained, explanation on the prediction of 𝑓 on x
can be obtained by examining the interpretable function 𝜉 (x).
There are several works extending this idea to explain GNNs [95, 208, 264]. They differ from
each other mainly in two aspects: the strategy of obtaining local prediction records (𝜋𝑥 ), and the
interpretable model family (𝐹 ′ ) selected as candidate surrogates. For example, GraphLime [95]
takes neighboring nodes as perturbed inputs. And it employs a nonlinear surrogate model which
can assign large weights to features that are important in inference. However, GraphLime ignores
graph structures and can only find important node attributes. PGM-Explainer [208] randomly

, Vol. 1, No. 1, Article . Publication date: September 2023.


46 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

perturbs node attributes to collect local records, and trains a probabilistic graph model (PGM) to
fit them. PGM is interpretable and can show the dependency among nodes inside input graph.
RelEx [264] randomly samples subgraphs as inputs, and uses GCN as the surrogate model. It can
assign importance weights to edges, but requires an additional running of explanation methods on
GCN as it is also non-interpretable.
6.3.2 Model-level Post-hoc Explanation. Compared to instance-level methods, model-level methods
focus more on providing general insights by giving high-level explanations that are independent
to inputs for deep graph models. The model-level explanations can be representative and dis-
criminative instances for each class [96, 251, 268] or logic rules to depict knowledge captured by
deep model [247]. However, due to the highly diverse topology and complex semantics in the
graph domain, it is a very challenging task and few attempts are made to provide model-level
post-hoc explanations for GNNs. XGNN [251] aims to expose what input graph patterns can trigger
certain predictions of the target GNN model, and adopt a graph generation module to achieve that.
They employ input optimization methods and train a graph generator to generate graphs that can
maximize the target predictions using reinforcement learning. After training, generated graphs
are expected to be representative of each class and can provide global knowledge on the captured
knowledge of the target GNN model. Concretely, the desired prototypical explanation for class 𝑐 is
obtained by solving the following objective:
G ∗ = arg max 𝑃 (𝑓𝑡 (G) = 𝑐), (44)
G

where 𝑓𝑡 denotes the target GNN model, and a graph generator is trained for finding G ∗ for
class 𝑐. Another work, GCFExplainer [96], explore the global explainability of GNNs using global
counterfactual reasoning, aiming to identify a concise set of representative counterfactual graphs
that elucidate all input graphs. To achieve this, they employ vertex-reinforced random walks on a
graph’s edit map and use a greedy technique to get the summarization for each class.
6.3.3 Self-explainable Approaches. Different from the post-hoc explanation, self-explainable ap-
proaches [45, 232, 268] aim to give predictions and provide explanations for each prediction
simultaneously. Specific GNN architectures are adopted to support built-in interpretablity, simi-
lar to attention mechanism in GAT [206]. However, although these methods are explainable by
design, they are often restricted in the modeling space and struggle to generalize across tasks
at the same time. There are several representative self-explainable GNNs proposed recently, i.e.,
SE-GNN [45], GIB [232], and ProtGNN [268]. Many causal-based methods can also be categorized
into self-explainable methods, i.e., DIR [233], DisC [65] and CIGA [36].
SE-GNN [45] focuses on instance-level self-explanation. It obtains the self-explanation for node
classification via identifying interpretable 𝐾-nearest labeled nodes for each node and utilizes the
𝐾-nearest labeled nodes to simultaneously give label prediction and explain why such prediction is
given. More specifically, SE-GNN [45] adopts an interpretable similarity modeling to compute the
attribute similarity and local structure similarity between the target nodes and labeled nodes. A
contrastive pretext task is further deployed in SE-GNN to provide self-supervision for interpretable
similarity metric learning. GIB [232] balances expressiveness and robustness of the learned graph
representation by learning the minimal sufficient representation for a given task. Following the
general information bottleneck, it maximizes the mutual information between the representation
and the target, and simultaneously constrains the mutual information between the representation
and the input data. ProtGNN [268] is better at global-level explanations by finding several prototypes
for each class. Newly-coming instances are classified via comparing with those prototypes in the
embedding space. A conditional subgraph sampling module is designed to conduct subgraph-level

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 47

matching and several regularization terms are used to promote diversity of prototypical embeddings.
DIR [233] utilizes the intrinsic interpretability of graph neural networks and aim to identify a subset
of input graph features, termed "rationale", to guide the model predictions. They formulate the
problem into a invariant learning problem and design distribution intervener to get the rationales.
DisC [65] is a disentangled GNN framework that seperates input graphs into causal and bias
substructures, using a parameterized edge mask generator and training two GNN modules with
respective loss functions. CIGA [36] provides an alternative causal-based framework, which can
capture graph invariance for reliable OOD generalization across diverse distribution shifts. CIGA
utilizes an information-theoretic objective to identify subgraphs enriched with invariant intra-class
information, ensuring resilience to distribution shifts.

6.4 Datasets for Explainability of GNNs


For comparing various explanation methods, it is desired to have datasets where the rationale
between input graphs and output labels are intuitive and easy-to-obtain, so that it would be
easier to evaluate identified explanatory substructures. In this subsection, we summarize popular
datasets used by existing works on explainable GNNs, which can roughly be categorized into
synthetic datasets and real-world datasets. Several representative benchmarks of both groups will
be introduced in detail.

6.4.1 Synthetic Data. With carefully-designed graph generation mechanism, we can constrain
unique causal relations between input elements and provided labels in synthetic datasets. GNN
models must capture such patterns for a successful training and obtained explanations are evaluated
with those ground-truth causal substructures. Several common synthetic datasets are listed below:
• BA-Shapes [248]: It is a single graph consisting of a base Barabasi-Albert (BA) graph (contains
300 nodes) and 80 “house”-structured motifs (contains 5 nodes). “House” motifs are randomly
attached to the base BA graph. Nodes in the base graph are labeled as 0 and those in the motifs
are labeled as 1, 2, and 3 based on their positions. Explanations are evaluated on attached nodes
of motifs, with edges inside the corresponding motif as ground-truth.
• Tree-Cycles [248]: It is a single network with a 8-layer balanced binary tree as the base graph.
80 cycle motifs (contains 6 nodes) are randomly attached to the base graph. Nodes in the base
graph are labeled as 0 and those in the motifs are labeled as 1. Ground-truth explanations for
nodes within cycle motifs are provided for evaluation.
• BA-2motifs [248]: This is a graph classification dataset containing 800 graphs. Half of the graphs
are constructed by attaching a “house” motif to BA base graphs, while the other half graphs
attach a five-node cycle motif. A binary label is assigned to each graph according to its attached
motif. The motif serves as the ground-truth explanation.
• Infection [64]: This is a single network initialized with a ER random graph. 5% of nodes are
labeled as infected (class 0), and the remaining nodes are labeled as their shortest distances to
those infected ones. In evaluation, nodes with multiple shortest paths are discarded. All remaining
nodes have one distinct path as the oracle explanation towards their labels.
• Syn-Cora [45]: This is synthesized from the Cora [113] to provide ground-truth of explanations,
i.e., 𝐾-nearest labeled nodes and edge machining results. To construct the graph, motifs are
obtained by sampling local graphs of nodes from Cora. Various levels of noises are applied to the
motifs in attributes and structures to generate similar local graphs.

6.4.2 Real-world Data. Due to high complexity in patterns and possible existence of noises, it is
challenging to obtain human-understandable rationale from node features and graph topology to

, Vol. 1, No. 1, Article . Publication date: September 2023.


48 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Table 13. Datasets for Explainability of GNNs


Tasks Dataset Graphs Avg.Nodes Avg.Edges Features References
BA-Shapes 1 700 4,110 10 [248] [141] [253] [208] [45] [268]
BA-Community 1 1,400 8,920 1 [248] [141] [70]
Node Classification Tree-Cycles 1 871 1,950 10 [248] [141] [208]
Tree-Grid 1 1,231 3,410 10 [248] [141] [208]
Syn-Cora 1 1,895 2,769 1,433 [45]
BA-2motifs 1,000 25 51.4 10 [141] [253]
Infection 10 1000 3996 2 [64] [141]
Graph-SST2 70,042 10.199 9.20 768 [252] [253] [268]
Graph Classification
Graph-SST5 11,855 19.849 18.849 768 [252]
Graph-Twitter 6,940 21.103 21.10 768 [252] [268]
MUTAG 188 19.79 17.93 14 [49] [248] [141] [253] [264] [251] [268]

labels for real-world graphs. Typically, strong domain knowledge is needed. Thus, real-world graph
datasets with groundtruth explanations are limited. Below are two benchmark datasets:
• Molecule Data [234]: This is a graph classification dataset. Each graph corresponds to a molecule
with nodes representing atoms and edges for chemical bonds. Molecules are labeled with con-
sideration of their chemical properties, and discriminative chemical groups are identified using
prior domain knowledge. Chemical groups 𝑁 𝐻 2 and 𝑁𝑂 2 are used as ground-truth explanations.
• Sentiment Graphs [252]: It contains three graph classification datasets created from text datasets
for sentiment analysis, i.e., Graph-SST3, Graph-SST5 and Graph-Twitter. Each graph is a text
document where nodes represent words and edges represent relationships between word pairs
constructed from parsing trees. Node attributes are set as word embeddings from BERT [51]. There
is no ground-truth explanation provided. Heuristic metrics are usually adopted for evaluation.
A summary of representative benchmark datasets is provided in Table 13 along with their key
statistics, tasks and papers used the datasets.

6.5 Evaluation Metrics


Besides visualizing identified explanations and conducting expert examinations, several metrics have
been proposed for quantitatively evaluating explanation approaches from different perspectives.
Next, we would present the major categories of metrics used and introduce their distinctions.
• Explanation Accuracy. For graphs with ground-truth rationale known, one direct evaluation
method is to compare identified explanatory parts with the real causes of the label [141, 248]. F1
score and ROC-AUC score can both be computed on identified edges. The higher these scores
are, the more accurate obtained explanation is.
• Explanation Fidelity. In the absence of ground-truth explanations, heuristic metrics can be
designed to measure the fidelity of identified substructures [252]. The basic idea is that explanatory
substructures should play a more important role in predictions. Concretely, Fidelity+ [252] is
computed by removing all input elements first, then gradually adding features with the highest
explanation scores. Heuristically, a faster increase in GNN’s prediction indicates stronger fidelity
of obtained explanations. In turn, Fidelity- [252] is computed by sequentially removing edges
following assigned importance weight. In turn, a faster performance drop represents stronger
fidelity of removed explanations.
• Sparsity. Good explanations are expected to be minimal structure explanations, as only the most
important input features should be identified. This criteria directly measures sparsity of obtained
explanation weights [72], and a better explanation should be sparser.
• Explanation Stability. As good explanations should capture the intrinsic rationale between input
graphs and their labels, this criterion requires identified explanations to be stable with respect
to small perturbations [4]. Stability score can be computed via comparing explanation changes

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 49

after attaching new nodes or edges to the original graph. However, GNN’s prediction is sensitive
towards input. Thus, it is challenging to select a proper amount of perturbations.

6.6 Future Research Directions on Explainability of GNNs


In this part, we provide our opinions on some promising future research directions. We hope it can
inspire and encourage the community to work on bridging the gaps in GNN interpretation.
Class-level Explanations. Despite many interpretation approaches, class-wise explanations
remain an under-explored area. Instance-level explanations provide a local view of GNN’s prediction.
However, it is important to recognize that they may provide anecdotal evidence and scale poorly to
large graph sets. On the other hand, class explanations could provide users with both a global view
of model’s behavior and a discriminative view grounded in each class, making it easier to expose
and evaluate learned knowledge.
Benchmark Datasets for Interpretability. One great obstacle in designing and evaluating
interpretation methods is the measurement of provided interpretability. To a large extent, the
difficulty is inherent. For example, it is impossible to provide gold labels for what is a correct
explanation. After all, we would not need to design an explanation approach in the first place had
we known those real explanations. Currently, we rely upon heuristic metrics and approximations,
but a set of principled proxy measurements and benchmark datasets are yet to be established.
User-Oriented Explanations. The purpose of designing explanation approaches is to use them
on real-world tasks to expose learned knowledge of trained models. Based on the needs of users,
different requirements may arise in the form of explanation and levels of interpretability. Hence,
we encourage researchers to consider real user cases, select suitable design choices, and evaluate
explanation algorithms in real-world scenarios for the integrity in interpretability field. One
promising direction is to provide flexible fine-grained multi-level explanations so that end-users
can select the level of explanations they can understand or satisfy their criteria.
Causal and Counterfactual Explanations. While traditional explanations offer insights into
model behavior, causal and counterfactual explanations dig deeper into understanding the cause-
and-effect relationships that drive predictions. Causal explanations focus on identifying the direct
influences or drivers behind a particular prediction, while counterfactual explanations provide
insights by answering "what if" scenarios, showcasing how a prediction might change if input
features were altered. Incorporating such explanations in GNNs can be beneficial, especially in
critical applications where understanding the underlying causes or potential outcomes of a decision
is crucial. It is paramount to develop methods that can reliably extract and present these types
of explanations, ensuring that users not only understand what the model predicts but also the
underlying reasons and potential alternatives.

7 CONCLUSION
In this survey, we conduct a comprehensive review on the trustworthy GNNs from the aspects of
privacy, robustness, fairness, and explainability. This fills the gap in lacking systematically summary
about privacy-preserving GNNs and fairness-aware GNNs. For the robustness and explainability, we
introduce the recent trends in more details apart from the representative methods that are reviewed
before. More specifically, for each aspect, we introduce the core definitions and concepts to help
the readers to understand the defined problems. The introduced methods are categorized from
various perspectives. The unified framework of each category is generally given followed by the
detailed implementations of the representative methods. In addition, we also list the used datasets
in privacy, fairness, and explainability, where the proposed methods have special requirements on
the datasets to be trained or evaluated. Numerical real-world applications of trustworthy GNNs

, Vol. 1, No. 1, Article . Publication date: September 2023.


50 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

are also provided to encourage the researcher to develop practical trustworthy GNNs. Finally, we
discuss the future research directions of each aspect at the end of each section, which includes
promising directions in a single aspect and interactions between aspects for trustworthy GNNs.

REFERENCES
[1] Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with
differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security
(2016), pp. 308–318.
[2] Adamic, L. A., and Glance, N. The political blogosphere and the 2004 us election: divided they blog. In Proceedings
of the 3rd international workshop on Link discovery (2005), pp. 36–43.
[3] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. Sanity checks for saliency maps.
Advances in neural information processing systems 31 (2018).
[4] Agarwal, C., Lakkaraju, H., and Zitnik, M. Towards a unified framework for fair and stable graph representation
learning. arXiv preprint arXiv:2102.13186 (2021).
[5] Ahmedt-Aristizabal, D., Armin, M. A., Denman, S., Fookes, C., and Petersson, L. Graph-based deep learning for
medical diagnosis and analysis: past, present and future. Sensors 21, 14 (2021), 4758.
[6] Al-Rubaie, M., and Chang, J. M. Privacy-preserving machine learning: Threats and solutions. IEEE Security &
Privacy 17, 2 (2019), 49–58.
[7] Alvarez-Melis, D., and Jaakkola, T. S. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049
(2018).
[8] Arachchige, P. C. M., Bertok, P., Khalil, I., Liu, D., Camtepe, S., and Atiqzzaman, M. Local differential privacy
for deep learning. IEEE Internet of Things Journal 7, 7 (2019), 5827–5842.
[9] Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein generative adversarial networks. In International conference
on machine learning (2017), PMLR, pp. 214–223.
[10] Arora, S. A survey on graph neural networks for knowledge graph completion. arXiv preprint arXiv:2007.12374
(2020).
[11] Asuncion, A., and Newman, D. Uci machine learning repository, 2007.
[12] Atanasova, P., Simonsen, J. G., Lioma, C., and Augenstein, I. A diagnostic study of explainability techniques for
text classification. arXiv preprint arXiv:2009.13295 (2020).
[13] Babaei, M., Grabowicz, P., Valera, I., Gummadi, K. P., and Gomez-Rodriguez, M. On the efficiency of the
information networks in social media. In Proceedings of the Ninth ACM International Conference on Web Search and
Data Mining (2016), pp. 83–92.
[14] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. On pixel-wise explanations for
non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.
[15] Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and Müller, K.-R. How to explain
individual classification decisions. The Journal of Machine Learning Research 11 (2010), 1803–1831.
[16] Bajaj, M., Chu, L., Xue, Z. Y., Pei, J., Wang, L., Lam, P. C.-H., and Zhang, Y. Robust Counterfactual Explanations on
Graph Neural Networks. In Conference on Neural Information Processing Systems (NeurIPS) (2021), vol. 34, pp. 5644–
5655.
[17] Baldassarre, F., and Azizpour, H. Explainability techniques for graph convolutional networks. arXiv preprint
arXiv:1905.13686 (2019).
[18] Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., and Biecek, P. dalex: Responsible machine learning
with interactive explainability and fairness in python. arXiv preprint arXiv:2012.14406 (2020).
[19] Beutel, A., Chen, J., Zhao, Z., and Chi, E. H. Data decisions and theoretical implications when adversarially learning
fair representations. arXiv preprint arXiv:1707.00075 (2017).
[20] Bojchevski, A., and Günnemann, S. Adversarial attacks on node embeddings via graph poisoning. In International
Conference on Machine Learning (2019), pp. 695–704.
[21] Bojchevski, A., and Günnemann, S. Certifiable robustness to graph perturbations. Advances in Neural Information
Processing Systems 32 (2019).
[22] Bose, A., and Hamilton, W. Compositional fairness constraints for graph embeddings. In International Conference
on Machine Learning (2019), PMLR, pp. 715–724.
[23] Bourtoule, L., Chandrasekaran, V., Choqette-Choo, C. A., Jia, H., Travers, A., Zhang, B., Lie, D., and
Papernot, N. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP) (2021), IEEE, pp. 141–159.
[24] Buyl, M., and De Bie, T. Debayes: a bayesian method for debiasing network embeddings. In International Conference
on Machine Learning (2020), PMLR, pp. 1220–1229.
[25] Cha, M., Haddadi, H., Benevenuto, F., and Gummadi, K. Measuring user influence in twitter: The million follower

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 51

fallacy. In Proceedings of the international AAAI conference on web and social media (2010), vol. 4.
[26] Chang, H., Rong, Y., Xu, T., Huang, W., Zhang, H., Cui, P., Zhu, W., and Huang, J. A restricted black-box adversarial
framework towards attacking graph embedding models. In Proceedings of the AAAI Conference on Artificial Intelligence
(2020), vol. 34, pp. 3389–3396.
[27] Chen, J., Ma, T., and Xiao, C. Fastgcn: fast learning with graph convolutional networks via importance sampling.
arXiv preprint arXiv:1801.10247 (2018).
[28] Chen, J., Shi, Z., Wu, Y., Xu, X., and Zheng, H. Link prediction adversarial attack. arXiv preprint arXiv:1810.01110
(2018).
[29] Chen, J., Wu, Y., Lin, X., and Xuan, Q. Can adversarial network attack be defended? arXiv preprint arXiv:1903.05994
(2019).
[30] Chen, J., Wu, Y., Xu, X., Chen, Y., Zheng, H., and Xuan, Q. Fast gradient attack on network embedding. arXiv
preprint arXiv:1809.02797 (2018).
[31] Chen, L., Li, J., Peng, J., Xie, T., Cao, Z., Xu, K., He, X., and Zheng, Z. A survey of adversarial learning on graphs.
arXiv preprint arXiv:2003.05730 (2020).
[32] Chen, L., Li, J., Peng, Q., Liu, Y., Zheng, Z., and Yang, C. Understanding structural vulnerability in graph convolutional
networks. arXiv preprint arXiv:2108.06280 (2021).
[33] Chen, M., Zhang, Z., Wang, T., Backes, M., Humbert, M., and Zhang, Y. Graph unlearning. In Proceedings of the
2022 ACM SIGSAC Conference on Computer and Communications Security (2022), pp. 499–513.
[34] Chen, Y., Wu, L., and Zaki, M. Iterative deep graph learning for graph neural networks: Better and robust node
embeddings. Advances in Neural Information Processing Systems 33 (2020), 19314–19326.
[35] Chen, Y., Yang, H., Zhang, Y., Ma, K., Liu, T., Han, B., and Cheng, J. Understanding and improving graph injection
attack by promoting unnoticeability. arXiv preprint arXiv:2202.08057 (2022).
[36] Chen, Y., Zhang, Y., Bian, Y., Yang, H., Kaili, M., Xie, B., Liu, T., Han, B., and Cheng, J. Learning Causally Invariant
Representations for Out-of-Distribution Generalization on Graphs. In Conference on Neural Information Processing
Systems (NeurIPS) (2022), vol. 35, pp. 22131–22148.
[37] Chen, Z., Li, X., and Bruna, J. Supervised community detection with line graph neural networks. arXiv preprint
arXiv:1705.08415 (2017).
[38] Cheng, D., Tu, Y., Ma, Z.-W., Niu, Z., and Zhang, L. Risk assessment for networked-guarantee loans using high-order
graph attention representation. In IJCAI (2019), pp. 5822–5828.
[39] Chien, E., Pan, C., and Milenkovic, O. Efficient model updates for approximate unlearning of graph-structured data.
In The Eleventh International Conference on Learning Representations (2022).
[40] Dai, E., Aggarwal, C., and Wang, S. Nrgnn: Learning a label noise resistant graph neural network on sparsely and
noisily labeled graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
(2021), pp. 227–236.
[41] Dai, E., Cui, L., Wang, Z., Tang, X., Wang, Y., Cheng, M., Yin, B., and Wang, S. A unified framework of graph
information bottleneck for robustness and membership privacy. arXiv preprint arXiv:2306.08604 (2023).
[42] Dai, E., Jin, W., Liu, H., and Wang, S. Towards robust graph neural networks for noisy graphs with sparse labels.
arXiv preprint arXiv:2201.00232 (2022).
[43] Dai, E., Lin, M., Zhang, X., and Wang, S. Unnoticeable backdoor attacks on graph neural networks. In Proceedings
of the ACM Web Conference 2023 (2023), pp. 2263–2273.
[44] Dai, E., and Wang, S. Say no to the discrimination: Learning fair graph neural networks with limited sensitive
attribute information. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (2021),
pp. 680–688.
[45] Dai, E., and Wang, S. Towards self-explainable graph neural network. In Proceedings of the 30th ACM International
Conference on Information & Knowledge Management (2021), pp. 302–311.
[46] Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J., and Song, L. Adversarial attack on graph structured data.
arXiv preprint arXiv:1806.02371 (2018).
[47] Dai, Q., Shen, X., Zhang, L., Li, Q., and Wang, D. Adversarial training methods for network embedding. In The
World Wide Web Conference (2019), pp. 329–339.
[48] Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., and Sen, P. A survey of the state of explainable ai
for natural language processing. arXiv preprint arXiv:2010.00711 (2020).
[49] Debnath, A. K., Lopez de Compadre, R. L., Debnath, G., Shusterman, A. J., and Hansch, C. Structure-activity
relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies
and hydrophobicity. Journal of medicinal chemistry 34, 2 (1991), 786–797.
[50] Deng, Z., Dong, Y., and Zhu, J. Batch virtual adversarial training for graph convolutional networks. arXiv preprint
arXiv:1902.09192 (2019).
[51] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for

, Vol. 1, No. 1, Article . Publication date: September 2023.


52 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

language understanding. arXiv preprint arXiv:1810.04805 (2018).


[52] Dong, Y., Kang, J., Tong, H., and Li, J. Individual fairness for graph neural networks: A ranking based approach. In
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021), pp. 300–310.
[53] Dong, Y., Liu, N., Jalaian, B., and Li, J. Edits: Modeling and mitigating data bias for graph neural networks. arXiv
preprint arXiv:2108.05233 (2021).
[54] Dong, Y., Lizardo, O., and Chawla, N. V. Do the young live in a “smaller world” than the old? age-specific degrees
of separation in a large-scale mobile communication network. arXiv preprint arXiv:1606.07556 (2016).
[55] Dong, Y., Ma, J., Wang, S., Chen, C., and Li, J. Fairness in graph mining: A survey. IEEE Transactions on Knowledge
and Data Engineering (2023).
[56] Dong, Y., Wang, S., Ma, J., Liu, N., and Li, J. Interpreting unfairness in graph neural networks via training node
attribution. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 7441–7449.
[57] Dong, Y., Wang, S., Wang, Y., Derr, T., and Li, J. On structural explanation of bias in graph neural networks. In
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022), pp. 316–326.
[58] Duddu, V., Boutet, A., and Shejwalkar, V. Quantifying privacy leakage in graph embedding. arXiv preprint
arXiv:2010.00906 (2020).
[59] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. Fairness through awareness. In Proceedings of the
3rd innovations in theoretical computer science conference (2012), pp. 214–226.
[60] Dwork, C., McSherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In
Theory of cryptography conference (2006), Springer, pp. 265–284.
[61] Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9,
3-4 (2014), 211–407.
[62] Edwards, H., and Storkey, A. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897 (2015).
[63] Entezari, N., Al-Sayouri, S. A., Darvishzadeh, A., and Papalexakis, E. E. All you need is low (rank) defending
against adversarial attacks on graphs. In Proceedings of the 13th International Conference on Web Search and Data
Mining (2020), pp. 169–177.
[64] Faber, L., K. Moghaddam, A., and Wattenhofer, R. When comparing to ground truth is wrong: On evaluating gnn
explanation methods. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
(2021), pp. 332–341.
[65] Fan, S., Wang, X., Mo, Y., Shi, C., and Tang, J. Debiasing Graph Neural Networks via Learning Disentangled Causal
Substructure. In Conference on Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 24934–24946.
[66] Fan, W., Jin, W., Liu, X., Xu, H., Tang, X., Wang, S., Li, Q., Tang, J., Wang, J., and Aggarwal, C. Jointly attacking
graph neural network and its explanations. arXiv preprint arXiv:2108.03388 (2021).
[67] Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., and Yin, D. Graph neural networks for social recommendation. In
The World Wide Web Conference (2019), pp. 417–426.
[68] Feng, F., He, X., Tang, J., and Chua, T.-S. Graph adversarial training: Dynamically regularizing based on graph
structure. arXiv preprint arXiv:1902.08226 (2019).
[69] Feng, S., Jing, B., Zhu, Y., and Tong, H. Adversarial graph contrastive learning with information regularization. In
Proceedings of the ACM Web Conference 2022 (2022), pp. 1362–1371.
[70] Funke, T., Khosla, M., and Anand, A. Hard masking for explaining graph neural networks.
[71] Funke, T., Khosla, M., and Anand, A. Hard masking for explaining graph neural networks, 2021.
[72] Funke, T., Khosla, M., and Anand, A. Zorro: Valid, sparse, and stable explanations in graph neural networks. arXiv
preprint arXiv:2105.08621 (2021).
[73] Gao, Y., Sun, T., Bhatt, R., Yu, D., Hong, S., and Zhao, L. Gnes: Learning to explain graph neural networks. In 2021
IEEE International Conference on Data Mining (ICDM) (2021), pp. 131–140.
[74] Garcia, J. O., Ashourvan, A., Muldoon, S., Vettel, J. M., and Bassett, D. S. Applications of community detection
techniques to brain graphs: Algorithmic considerations and implications for neural function. Proceedings of the IEEE
106, 5 (2018), 846–867.
[75] Geisler, S., Schmidt, T., Şirin, H., Zügner, D., Bojchevski, A., and Günnemann, S. Robustness of graph neural
networks at scale. Advances in Neural Information Processing Systems 34 (2021).
[76] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum
chemistry. In International conference on machine learning (2017), PMLR, pp. 1263–1272.
[77] Girvan, M., and Newman, M. E. Community structure in social and biological networks. Proceedings of the national
academy of sciences 99, 12 (2002), 7821–7826.
[78] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.
Generative adversarial nets. Advances in neural information processing systems 27 (2014).
[79] Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint
arXiv:1412.6572 (2014).

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 53

[80] Guo, C., Goldstein, T., Hannun, A., and Van Der Maaten, L. Certified data removal from machine learning models.
arXiv preprint arXiv:1911.03030 (2019).
[81] Guo, J., Li, S., Zhao, Y., and Zhang, Y. Learning robust representation through graph adversarial contrastive learning.
In International Conference on Database Systems for Advanced Applications (2022), Springer, pp. 682–697.
[82] Guo, Z., Li, J., Xiao, T., Ma, Y., and Wang, S. Towards fair graph neural networks via graph counterfactual. arXiv
preprint arXiv:2307.04937 (2023).
[83] Guo, Z., Xiao, T., Aggarwal, C., Liu, H., and Wang, S. Counterfactual learning on graphs: A survey. arXiv preprint
arXiv:2304.01391 (2023).
[84] Hamilton, W. L., Ying, R., and Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the
31st International Conference on Neural Information Processing Systems (2017), pp. 1025–1035.
[85] Hardt, M., Price, E., and Srebro, N. Equality of opportunity in supervised learning. Advances in neural information
processing systems 29 (2016), 3315–3323.
[86] Harl, M., Weinzierl, S., Stierle, M., and Matzner, M. Explainable predictive business process monitoring using
gated graph neural networks. Journal of Decision Systems 29, sup1 (2020), 312–327.
[87] Harper, F. M., and Konstan, J. A. The movielens datasets: History and context. Acm transactions on interactive
intelligent systems (tiis) 5, 4 (2015), 1–19.
[88] Hashimoto, T., Srivastava, M., Namkoong, H., and Liang, P. Fairness without demographics in repeated loss
minimization. In International Conference on Machine Learning (2018), PMLR, pp. 1929–1938.
[89] He, C., Ceyani, E., Balasubramanian, K., Annavaram, M., and Avestimehr, S. Spreadgnn: Serverless multi-task
federated learning for graph neural networks. arXiv preprint arXiv:2106.02743 (2021).
[90] He, X., Jia, J., Backes, M., Gong, N. Z., and Zhang, Y. Stealing links from graph neural networks. In 30th {USENIX}
Security Symposium ({USENIX} Security 21) (2021).
[91] He, X., Wen, R., Wu, Y., Backes, M., Shen, Y., and Zhang, Y. Node-level membership inference attacks against graph
neural networks. arXiv preprint arXiv:2102.05429 (2021).
[92] Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V., and Leskovec, J. Strategies for pre-training graph neural
networks. arXiv preprint arXiv:1905.12265 (2019).
[93] Hu, Z., Dong, Y., Wang, K., Chang, K.-W., and Sun, Y. Gpt-gnn: Generative pre-training of graph neural networks.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020),
pp. 1857–1867.
[94] Hu, Z., Dong, Y., Wang, K., and Sun, Y. Heterogeneous graph transformer. In Proceedings of The Web Conference
2020 (2020), pp. 2704–2710.
[95] Huang, Q., Yamada, M., Tian, Y., Singh, D., Yin, D., and Chang, Y. Graphlime: Local interpretable model explanations
for graph neural networks. arXiv preprint arXiv:2001.06216 (2020).
[96] Huang, Z., Kosan, M., Medya, S., Ranu, S., and Singh, A. Global Counterfactual Explainer for Graph Neural
Networks. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (2023), ACM.
[97] Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., and Baesens, B. An empirical evaluation of the comprehensi-
bility of decision table, tree and rule based predictive models. Decision Support Systems 51, 1 (2011), 141–154.
[98] Jayaraman, B., and Evans, D. Evaluating differentially private machine learning in practice. In 28th {USENIX}
Security Symposium ({USENIX} Security 19) (2019), pp. 1895–1912.
[99] Ji, Z., Lipton, Z. C., and Elkan, C. Differential privacy and machine learning: a survey and review. arXiv preprint
arXiv:1412.7584 (2014).
[100] Jiang, D., Wu, Z., Hsieh, C.-Y., Chen, G., Liao, B., Wang, Z., Shen, C., Cao, D., Wu, J., and Hou, T. Could graph
neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based
and graph-based models. Journal of cheminformatics 13, 1 (2021), 1–23.
[101] Jin, H., Shi, Z., Peruri, V. J. S. A., and Zhang, X. Certified robustness of graph convolution networks for graph
classification under topological attacks. Advances in Neural Information Processing Systems 33 (2020), 8463–8474.
[102] Jin, M., Chang, H., Zhu, W., and Sojoudi, S. Power up! robust graph convolutional network against evasion attacks
based on graph powering. arXiv preprint arXiv:1905.10029 (2019).
[103] Jin, W., Derr, T., Wang, Y., Ma, Y., Liu, Z., and Tang, J. Node similarity preserving graph convolutional networks.
In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (2021), pp. 148–156.
[104] Jin, W., Li, Y., Xu, H., Wang, Y., Ji, S., Aggarwal, C., and Tang, J. Adversarial attacks and defenses on graphs: A
review, a tool and empirical studies. arXiv preprint arXiv:2003.00653 (2020).
[105] Jin, W., Ma, Y., Liu, X., Tang, X., Wang, S., and Tang, J. Graph structure learning for robust graph neural networks.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020),
pp. 66–74.
[106] Jordan, K. L., and Freiburger, T. L. The effect of race/ethnicity on sentencing: Examining sentence type, jail length,
and prison length. Journal of Ethnicity in Criminal Justice 13, 3 (2015), 179–196.

, Vol. 1, No. 1, Article . Publication date: September 2023.


54 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

[107] Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode,
G., Cummings, R., et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977 (2019).
[108] Kang, J., He, J., Maciejewski, R., and Tong, H. Inform: Individual fairness on graph mining. In Proceedings of the
26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp. 379–389.
[109] Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. What can we learn privately?
SIAM Journal on Computing 40, 3 (2011), 793–826.
[110] Kawahara, J., Brown, C. J., Miller, S. P., Booth, B. G., Chau, V., Grunau, R. E., Zwicker, J. G., and Hamarneh, G.
Brainnetcnn: Convolutional neural networks for brain networks; towards predicting neurodevelopment. NeuroImage
146 (2017), 1038–1049.
[111] Khajehnejad, A., Khajehnejad, M., Babaei, M., Gummadi, K. P., Weller, A., and Mirzasoleiman, B. Crosswalk:
Fairness-enhanced node representation learning. arXiv preprint arXiv:2105.02725 (2021).
[112] Kim, D., and Oh, A. How to find your friendly neighborhood: Graph attention design with self-supervision. In
International Conference on Learning Representations (2020).
[113] Kipf, T. N., and Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint
arXiv:1609.02907 (2016).
[114] Kipf, T. N., and Welling, M. Variational graph auto-encoders. arXiv abs/1611.07308 (2016).
[115] Klicpera, J., Bojchevski, A., and Günnemann, S. Predict then propagate: Graph neural networks meet personalized
pagerank. arXiv preprint arXiv:1810.05997 (2018).
[116] Konečnỳ, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., and Bacon, D. Federated learning: Strategies
for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
[117] Köse, Ö. D., and Shen, Y. Fairness-aware node representation learning. arXiv preprint arXiv:2106.05391 (2021).
[118] Kose, O. D., and Shen, Y. Demystifying and mitigating bias for node representation learning. IEEE Transactions on
Neural Networks and Learning Systems (2023).
[119] Kulesza, T., Stumpf, S., Burnett, M., Yang, S., Kwan, I., and Wong, W.-K. Too much, too little, or just right?
ways explanations impact end users’ mental models. In 2013 IEEE Symposium on visual languages and human centric
computing (2013), IEEE, pp. 3–10.
[120] Kusner, M. J., Loftus, J. R., Russell, C., and Silva, R. Counterfactual fairness. arXiv preprint arXiv:1703.06856 (2017).
[121] Lahoti, P., Beutel, A., Chen, J., Lee, K., Prost, F., Thain, N., Wang, X., and Chi, E. H. Fairness without demographics
through adversarially reweighted learning. arXiv preprint arXiv:2006.13114 (2020).
[122] Leskovec, J., and Mcauley, J. Learning to discover social circles in ego networks. Advances in neural information
processing systems 25 (2012).
[123] Li, H., Xu, Z., Taylor, G., Studer, C., and Goldstein, T. Visualizing the loss landscape of neural nets. Advances in
neural information processing systems 31 (2018).
[124] Li, J., Xie, T., Liang, C., Xie, F., He, X., and Zheng, Z. Adversarial attack on large scale graph. IEEE Transactions on
Knowledge and Data Engineering (2021).
[125] Li, K., Luo, G., Ye, Y., Li, W., Ji, S., and Cai, Z. Adversarial privacy-preserving graph embedding against inference
attack. IEEE Internet of Things Journal 8, 8 (2020), 6904–6915.
[126] Li, P., Wang, Y., Zhao, H., Hong, P., and Liu, H. On dyadic fairness: Exploring and mitigating bias in graph
connections. In International Conference on Learning Representations (2020).
[127] Li, X., Zhou, Y., Dvornek, N., Zhang, M., Gao, S., Zhuang, J., Scheinost, D., Staib, L. H., Ventola, P., and Duncan,
J. S. Braingnn: Interpretable brain graph neural network for fmri analysis. Medical Image Analysis 74 (2021), 102233.
[128] Li, Y., Qian, B., Zhang, X., and Liu, H. Graph neural network-based diagnosis prediction. Big Data 8, 5 (2020),
379–390.
[129] Li, Y., Yin, J., and Chen, L. Unified robust training for graph neural networks against label noise. In Pacific-Asia
Conference on Knowledge Discovery and Data Mining (2021), Springer, pp. 528–540.
[130] Liang, T., Zeng, G., Zhong, Q., Chi, J., Feng, J., Ao, X., and Tang, J. Credit risk and limits forecasting in e-commerce
consumer lending service via multi-view-aware mixture-of-experts nets. In Proceedings of the 14th ACM International
Conference on Web Search and Data Mining (2021), pp. 229–237.
[131] Liao, J., Huang, C., Kairouz, P., and Sankar, L. Learning generative adversarial representations (gap) under fairness
and censoring constraints. arXiv preprint arXiv:1910.00411 (2019).
[132] Liao, P., Zhao, H., Xu, K., Jaakkola, T., Gordon, G. J., Jegelka, S., and Salakhutdinov, R. Information obfuscation
of graph neural networks. In International Conference on Machine Learning (2021), PMLR, pp. 6600–6610.
[133] Lin, L., Chen, J., and Wang, H. Spectral augmentation for self-supervised learning on graphs. ICLR (2023).
[134] Lin, W., Lan, H., and Li, B. Generative causal explanations for graph neural networks. In International Conference on
Machine Learning (2021), PMLR, pp. 6666–6679.
[135] Ling, H., Jiang, Z., Luo, Y., Ji, S., and Zou, N. Learning fair graph representations via automated data augmentations.
In The Eleventh International Conference on Learning Representations (2022).

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 55

[136] Liu, H., Wang, Y., Fan, W., Liu, X., Li, Y., Jain, S., Liu, Y., Jain, A. K., and Tang, J. Trustworthy ai: A computational
perspective. arXiv preprint arXiv:2107.06641 (2021).
[137] Liu, Z., Yang, L., Fan, Z., Peng, H., and Yu, P. S. Federated social recommendation with graph neural network. arXiv
preprint arXiv:2111.10778 (2021).
[138] Locatello, F., Abbati, G., Rainforth, T., Bauer, S., Schölkopf, B., and Bachem, O. On the fairness of disentangled
representations. arXiv preprint arXiv:1905.13662 (2019).
[139] Long, Y., Wu, M., Liu, Y., Fang, Y., Kwoh, C. K., Luo, J., and Li, X. Pre-training graph neural networks for link
prediction in biomedical networks.
[140] Lucic, A., Ter Hoeve, M. A., Tolomei, G., De Rijke, M., and Silvestri, F. Cf-gnnexplainer: Counterfactual
explanations for graph neural networks. In International Conference on Artificial Intelligence and Statistics (2022),
PMLR, pp. 4499–4511.
[141] Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., and Zhang, X. Parameterized explainer for graph neural
network. arXiv preprint arXiv:2011.04573 (2020).
[142] Luo, D., Cheng, W., Yu, W., Zong, B., Ni, J., Chen, H., and Zhang, X. Learning to drop: Robust graph neural network
via topological denoising. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining
(2021), pp. 779–787.
[143] Lv, L., Cheng, J., Peng, N., Fan, M., Zhao, D., and Zhang, J. Auto-encoder based graph convolutional networks
for online financial anti-fraud. In 2019 IEEE Conference on Computational Intelligence for Financial Engineering &
Economics (CIFEr) (2019), IEEE, pp. 1–6.
[144] Ma, J., Ding, S., and Mei, Q. Towards more practical adversarial attacks on graph neural networks. Advances in
neural information processing systems 33 (2020), 4756–4766.
[145] Ma, J., Guo, R., Wan, M., Yang, L., Zhang, A., and Li, J. Learning fair node representations with graph counterfactual
fairness. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2022), pp. 695–
703.
[146] Ma, Y., Wang, S., Derr, T., Wu, L., and Tang, J. Graph adversarial attack via rewiring. In Proceedings of the 27th
ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021), pp. 1161–1169.
[147] Madras, D., Creager, E., Pitassi, T., and Zemel, R. Learning adversarially fair and transferable representations.
arXiv preprint arXiv:1802.06309 (2018).
[148] Masrour, F., Wilson, T., Yan, H., Tan, P.-N., and Esfahanian, A. Bursting the filter bubble: Fairness-aware network
link prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (2020), vol. 34, pp. 841–848.
[149] McAuley, J., and Leskovec, J. Image labeling on a network: using social-network metadata for image classification.
In European conference on computer vision (2012), Springer, pp. 828–841.
[150] McAuley, J. J., and Leskovec, J. Learning to discover social circles in ego networks. In NIPS (2012), vol. 2012, Citeseer,
pp. 548–56.
[151] McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep
networks from decentralized data. In Artificial intelligence and statistics (2017), PMLR, pp. 1273–1282.
[152] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and fairness in machine
learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–35.
[153] Mehrabi, N., Naveed, M., Morstatter, F., and Galstyan, A. Exacerbating algorithmic bias through fairness attacks.
arXiv preprint arXiv:2012.08723 (2020).
[154] Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019),
1–38.
[155] Mittelstadt, B., Russell, C., and Wachter, S. Explaining explanations in ai. In Proceedings of the conference on
fairness, accountability, and transparency (2019), pp. 279–288.
[156] Morris, C., Kriege, N. M., Bause, F., Kersting, K., Mutzel, P., and Neumann, M. Tudataset: A collection of
benchmark datasets for learning with graphs. arXiv preprint arXiv:2007.08663 (2020).
[157] Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., and Yu, B. Definitions, methods, and applications in
interpretable machine learning. Proceedings of the National Academy of Sciences 116, 44 (2019), 22071–22080.
[158] Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., van Keulen, M., and
Seifert, C. From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating
explainable ai. arXiv preprint arXiv:2201.08164 (2022).
[159] Niu, X., Li, B., Li, C., Xiao, R., Sun, H., Deng, H., and Chen, Z. A dual heterogeneous graph attention network to
improve long-tail performance for shop search in e-commerce. In Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (2020), pp. 3405–3415.
[160] Olatunji, I. E., Funke, T., and Khosla, M. Releasing graph neural networks with differential privacy guarantees.
arXiv preprint arXiv:2109.08907 (2021).
[161] Olatunji, I. E., Nejdl, W., and Khosla, M. Membership inference attack on graph neural networks. arXiv preprint

, Vol. 1, No. 1, Article . Publication date: September 2023.


56 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

arXiv:2101.06570 (2021).
[162] Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social data: Biases, methodological pitfalls, and ethical
boundaries. Frontiers in Big Data 2 (2019), 13.
[163] Palowitch, J., and Perozzi, B. Monet: Debiasing graph embeddings via the metadata-orthogonal training unit.
arXiv preprint arXiv:1909.11793 (2019).
[164] Pan, S., Wu, J., Zhu, X., Zhang, C., and Wang, Y. Tri-party deep network representation. Network 11, 9 (2016), 12.
[165] Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for
deep learning from private training data. arXiv preprint arXiv:1610.05755 (2016).
[166] Pei, Y., Mao, R., Liu, Y., Chen, C., Xu, S., Qiang, F., and Tech, B. E. Decentralized federated graph neural networks.
In International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction
with IJCAI (2021).
[167] Pope, P. E., Kolouri, S., Rostami, M., Martin, C. E., and Hoffmann, H. Explainability methods for graph convolu-
tional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019),
pp. 10772–10781.
[168] Qiu, J., Chen, Q., Dong, Y., Zhang, J., Yang, H., Ding, M., Wang, K., and Tang, J. Gcc: Graph contrastive coding for
graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining (2020), pp. 1150–1160.
[169] Rahman, M. A., Rahman, T., Laganière, R., Mohammed, N., and Wang, Y. Membership inference attack against
differentially private deep learning model. Trans. Data Priv. 11, 1 (2018), 61–79.
[170] Rahman, T., Surma, B., Backes, M., and Zhang, Y. Fairwalk: Towards fair graph embedding.
[171] Rao, S. X., Zhang, S., Han, Z., Zhang, Z., Min, W., Chen, Z., Shan, Y., Zhao, Y., and Zhang, C. xfraud: Explainable
fraud transaction detection on heterogeneous graphs. arXiv preprint arXiv:2011.12193 (2020).
[172] Ribeiro, M. T., Singh, S., and Guestrin, C. " why should i trust you?" explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (2016),
pp. 1135–1144.
[173] Rigaki, M., and Garcia, S. A survey of privacy attacks in machine learning. arXiv preprint arXiv:2007.07646 (2020).
[174] Robnik-Šikonja, M., and Bohanec, M. Perturbation-based explanations of prediction models. In Human and
machine learning. Springer, 2018, pp. 159–175.
[175] Rong, Y., Bian, Y., Xu, T., Xie, W., Wei, Y., Huang, W., and Huang, J. Self-supervised graph transformer on large-scale
molecular data. arXiv preprint arXiv:2007.02835 (2020).
[176] Sajadmanesh, S., and Gatica-Perez, D. Locally private graph neural networks. arXiv preprint arXiv:2006.05535
(2020).
[177] Sankar, A., Liu, Y., Yu, J., and Shah, N. Graph neural networks for friend ranking in large-scale social platforms. In
Proceedings of the Web Conference 2021 (2021), pp. 2535–2546.
[178] Schlichtkrull, M., Kipf, T. N., Bloem, P., Berg, R. v. d., Titov, I., and Welling, M. Modeling relational data with
graph convolutional networks. In European semantic web conference (2018), Springer, pp. 593–607.
[179] Schlichtkrull, M. S., De Cao, N., and Titov, I. Interpreting graph neural networks for nlp with differentiable edge
masking. arXiv preprint arXiv:2010.00577 (2020).
[180] Schnake, T., Eberle, O., Lederer, J., Nakajima, S., Schütt, K. T., Müller, K.-R., and Montavon, G. Higher-order
explanations of graph neural networks via relevant walks. arXiv preprint arXiv:2006.03589 (2020).
[181] Schwarzenberg, R., Hübner, M., Harbecke, D., Alt, C., and Hennig, L. Layerwise relevance visualization in
convolutional text graph classifiers. arXiv preprint arXiv:1909.10911 (2019).
[182] Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. Collective classification in network
data. AI magazine 29, 3 (2008), 93–93.
[183] Sharma, S., Henderson, J., and Ghosh, J. Certifai: A common framework to provide explanations and analyse the
fairness and robustness of black-box models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
(2020), pp. 166–172.
[184] Shchur, O., and Günnemann, S. Overlapping community detection with graph neural networks. arXiv preprint
arXiv:1909.12201 (2019).
[185] Shchur, O., Mumme, M., Bojchevski, A., and Günnemann, S. Pitfalls of graph neural network evaluation. arXiv
preprint arXiv:1811.05868 (2018).
[186] Shen, Y., He, X., Han, Y., and Zhang, Y. Model stealing attacks against inductive graph neural networks. arXiv
preprint arXiv:2112.08331 (2021).
[187] Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference attacks against machine learning
models. In 2017 IEEE Symposium on Security and Privacy (SP) (2017), IEEE, pp. 3–18.
[188] Shrestha, Y. R., and Yang, Y. Fairness in algorithmic decision-making: Applications in multi-winner voting, machine
learning, and recommender systems. Algorithms 12, 9 (2019), 199.

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 57

[189] Smuha, N. Ethics guidelines for trustworthy ai. In AI & Ethics, Date: 2019/05/28-2019/05/28, Location: Brussels (Digityser),
Belgium (2019).
[190] Snijders, T. A., Van de Bunt, G. G., and Steglich, C. E. Introduction to stochastic actor-based models for network
dynamics. Social networks 32, 1 (2010), 44–60.
[191] Solans, D., Biggio, B., and Castillo, C. Poisoning attacks on algorithmic fairness. arXiv preprint arXiv:2004.07401
(2020).
[192] Spinelli, I., Scardapane, S., Hussain, A., and Uncini, A. Biased edge dropout for enhancing fairness in graph
representation learning. arXiv preprint arXiv:2104.14210 (2021).
[193] Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional
net. arXiv preprint arXiv:1412.6806 (2014).
[194] Stoica, A.-A., Riederer, C., and Chaintreau, A. Algorithmic glass ceiling in social networks: The effects of social
recommendations on network diversity. In Proceedings of the 2018 World Wide Web Conference (2018), pp. 923–932.
[195] Sun, L., Wang, J., Yu, P. S., and Li, B. Adversarial attack and defense on graph data: A survey. arXiv preprint
arXiv:1812.10528 (2018).
[196] Sun, Y., Liu, T., Hu, P., Liao, Q., Ji, S., Yu, N., Guo, D., and Liu, L. Deep intellectual property: A survey. arXiv preprint
arXiv:2304.14613 (2023).
[197] Sun, Y., Wang, S., Tang, X., Hsieh, T.-Y., and Honavar, V. Adversarial attacks on graph neural networks via
node injections: A hierarchical reinforcement learning approach. In Proceedings of the Web Conference 2020 (2020),
pp. 673–683.
[198] Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. In International Conference on
Machine Learning (2017), PMLR, pp. 3319–3328.
[199] Suresh, H., and Guttag, J. V. A framework for understanding unintended consequences of machine learning.
[200] Tan, Q., Liu, N., and Hu, X. Deep representation learning for social network analysis. Frontiers in big Data 2 (2019), 2.
[201] Tang, H., Ma, G., Chen, Y., Guo, L., Wang, W., Zeng, B., and Zhan, L. Adversarial attack on hierarchical graph
pooling neural networks. arXiv preprint arXiv:2005.11560 (2020).
[202] Tang, X., Li, Y., Sun, Y., Yao, H., Mitra, P., and Wang, S. Transferring robustness for graph neural network
against poisoning attacks. In Proceedings of the 13th International Conference on Web Search and Data Mining (2020),
pp. 600–608.
[203] Tang, X., Yao, H., Sun, Y., Wang, Y., Tang, J., Aggarwal, C., Mitra, P., and Wang, S. Investigating and mitigating
degree-related biases in graph convoltuional networks. In Proceedings of the 29th ACM International Conference on
Information & Knowledge Management (2020), pp. 1435–1444.
[204] Tao, S., Cao, Q., Shen, H., Huang, J., Wu, Y., and Cheng, X. Single node injection attack against graph neural
networks. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (2021),
pp. 1794–1803.
[205] Van, M.-H., Du, W., Wu, X., and Lu, A. Poisoning attacks on fair machine learning. arXiv preprint arXiv:2110.08932
(2021).
[206] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv
preprint arXiv:1710.10903 (2017).
[207] Verma, S., Boonsanong, V., Hoang, M., Hines, K. E., Dickerson, J. P., and Shah, C. Counterfactual Explanations
and Algorithmic Recourses for Machine Learning: A Review. arXiv (2020).
[208] Vu, M. N., and Thai, M. T. Pgm-explainer: Probabilistic graphical model explanations for graph neural networks.
arXiv preprint arXiv:2010.05788 (2020).
[209] Waheed, A., Duddu, V., and Asokan, N. Grove: Ownership verification of graph neural networks using embeddings.
arXiv preprint arXiv:2304.08566 (2023).
[210] Wang, B., Guo, J., Li, A., Chen, Y., and Li, H. Privacy-preserving representation learning on graphs: A mutual
information perspective. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
(2021), pp. 1667–1676.
[211] Wang, B., Jia, J., Cao, X., and Gong, N. Z. Certified robustness of graph neural networks against adversarial structural
perturbation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021),
pp. 1645–1653.
[212] Wang, B., Li, A., Li, H., and Chen, Y. Graphfl: A federated learning framework for semi-supervised node classification
on graphs. arXiv preprint arXiv:2012.04187 (2020).
[213] Wang, H., Zhang, F., Xie, X., and Guo, M. Dkn: Deep knowledge-aware network for news recommendation. In
Proceedings of the 2018 world wide web conference (2018), pp. 1835–1844.
[214] Wang, J., Luo, M., Li, J., Liu, Z., Zhou, J., and Zheng, Q. Robust unsupervised graph representation learning via
mutual information maximization. arXiv preprint arXiv:2201.08557 (2022).
[215] Wang, J., Luo, M., Suya, F., Li, J., Yang, Z., and Zheng, Q. Scalable attack on graph data by injecting vicious nodes.

, Vol. 1, No. 1, Article . Publication date: September 2023.


58 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

Data Mining and Knowledge Discovery 34, 5 (2020), 1363–1389.


[216] Wang, J., Zhang, S., Xiao, Y., and Song, R. A review on graph neural network methods in financial applications.
arXiv preprint arXiv:2111.15367 (2021).
[217] Wang, L., Yu, W., Wang, W., Cheng, W., Zhang, W., Zha, H., He, X., and Chen, H. Learning robust representations
with graph denoising policy network. In 2019 IEEE International Conference on Data Mining (ICDM) (2019), IEEE,
pp. 1378–1383.
[218] Wang, N., Lin, L., Li, J., and Wang, H. Unbiased graph embedding with biased graph observations. arXiv preprint
arXiv:2110.13957 (2021).
[219] Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., and Yu, P. S. Heterogeneous graph attention network. In The
World Wide Web Conference (2019), pp. 2022–2032.
[220] Wang, X., Li, J., Yang, L., and Mi, H. Unsupervised learning for community detection in attributed networks based
on graph convolutional network. Neurocomputing 456 (2021), 147–155.
[221] Wang, X., Liu, X., and Hsieh, C.-J. Graphdefense: Towards robust graph convolutional networks. arXiv preprint
arXiv:1911.04429 (2019).
[222] Wang, X., Wu, Y., Zhang, A., He, X., and Chua, T.-s. Causal screening to interpret graph neural networks.
[223] Wang, X., Wu, Y., Zhang, A., He, X., and Chua, T.-S. Towards multi-grained explainability for graph neural networks.
Advances in Neural Information Processing Systems 34 (2021).
[224] Wang, Y., Zhao, Y., Dong, Y., Chen, H., Li, J., and Derr, T. Improving fairness in graph neural networks via
mitigating sensitive attribute leakage. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (2022), pp. 1938–1948.
[225] Wu, B., Li, J., Hou, C., Fu, G., Bian, Y., Chen, L., and Huang, J. Recent advances in reliable deep graph learning:
Adversarial attack, inherent noise, and distribution shift. arXiv preprint arXiv:2202.07114 (2022).
[226] Wu, B., Yang, X., Pan, S., and Yuan, X. Model extraction attacks on graph neural networks: Taxonomy and realization.
arXiv preprint arXiv:2010.12751 (2020).
[227] Wu, B., Yang, X., Pan, S., and Yuan, X. Adapting membership inference attacks to gnn for graph classification:
Approaches and implications. arXiv preprint arXiv:2110.08760 (2021).
[228] Wu, C., Wu, F., Cao, Y., Huang, Y., and Xie, X. Fedgnn: Federated graph neural network for privacy-preserving
recommendation. arXiv preprint arXiv:2102.04925 (2021).
[229] Wu, H., Wang, C., Tyshetskiy, Y., Docherty, A., Lu, K., and Zhu, L. Adversarial examples for graph data: deep
insights into attack and defense. In Proceedings of the 28th International Joint Conference on Artificial Intelligence
(2019), AAAI Press, pp. 4816–4823.
[230] Wu, J., Yang, Y., Qian, Y., Sui, Y., Wang, X., and He, X. Gif: A general graph unlearning strategy via influence
function. In Proceedings of the ACM Web Conference 2023 (2023), pp. 651–661.
[231] Wu, K., Shen, J., Ning, Y., Wang, T., and Wang, W. H. Certified edge unlearning for graph neural networks. In
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2023), pp. 2606–2617.
[232] Wu, T., Ren, H., Li, P., and Leskovec, J. Graph information bottleneck. Advances in Neural Information Processing
Systems 33 (2020), 20437–20448.
[233] Wu, Y., Wang, X., Zhang, A., He, X., and Chua, T.-S. Discovering Invariant Rationales for Graph Neural Networks.
In International Conference on Learning Representations (ICLR) (2022).
[234] Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., and Pande, V. Moleculenet:
a benchmark for molecular machine learning. Chemical science 9, 2 (2018), 513–530.
[235] Xi, Z., Pang, R., Ji, S., and Wang, T. Graph backdoor. In 30th USENIX Security Symposium (USENIX Security 21) (2021),
pp. 1523–1540.
[236] Xiao, T., Chen, Z., Wang, D., and Wang, S. Learning how to propagate messages in graph neural networks. In
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021), pp. 1894–1903.
[237] Xie, H., Ma, J., Xiong, L., and Yang, C. Federated graph classification over non-iid graphs. Advances in Neural
Information Processing Systems 34 (2021).
[238] Xiong, W., Ni’mah, I., Huesca, J. M., van Ipenburg, W., Veldsink, J., and Pechenizkiy, M. Looking deeper into
deep learning model: Attribution-based explanations of textcnn. arXiv preprint arXiv:1811.03970 (2018).
[239] Xu, B., Shen, H., Sun, B., An, R., Cao, Q., and Cheng, X. Towards consumer loan fraud detection: Graph neural
networks with role-constrained conditional random field. In Proceedings of the AAAI Conference on Artificial Intelligence
(2021), vol. 35, pp. 4537–4545.
[240] Xu, D., Yuan, S., Wu, X., and Phan, H. Dpne: Differentially private network embedding. In Pacific-Asia Conference
on Knowledge Discovery and Data Mining (2018), Springer, pp. 235–246.
[241] Xu, J., Koffas, S., Ersoy, O., and Picek, S. Watermarking graph neural networks based on backdoor attacks. In 2023
IEEE 8th European Symposium on Security and Privacy (EuroS&P) (2023), IEEE, pp. 1179–1197.
[242] Xu, J., Wang, R., Liang, K., and Picek, S. More is better (mostly): On the backdoor attacks in federated graph neural

, Vol. 1, No. 1, Article . Publication date: September 2023.


A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 59

networks. arXiv preprint arXiv:2202.03195 (2022).


[243] Xu, K., Chen, H., Liu, S., Chen, P.-Y., Weng, T.-W., Hong, M., and Lin, X. Topology attack and defense for graph
neural networks: An optimization perspective. arXiv preprint arXiv:1906.04214 (2019).
[244] Yang, F., Fan, K., Song, D., and Lin, H. Graph-based prediction of protein-protein interactions with attributed signed
graph embedding. BMC bioinformatics 21, 1 (2020), 1–16.
[245] Yang, M., Lyu, L., Zhao, J., Zhu, T., and Lam, K.-Y. Local differential privacy and its applications: A comprehensive
survey. arXiv preprint arXiv:2008.03686 (2020).
[246] Yang, Q., Liu, Y., Chen, T., and Tong, Y. Federated machine learning: Concept and applications. ACM Transactions
on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1–19.
[247] Yang, Y., and Song, L. Learn to explain efficiently via neural logic inductive learning. In International Conference on
Learning Representations (2019).
[248] Ying, R., Bourgeois, D., You, J., Zitnik, M., and Leskovec, J. Gnnexplainer: Generating explanations for graph
neural networks. Advances in neural information processing systems 32 (2019), 9240.
[249] Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph convolutional neural
networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (2018), pp. 974–983.
[250] You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., and Shen, Y. Graph contrastive learning with augmentations. Advances
in Neural Information Processing Systems 33 (2020), 5812–5823.
[251] Yuan, H., Tang, J., Hu, X., and Ji, S. Xgnn: Towards model-level explanations of graph neural networks. In Proceedings
of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp. 430–438.
[252] Yuan, H., Yu, H., Gui, S., and Ji, S. Explainability in graph neural networks: A taxonomic survey. arXiv preprint
arXiv:2012.15445 (2020).
[253] Yuan, H., Yu, H., Wang, J., Li, K., and Ji, S. On explainability of graph neural networks via subgraph explorations.
arXiv preprint arXiv:2102.05152 (2021).
[254] Zeng, Z., Islam, R., Keya, K. N., Foulds, J., Song, Y., and Pan, S. Fair representation learning for heterogeneous
information networks. arXiv preprint arXiv:2104.08769 (2021).
[255] Zhang, H., Wu, B., Yuan, X., Pan, S., Tong, H., and Pei, J. Trustworthy graph neural networks: Aspects, methods
and trends. arXiv preprint arXiv:2205.07424 (2022).
[256] Zhang, H., Zheng, T., Gao, J., Miao, C., Su, L., Li, Y., and Ren, K. Data poisoning attack against knowledge graph
embedding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (2019), AAAI Press,
pp. 4853–4859.
[257] Zhang, M., Hu, L., Shi, C., and Wang, X. Adversarial label-flipping attack and defense for graph neural networks. In
2020 IEEE International Conference on Data Mining (ICDM) (2020), IEEE, pp. 791–800.
[258] Zhang, M., Wang, X., Zhu, M., Shi, C., Zhang, Z., and Zhou, J. Robust heterogeneous graph neural networks
against adversarial attacks.
[259] Zhang, Q.-s., and Zhu, S.-C. Visual interpretability for deep learning: a survey. Frontiers of Information Technology
& Electronic Engineering 19, 1 (2018), 27–39.
[260] Zhang, S., and Ni, W. Graph embedding matrix sharing with differential privacy. IEEE Access 7 (2019), 89390–89399.
[261] Zhang, S., Yin, H., Chen, T., Huang, Z., Cui, L., and Zhang, X. Graph embedding for recommendation against
attribute inference attacks. In Proceedings of the Web Conference 2021 (2021), pp. 3002–3014.
[262] Zhang, X., Liu, H., Li, Q., and Wu, X.-M. Attributed graph clustering via adaptive graph convolution. arXiv preprint
arXiv:1906.01210 (2019).
[263] Zhang, X., and Zitnik, M. Gnnguard: Defending graph neural networks against adversarial attacks. Advances in
Neural Information Processing Systems 33 (2020), 9263–9275.
[264] Zhang, Y., Defazio, D., and Ramesh, A. Relex: A model-agnostic relational model explainer. In Proceedings of the
2021 AAAI/ACM Conference on AI, Ethics, and Society (2021), pp. 1042–1049.
[265] Zhang, Z., Chen, M., Backes, M., Shen, Y., and Zhang, Y. Inference attacks against graph neural networks. arXiv
preprint arXiv:2110.02631 (2021).
[266] Zhang, Z., Jia, J., Wang, B., and Gong, N. Z. Backdoor attacks to graph neural networks. In Proceedings of the 26th
ACM Symposium on Access Control Models and Technologies (2021), pp. 15–26.
[267] Zhang, Z., Liu, Q., Huang, Z., Wang, H., Lu, C., Liu, C., and Chen, E. Graphmi: Extracting private graph data from
graph neural networks. arXiv preprint arXiv:2106.02820 (2021).
[268] Zhang, Z., Liu, Q., Wang, H., Lu, C., and Lee, C. Protgnn: Towards self-explaining graph neural networks, 2021.
[269] Zhao, T., Dai, E., Shu, K., and Wang, S. You can still achieve fairness without sensitive attributes: Exploring biases
in non-sensitive features. arXiv preprint arXiv:2104.14537 (2021).
[270] Zhao, T., Zhang, X., and Wang, S. Graphsmote: Imbalanced node classification on graphs with graph neural
networks. In Proceedings of the 14th ACM international conference on web search and data mining (2021), pp. 833–841.

, Vol. 1, No. 1, Article . Publication date: September 2023.


60 Enyan Dai, Tianxiang Zhao, Huaisheng Zhu, Junjie Xu, Zhimeng Guo, Hui Liu, Jiliang Tang, and Suhang Wang

[271] Zhao, T., Zhang, X., and Wang, S. Exploring edge disentanglement for node classification. In The Web Conference
(2022).
[272] Zhao, X., Wu, H., and Zhang, X. Watermarking graph neural networks by random graphs. In 2021 9th International
Symposium on Digital Forensics and Security (ISDFS) (2021), IEEE, pp. 1–6.
[273] Zheng, L., Zhou, J., Chen, C., Wu, B., Wang, L., and Zhang, B. Asfgnn: Automated separated-federated graph
neural network. Peer-to-Peer Networking and Applications 14, 3 (2021), 1692–1704.
[274] Zhou, J., Chen, C., Zheng, L., Wu, H., Wu, J., Zheng, X., Wu, B., Liu, Z., and Wang, L. Vertically federated graph
neural network for privacy-preserving node classification. arXiv preprint arXiv:2005.11903 (2020).
[275] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review
of methods and applications. AI Open 1 (2020), 57–81.
[276] Zhu, D., Zhang, Z., Cui, P., and Zhu, W. Robust graph convolutional networks against adversarial attacks.
[277] Zhu, H., Fu, G., Guo, Z., Zhang, Z., Xiao, T., and Wang, S. Fairness-aware message passing for graph neural
networks. arXiv preprint arXiv:2306.11132 (2023).
[278] Zhu, Y., Xu, W., Zhang, J., Liu, Q., Wu, S., and Wang, L. Deep graph structure learning for robust representations: A
survey. arXiv preprint arXiv:2103.03036 (2021).
[279] Zou, X., Zheng, Q., Dong, Y., Guan, X., Kharlamov, E., Lu, J., and Tang, J. Tdgia: Effective injection attacks on
graph neural networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
(2021), pp. 2461–2471.
[280] Zügner, D., Akbarnejad, A., and Günnemann, S. Adversarial attacks on neural networks for graph data. In
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), ACM,
pp. 2847–2856.
[281] Zügner, D., and Günnemann, S. Adversarial attacks on graph neural networks via meta learning. arXiv preprint
arXiv:1902.08412 (2019).
[282] Zügner, D., and Günnemann, S. Certifiable robustness and robust training for graph convolutional networks.
In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019),
pp. 246–256.
[283] Zügner, D., and Günnemann, S. Certifiable robustness of graph convolutional networks under structure perturba-
tions. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (2020),
pp. 1656–1665.

, Vol. 1, No. 1, Article . Publication date: September 2023.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy