GNN-Foundations-Frontiers-and-Applications-chapter8
GNN-Foundations-Frontiers-and-Applications-chapter8
Stephan Günnemann
Abstract Graph neural networks have achieved impressive results in various graph
learning tasks and they have found their way into many applications such as molec-
ular property prediction, cancer classification, fraud detection, or knowledge graph
reasoning. With the increasing number of GNN models deployed in scientific ap-
plications, safety-critical environments, or decision-making contexts involving hu-
mans, it is crucial to ensure their reliability. In this chapter, we provide an overview
of the current research on adversarial robustness of GNNs. We introduce the unique
challenges and opportunities that come along with the graph setting and give an
overview of works showing the limitations of classic GNNs via adversarial example
generation. Building upon these insights we introduce and categorize methods that
provide provable robustness guarantees for graph neural networks as well as prin-
ciples for improving robustness of GNNs. We conclude with a discussion of proper
evaluation practices taking robustness into account.
8.1 Motivation
The success story of graph neural networks is astonishing. Within a few years, they
have become a core component of many deep learning applications. Nowadays they
are used in scientific applications such as drug design or medical diagnoses, are
integrated in human-centered applications like fake news detection in social media,
get applied in decision-making tasks, and even are studied in safety-critical environ-
ments like autonomous driving. What unites these domains is their crucial need for
reliable results; misleading predictions are not only unfortunate but indeed might
lead to dramatic consequences – from false conclusions drawn in science to harm
for people. However, can we really trust the predictions resulting from graph neural
Stephan Günnemann
Department of Informatics, Technical University of Munich, e-mail: guennemann@in.tum.de
149
150 Stephan Günnemann
networks? What happens when the underlying data is corrupted or even becomes
deliberately manipulated?
Indeed, the vulnerability of classic machine learning models to (deliberate) per-
turbations of the data is well known (Goodfellow et al, 2015): even only slight
changes of the input can lead to wrong predictions. Such instances, for humans
nearly indistinguishable from the original input yet wrongly classified, are also
known as adversarial examples. One of the most well-known and alarming exam-
ples is an image of a stop sign, which is classified as a speed limit sign by a neural
network with only very subtle changes to the input; though, for us as humans it still
clearly looks like a stop sign (Eykholt et al, 2018). Examples like these illustrate
how machine learning models can dramatically fail in the presence of adversarial
perturbations. Consequently, adopting machine learning for safety-critical or sci-
entific application domains is still problematic. To address this shortcoming, many
researchers have started to analyze the robustness of models in domains like images,
natural language, or speech. Only recently, however, GNNs have come into focus.
Here, the first work studying GNNs’ robustness (Zügner et al, 2018) investigates
one of the most prominent tasks, node-level classification, and demonstrated the
susceptibility of GNNs to adversarial perturbations as well (see Figure 8.1). Since
then, the field of adversarial robustness on graphs has been rapidly expanding, with
many works studying diverse tasks and models, and exploring ways to make GNNs
more robust.
To some degree it is surprising that graphs were not in the focus even earlier.
Corrupted data and adversaries are common in many domains where graphs are
analyzed, e.g., social media and e-commerce systems. Take for example a GNN-
based model for detectingNode fake news in a social network (Monti et al, 2019; Shu et al,
2020). Adversaries have a strong incentive to fool the system in order to avoid being
classification
detected. Similarly, in credit scoring systems, fraudsters try to disguise themselves
by creating fake connections. Thus, robustness is an important concern for graph-
based learning.
It is important to highlight, though, that adversarial robustness is not only a topic
in light of security concerns, where intentional changes, potentially crafted by hu-
mans, are used to try to fool the predictions. Instead, adversarial robustness con-
siders worst-case scenarios in general. Especially in safety-critical or scientific ap-
8 Graph Neural Networks: Adversarial Robustness 151
Here F(G ) denotes the set of all graphs we are treating as indistinguishable to the
given graph G at hand, and Gˆ denotes a specific perturbed graph from this set. For
example, F(G ) could capture all graphs which differ from G in at most ten edges
or in a few node attributes. The attacker’s goal is to find a graph Gˆ that, when
passed through the GNN fq ⇤ , maximizes a specific objective Oatk , e.g., increasing
the predicated probability of a certain class for a specific node. Importantly, in a
poisoning setting, the weights q ⇤ of the GNN are not fixed but learned based on
the perturbed data, leading to the inner optimization problem that corresponds to
the usual training procedure on the (now perturbed) graph. That is, q ⇤ is obtained
1 Again it is worth highlighting that such ‘attacks’ are not always due to human adversaries. Thus,
the terms ‘change’ or ‘perturbation’ might be better suited and have a more neutral connotation.
8 Graph Neural Networks: Adversarial Robustness 153
by minimizing some training loss Ltrain on the graph Gˆ. This nested optimization
makes the problem specifically hard.
To define an evasion attack, the above equation can simply be changed by assum-
ing the parameter q ⇤ to be fixed. Often it is assumed to be given by minimizing the
training loss w.r.t. the given graph G (i.e. q ⇤ = arg minq Ltrain ( fq (G ))). This makes
the above scenario a single-level optimization problem.
This general form of an attack enables us to provide a categorization along dif-
ferent aspects and illustrates the space to explore for robustness characteristics of
GNNs in general. While this taxonomy is general, for ease of understanding, it helps
to think about an intentional attacker.
What changes are allowed to the original graph? What do we expect the perturba-
tions to look like? For example, do we want to understand how deleting a few edges
influences the prediction? The space of perturbations under consideration is mod-
eled via F(G ). It intuitively represents the attacker’s capabilities; what and how
much they are able to manipulate. The complexity of the perturbation space for
graphs represents one of the biggest differences to classical robustness studies and
stretches along two dimensions.
(1) What can be changed? Unique to the graph domain are perturbations of the
graph structure. In this regard, most publications have studied the scenarios of re-
moving or adding edges to the graph (Dai et al, 2018a; Wang and Gong, 2019;
Zügner et al, 2018; Zügner and Günnemann, 2019; Bojchevski and Günnemann,
2019; Zhang et al, 2019e; Zügner et al, 2018; Tang et al, 2020b; Chen et al, 2020f;
Chang et al, 2020b; Ma et al, 2020b; Geisler et al, 2021). Focusing on the node level,
some works (Wang et al, 2020c; Sun et al, 2020d; Geisler et al, 2021) have consid-
ered adding or removing nodes from the graph. Beyond the graph structure, GNN
robustness has also been explored for changes to the node attributes (Zügner et al,
2018; Wu et al, 2019b; Takahashi, 2019) and the labels used in semi-supervised
node classification (Zhang et al, 2020b).
An intriguing aspect of graphs is to investigate how the interdepenence of in-
stances plays a role in robustness. Due to the message passing scheme, changes to
one node might affect (potentially many) other nodes. Often, for example, a node’s
prediction depends on its k-hop neighborhood, intuitively representing the node’s
receptive field. Thus, it is not only important what type of change can be performed
but also where in the graph this can happen. Consider for example Figure 8.1: to
analyze whether the prediction for the highlighted node can change, we are not lim-
ited to perturbing the node’s own attributes and its incident edges but we can also
achieve our aim by perturbing other nodes. Indeed, this reflects real world scenarios
much better since it is likely that an attacker has access to a few nodes only, and
not to the entire data or the target node itself. Put simply, we also have to consider
which nodes can be perturbed. Multiple works (Zügner et al, 2018; Zhang et al,
2019e; Takahashi, 2019) investigate what they call indirect attacks (or sometimes
influencer attacks), specifically analyzing how an individual node’s prediction can
change when only perturbing other parts of the graph while leaving the target node
untouched.
(2) How much can be changed? Typically, adversarial examples are designed to
be nearly indistinguishable to the original input, e.g., changing the pixel values of an
image so that it stays visually the same. Unlike image data, where this can easily be
verified by manual inspection, this is much more challenging in the graph setting.
8 Graph Neural Networks: Adversarial Robustness 155
Technically, the set of perturbations can be defined based on any graph distance
function D measuring the (dis)similarity between graphs. All graphs similar to the
given graph G then define the set F(G ) = {Gˆ 2 G | D(G , Gˆ) D }, where G denotes
the space of all potential graphs and D the largest acceptable distance.
Defining what are suitable graph distance functions is in itself a challenging
task. Beyond that, computing these distances and using them within the optimiza-
tion problem of Eq. equation 8.1 might be computationally intractable (think, e.g.,
about the graph edit distance which itself is NP-hard to compute). Therefore, exist-
ing works have mainly focused on so called budget constraints, limiting the number
of changes allowed to be performed. Technically, such budgets correspond to the
L0 pseudo-norm between the clean and perturbed data, e.g., relating to the graphs’
adjacency matrix A or its node attributes X.3 To enable more fine-grained control,
often such budget constraints are used locally per node (e.g., limiting the maximal
number of edge deletions per node; Diloc ) as well as globally (e.g., limiting the over-
all number of edge deletions; D glob ). For example
3 This is a similar approach to image data, where often we take a certain radius as measured by,
e.g., an L p norm around the original input as the allowed perturbation set, assuming that for small
radii the semantic meaning of the input does not change.
156 Stephan Günnemann
the ground-truth labels of the target node(s) could additionally be hidden from the
attacker. The knowledge about the model includes many aspects such as knowledge
about the used GNN architecture, the model’s weights, or whether only the output
predictions or the gradients are known. Given all these variations, the most common
ones are white-box settings, where full information is available, and black-box set-
tings, which usually mean that only the graph and potentially the predicted outputs
are available.
Among the three aspects above, the attacker’s knowledge seems to be the one
which most strongly links to human-like adversaries. It should be highlighted,
though, that worst-case perturbations in general are best reflected by the fully white-
box setting, making it the preferred choice for strong robustness results. If a model
performs robustly in a white-box setting, it will also be robust under the limited
scenarios. Moreover, as we will see in Section 8.2.2.1, the transferability of attacks
implies that knowledge about the model is not really required.
Besides the above categorization that focuses on the properties of the attack, an-
other, more technical, view can be taken by considering the algorithmic approach
how the (bi-level) optimization problem is solved. In the discussion of the pertur-
bation space we have seen that graph perturbations often relate to the addition/re-
moval of edges or nodes — these are discrete decisions, making Eq. equation 8.1 a
discrete optimization problem. This is in stark contrast to other data domains where
infinitesimal changes are possible. Thus besides adapting gradient-based approxi-
mations, various other techniques can be used to tackle Eq. equation 8.1 for GNNs
such as reinforcement learning (Sun et al, 2020d; Dai et al, 2018a) or spectral ap-
proximations (Bojchevski and Günnemann, 2019; Chang et al, 2020b). Moreover,
the attacker’s knowledge has also implications on the algorithmic choice. In a black-
box setting where, e.g., only the input and output are observed, we cannot use the
true GNN fq to compute gradients but have to use other principles like first learning
some surrogate model.
The above categorization shows that various kinds of adversarial perturbations under
different scenarios can be investigated. Summarizing the different results obtained
in the literature so far, the trend is clear: standard GNNs trained in the standard way
are not robust. In the following, we given an overview of some key insights.
Figure 8.2 illustrates one of the results of the method Nettack as introduced in
(Zügner et al, 2018). Here, local attacks in an evasion setting focusing on graph
structure perturbations are analyzed for a GCN (Kipf and Welling, 2017b). The
figure shows the classification margin, i.e., the difference between the predicted
8 Graph Neural Networks: Adversarial Robustness 157
Classification margin
1.0
model and the Cora ML data
with the Nettack (Zügner et al, 0.5
misclassified
below the dashed line it is
nodes
0.5
misclassified w.r.t. the ground
truth label. As shown, almost 1.0
any node’s prediction can be Original Nettack Nettack-In. Nettack Nettack-In.
changed. Graph Budget = bd/2c Budget =d
probability of the node’s true class minus the one of the second highest class. The
left column shows the results for the unperturbed graph where most nodes are cor-
rectly classified as illustrated by the predominantly positive classification margin.
The second column shows the result after perturbing the graph based on the pertur-
bation found by Nettack using a global budget of D = bdv /2c and making sure that
no singletons occur where dv is the degree of the node v under attack. Clearly, the
GCN model is not robust: almost every node’s prediction can be changed. Moreover,
the third column shows the impact of indirect attacks. Recall that in these scenarios
the performed perturbations cannot happen at the node we aim to misclassify. Even
in this setting, a large fraction of nodes is vulnerable. The last two columns show
results for an increased budget of D = dv . Not surprisingly, the impact of the attack
becomes even more pronounced.
Considering global attacks in the poisoning setting similar behavior can be ob-
served. For example, when studying the effect of node additions, the work (Sun et al,
2020d) reports a relative drop in accuracy by up to 7 percentage points with a bud-
get of 1% of additional nodes, without changing the connectivity between existing
nodes. For changes to the edge structure, the work (Zügner and Günnemann, 2019)
reports performance drops on the test sets by around 6 to 16 percentage points when
perturbing 5% of the edges. Noteworthy, on one dataset, these perturbations lead to
a GNN obtaining worse performance than a logistic regression baseline operating
only on the node attributes, i.e., ignoring the graph altogether becomes the better
choice.
The following observation from (Zügner and Günnemann, 2019) is important
to highlight: One core factor for the obtained lower performance on the perturbed
graphs are indeed the learned GNN weights. When using the weights qGˆ trained on
the perturbed graph Gˆ obtained by the poisoning attack, not only the performance
on Gˆ is low but even the performance on the unperturbed graph G suffers dramat-
ically. Likewise, when applying weights qG trained on the unperturbed graph G to
the graph Gˆ, the classification accuracy barely changes. Thus, the poisoning attack
performed in (Zügner and Günnemann, 2019) indeed derails the training procedure,
i.e., leads to ‘bad’ weights. This result emphasizes the importance of the training
procedure for the performance of graph models. If we are able to find appropriate
weights, even perturbed data might be handled more robustly. We will encounter
this aspect again in Section 8.4.2.
158 Stephan Günnemann
Figure 8.3 compares the distribution of such a property (e.g. node degree) when
considering all nodes of the unperturbed graph with the distribution of the prop-
erty when considering only the nodes incident to the inserted/removed adversarial
edges. The comparison indicates a statistically significant difference between the
distributions. For example, in Figure 8.3 (left) we can see that the Nettack method
tends to connect a target node to low-degree nodes. This could be due to the degree-
normalization performed in GCN, where low-degree nodes have a higher weight
(i.e., influence) on the aggregation of a node. Likewise, considering nodes incident
to edges removed by the adversary we can observe that the Nettack method tends
to disconnect high-degree nodes from the target node. In Figure 8.3 (second and
third plot) we can see that the attack tends to connect the target node with peripheral
nodes, as evidenced by small two-hop neighborhood size and low closeness cen-
trality of the adversarially connected nodes. In Figure 8.3 (right) we can see that
the adversary tends to connect a target node to other nodes which have dissimilar
attributes. As also shown in other works, the adversary appears to try to counter the
homophily property in the graph – which is not surprising, since the GNN has likely
learned to partly infer a node’s class based on its neighbors.
To understand whether such detected patterns are universal, they can be used
to design attack principles itself — indeed, this even leads to black-box attacks
since the analyzed properties usually relate to the graph only and not the GNN. In
(Zügner et al, 2020) a prediction model was learned estimating the potential impact
of a perturbation on unseen graphs using the above mentioned properties as input
features. While this often resulted in finding effective adversarial perturbations, thus,
highlighting the generality of the regularities uncovered, the attack performance
was not on par with the original Nettack attack. Similarly, in (Ma et al, 2020b)
PageRank-like scores have been used to identify potential harmful perturbations.
The aspects along which adversarial attacks on graphs can be studied allow for a
huge variety of scenarios. Only a few of them have been thoroughly investigated
in the literature. One important aspect to consider, for example, is that in real ap-
plications the cost of perturbations differ: while changing node attributes might be
relatively easy, injecting edges might be harder. Thus, designing improved pertur-
bation spaces can make the attack scenarios more realistic and better captures the
robustness properties one might want to ensure. Moreover, many different data do-
mains such as knowledge graphs or temporal graphs need to be investigated.
Importantly, while first steps have been made to understand the patterns that
makes these perturbations harmful, a clear understanding with a sound theoretical
backing is still missing. In this regard, it is also worth repeating that all these studies
have focused on analyzing perturbations obtained by Nettack; other attacks might
potentially lead to very different patterns. This also implies that exploiting the re-
sulting patterns to design more robust GNNs (see Section 8.4.1) is not necessarily
160 Stephan Günnemann
a good solution. Moreover, finding reliable patterns also requires more research on
how to compute adversarial perturbations in a scalable way (Wang and Gong, 2019;
Geisler et al, 2021), since such patterns might be more pronounced on larger graphs.
Model-specific certificates are designed for a specific class of GNN models (e.g., 2-
layer GCNs) and a specific task such as node-level classification. A common theme
is to phrase certification as a constrained optimization problem: Recall that in a
classification task, the final prediction is usually obtained by taking the class with
the largest predicted probability or logit. Let c⇤ = arg maxc2C fq (G )c denote the
predicted class4 obtained on the unperturbed graph G , where C is the set of classes
and fq (G )c denotes the logit obtained for class c. This specifically implies, that the
margin fq (G )c⇤ fq (G )c between class c⇤ and any other class c is positive.
A particularly useful quantity for robustness certification is the worst-case mar-
gin, i.e., the smallest margin possible under any perturbed data Gˆ:
4 This could either be the predicted class for a specific target node v in case of node-level classi-
fication; or for the entire graph in case of graph-level classification. We drop the dependency on v
since it is not relevant for the discussion. For simplicity, we assume the maximizer c⇤ to be unique.
prediction/logit for class #
Fig. 8.4: Obtaining robustness certificates via the worst-case margin: The predic-
tion obtained from the unperturbed graph Gi is illustrated with a cross, while the
predictions for the perturbed graphs F(Gi ) are illustrated around it. The worst-case
margin measures the shortest distance to the decision boundary. If it is positive (see
G1 ), all predictions are on the same side of the boundary; robustness holds. If it is
negative (see G2 ), some predictions cross the decision boundary; the class prediction
will change under perturbations, meaning the model is not robust. When using lower
bounds — the shaded regions in the figure — robustness is ensured for positive val-
ues (see G1 ) since the exact worst-case margin can only be larger. If the lower bound
becomes negative, no statement can be made (see G2 and G3 ; robustness unknown).
Both G2 and G3 have a negative lower bound, while the (not tractable to compute)
exact worst-case margin differs in sign.
If this term is positive, c can never be the predicted class for node v. And if the
worst-case margin m̂(c⇤ , c) stays positive for all c 6= c⇤ , the prediction is certifiably
robust since the logit for class c⇤ is always the largest – for all perturbed graphs in
F(G ). This idea is illustrated in Figure 8.4.
As shown, obtaining a certificate means solving the (constrained) optimization
problem in Eq. equation 8.3 for every class c. Not surprisingly, however, solving
this optimization problem is usually intractable – for similar reasons as computing
adversarial attacks is hard. So how can we obtain certificates? Just heuristically
solving Eq. equation 8.3 is not helpful since we aim for guarantees.
The core idea is to obtain tractable lower bounds on the worst-case margin. That is,
we aim to find functions m̂LB that ensure m̂LB (c⇤ , c) m̂(c⇤ , c) and are more effi-
cient to compute. One solution is to consider relaxations of the original constrained
minimization problem, replacing, for example, the model’s nonlinearities and hard
discreteness constraints via their convex relaxation. For example, instead of requir-
ing that an edge is perturbed or not, indicated by the variables e 2 {0, 1}, we can
use e 2 [0, 1]. Intuitively, using such relaxations leads to supersets of the actually
reachable predictions, as visualized in Figure 8.4 with the shaded regions.
162 Stephan Günnemann
Overall, if the lower bound m̂LB stays positive, the robustness certificate holds —
since m̂ is positive by transitivity as well. This is shown in Figure 8.4 for graph G1 .
If m̂LB is negative, no statement can be made since it is only a lower bound of the
original worst-case margin m̂, which thus can be positive or negative. Compare the
two graphs G2 and G3 in Figure 8.4: While both have a negative lower bound (i.e.,
both shaded regions cross the decision boundary), their actual worst-case margins m̂
differ. Only for graph G2 the actually reachable predictions (which are not efficiently
computable) also cross the decision boundary. Thus, if the lower bound is negative,
the actual robustness remains unknown – similar to an unsuccessful attack, where
it remains unclear whether the model is actually non-robust or the attack simply
not strong enough. Therefore, besides being efficient to compute, the function m̂LB
should be as close as possible to m̂ to avoid cases where no answer can be given
despite the model being robust.
The above idea, using convex relaxations of the model’s nonlinearities and the
admissible perturbations, is used in the works (Zügner and Günnemann, 2019;
Zügner and Günnemann, 2020) for the class of GCNs and node-level classification.
In (Zügner and Günnemann, 2019), the authors consider perturbations to the node
attributes and obtain lower bounds via a relaxation to a linear program. The work
(Zügner and Günnemann, 2020) considers perturbations in the form of edge dele-
tions and reduces the problem to a jointly constrained bilinear program. Similarly,
also using convex relaxations, Jin et al (2020a) has proposed certificates for graph-
level classification under edge perturbations using GCNs. Beyond GCNs, model-
specific certificates for edge perturbations have also been devised for the class of
GNNs using PageRank diffusion (Bojchevski and Günnemann, 2019), which in-
cludes label/feature propagation and (A)PPNP (Klicpera et al, 2019a). The core idea
of (Bojchevski and Günnemann, 2019) is to treat the problem as a PageRank opti-
mization task which subsequently can be expressed as a Markov decision process.
Using this connection one can indeed show that in scenarios where only local bud-
gets are used (see Section 8.2; Eq. equation 8.2) the derived certificates are exact,
i.e., no lower bound, while we can still compute them in polynomial time w.r.t. the
graph size. In general, all models above consider local and global budget constraints
on the number of changes.
Besides providing certificates, being able to efficiently compute (a differentiable
lower bound on) the worst-case margin as in Eq. equation 8.3 also enables to im-
prove GNN robustness by incorporating the margin during training, i.e. aiming to
make it positive for all nodes. We will discuss this in detail in Section 8.4.2.
Overall, a strong advantage of model-specific certificates is their explicit consid-
eration of the GNN model structure within the margin computation. However, the
white-box nature of these certificates is simultaneously their limitation: The pro-
posed certificates capture only a subset of the existing GNN models and any GNN
yet to be developed likely requires a new certification technique as well. This limi-
tation is tackled by model-agnostic certificates.
8 Graph Neural Networks: Adversarial Robustness 163
In other words, g(G ) returns the most likely class obtained by first randomly per-
turbing the graph G using t and then classifying the resulting graphs t(G ) with the
base classifier f .
As in Section 8.3.1, the goal is to assess whether the prediction does not change
under perturbations: denoting with c⇤ = g(G ) the class predicted by the smoothed
classifier on G , we want g(Gˆ) = c⇤ for all Gˆ 2 F(G ). Considering for simplicity the
case of binary classification, this is equivalent to ensure that Pr( f (t(Gˆ)) = c⇤ ) 0.5
for all Gˆ 2 F(G ); or short: minGˆ2F(G ) Pr( f (t(Gˆ)) = c⇤ ) 0.5.
Since, unsurprisingly, the term is intractable to compute, we refer again to a lower
bound to obtain the certificate:
Here, H f is the set of all classifiers sharing some properties with f , e.g., often
that the smoothed classifier based on h and f would return the same probability for
G , i.e., Pr(h(t(G )) = c⇤ ) = Pr( f (t(G )) = c⇤ ). Since f 2 H f , the inequality holds
164 Stephan Günnemann
trivially. Accordingly, if the left hand side of Eq. equation 8.5 is larger than 0.5,
also the right hand side is guaranteed to be so, implying that G would be certifiably
robust.
What does Eq. equation 8.5 intuitively mean? It aims to find a base classifier h
which minimizes the probability that the perturbed sample Gˆ is assigned to class c⇤ .
Thus, h represents a kind of worst-case base classifier which, when used within the
smoothed classifier, tries to obtain a different prediction for Gˆ. If even this worst-
case base classifier leads to certifiable robustness (left hand side of Eq. equation 8.5
larger than 0.5), then surely the actual base classifier at hand has well.
The most important part to make this all useful, however, is the following: given
a set of classifiers H f , finding the worst-case classifier h and minimizing over the
perturbation model F(G ) is often tractable. In some cases, the optima can even
be calculated in closed-form. This shows some interesting relation to the previous
section: There, the intractable minimization over F(G ) in Eq. equation 8.3 was re-
placed by some tractable lower bound, e.g., via relaxations. Now, by finding a worst-
case classifier h we not only obtain a lower bound but minimization over F(G )
becomes often also immediately tractable. Note, however, that in Section 8.3.1 we
obtain a certificate for the base classifier f , while here we obtain a certificate for the
smoothed classifier g.
As said, given a set of classifiers H f , finding the worst-case classifier h and min-
imizing over the perturbation model F(G ) is often tractable. The main compu-
tational challenge in practice lies in determining H f . Let’s consider our previ-
ous example where we enforced all classifiers h to ensure Pr(h(t(G )) = c⇤ ) =
Pr( f (t(G )) = c⇤ ). To determine H f , one needs to compute Pr( f (t(G )) = c⇤ ).
Clearly, doing this exactly is again usually intractable. Instead, the probability can
be estimated using sampling. To ensure a tight approximation, the base classifier has
to be fed a large number of samples from the smoothing distribution. This becomes
increasingly expensive as the size and complexity of the GNN model increases.
Furthermore, the resulting estimates only hold with a certain probability. Accord-
ingly, also the derived guarantees have the same probability, i.e., one obtains only
probabilistic robustness certificates. Despite these practical limitations, randomized
smoothing has become widely popular, as it is often still more efficient than model-
specific certificates.
This general idea of model-agnostic certificates has been investigated for discrete
data in (Lee et al, 2019a; Dvijotham et al, 2020; Bojchevski et al, 2020a; Jia et al,
2020), with the latter two focusing also on graph-related tasks. In (Jia et al, 2020),
the authors investigate the robustness of community detection. In (Bojchevski et al,
2020a), the main focus is on node-level and graph-level classification w.r.t. graph
structure and/or attribute perturbations under global budget constraints. Specifically,
Bojchevski et al (2020a) overcomes critical limitations of the other approaches in
two regards: it explicitly accounts for sparsity in the data as present in many graphs,
8 Graph Neural Networks: Adversarial Robustness 165
As we have established, standard GNNs trained in the usual way are not robust to
even small changes to the graph, thus, using them in sensitive and critical applica-
tions might be risky. Certificates can provide us guarantees about their performance.
166 Stephan Günnemann
Interestingly, even before the rise of graph neural networks, such joint approaches
have been investigated, e.g., in (Bojchevski et al, 2017) to improve the robustness of
spectral embeddings. For GNNs, such graph structure learning has been proposed in
(Jin et al, 2020e; Luo et al, 2021) where certain properties like low-rank graph struc-
ture and attribute similarity are used to define how the clean graph should preferably
look like.
As discussed in Section 8.2.2, one further reason for the non-robustness of GNNs
are the parameters/weights learned during training. Weights resulting from standard
training often lead to models that do not generalize well to slightly perturbed data.
This is illustrated in Figure 8.5 with the orange/solid decision boundary. Note that
the figure shows the input space, i.e., the space of all graphs G; this is in contrast to
Figure 8.4 which shows the predicted probabilities. If we were able to improve our
training procedure to find ‘better’ parameters – taking into account that the data is or
might become potentially perturbed – the robustness of our model would improve
as well. This is illustrated in Figure 8.5 with the blue/dashed decision boundary.
There, all perturbed graphs from F1 (G ) get the same prediction. As seen, in this
regard robustness links to the generalization performance of prediction models in
general.
Robust training refers to training procedures that aim at producing models that are
robust to adversarial (and/or other) perturbations. The common theme is to optimize
a worst-case loss (also called robust loss), i.e. the loss achieved under the worst-case
perturbation. Technically, the training objective becomes:
168 Stephan Günnemann
where fq is the GNN with its trainable weights. As shown, we do not evaluate the
loss at the unperturbed graph but instead use the loss achieved in the worst case
(compare this to the standard training where we simply minimize Ltrain ( fq (G ))).
The weights are steered to obtain low loss under these worst scenarios as well, thus
obtaining better generalization.
Not surprisingly, solving Eq. equation 8.6 is usually not tractable for the same
reasons as finding attacks and certificates is hard: we have to solve a discrete, highly
complex (minmax) optimization problem. In particular, for training, e.g., via gradi-
ent based approaches, we also need to compute the gradient w.r.t. the inner maxi-
mization. Thus, for feasibility, one usually has to refer to various surrogate objec-
tives, substituting the worst-case loss and the resulting gradient by simpler ones.
In this regard, the most naı̈ve approach is to randomly draw samples from the pertur-
bation set F(G ) during each training iteration. That is, during training the loss and
the gradient are computed w.r.t. these randomly perturbed samples; with different
samples drawn in each training iteration. If the perturbation set, for example, con-
tains graphs where up to x edge deletions are admissible, we would randomly create
graphs with up to x edges dropped out. Such edge dropout has been analyzed in
various works but does not improve adversarial robustness substantially (Dai et al,
2018a; Zügner and Günnemann, 2020); a possible explanation is that the random
samples simply do not represent the worst-case perturbations well.
Thus, more common is the approach of adversarial training (Xu et al, 2019c;
Feng et al, 2019a; Chen et al, 2020i). Here, we do not randomly sample from the
perturbation set, but in each training iteration we create adversarial examples Gˆ and
subsequently compute the gradient w.r.t. these. As these samples are expected to
lead to a higher loss, the result of the inner max-operation in Eq. equation 8.6 is
much better approximated. Instead of perturbing the input graph, the work (Jin and
Zhang, 2019) has investigated a robust training scheme which perturbs the latent
embeddings.
It is interesting to note that adversarial training in its standard form requires la-
beled data since the attack aims to steer towards an incorrect prediction. In the typi-
cal transductive graph-learning tasks, however, large amounts of unlabeled data are
available. As a solution, virtual adversarial training has also been investigated (Deng
et al, 2019; Sun et al, 2020d), operating on the unlabeled data as well. Intuitively,
it treats the currently obtained predictions on the unperturbed graph as the ground
truth, making it a kind of self-supervised learning. The predictions on the perturbed
data should not deviate from the clean predictions, thus enforcing smoothness.
Using (virtual) adversarial training has empirically shown some improvements
in robustness, but not consistently. In particular, to well approximate the max term
in the robust loss of Eq. equation 8.6, we need powerful adversarial attacks, which
8 Graph Neural Networks: Adversarial Robustness 169
are typically costly to compute for graphs (see Section 8.2). Since here attacks need
to be computed in every training iteration, the training process is slowed down sub-
stantially.
At the end of the day, the techniques above perform a costly data augmentation dur-
ing training, i.e., they use altered versions of the graph. Besides being computation-
ally expensive, there is no guarantee that the adversarial examples are indeed good
proxies for the max term in Eq. equation 8.6. An alternative approach, e.g., followed
by (Zügner and Günnemann, 2019; Bojchevski and Günnemann, 2019) relies on the
idea of certification as discussed previously. Recall that these techniques compute a
lower bound m̂LB on the worst-case margin. If it is positive, the prediction is robust
for this node/graph. Thus, the lower bound itself acts like a robustness loss Lrob , for
example instantiated as a hinge loss: max(0, d m̂LB ). If the lower-bound is above
d , then the loss is zero; if it is smaller, a penalty occurs. Combining this loss func-
tion with, e.g., the usual cross-entropy loss, forces the model not only to obtain good
classification performance but also robustness.
Crucially, Lrob and, thus, the lower bound need to be differentiable since we need
to compute gradients for training. This, indeed, might be challenging since usually
the lower bound itself is still an optimization problem. While in some special cases
the optimization problem is directly differentiable (Bojchevski and Günnemann,
2019), another general idea is to relate to the principle of duality. Recall that the
worst-case margin m̂ (or a potential corresponding lower bound m̂LB ) is the result of
a (primal) minimization problem (see Eq. equation 8.3). Based on the principle of
duality, the result of the dual maximization problem provides, as required, a lower
bound to this value. Even more, any feasible solution of the dual problem provides
a lower bound on the optimal solution. Thus, we actually do not need to solve the
dual program. Instead, it is sufficient to compute the objective function of the dual at
any single feasible point to obtain an (even lower, thus looser) lower bound; no op-
timization is required and computing gradients often becomes straightforward. This
principle of duality has been used in (Zügner and Günnemann, 2019) to perform
robust training in an efficient way.
Robust training is not the only way to obtain ‘better’ GNN weights. In (Tang
et al, 2020b), for example, the idea of transfer learning (besides further architecture
changes; see next section) is exploited. Instead of purely training on a perturbed
target graph, the method adopts clean graphs with artificially injected perturbations
to first learn suitable GNN weights. These weights are later transferred and fine-
tuned to the actual graph at hand. The work (Chen et al, 2020i) exploits smoothing
distillation where one trains on predicted soft labels instead of ground-truth labels
170 Stephan Günnemann
to enhance robustness. The work (Jin et al, 2019b) argues that graph powering en-
hances robustness and proposes to minimize the loss not only on the original graph
but on a set of graphs consisting of the different graph powers. Lastly, the authors
of (You et al, 2021) use a contrastive learning framework using different (graph)
data augmentations. Albeit adversarial robustness is not their focus, they report in-
creased adversarial robustness against the attacks of (Dai et al, 2018a). In general,
changing the loss function or regularization terms leads to different training, though
the effects on robustness for GNNs are not fully understood yet.
Inspired by the idea of graph cleaning as discussed before, a natural idea is to en-
hance the GNN by mechanisms to reduce the impact of perturbed edges. An obvious
choice for this are edge attention principles. However, it is a false conclusion to as-
sume that standard attention-based GNNs like GAT are immediately suitable for
this task. Indeed, as shown in (Tang et al, 2020b; Zhu et al, 2019a) such models are
non-robust. The problem is that these models still assume clean data to be given;
they are not aware that the graph might be perturbed.
Thus, other attention approaches try to incorporate more information in the pro-
cess. In (Tang et al, 2020b) the attention mechanism is enhanced by taking clean
graphs into account for which perturbations have been artificially injected. Since
now ground truth information is available (i.e., which edges are harmful), the atten-
tion can try to learn down-weighing these while retaining the non-perturbed ones.
An alternative idea is used in (Zhu et al, 2019a). Here, the representations of each
node in each layer are no longer represented as vectors but as Gaussian distribution.
They hypothesize that attacked nodes tend to have large variances, thus using this
information within the attention scores. Further attention mechanism considering,
e.g., the model and data uncertainty or the neighboring nodes’ similarity have been
proposed in (Feng et al, 2021; Zhang and Zitnik, 2020).
An alternative to edge attention is to enhance the aggregation used in message
passing. In a GNN message passing step, a node’s embedding is updated by aggre-
gating over its neighbors’ embeddings. In this regard, adversarially inserted edges
add additional data points to the aggregation and therefore perturb the output of the
message passing step. Aggregation functions such as sum, weighted mean, or the
8 Graph Neural Networks: Adversarial Robustness 171
max operation used in standard GNNs can be arbitrarily distorted by only a single
outlier. Thus, inspired by the principle of robust statistics, the work (Geisler et al,
2020) proposes to replace the usual GNN’s aggregation function with a differen-
tiable version of the Medoid, a provably robust aggregation operation. The idea of
enhancing the robustness of the aggregation function used during message passing
has further been investigated in (Wang et al, 2020o; Zhang and Lu, 2020).
Overall, all these methods down-weight the relevance of edges, with one cru-
cial difference to the methods discussed in Section 8.4.1: they are adaptive in the
sense that the relevance of each edge might vary between, e.g., the different lay-
ers of the GNN. Thus, an edge might be excluded/down-weighted in the first layer
but included in the second one, depending on the learned intermediate represen-
tation. This allows a more fine-grained handling of perturbations. In contrast, the
approaches in Section 8.4.1 derive a single cleaned graph that is used in the entire
GNN.
Many further ideas to improve robustness have been proposed, which do not all en-
tirely fit into the before mentioned categories. For example, in (Shanthamallu et al,
2021) a surrogate classifier is trained which does not access the graph structure but
is aimed to be aligned with the predictions of the GNN, both being jointly trained.
Since the final predictor is not using the graph but only the node’s attributes, higher
robustness to structure perturbations is hypothesized. The work (Miller et al, 2019)
proposes to select the training data in specific ways to increase robustness, and Wu
et al (2020d) uses the principle of information bottleneck, an information theoretic
approach to learn representations balancing expressiveness and robustness. Finally,
also randomized smoothing (Section 8.3.2) can be interpreted as a technique to im-
prove adversarial robustness by using an ensemble of predictors on randomized in-
puts.
erty that diminishes the effect of robust training or whether the generated adversarial
perturbations are not capturing the worst-case; showcasing again the hardness of the
problem. This might also explain why the majority of works have focused on prin-
ciples of weighting/filtering out edges.
In this regard, it is again important to remember that all approaches are typi-
cally designed with a specific perturbation model F(G ) in mind. Indeed, down-
weighting/filtering edges implicitly assumes that adversarial edges had been added
to the graph. Adversarial edge deletions, in contrast, would require to identify po-
tential edges to (re)add. This quickly becomes intractable due to the large number of
possible edges and has not been investigated so far. Moreover, only a few methods
so far have provided theoretical guarantees on the methods’ robustness behavior.
Progress in the field of GNN robustness requires sound evaluation of the proposed
techniques. Importantly, we have to be aware of the potential trade-off between
prediction performance (e.g., accuracy) and robustness. For example, we can easily
obtain a highly robust classification model by simply always predicting the same
class. Clearly, such a model has no use at all. Thus, the evaluation always involves
two aspects: (1) Evaluation of the prediction performance. For this, one can simply
refer to the established evaluation metrics such as accuracy, precision, recall, or
similar, as known for the various supervised and unsupervised learning tasks. (2)
Evaluation of the robustness performance.
Perturbation set and radius. Regarding the latter, the first noteworthy point is that
robustness always links to a specific perturbation set F(.) that defines the perturba-
tions the model should be robust to. To enable a proper evaluation, existing works
therefore usually define some parametric form of the perturbation set, e.g., denoted
Fr (G ) where r is the maximal number of changes – the budget – we are allowed to
perform (e.g., maximal number of edges to add). The variable r is often referred to
as the radius. This is because the budget usually coincides with a certain maximal
norm/distance we are willing to accept between graph G and perturbed ones. A gen-
eralization of the above form to consider multiple budgets/radii is straightforward.
Varying the radius enables us to analyze the robustness behavior of the models in de-
tail. Depending on the radius, different robustness results are expected. Specifically,
for a large radius low robustness is expected – or even desired – and accordingly,
the evaluation should also include these cases showing the limits of the models.
Recall that using the methods discussed in Section 8.2 and Section 8.3 together,
we are able to obtain one of the following answers about a prediction’s robustness:
(R) It is robust; the certificate holds since, e.g., the lower bound on the margin
is positive. (NR) It is non-robust; we are able to find an adversarial example. (U)
Unknown; no statement possible since, e.g., the lower bound is negative but the
attack was not successful either.
8 Graph Neural Networks: Adversarial Robustness 173
Figure 8.6 shows such an example analysis providing insights about the robust-
ness properties of a GCN in detail. Here, local attacks and certificates are computed
on standard (left) and robustly (right) trained GCNs for the task of node classifica-
tion. As the result shows, robust training indeed increases the robustness of a GCN
with fewer attacks being successful and more nodes being certifiable.
100 100
Non-robust (NR) Non-robust (NR)
% Nodes
% Nodes
50 50 Unknown (U)
Unknown (U)
Robust (R) Robust (R)
0 0
0 10 20 30 0 10 20 30
Radius (Allowed Perturbations) Radius (Allowed Perturbations)
Fig. 8.6: Share of nodes which are provably robust (blue; R), non-robust via ad-
versarial example construction (orange; NR), or whose robustness is unknown
(“gap”; U), for increasing perturbation radii. For a given radius, the shares of
(R)+(NR)+(U)= 100%. Left: Standard training; Right: robust training as pro-
posed in (Zügner and Günnemann, 2019). Citeseer data and perturbations of node
attributes.
It is worth highlighting that case (U) – the white gap in Figure 8.6 – occurs
only due to the algorithmic inability to solve the attack/certificate problems exactly.
Thus, case (U) does not give a clear indication about the GNN’s robustness but rather
about the performance of the attack/certificate.6 Given this set-up, in the following
we distinguish between two evaluation directions, which are reflected in frequently
used measures.
6 A large gap indicates that the attacks/certificates are rather loose. The gap might become smaller
when improved attacks/certificates become available. Thus, attacks/certificates itself can be eval-
uated by analyzing the size of the gap since it shows what the maximal possible improvement
in either direction is (e.g., the true share of robust predictions can never exceed 100%-NR for a
specific radius).
174 Stephan Günnemann
• The attack success rate, measuring how many predictions were successfully
changed by the attack(s). This simply corresponds to the case (NR), the orange
region shown in Fig 8.6. This metric is typically used in combination with local
attacks where for each prediction a different perturbation can be used. Naturally,
the local attacks’ success rate is higher than the overall performance drop due
to the flexibility in picking different perturbations.
• In the case of classification, the classification margin, i.e., the difference be-
tween the predicted probability of the ‘true’ class minus the second-highest
class, and its drop after the attack. See again Figure 8.2 for an example.
The crucial limitation of this evaluation is its dependence on a specific attack
approach. The power of the attack strongly affects the result. Indeed, it can be re-
garded as an optimistic evaluation of robustness since a non-successful attack is
treated as seemingly robust. However, the conclusion is dangerous since a GNN
might only perform well for one type of attack but not another. Thus, the above
metrics rather evaluate the power of the attack but only weakly the robustness of the
model. Interpreting the results has to be done with care. Consequently, when refer-
ring to empirical robustness evaluation, it is imperative to use multiple different and
powerful attack approaches. Indeed, as also discussed in (Tramer et al, 2020), each
robustification principle should come with its own specifically suited attack method
(also called adaptive attack) to showcase its limitations.
are treated as wrong. The certified performance gives a provable lower bound
on the performance of the GNN under any admissible perturbation w.r.t. the
current perturbation set Fr (G ) and the given data.
• Certified radius: While the above metrics assume a fixed Fr (G ), i.e., a fixed
radius r, we can also take another view. For a specific prediction, the largest
radius r⇤ for which the prediction can still be certified as robust is called its
certified radius. Given the certified radius of a single prediction, one can easily
calculate the average certifiable radius over multiple predictions.
Certified ratio
0.8
models using the certificate APPNP
of (Bojchevski et al, 2020a) 0.6 Soft
Medoid
where Fr (G ) consists of edge 0.4 GDC
deletion perturbations. The 0.2
model-agnostic nature of the
certificate allows to compare 0.0
0 2 4 6 8
the robustness across models. Delete radius rd
Figure 8.7 shows the certified ratio for different GNN architectures for the task
of node-classification when perturbing the graph structure. The smoothed classifier
uses 10,000 randomly drawn graphs and the probabilistic certification is based on a
confidence level of a = 0.05 analogously to the set-up in (Geisler et al, 2020). Since
local attacks are considered, the certified ratio is naturally rather low. Still, as shown,
there is a significant difference between the models’ robustness performance.
Provable robustness evaluation provides strong guarantees in the sense that the
evaluation is more pessimistic. E.g. if the certified ratio is high, we know that the
actual GNN can only be better. Note again, however, that we still also implicitly
evaluate the certificate; with new certificates the result might become even better.
Also recall that certificates based on randomized smoothing (Section 8.3.2), eval-
uate the robustness of the smoothed classifier, thus, not providing guarantees for
the base classifier itself. Still, a robust prediction of the smoothed classifier entails
that the base classifier predicts the respective class with a high probability w.r.t. the
randomization scheme.
As it becomes apparent, evaluating robustness is more complex than evaluating
usual prediction performance. To achieve a detailed understanding of the robustness
properties of GNNs it is thus helpful to analyze all aspects introduced above.
8.6 Summary
Along with the increasing relevance of graph neural networks in various application
domains, comes also an increasing demand to ensure their reliability. In this regard,
176 Stephan Günnemann
Acknowledgements