0% found this document useful (0 votes)
3 views14 pages

Can Large Language Models Improve The Adversarial Robustness of Graph Neural Networks?

Uploaded by

Adinath S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

Can Large Language Models Improve The Adversarial Robustness of Graph Neural Networks?

Uploaded by

Adinath S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Can Large Language Models Improve the Adversarial Robustness

of Graph Neural Networks?


Zhongjian Zhang∗ Xiao Wang∗ Huichi Zhou
zhangzj@bupt.edu.cn xiao_wang@buaa.edu.cn huichizhou77@gmail.com
Beijing University of Posts and Beihang University Imperial College London
Telecommunications Beijing, China London, UK
Beijing, China

Yue Yu Mengmei Zhang Cheng Yang


arXiv:2408.08685v1 [cs.LG] 16 Aug 2024

yuyue1218@bupt.edu.cn zhangmengmei@bestpay.com.cn yangcheng@bupt.edu.cn


Beijing University of Posts and China Telecom Bestpay Beijing University of Posts and
Telecommunications Beijing, China Telecommunications
Beijing, China Beijing, China

Chuan Shi†
shichuan@bupt.edu.cn
Beijing University of Posts and
Telecommunications
Beijing, China

ABSTRACT where the perturbation ratio increases to 40%, the accuracy of GNNs
Graph neural networks (GNNs) are vulnerable to adversarial per- is still better than that on the clean graph.
turbations, especially for topology attacks, and many methods that
improve the robustness of GNNs have received considerable atten- KEYWORDS
tion. Recently, we have witnessed the significant success of large graph neural networks, large language models, adversarial robust-
language models (LLMs), leading many to explore the great poten- ness
tial of LLMs on GNNs. However, they mainly focus on improving ACM Reference Format:
the performance of GNNs by utilizing LLMs to enhance the node Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng
features. Therefore, we ask: Will the robustness of GNNs also be Yang, and Chuan Shi. 2024. Can Large Language Models Improve the Ad-
enhanced with the powerful understanding and inference capabilities versarial Robustness of Graph Neural Networks?. In . ACM, New York, NY,
of LLMs? By presenting the empirical results, we find that despite USA, 14 pages. https://doi.org/XXXXXXX.XXXXXXX
that LLMs can improve the robustness of GNNs, there is still an
average decrease of 23.1% in accuracy, implying that the GNNs 1 INTRODUCTION
remain extremely vulnerable against topology attack. Therefore, Graph neural networks (GNNs), as representative graph machine
another question is how to extend the capabilities of LLMs on graph learning methods, effectively utilize their message-passing mecha-
adversarial robustness. In this paper, we propose an LLM-based nism to extract useful information and learn high-quality represen-
robust graph structure inference framework, LLM4RGNN, which tations from graph data [20, 35, 42]. Despite great success, a host
distills the inference capabilities of GPT-4 into a local LLM for iden- of studies have shown that GNNs are vulnerable to adversarial at-
tifying malicious edges and an LM-based edge predictor for finding tacks [18, 23, 29, 33, 40], especially for topology attacks [43, 54, 55],
missing important edges, so as to recover a robust graph structure. where slightly perturbing the graph structure can lead to a dramatic
Extensive experiments demonstrate that LLM4RGNN consistently decrease in the performance. Such vulnerability poses significant
improves the robustness across various GNNs. Even in some cases challenges for applying GNNs to real-world applications, especially
∗ Both authors contributed equally to this research. in security-critical scenarios such as finance networks [38] or med-
† Corresponding author. ical networks [26].
Threatened by adversarial attacks, several attempts have been
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed made to build robust GNNs, which can be mainly divided into model-
for profit or commercial advantage and that copies bear this notice and the full citation centric and data-centric defenses [52]. From the model-centric per-
on the first page. Copyrights for components of this work owned by others than the spective, defenders can improve robustness through model enhance-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission ment, either by robust training schemes [7, 21] or new model ar-
and/or a fee. Request permissions from permissions@acm.org. chitectures [17, 50, 53]. In contrast, data-centric defenses typically
Conference’17, July 2017, Washington, DC, USA focus on flexible data processing to improve the robustness of GNNs.
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM Treating the attacked topology as noisy, defenders primarily purify
https://doi.org/XXXXXXX.XXXXXXX graph structures by calculating various similarities between node
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

embeddings [5, 19, 22, 41, 48]. The above methods have received &OHDQ $WWDFN
considerable attention in enhancing the robustness of GNNs. 2)$/ODPD%
Recently, large language models (LLMs), such as GPT-4 [1], have 2)$6%HUW
*&1/ODPD%
demonstrated expressive capabilities in understanding and inferring 9DQLOOD*&1
complex texts, revolutionizing the fields of natural language pro- 7$3(
cessing [51], computer vision [45] and graph [25]. The performance *&1HODUJH
*&16%HUW
of GNNs can be greatly improved by utilizing LLMs to enhance the          
node features [3, 12, 24]. However, one question remains largely &RUD$FFXUDF\  3XEPHG$FFXUDF\ 
unknown: Considering the powerful understanding and inference Figure 1: The accuracy of different GNNs combining
capabilities of LLMs, will LLMs enhance or weaken the adversarial LLMs/LMs against Mettack with a 20% perturbation rate.
robustness of GNNs to a certain extent? Answering this question not • Extensive experiments demonstrate that LLM4RGNN consis-
only helps explore the potential capabilities of LLMs on graphs, tently improves the robustness of various GNNs against topology
but also provides a new perspective for the adversarial robustness attacks. Even in some cases where the perturbation ratio increases
problem on graphs. to 40%, the accuracy of GNNs with LLM4RGNN is still better than
Here, we empirically investigate the robustness of GNNs combin- that on the clean graph.
ing six LLMs/LMs (language models), namely OFA-Llama2-7B [24], • We utilize GPT-4 to construct an instruction dataset, including
OFA-SBert [24], TAPE [12], GCN-Llama-7B [3], GCN-e5-large [3], GPT-4’s maliciousness assessments and analyses of 26,000 edges.
and GCN-SBert [3], against Mettack [55] with a 20% perturbation This dataset will be publicly released, which can be used to tune
rate on Cora [27] and PubMed [32] datasets. As shown in Figure 1, any other LLMs so that they can have the robust graph structure
the results clearly show that these models suffer from a maximum inference capability as GPT-4.
accuracy decrease of 37.9% and an average of 23.1%, while vanilla
GCN [20] experiences a maximum accuracy decrease of 39.1% and
an average of 35.5%. It demonstrates that these models remain ex-
2 PRELIMINARIES
tremely vulnerable to topology perturbations (more details refer to 2.1 Text-attributed Graphs (TAGs)
Section 3). Consequently, another question naturally arises: How Here, a Text-attributed graph (TAG), defined as G = (V, E, S), is a
to extend the capabilities of LLMs to improve graph adversarial ro- graph with node-level textual information, where V = {𝑣 1, . . . , 𝑣 | V | },
bustness? This problem is non-trivial because graph adversarial E = {𝑒 1, . . . , 𝑒 | E | } and S = {𝑠 1, . . . , 𝑠 | V | } are the node set, edge set,
attacks typically perturb the graph structures, while the capabilities and text set, respectively. The adjacency matrix of the graph G is
of LLMs usually focus on text processing. Considering that graph
denoted as A ∈ R | V | × | V | , where A𝑖 𝑗 = 1 if nodes 𝑣𝑖 and 𝑣 𝑗 are
structures involve complex interactions among a large number of
connected, otherwise A𝑖 𝑗 = 0. In this work, we focus on the node
nodes, how to efficiently explore the inference capabilities of LLMs
classification task on TAGs. Specifically, each node 𝑣𝑖 corresponds
on perturbed structures is a significant challenge.
to a label 𝑦𝑖 that indicates which category the node 𝑣𝑖 belongs
In this paper, we propose an LLM-based robust graph structure
to. Usually, we encode the text set S as the node feature matrix
inference framework, called LLM4RGNN, which efficiently utilizes
X = {x1, . . . , x | V | } via some embedding techniques [3, 11, 28] to
LLMs to purify the perturbed structure, improving the adversarial
train GNNs, where x𝑖 ∈ R𝑑 . Given some labeled nodes V𝐿 ⊂ V, the
robustness. Specifically, based on an open-source and clean graph
goal is training a GNN 𝑓 (A, X) to predict the labels of the remaining
structure, we design a prompt template that enables GPT-4 [1] to
unlabeled nodes V𝑈 = V \ V𝐿 .
infer how malicious an edge is and provide analysis, to construct
an instruction dataset. This dataset is used to fine-tune a local
LLM (e.g., Mistral-7B [16] or Llama3-8B [34]), so that the inference
2.2 Graph Adversarial Robustness
capability of GPT-4 can be distilled into the local LLM. When given This paper primarily focuses on stronger poisoning attacks, which
a new attacked graph structure, we first utilize the local LLM to can lead to an extremely low model performance by directly modi-
identify malicious edges. By treating identification results as edge fying the training data [53, 54]. The formal definition of adversarial
labels, we further distill the inference capability from the local LLM robustness against poisoning attacks is as follows:
to an LM-based edge predictor, to find missing important edges. max min L (𝑓𝜃 (G + 𝛿), y T ), (1)
Finally, purifying the graph structure by removing malicious edges 𝛿 ∈Δ 𝜃
and adding important edges, makes GNNs more robust. where 𝛿 represents a perturbation to the graph G, which may in-
Our contributions can be summarized four-fold: clude perturbations to node features, inserting or deleting of edges,
etc., Δ represents all permitted and effective perturbations. y T is
• To the best of our knowledge, we are the first to explore the the node labels of the target set T . L denotes the training loss of
potential of LLMs on the graph adversarial robustness. Moreover, GNNs, and 𝜃 is the model parameters of 𝑓 . Equation 1 indicates that
we verify the vulnerability of GNNs even with the powerful under the worst-case perturbation 𝛿, the adversarial robustness
understanding and inference capabilities of LLMs. of model 𝑓 is represented by its performance on the target set T .
• We propose a novel LLM-based robust graph structure inference A smaller loss value suggests stronger adversarial robustness, i.e.,
framework, called LLM4RGNN, which efficiently utilizes LLMs to better model performance. In this paper, we primarily focus on the
make GNNs more robust. Additionally, LLM4RGNN is a general robustness under two topology attacks: 1) Targeted attacks [54],
framework, suitable for different LLMs and GNNs. where attackers aim to mislead the model’s prediction on specific
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

(a) Tuning System prompt: In the context of graph neural networks … Given textual information about two
Local LLMs nodes, analyze the relevance of these two nodes … Your response should be formatted in JSON,
Local LLMs
with two keys: "Analysis" for your written analysis and "Relevance Score" for your numerical Mistral
LLM evaluation. User content: Node 𝑣! →{Title, Abstract}. \n\n Node 𝑣" →{Title, Abstract}.
GPT-4 Prompt template Tuning
Instance
Distill nettack Instruction: System prompt from the input of GPT-4.
mettack … Input LLM Output Input: User content from the input of GPT-4.
Local GPT-4 Output: Analysis and Relevance Score from the
LLMs minmax output of GPT-4.
Clean TAPE-Arxiv23 …
(b) Training
Edge Predictor Prob
Local 0.9
Input Local LLMs Output … Encode … Train LM-based Output
LLMs
Distill Mistral Edge Predictor
0.1
Edge
Attacked Cora … … Class
Predictor

(c) Purifying Remove the


Graph Structure Local LLMs malicious edges Frozen
Local Mistral Train GCN Tunable
LLMs
Add the GAT/RGCN Positive
Edge LM-based important edges
Predictor Attacked Cora Edge Predictor Purified Cora
Negative

Figure 2: The framework of LLM4RGNN, which involves three main parts: (a) instruction tuning a local LLM, which distills the
inference capability from GPT-4 into a local LLM for identifying malicious edges; (b) training an LM-based edge predictor,
which further distills the inference capability from the local LLM into LM-based edge predictor for finding missing important
edges; (c) purifying the graph structure by removing malicious edges and adding important edges, making GNNs more robust.

nodes 𝑣 by manipulating the adjacent edges of 𝑣, thus T = 𝑣. 2) suffer from a maximum accuracy decrease of 37.9% and an average
Non-targeted attacks [40, 55], where attackers aim to degrade the decrease of 23.1%, while vanilla GCN [20] suffers from a maximum
overall performance of GNNs but do not care which node is being accuracy decrease of 39.1% and an average accuracy decrease of
targeted, thus T = 𝑉test , where 𝑉test denotes the test set. 35.5%. The results demonstrate that GNNs combining LLMs/LMs
remain extremely vulnerable against topology perturbations.
3 THE ADVERSARIAL ROBUSTNESS OF GNNS
COMBINING LLMS/LMS 4 LLM4RGNN: THE PROPOSED FRAMEWORK
In this section, we propose a novel LLM-based robust graph struc-
In this paper, we empirically investigate whether LLMs enhance
ture inference framework, LLM4RGNN. As shown in Figure 2,
or weaken the adversarial robustness of GNNs to a certain extent.
LLM4RGNN distills the inference capabilities of GPT-4 into a lo-
Specifically, for the Cora [27] and PubMed [32] datasets, based
cal LLM for identifying malicious edges and an edge predictor for
on non-contextualized embeddings encoded by BoW [11] or TF-
finding missing important edges, so as to recover a robust graph
IDF [31], we employ Mettack [55] with a 20% perturbation rate to
structure, making various GNNs more robust.
generate attack topology. We compare seven representative base-
lines: TAPE [12] utilizes LLMs to generate extra semantic knowl-
edge relevant to the nodes. OFA [24] employs LLMs to unify dif- 4.1 Instruction Tuning a Local LLM
ferent graph data and tasks, where OFA-SBert utilizes Sentence Given an attacked graph structure, one straightforward method is
Bert [30] to encode the text of nodes, training and testing the GNNs to query powerful GPT-4 to identify malicious edges on the graph.
on each dataset independently. OFA-Llama2-7B involves training However, this method is extremely expensive, because there are
a single GNN across the Cora, Pubmed, and OGBN-Arxiv [13] |V | 2 different perturbation edges on a graph. For example, for
datasets. Following the work [3], GCN-Llama2-7B, GCN-e5-large, the PubMed [32] dataset with 19,717 nodes, the cost is approxi-
and GCN-SBert represent the use of Llama2-7B, e5-large, and Sen- mately $9.72 million. Thus, we hope to distill the inference capabil-
tence Bert as nodes’ text encoders, respectively. The vanilla GCN ity of GPT-4 into a local LLM, to identify malicious edges. To this
directly utilizes non-contextual embeddings. By reporting the node end, instruction tuning based on GPT-4 is a popular fine-tuning
classification accuracy on 𝑉test to evaluate the robustness of mod- technique [4, 44], which utilizes GPT-4 to construct an instruction
els against the Mettack. The implementation details of baselines dataset, and then further trains a local LLM in a supervised fashion.
refer to Appendix B. The result is depicted in Figure 1. We observe The instruction dataset generally consists of instance (instruction,
that under the influence of Mettack, GNNs combining LLMs/LMs input, output), where instruction denotes the human instruction
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

(a task definition in natural language) for LLMs, input is used as only facilitates an inference process in GPT-4 regarding prediction
supplementary content for the instruction, and output denotes the results, but also serves as a key to distilling the inference capability
desired output that follows the instruction. Therefore, the key is of GPT-4 into local LLMs. Finally, the output of the instruction
how to construct an effective instruction dataset for fine-tuning an dataset is generated by GPT-4 as follows:
LLM to identify the malicious edges.
In the tuning local LLMs phase of Figure 2 (a), based on an open- Analysis: {Analysis of predicted results}.
source and clean graph structure A (TAPE-Arxiv23 [12]), we utilize Relevance Score: {Predicted integer scores from 1-6}.
the existing attacks (Mettack [55], Nettack [54], and Minmax [43])
to generate the perturbed graph structure A′ , thus we have the In fact, it is difficult for GPT-4 to predict with complete accu-
modification matrix S as follows: racy. To construct a cleaner instruction dataset, we design a post-

Ø

Ø

processing filtering operation. Specifically, for the output of GPT-4,
S = A − {AMettack ANettack AMinmax }, (2) we only preserve the edges with relevance scores 𝑟𝑒 ∈ {1, 2, 3}
from the negative sample set En , and the edges with 𝑟𝑒 ∈ {4, 5, 6}
where S ∈ {−1, 0, 1} | V | × | V | and S𝑖 𝑗 = S 𝑗𝑖 = −1 when the edge
from the positive sample set Eq . The refined instruction dataset
between nodes 𝑣𝑖 and 𝑣 𝑗 is added. Conversely, it is removed if
is then used to fine-tune a local LLM, such as Mistral-7B [16] or
and only if S𝑖 𝑗 = S 𝑗𝑖 = 1, and S𝑖 𝑗 = S 𝑗𝑖 = 0 implies that the
Llama3-8B [34]. After that, the well-tuned LLM is able to infer
edge remains unchanged. Here the added edges are considered
the maliciousness of edges similar to GPT-4. We also provide case
as negative edge set E𝑛 , i.e., malicious edge set, and the removed
studies of GPT-4 and the local LLM (Mistral-7B) in Appendix E.
edges are considered as positive edge set E𝑝 , i.e., important edge
set. Since the attack methods prefer adding edges over removing
edges [18], to balance E𝑛 and E𝑝 , we sample a certain number of
4.2 Training an LM-based Edge Predictor
clean edges from A to E𝑝 . With E𝑛 and E𝑝 , we construct the query Now, given a new attacked graph structure A′ , our key idea is to re-
cover a robust graph structure Â. Intuitively, we can input each edge
Ð
edge set Eq = E𝑛 E𝑝 , which will be used to construct prompts
for requesting GPT-4. of A′ into the local LLM and obtain its relevance score 𝑟𝑒 . By remov-
Next, based on E𝑞 , we query the GPT-4 in an open-ended manner. ing edges with lower scores, we can mitigate the impact of malicious
This involves prompting the GPT-4 to make predictions on how edges on model predictions. Meanwhile, considering that attackers
malicious an edge is and provide analysis for its decisions. With can also delete some important edges to reduce model performance,
this objective, we design a prompt template that includes "System we need to find and add important edges that do not exist in A′ .
prompt", which is an open-ended question about how malicious the Although the local LLM can identify important edges with higher
edge is, and "User content", which is the textual information of node relevance scores, it is still very time and resource-consuming with
pairs 𝑣𝑖 , 𝑣 𝑗 from E𝑝 . The general structure of the template fol- |V | 2 edges. Therefore, we further design an LM-based edge predic-
lows: (where "System prompt" and "User content" also respectively tor, as depicted in Figure 2 (b), which utilizes Sentence Bert [30] as
correspond to instruction and input in the instruction dataset.) the text encoder and trains a multilayer perceptron (MLP) to find
missing important edges.
System prompt: In the context of graph neural networks, Firstly, we introduce how to construct the feature of each edge.
attackers manipulate models by adding irrelevant edges or re- Inspired by [3], deep sentence embeddings have emerged as a pow-
moving relevant ones, leading to incorrect predictions. Your erful text encoding method, outperforming non-contextualized em-
role is crucial in defending against such attacks by evaluat- beddings [11, 31]. Furthermore, sentence embedding models offer a
ing the relevance between pairs of nodes, which will help in lightweight method to obtain representations without fine-tuning.
identifying and removing the irrelevant edges to mitigate the Consequently, for each node 𝑣𝑖 , we adopt a sentence embedding
impact of adversarial attacks on graph-based models. Given model LM as texts encoder to extract representations h𝑖 from the
textual information about two nodes, analyze the relevance raw text 𝑠𝑖 , i.e., h𝑖 = LM(𝑠𝑖 ). We concatenate the representations
of these two nodes. Provide a concise analysis(approximately of the node 𝑣𝑖 and 𝑣 𝑗 as the feature for the corresponding edge.
100 words) and assign an integer relevance score from 1 to 6, Then the edge label 𝑦𝑒 can be derived from 𝑟𝑒 as follows:
where 1 indicates completely irrelevant and 6 indicates directly (
relevant. Your response should be formatted in JSON, with 1 if 𝑟𝑒 > 4
𝑦𝑒 = , (3)
two keys: "Analysis" for your written analysis and "Relevance 0 if 𝑟𝑒 ≤ 4
Score" for your numerical evaluation.
User content: Node 𝑣𝑖 →{Title, Abstract}.\n\nNode 𝑣 𝑗 →{Title, here, we utilize the local LLM as an edge annotator to distill its
Abstract}. inference capability, and select 4 as the threshold to find the most
positive edges. It is noted that there may be a label imbalance
In the "System prompt", we provide background knowledge about problem, where the number of positive edges is much higher than
tasks and the specific roles played by LLMs in the prompt, which can the negative. Thus, based on the cosine similarity, we select some
more effectively harness the inference capability of GPT-4 [12, 46]. node pairs with a lower similarity to construct a candidate set.
Additionally, we require GPT-4 to provide a fine-grained rating of When there are not enough negative edges, we sample from the
the maliciousness of edges on a scale from 1 to 6, where a lower candidate set to balance the training set.
score indicates more malicious, and a higher score indicates more Next, we feed the feature of each edge into an MLP to obtain the
important. The concept of "Analysis" is particularly crucial, as it not prediction probability 𝑦ˆ𝑒 (𝑣𝑖 , 𝑣 𝑗 ) = MLP(h𝑖 ||h 𝑗 ). The cross-entropy
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

Table 1: Node classification accuracy (% ± 𝜎) under non-targeted attack (Mettack). Bolded results indicate improved performance.
Dataset GCN GAT RGCN SimP-GCN
Ptb Rate Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN
0% 84.25±0.36 84.13±0.33 84.62±0.47 84.61±0.39 84.61±0.46 84.50±0.46 84.70±0.62 84.46±0.67
5% 75.62±1.70 81.76±0.69 79.82±0.52 81.22±0.78 76.65±0.82 82.19±0.59 79.36±0.99 81.78±0.51
Cora

10% 70.72±2.86 81.80±0.76 75.33±1.17 81.86±0.61 69.62±1.09 82.14±0.60 77.00±1.34 82.19±0.75


20% 57.41±2.00 81.41±0.77 63.17±1.82 81.00±0.98 59.27±0.81 81.39±0.44 76.04±1.66 81.20±0.63
0% 73.38±0.72 74.20±0.56 73.95±0.45 73.72±1.07 74.45±0.31 74.07±0.90 72.71±1.14 73.30±0.62
Citeseer

5% 69.69±0.59 73.94±0.56 71.37±0.73 73.22±1.08 72.30±0.25 73.61±0.65 71.07±0.93 73.48±0.58


10% 65.75±0.94 73.62±0.39 68.40±1.14 73.21±1.16 69.36±0.40 73.41±0.90 70.55±0.55 72.84±0.50
20% 58.72±1.00 74.12±0.85 61.92±1.76 73.94±0.67 62.79±0.60 74.04±0.70 69.54±0.78 73.70±0.64
0% 85.62±0.10 86.21±0.13 85.14±0.11 85.10±0.09 85.84±0.10 86.35±0.10 87.26±0.08 87.53±0.14
Pubmed

5% 73.27±1.15 85.27±0.26 80.87±0.91 84.45±0.23 81.61±0.20 85.93±0.10 85.90±0.20 86.08±0.27


10% 67.54±0.42 84.95±0.17 69.94±4.60 84.88±0.19 69.55±0.40 85.79±0.22 85.71±0.11 86.06±0.13
20% 52.12±1.47 84.99±0.32 53.07±0.69 85.20±0.30 48.47±0.71 85.99±0.21 85.70±0.15 86.07±0.24
0% 67.01±0.08 68.16±0.36 65.02±0.37 67.43±0.49 65.53±0.38 67.53±0.38 65.13±0.31 66.20±1.16
Arxiv

5% 50.51±0.29 68.86±0.41 54.68±0.57 68.46±0.47 52.35±0.09 68.71±0.43 51.07±3.87 67.88±0.47


10% 42.91±0.81 68.65±0.43 49.18±0.54 68.56±0.55 44.75±0.36 68.54±0.70 46.52±6.76 66.47±2.54
20% 33.96±0.46 69.17±0.43 34.24±2.12 68.86±0.54 31.69±0.63 69.00±0.59 37.31±7.59 68.15±0.65
0% 79.86±0.15 79.04±0.42 78.75±0.26 77.76±0.62 78.45±0.25 77.83±0.48 75.87±0.36 75.84±0.29
Products

5% 66.62±0.56 76.34±0.39 76.41±0.60 76.70±0.68 71.48±0.35 75.20±0.25 64.84±0.60 72.67±0.66


10% 63.31±0.52 75.80±0.24 74.13±0.37 75.48±0.75 68.98±0.41 74.75±0.38 58.59±1.97 71.80±0.48
20% 57.56±0.64 76.57±0.46 70.25±0.82 74.98±0.53 64.81±0.31 74.32±0.38 50.36±0.96 73.71±0.81

loss function is used to optimize the parameters of MLP as: deviation over 10 seeds for each result. More dataset details refer
to the Appendix C.1.
LCE (𝑦𝑒 , 𝑦ˆ𝑒 ) = −[𝑦𝑒 log(𝑦ˆ𝑒 ) + (1 − 𝑦𝑒 ) log(1 − 𝑦ˆ𝑒 )]. (4)
5.1.2 Baseline. First, LLM4RGNN is a general LLM-based frame-
After training the edge predictor, we input any node pair (𝑣𝑖 , 𝑣 𝑗 ) work to enhance the robustness of GNNs. Therefore, we select the
that does not exist in A′ into it to obtain the prediction probability classical GCN [20] and three robust GNNs (GAT [35], RGCN [53]
of edge existence. We have the important edge set for node 𝑣𝑖 : and Simp-GCN [17]) as baselines. Moreover, to more comprehen-
sively evaluate LLM4RGNN, we also compare it with existing SOTA
E𝑣𝑖 = {(𝑣𝑖 , 𝑣 𝑗 ) | 𝑗 ≠ 𝑖, A𝑖′ 𝑗 = 0, 𝑦ˆ𝑒 > 𝛾 and 𝑦ˆ𝑒 ∈ Top𝐾 }, (5)
robust GNN frameworks, including ProGNN [19], STABLE [22],
where 𝛾 ∈ (0, 1) is the threshold and 𝐾 is the maximum number HANG-quad [50] and GraphEdit1 [9], where GCN is selected as the
of edges. In this way, we can select the top 𝐾 neighbors for the object for improving robustness. More baseline introduction and
current node 𝑣𝑖 with predicted score greater than threshold 𝛾, to implementation details refer to Appendix C.2 and C.3, respectively.
establish the most important edges for 𝑣𝑖 as possible. For all the
Ð
nodes, we have the final important edge set Eadd = 𝑣𝑖 ∈ V E𝑣𝑖 . 5.2 Main Result
In this subsection, we conduct extensive evaluations of LLM4RGNN
4.3 Purifying Attacked Graph Structure against three popular poisoning topology attacks: non-targeted at-
tacks Mettack [55] and DICE [40], and targeted attack Nettack [54],
In Figure 2 (c), the robust graph structure  is derived from the
where we observe remarkable improvements of proposed LLM4RGNN
purification of A′ . Specifically, new edges from Eadd will be added
in the defense effectiveness. We report the accuracy (ACC (↑)) on
in A′ . Simultaneously, with the relevance score 𝑟𝑒 of each edge, we
representative transductive node classification task. More results
remove the malicious edges in A′ by setting a purification threshold
of inductive poisoning attacks refer to the Appendix D.1.
𝛽, i.e., edges with 𝑟𝑒 larger than 𝛽 are preserved, otherwise removed.
The  is adaptive to any GNNs, making GNNs more robust. 5.2.1 Against Mettack. Non-targeted attacks aim to disrupt the
entire graph topology to degrade the performance of GNNs on the
5 EXPERIMENTS test set. We employ the SOTA non-targeted attack method, Met-
tack [55], and vary the perturbation rate, i.e., the ratio of changed
5.1 Experimental Setup edges, from 0 to 20% with a step of 5%. We have the following
5.1.1 Dataset. We conduct extensive experiments on four cross- observations: (1) From Table 1, LLM4RGNN consistently improves
dataset citation networks (Cora [27], Citeseer [6], Pubmed [32], the robustness across various GNNs. For GCN, there is an average
OGBN-Arxiv [13]) and one cross-domain product network (OGBN-
Products [13]). We report the average performance and standard 1 GraphEdit only provides prompts for Cora, Citeseer and Pubmed datasets.
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

Table 2: Node classification accuracy (% ± 𝜎) under non-targeted attack (DICE). Bolded results indicate improved performance.
Dataset GCN GAT RGCN SimP-GCN
Ptb Rate Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN
10% 81.38±0.31 81.84±0.71 81.52±0.49 82.12±0.71 81.71±0.52 82.14±0.52 82.29±0.55 82.35±0.65
Cora

20% 78.38±0.51 80.96±0.61 78.51±0.35 81.02±1.00 78.58±0.55 81.52±0.88 79.65±0.61 81.38±0.56


40% 71.62±0.40 79.46±0.93 71.81±0.72 79.11±1.00 72.77±0.48 79.86±0.86 74.37±0.67 79.64±1.14
10% 71.08±0.63 73.95±0.59 71.62±0.79 73.73±0.58 72.74±0.46 73.85±1.07 71.15±0.75 73.41±0.64
Products Arxiv Pubmed Citeseer

20% 68.79±0.49 73.32±0.58 69.68±0.59 73.26±0.45 70.67±0.33 73.61±0.69 70.67±0.53 72.66±0.73


40% 63.98±1.02 73.35±0.50 64.81±0.87 73.07±0.71 65.46±0.55 73.64±1.02 68.53±1.80 72.80±0.63
10% 82.86±0.17 85.14±0.22 81.85±0.17 83.96±0.22 83.48±0.12 85.24±0.15 86.60±0.07 86.92±0.13
20% 80.11±0.16 84.96±0.19 78.92±0.20 84.33±0.28 80.55±0.11 85.06±0.20 85.98±0.08 86.66±0.18
40% 74.45±0.19 84.89±0.21 72.47±0.24 83.98±0.17 74.78±0.18 84.73±0.25 85.36±0.09 86.42±0.11
10% 64.23±0.10 68.15±0.89 62.39±0.28 67.96±1.05 62.92±0.27 67.99±1.00 62.73±0.27 66.77±1.51
20% 62.35±0.17 68.21±1.01 59.60±0.40 67.84±1.10 60.58±0.27 67.95±1.11 60.52±0.25 66.56±1.19
40% 57.55±0.14 68.73±0.32 54.86±0.38 68.52±0.53 56.33±0.21 68.47±0.40 56.38±0.23 67.37±1.64
10% 76.05±0.16 77.66±0.22 75.12±0.28 76.16±0.31 74.78±0.33 76.41±0.43 71.62±0.61 74.20±0.41
20% 72.48±0.21 77.15±0.62 72.03±0.37 75.62±1.06 71.71±0.27 76.12±0.59 68.03±0.44 74.57±1.00
40% 66.26±0.37 77.02±0.56 65.52±0.62 76.20±0.40 64.70±0.31 75.85±0.46 60.41±0.57 75.01±0.64

Table 3: Node classification accuracy (%±𝜎) under non- Table 4: Node classification accuracy (% ± 𝜎) under non-
targeted attack (Mettack). The best results are in bold. OOT targeted attack (DICE). The best results are in bold. OOT
means that the result could not be obtained in 15 days. means that the result could not be obtained in 15 days.
Dataset Dataset
Pro-GNN STABLE HANG-quad GraphEdit LLM4RGNN Pro-GNN STABLE HANG-quad GraphEdit LLM4RGNN
Ptb Rate Ptb Rate
0% 80.85±0.44 85.09±0.21 79.41±0.55 76.35±0.38 84.13±0.33 0% 80.85±0.44 85.09±0.21 79.41±0.55 76.35±0.38 84.13±0.33
5% 79.81±0.45 80.53±1.13 78.81±1.02 74.97±0.95 81.76±0.69 10% 81.56±0.36 81.33±0.72 78.26±0.29 73.73±0.57 81.84±0.71
Cora
Cora

10% 78.57±0.97 79.53±0.52 78.15±1.14 74.87±0.75 81.80±0.76 20% 78.32±0.33 78.80±0.83 76.45±0.50 69.86±0.71 80.96±0.61
20% 76.07±0.57 78.70±0.89 74.90±0.65 73.82±0.48 81.41±0.77 40% 71.76±0.34 76.72±0.93 74.28±0.42 66.95±0.48 79.46±0.93
0% 71.11±0.45 72.51±1.37 71.18±0.64 72.60±0.57 74.20±0.56 0% 71.11±0.45 72.51±1.37 71.18±0.64 72.60±0.57 74.20±0.56
Citeseer
Citeseer

5% 69.68±0.53 71.23±1.35 71.16±0.71 71.40±0.72 73.94±0.56 10% 70.81±0.68 70.27±1.62 70.29±0.70 68.99±0.52 73.95±0.59
10% 68.73±0.79 70.47±1.47 70.84±1.00 71.35±0.98 73.62±0.39 20% 69.21±0.79 69.11±1.57 70.05±0.77 69.31±0.41 73.32±0.58
20% 68.37±0.84 67.91±1.98 70.00±1.11 69.44±0.86 74.12±0.85 40% 68.04±1.38 67.55±0.86 68.79±1.09 66.59±0.74 73.35±0.50
0% OOT 84.07±0.17 84.98±0.13 85.38±0.07 86.21±0.13 0% OOT 84.07±0.17 84.98±0.13 85.38±0.07 86.21±0.13
Pubmed
Pubmed

5% OOT 79.41±0.86 84.97±0.16 84.77±0.09 85.27±0.26 10% OOT 81.68±0.21 84.97±0.15 82.12±0.07 85.14±0.22
10% OOT 77.65±0.25 84.88±0.19 83.41±0.19 84.95±0.17 20% OOT 79.08±0.23 84.94±0.18 81.11±0.10 84.96±0.19
20% OOT 72.51±1.05 84.94±0.19 82.34±0.27 84.99±0.32 40% OOT 74.51±0.45 84.86±0.20 83.04±0.09 84.89±0.21
0% OOT 66.70±0.19 67.85±0.17 - 68.16±0.36 0% OOT 66.70±0.19 67.85±0.17 - 68.16±0.36
Arxiv

10% OOT 64.93±0.19 65.99±0.19 - 68.15±0.89


Arxiv

5% OOT 61.40±0.50 63.53±0.42 - 68.86±0.41


10% OOT 59.37±0.40 57.49±0.76 - 68.65±0.43 20% OOT 62.64±0.44 63.74±0.23 - 68.21±1.01
20% OOT 58.24±0.60 41.61±1.28 - 69.17±0.43 40% OOT 58.55±0.37 59.23±0.17 - 68.73±0.32
0% OOT 79.23±0.33 80.61±0.18 - 79.04±0.42 0% OOT 79.23±0.33 80.61±0.18 - 79.04±0.42
Products
Products

5% OOT 76.24±0.44 72.41±0.66 - 76.34±0.39 10% OOT 76.72±0.32 77.06±0.29 - 77.66±0.22


10% OOT 74.38±0.40 69.88±0.86 - 75.80±0.24 20% OOT 75.15±0.27 75.52±0.24 - 77.15±0.62
20% OOT 72.21±0.40 62.24±0.28 - 76.57±0.46 40% OOT 73.68±0.26 69.01±0.17 - 77.02±0.56

accuracy improvement of 24.3% and a maximum improvement of accuracy of GNNs is better than that on the clean graph. A possible
103% across five datasets. For robust GNNs, including GAT, RGCN, reason is that the local LLM effectively identifies malicious edges
and Simp-GCN, LLM4RGNN on average also has 16.6%, 21.4%, and as negative samples, which also helps train a more effective edge
13.7% relative improvements in accuracy. Notably, despite fine- predictor to find important edges.
tuning the local LLM on the TAPE-Arxiv23 dataset, which does
not include any medical or product samples, there is still a rela- 5.2.2 Against DICE. To verify the defense generalization capabil-
tive accuracy improvement of 18.8% and 11.4% on the Pubmed and ity of LLM4RGNN, we also evaluate its effectiveness against another
OGBN-Products datasets, respectively. (2) Referring to Table 3, com- non-targeted attack, DICE [40]. Notably, DICE is not involved in the
pared with existing robust GNN frameworks, LLM4RGNN achieves construction process of the instruction dataset. Considering that
SOTA robustness, which benefits from the powerful understand- DICE is not as effective as Mettack, we set higher perturbation rates
ing and inference capabilities of LLMs. (3) Combining Table 1 and of 10%, 20% and 40%. The results are reported in Table 2 and Table 4.
Table 3, even in some cases where the perturbation ratio increases Similar to the results under Mettack, LLM4RGNN consistently im-
to 20%, after using LLM4RGNN to purify the graph structure, the proves the robustness across various GNNs and is superior to other
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

Table 5: Node classification accuracy (% ± 𝜎) under targeted attack (Nettack). Bolded results indicate improved performance.
Dataset GCN GAT RGCN Sim-PGCN
Ptb Num Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN Vanilla LLM4RGNN
0 85.92±2.21 87.50±1.79 85.66±1.90 87.11±1.64 86.45±1.87 88.55±2.57 86.97±1.09 86.71±2.39
1 79.87±2.50 85.79±1.75 82.11±2.06 86.71±2.24 82.50±1.87 87.37±2.64 81.71±2.39 86.05±2.37
2 76.32±1.77 85.66±2.08 75.13±4.42 86.18±1.79 77.37±1.29 87.37±1.58 77.76±1.37 84.34±1.99
Cora

3 71.45±2.43 84.47±1.15 70.79±3.85 85.00±2.58 72.11±1.64 86.32±2.44 71.97±2.89 85.39±1.37


4 64.47±3.38 84.61±2.04 65.79±3.86 85.39±1.99 67.24±1.90 86.84±1.95 63.55±3.58 84.61±2.50
5 55.13±4.01 84.08±1.61 60.00±4.68 83.03±1.90 60.26±2.48 85.13±1.56 52.63±4.40 83.55±1.69
0 86.31±1.28 86.77±1.02 86.92±0.77 87.38±1.02 85.23±1.41 86.92±1.72 80.92±2.50 84.62±2.84
1 84.62±2.57 86.46±0.62 85.54±1.23 86.92±1.03 82.15±1.85 86.31±1.45 80.00±3.30 84.31±2.36
Citeseer

2 81.69±2.62 85.38±1.03 83.38±1.79 86.38±1.61 74.46±4.25 85.85±1.51 79.54±4.30 84.62±1.54


3 79.08±3.24 86.00±0.63 81.38±3.11 85.38±1.42 72.00±3.94 85.54±1.41 76.62±5.99 83.85±2.31
4 76.77±5.26 86.00±0.83 77.08±4.54 85.69±1.55 69.08±2.70 85.38±1.58 73.23±7.51 84.00±1.41
5 74.31±5.29 86.00±1.08 73.69±8.44 85.54±1.02 64.31±3.94 84.46±1.45 68.31±10.62 82.46±2.20

9DQLOOD*&1 //05*11ZR(3 //05*11)XOO


   
   
$FFXUDF\ 

$FFXUDF\ 

$FFXUDF\ 

$FFXUDF\ 
   
   
   
   
 &RUD &LWHVHHU 3XEPHG $U[LY 3URGXFWV  &RUD &LWHVHHU 3XEPHG $U[LY 3URGXFWV  &RUD &LWHVHHU 3XEPHG $U[LY 3URGXFWV  &RUD &LWHVHHU 3XEPHG $U[LY 3URGXFWV
*&1 *$7 5*&1 6LP3*&1
Figure 3: Performance comparison in different settings against Mettack with a 20% perturbation rate.

robust GNN frameworks. For GCN, GAT, RGCN and Simp-GCN, Table 6: Node classification accuracy (% ± 𝜎) under targeted
LLM4RGNN on average brings 8.2%, 8.8%, 8.1% and 6.5% relative im- attack (Nettack). The best results are in bold.
provements in accuracy on five datasets. Remarkably, even in some Dataset
cases where the perturbation ratio increases to 40%, the accuracy STABLE HANG-quad LLM4RGNN
Ptb Num
of GNNs is better than that on the clean graph.
0 88.03±2.24 78.42±4.13 87.50±1.79
5.2.3 Against Nettack. Different from non-targeted attacks, tar- 1 86.05±2.14 77.89±2.41 85.79±1.75
geted attacks specifically focus on a particular node 𝑣, aiming to 2 84.08±1.61 78.29±3.44 85.66±2.08
Cora

fool GNNs into misclassifying 𝑣. We employ the SOTA targeted 3 82.63±2.19 76.32±4.20 84.47±1.15
attack, Nettack [54]. Following previous work [53], we select nodes 4 79.74±2.37 75.00±3.06 84.61±2.04
with a degree greater than 10 as the target nodes, and vary the
5 75.13±2.24 74.61±3.68 84.08±1.61
number of perturbations applied to the targeted node from 0 to 5
with a step of 1, to generate attacked structures. The results are 0 87.08±0.75 79.23±3.53 86.77±1.02
reported in Table 6 and Table 5. Similar to the results under the 1 86.46±1.15 78.77±2.98 86.46±0.62
Citeseer

Mettack and DICE, LLM4RGNN not only consistently improves 2 84.31±1.66 78.15±3.56 85.38±1.03
the robustness of various GNNs, but also surpasses existing robust 3 81.54±1.54 80.15±1.28 86.00±0.63
GNN frameworks, exhibiting exceptional resistance to Nettack. 4 78.92±4.01 78.15±3.21 86.00±0.83
In summary, although we fine-tune the local LLM only based 5 78.31±4.10 77.54±3.01 86.00±1.08
on TAPE-Arxiv23, LLM4RGNN still significantly improves the ro-
bustness of GNNs in both cross-dataset (Cora, Citeseer, PubMed, "Vanilla" refers to the setting without any modifications to the
OGBN-Arxiv) and cross-domain (OGBN-Products) scenarios. attacked structure. The "w/o EP" variant involves only removing
malicious edges by the local LLM, while "Full" includes both remov-
ing malicious edges and adding missing important edges, i.e., our
5.3 Model Analysis proposed LLM4RGNN. Across all settings, our proposed method
5.3.1 Ablation Study. To assess how the key components of consistently outperforms the other settings. Specifically, utilizing
LLM4RGNN benefit the adversarial robustness, we employ Mettack the local LLM to identify and remove the majority of malicious
with a perturbation ratio of 20% to generate the attacked structure edges can significantly reduce the impact of adversarial perturba-
and conduct ablation experiments. Classical GCN is selected as tion, improving the accuracy of GNNs. By further employing the
training GNN. The experiment result is depicted in Figure 3, where edge predictor to find and add important neighbors for each node,
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

Table 7: Performance comparison with different LLMs against Mettack with a 20% perturbation rate.
Cora Citeseer
LLM
AdvEdge (↓) ACC (↑) w/o EP ACC (↑) w/ EP AdvEdge (↓) ACC (↑) w/o EP ACC (↑) w/ EP
Close GPT-3.5 186(3.52%) 73.56±0.49 75.36±1.73 231(5.47%) 66.69±1.04 68.92±1.63
source GPT-4 90(1.71%) 79.83±0.31 81.76±0.87 73(1.73%) 71.04±0.57 72.81±0.60
Llama2-7B 156(2.96%) 78.76±0.42 80.99±0.78 203(4.80%) 69.83±0.78 72.86±0.93
Open Llama2-13B 132(2.50%) 79.00±0.30 81.35±1.15 93(2.20%) 70.81±0.73 72.84±0.85
source Llama3-8B 107(2.03%) 79.22±0.34 81.71±0.51 91(2.15%) 70.85±0.64 74.06±0.78
Mistral-7B 102(1.93%) 78.68±0.28 81.41±0.77 92(2.18%) 71.07±0.35 74.12±0.85

   

$FFXUDF\ 
$FFXUDF\ 

$FFXUDF\ 

$FFXUDF\ 
   
   
   
   
 
 
 
   
   
   
   


 


K 

 


K


 


K 

 


K
       
       

(a) Cora-Mettack-5%. (b) Cora-Mettack-20%. (c) Citeseer-Mettack-5%. (d) Citeseer-Mettack-20%.


Figure 4: Analysis of the hyper-parameter 𝛾 and 𝐾 against Mettack.

additional information gain is provided to the center nodes, further Table 8: Average inference times (s) for different datasets.
improving the accuracy of GNNs.
Component Cora Citeseer Products Pubmed Arxiv
5.3.2 Comparison with Different LLMs. To evaluate the gener- Local LLM 0.54 0.57 0.63 0.64 1.12
alizability of LLM4RGNN across different LLMs, we choose four pop- Edge Predictor 0.05 0.06 0.24 0.31 0.25
ular open-source LLMs, including Llama2-7B, Llama2-13B, Llama3-
8B and Mistral-7B, as the starting checkpoints of LLM. We also 
  
$FFXUDF\ 

$FFXUDF\ 
introduce a direct comparison using the closed-source GPT-3.5

$GY(GJH

$GY(GJH
   
and GPT-4 as malicious edge detectors. Additionally, the metric
 $FFXUDF\   $FFXUDF\ 
AdvEdge (↓) is introduced to measure the number and proportion $GY(GJH $GY(GJH
of malicious edges remaining after the LLM performs the filter-                
ing operation. We report the results of GCN on Cora and Citeseer &RUD &LWHVHHU
datasets under Mettack with a perturbation ratio of 20% (generating Figure 5: Analysis of hyper-parameter 𝛽 against Mettack-20%.
1053 malicious edges for Cora and 845 for Citeseer). As reported
  
in Table 7, we have the following observations: (1) The well-tuned
   
$FFXUDF\ 

 
local LLMs are significantly superior to GPT-3.5 in identifying ma-  
$GY(GJH

licious edges, and the trained LM-based edge predictor consistently  
improves accuracy, indicating that the inference capability of GPT-  
4 is effectively distilled into different LLMs and edge predictors.  
(2) A stronger open-source LLM yields better overall performance.  &RUD &LWHVHHU  &RUD &LWHVHHU
Among them, the performance of fine-tuned Mistral-7B and Llama3-
Figure 6: Analysis of instances number against Mettack-20%.
8B is comparable to that of GPT-4. We provide a comprehensive
comparison between Mistral-7B and Llama3-8B in Appendix D.3. 0.7s and 0.2s, respectively, which is acceptable. Each experiment’s
5.3.3 Efficiency Analysis. In LLM4RGNN, using 26,000 samples total inference time is controlled within 90 minutes.
to fine-tune the local LLM is a one-time process, controlled within
6 hours. The edge predictor is only a lightweight MLP, with train- 5.4 Hyper-parameter Analysis
ing time on each dataset controlled within 1 minute. Furthermore, We conduct hyper-parameter analysis on the probability threshold
LLM4RGNN does not increase the complexity compared with ex- 𝜆, the maximum number of important edges 𝐾, the purification
isting robust GNNs framework [19, 22]. The complexity of LLM threshold 𝛽, and the number of instances used in tuning LLMs.
inferring edge relationships is 𝑂 (|E |), and the edge predictor is First, we present the accuracy of LLM4RGNN under different
𝑂 (|V | 2 ), and their inference processes are parallelizable. We pro- combinations of 𝜆 and 𝐾 in Figure 4. The results indicate that the
vide the average time for LLM to infer one edge and for the light- accuracy of LLM4RGNN varies minimally across different hyper-
weight edge predictor to infer one node in Table 8. Overall, the parameter settings, demonstrating its insensitivity to the hyper-
average inference time for the local LLM and the edge predictor is parameters 𝜆 and 𝐾. More results are presented in Appendix D.2.
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

Besides, we report the accuracy and AdvEdge of LLM4RGNN [16] Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, et al. Mistral 7b. arXiv
under different 𝛽, the purification threshold of whether an edge is preprint arXiv:2310.06825, 2023.
[17] Wei Jin, Tyler Derr, Yiqi Wang, et al. Node similarity preserving graph convolu-
preserved or not. The results are shown in Figure 5. When the 𝛽 is tional networks. In Proceedings of the 14th ACM international conference on web
set to 4, most malicious edges can be identified and achieve optimal search and data mining, 2021.
[18] Wei Jin, Yaxing Li, Han Xu, et al. Adversarial attacks and defenses on graphs.
performance. This is because a low 𝛽 cannot effectively identify ACM SIGKDD Explorations Newsletter, 2021.
malicious edges, while a high 𝛽 may delete more malicious edges [19] Wei Jin, Yao Ma, Xiaorui Liu, et al. Graph structure learning for robust graph
but could also remove some useful edges. neural networks. In Proceedings of the 26th ACM SIGKDD international conference
on knowledge discovery & data mining, 2020.
Lastly, we also analyze the effectiveness of LLM4RGNN under [20] Thomas N Kipf and Max Welling. Semi-supervised classification with graph
different numbers of instances. Specifically, we set the number of convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
instances for tuning the local LLM to 5000, 15000 and 26000, re- [21] Kuan Li, YiWen Chen, Yang Liu, et al. Boosting the adversarial robustness of graph
neural networks: An ood perspective. In The Twelfth International Conference on
spectively. The results are shown in Figure 6. Remarkably, with Learning Representations, 2023.
only 5000 instances, the tuned local LLM can effectively identify [22] Kuan Li, Yang Liu, Xiang Ao, et al. Reliable representations make a stronger
defender: Unsupervised structure refinement for robust gnn. In Proceedings of the
malicious edges, surpassing the current SOTA method. This also in- 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022.
dicates that the excellent robustness of LLM4RGNN can be achieved [23] Kuan Li, Yang Liu, Xiang Ao, et al. Revisiting graph adversarial attack and defense
on a lower budget, approximately $80. from a data distribution perspective. In The Eleventh International Conference on
Learning Representations, 2022.
[24] Hao Liu, Jiarui Feng, Lecheng Kong, et al. One for all: Towards training one
graph model for all classification tasks. arXiv preprint arXiv:2310.00149, 2023.
6 CONCLUSION [25] Jiawei Liu, Cheng Yang, Zhiyuan Lu, et al. Towards graph foundation models: A
In this paper, we first explore the potential of LLMs on the graph survey and beyond. arXiv preprint arXiv:2310.11829, 2023.
[26] Chengsheng Mao, Liang Yao, and Yuan Luo. Medgcn: Graph convolutional
adversarial robustness. Specifically, we propose a novel LLM-based networks for multiple medical tasks. arXiv preprint arXiv:1904.00326, 2019.
robust graph structure inference framework, LLM4RGNN, which [27] Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, et al. Automating the
construction of internet portals with machine learning. Information Retrieval,
distills the inference capability of GPT-4 into a local LLM for identi- 2000.
fying malicious edges and an LM-based edge predictor for finding [28] Tomas Mikolov, Ilya Sutskever, Kai Chen, et al. Distributed representations of
missing important edges, to efficiently purify attacked graph struc- words and phrases and their compositionality. Advances in neural information
processing systems, 2013.
ture, making GNNs more robust. Extensive experiments demon- [29] Felix Mujkanovic, Simon Geisler, Stephan Günnemann, et al. Are defenses for
strate that LLM4RGNN significantly improves the adversarial ro- graph neural networks robust? Advances in Neural Information Processing Systems,
bustness of GNNs and achieves SOTA defense result. Considering 2022.
[30] Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using
there are some graphs that lack textual information, a future plan siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
is to extend LLM4RGNN to graphs without text. [31] Stephen Robertson. Understanding inverse document frequency: on theoretical
arguments for idf. Journal of documentation, 2004.
[32] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, et al. Collective classification in
REFERENCES network data. AI magazine, 2008.
[33] Lichao Sun, Yingtong Dou, Carl Yang, et al. Adversarial attack and defense on
[1] Josh Achiam, Steven Adler, Sandhini Agarwal, et al. Gpt-4 technical report. arXiv graph data: A survey. IEEE Transactions on Knowledge and Data Engineering,
preprint arXiv:2303.08774, 2023. 2022.
[2] Yupeng Chang, Xu Wang, Jindong Wang, et al. A survey on evaluation of large [34] Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al. Llama: Open and efficient
language models. ACM Transactions on Intelligent Systems and Technology, 2024. foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[3] Zhikai Chen, Haitao Mao, Hang Li, et al. Exploring the potential of large language [35] Petar Veličković, Guillem Cucurull, Arantxa Casanova, et al. Graph attention
models (llms) in learning on graphs. ACM SIGKDD Explorations Newsletter, 2024. networks. arXiv preprint arXiv:1710.10903, 2017.
[4] Zhikai Chen, Haitao Mao, Hongzhi Wen, et al. Label-free node classification on [36] Haishuai Wang, Yang Gao, Xin Zheng, et al. Graph neural architecture search
graphs with large language models (llms). In The Twelfth International Conference with gpt-4. arXiv preprint arXiv:2310.01436, 2023.
on Learning Representations, 2023. [37] Heng Wang, Shangbin Feng, Tianxing He, et al. Can language models solve
[5] Negin Entezari, Saba A Al-Sayouri, Amirali Darvishzadeh, et al. All you need is graph problems in natural language? Advances in Neural Information Processing
low (rank) defending against adversarial attacks on graphs. In Proceedings of the Systems, 2024.
13th international conference on web search and data mining, 2020. [38] Jianian Wang, Sheng Zhang, Yanghua Xiao, et al. A review on graph neural
[6] C Lee Giles, Kurt D Bollacker, and Steve Lawrence. Citeseer: An automatic network methods in financial applications. arXiv preprint arXiv:2111.15367, 2021.
citation indexing system. In Proceedings of the third ACM conference on Digital [39] Kuansan Wang, Zhihong Shen, Chiyuan Huang, et al. Microsoft academic graph:
libraries, 1998. When experts are not enough. Quantitative Science Studies, 2020.
[7] Lukas Gosch, Simon Geisler, Daniel Sturm, et al. Adversarial training for graph [40] Marcin Waniek, Tomasz P Michalak, Michael J Wooldridge, et al. Hiding individ-
neural networks: Pitfalls, solutions, and new directions. Advances in Neural uals and communities in a social network. Nature Human Behaviour, 2018.
Information Processing Systems, 2024. [41] Huijun Wu, Chen Wang, Yuriy Tyshetskiy, et al. Adversarial examples on graph
[8] Lukas Gosch, Daniel Sturm, Simon Geisler, et al. Revisiting robustness in graph data: Deep insights into attack and defense. arXiv preprint arXiv:1903.01610, 2019.
machine learning. arXiv preprint arXiv:2305.00851, 2023. [42] Zonghan Wu, Shirui Pan, Fengwen Chen, et al. A comprehensive survey on
[9] Zirui Guo, Lianghao Xia, Yanhua Yu, et al. Graphedit: Large language models for graph neural networks. IEEE transactions on neural networks and learning systems,
graph structure learning. arXiv preprint arXiv:2402.15183, 2024. 2020.
[10] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning [43] Kaidi Xu, Hongge Chen, Sijia Liu, et al. Topology attack and defense for graph
on large graphs. Advances in neural information processing systems, 2017. neural networks: An optimization perspective. arXiv preprint arXiv:1906.04214,
[11] Zellig S Harris. Distributional structure. Word, 1954. 2019.
[12] Xiaoxin He, Xavier Bresson, Thomas Laurent, et al. Harnessing explanations: [44] Xiaohan Xu, Ming Li, Chongyang Tao, et al. A survey on knowledge distillation
Llm-to-lm interpreter for enhanced text-attributed graph representation learning. of large language models. arXiv preprint arXiv:2402.13116, 2024.
In The Twelfth International Conference on Learning Representations, 2023. [45] Shukang Yin, Chaoyou Fu, Sirui Zhao, et al. A survey on multimodal large
[13] Weihua Hu, Matthias Fey, Marinka Zitnik, et al. Open graph benchmark: Datasets language models. arXiv preprint arXiv:2306.13549, 2023.
for machine learning on graphs. Advances in neural information processing systems, [46] Jianxiang Yu, Yuxiang Ren, Chenghua Gong, et al. Empower text-
2020. attributed graphs learning with large language models (llms). arXiv preprint
[14] Jin Huang, Xingjian Zhang, Qiaozhu Mei, et al. Can llms effectively leverage arXiv:2310.09872, 2023.
graph structural information: When and why, 2023. [47] Mengmei Zhang, Mingwei Sun, Peng Wang, et al. Graphtranslator: Aligning
[15] Jincheng Huang, Lun Du, Xu Chen, et al. Robust mid-pass filtering graph convo- graph model to large language model for open-ended tasks. arXiv preprint
lutional networks. In Proceedings of the ACM Web Conference 2023, 2023. arXiv:2402.07197, 2024.
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

[48] Xiang Zhang and Marinka Zitnik. Gnnguard: Defending graph neural networks
against adversarial attacks. Advances in neural information processing systems,
2020.
[49] Haiteng Zhao, Shengchao Liu, Ma Chang, et al. Gimlet: A unified graph-text
model for instruction-based molecule zero-shot learning. Advances in Neural
Information Processing Systems, 2024.
[50] Kai Zhao, Qiyu Kang, Yang Song, et al. Adversarial robustness in graph neural
networks: A hamiltonian approach. Advances in Neural Information Processing
Systems, 2024.
[51] Wayne Xin Zhao, Kun Zhou, Junyi Li, et al. A survey of large language models.
arXiv preprint arXiv:2303.18223, 2023.
[52] Qinkai Zheng, Xu Zou, Yuxiao Dong, et al. Graph robustness benchmark: Bench-
marking the adversarial robustness of graph machine learning. Neural Information
Processing Systems Track on Datasets and Benchmarks 2021, 2021.
[53] Dingyuan Zhu, Ziwei Zhang, Peng Cui, et al. Robust graph convolutional net-
works against adversarial attacks. In Proceedings of the 25th ACM SIGKDD
international conference on knowledge discovery & data mining, 2019.
[54] Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks
on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD
international conference on knowledge discovery & data mining, 2018.
[55] Daniel Zügner and Stephan Günnemann. Adversarial attacks on graph neural net-
works via meta learning. In International Conference on Learning Representations
(ICLR), 2019.
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

A RELATED WORK C EXPERIMENT DETAILS


A.1 Adversarial Attacks and Defenses on Graph C.1 Datasets
It has been demonstrated in extensive studies [18, 23, 29, 40] that In this paper, we use the TAPE-Arxiv23 [12] that have up-to-date
attackers can catastrophically degrade the performance of GNNs and rich texts to construct the instruction dataset, and use the
by maliciously perturbing the graph structure. For example, the following popular datasets commonly adopted for node classifica-
Nettack [54] is the first study of adversarial attacks on graph data, tions: Cora [27], Citeseer [6], Pubmed [32], OGBN-Arxiv [13] and
which preserves degree distribution and imposes constraints on OGBN-Products [13].
feature co-occurrence to generate small deliberate perturbations. Note that for OGBN-Arxiv, given its large scale with 169,343
Subsequently, the Mettack [55] utilizes the meta-learning while the nodes and 1,166,243 edges, we adopt a node sampling strategy [10]
Minmax and PGD [43] attacks utilize projected gradient descent, to obtain a subgraph containing 14,167 nodes and 33,502 edges.
to solve the bilevel problem underlying poisoning attacks. For the larger OGBN-Products, which have 2 million nodes and 61
Threatened by adversarial attacks, many methods [7, 8, 15, 21] million edges, we used the same sampling technique to construct
have been proposed to defend against adversarial attacks. These a subgraph containing 12,394 nodes and 29,676 edges. We give
methods can mainly be categorized into model-centric and data- a detailed description of each dataset in Table 9. The sources of
centric. The methods of model-centric improve the robustness datasets are listed as follows:
through model enhancement, either by robust training schemes • Cora, Pubmed, OGBN-Arxiv, TAPE-Arxiv23: TAPE Reposi-
(e.g., adversarial training [7, 21]) or designing new model architec- tory (MIT license)
tures (e.g., RGCN [53], HANG [50], Mid-GCN[15]). The methods of • OGBN-Products: LLM-Structured-Data Repository (MIT license)
data-centric typically focus on flexible data processing to improve • Citeseer: Graph-LLM Repository (MIT license)
the robustness of GNNs. By treating the attacked topology as noisy,
defenders primarily purify graph structures by calculating various
similarities between node embeddings [5, 19, 22, 41, 48]. For ex- C.2 Baselines
ample, ProGNN [19] jointly trains GNN’s parameters and learns • GCN: GCN is a popular graph convolutional network based on
a clean adjacency matrix with graph properties. STABLE [22] is a spectral theory.
pre-training model, which is specifically designed to learn effec- • GAT: GAT is composed of multiple attention layers, which can
tive representations to refine graph quality. The above methods learn different weights for different nodes in the neighborhood.
have received considerable attention in enhancing the robustness It is often used as a baseline for defending against adversarial
of GNNs. attacks.
• RGCN: RGCN models node representations as Gaussian distribu-
tions to mitigate the impact of adversarial attacks, and employs
A.2 LLMs for Graphs
an attention mechanism to penalize nodes with high variance.
Recently, Large Language models (LLMs) have been widely em- • Simp-GCN: SimpGCN employs a 𝑘NN graph to maintain the
ployed in graph-related tasks, which outperform traditional GNN- proximity of nodes with similar features in the representation
based methods and yield SOTA performance. According to the role space and uses self-learned regularization to preserve the remote-
played by LLMs in graph-related tasks, some methods utilize LLMs ness of nodes with differing features.
as an enhancer [12, 24, 47], where LLMs are used to enhance the • ProGNN: ProGNN adapts three regularizations of graphs, i.e.,
quality of node features. Some methods directly utilize LLMs as feature smoothness, low-rank, and sparsity, and learns a clean
a predictor [3, 14, 37, 49], where the graph structure is described adjacency matrix to defend against adversarial attacks.
in natural language for input to LLMs for prediction. Additionally, • STABLE: STABLE is a pre-training model, which is specifically
some methods employ LLMs as an annotator [4], generator [46], designed to learn effective representations to refine graph struc-
and controller [36]. Although GraphEdit [9] utilizes LLMs for graph tures.
structure learning, it focuses on identifying noisy connections in • HANG-quad: HANG-quad incorporates conservative Hamilton-
original graphs, rather than addressing adversarial robustness prob- ian flows with Lyapunov stability to various GNNs, to improve
lem in graphs. In this paper, we adopt LLM as a defender, which their robustness against adversarial attacks.
utilizes LLMs to purify attacked graph structures, making GNNs • GraphEdit: GraphEdit utilizes LLMs to identify noisy connec-
more robust. tions and uncover implicit relations among non-connected nodes
in the original graph.
B IMPLEMENTATION DETAILS OF BASELINE
IN SECTION 3 C.3 Implementation Details
For all baselines, we use their original code and the default hyper- We use DeepRobust, an adversarial attack repository, to implement
parameter settings in the authors’ implementation. The sources are all the attack methods as well as GCN, GAT, RGCN, and Sim-PGCN.
listed as follows: We implement ProGNN, STABLE, and HANG-quad with the code
provided by the authors.
• TAPE: TAPE Repository For each graph, following existing works [19, 22], we randomly
• OFA-Llama2-7B/SBert: OneForAll Repository split the nodes into 10% for training, 10% for validation, and 80%
• GCN-Llama2-7B/SBert/e5-large: Graph-LLM Repository for testing. We generate attacks on each graph according to the
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

Table 9: Dataset descriptions.


Dataset #Nodes #Edges #Classes #Features Method Description
Cora 2,708 5,429 7 1,433 BoW After stemming and removing stopwords, there is a
vocabulary of size 1,433 unique words. All words with
a document frequency of less than 10 were removed.
Citeseer 3,186 4,225 6 3,113 BoW Remove stopwords and words that appear fewer than
10 times in the document, then use a 0/1-valued word
vector to represent the presence of corresponding words
in the dictionary.
PubMed 19,717 44,338 3 500 TF-IDF Each publication in the dataset is described by a TF/IDF
weighted word vector from a dictionary which consists
of 500 unique words.
OGBN-Arxiv (subset) 14,167 33,520 40 128 skip-gram The embeddings of individual words are computed by
running the skip-gram model [28] over the MAG [39]
corpus.
OGBN-Product (subset) 12,394 29,676 47 100 BoW Node features are generated by extracting BoW features
from the product descriptions followed by a Principal
Component Analysis to reduce the dimension to 100.
TAPE-Arxiv23 46,198 78,548 40 300 word2vec The embeddings of individual words are computed by
running the word2vec model.

perturbation rate, and all the hyper-parameters in attack methods • RGCN: DeepRobust RGCN
are the same as the authors’ implementation. • Sim-PGCN: DeepRobust Sim-PGCN
For LLM4RGNN, we use Mistral-7B as our local LLM. Based on • ProGNN: ProGNN Repository
GPT-4, we construct approximately 26,000 instances for tuning • STABLE: STABLE Repository
LLMs and use the LoRA method to achieve parameter-efficient • HANG-quad: HANG-quad Repository
fine-tuning. To address the potential problem of label imbalance • GraphEdit: GraphEdit Repository
in training LM-based edge predictor, we select the 4000 node pairs
with the lowest cosine similarity to construct the candidate set. We
C.4 Computing Environment and Resources
set the hyper-parameters as follows: For local LLMs, when no pu-
rification occurs, the purification threshold 𝛽 is fixed at 2 to prevent The checkpoint of the local LLM and source code will be publicly
deleting too many edges; otherwise, it is set to 4. For LM-based edge available after the review. The implementation of the proposed
predictor, the threshold 𝛾 is tuned from {0.91, 0.93, 0.95, 0.97, 0.99} LLM4RGNN utilized the PyG module. The experiments are con-
and the number of edges 𝐾 is tuned from {1, 3, 5, 7, 9}. ducted in a computing environment with the following specifica-
To facilitate fair comparisons, we tune the parameters of vari- tions:
ous baselines using a grid search strategy. Unless otherwise speci- • OS: Linux ubuntu 5.15.0-102-generic.
fied, we adopt the default parameter setting in the author’s imple- • CPU: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz.
mentation. For GCN, we set the hidden size is 256 for the OGBN- • GPU: NVIDIA A800 80GB.
Arxiv, and 16 for other datasets. For GAT, the hidden size is 128
for the OGBN-Arxiv and 8 for other datasets. For RGCN and Sim-
PGCN, the hidden size is 256 for the OGBN-Arxiv and 128 for D MORE EXPERIMENT RESULTS
other datasets. For Sim-PGCN, we tune the weighting parameter D.1 Inductive Poisoning Atatck
𝜆 is searched from {0.1, 0.5, 1, 5, 10, 50, 100} and 𝛾 is searched from
{0.01, 0.1}. For Pro-GNN, we use the default hyper-parameter set- We further verify the generalization ability of LLM4RGNN under
tings in the authors’ implementation. For STABLE, we tune the inductive poisoning attacks. We conduct inductive experiments
Jaccard Similarity threshold 𝑡 1 from {0.0, 0.01, 0.02, 0.03, 0.04, 0.05}, with Mettack on the Cora and Citeseer datasets. Specifically, we
𝑘 is tuned from {1, 3, 5, 7, 11, 13}, 𝛼 is tuned from −0.5 to 3. For randomly split the data into training, validation, and test sets with a
HANG_quad, we tune the time from {3, 6, 8, 15, 20, 25}, the hidden 1:8:1 ratio. During training, we ensure the removal of test nodes and
size from {16, 32, 64, 128}, and dropout from {0.2, 0.4, 0.6, 0.8}. For their connected edges from the graph. We perform Mettack attacks
all experiments, we select the optimal hyper-parameters on the val- on the validation set, purify the attacked graph using LLM4RGNN,
idation set and apply them to the test set. The sources of baselines and use the purified graph to train GNNs. The trained GNNs are
are listed as follows: then predicted on the clean test set. As shown in Table 11, we only
report the baselines that support the inductive setting. Experimental
• GCN: DeepRobust GCN results show that under the inductive setting, LLM4RGNN not only
• GAT: DeepRobust GAT consistently improves the robustness of various GNNs but also
Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? Conference’17, July 2017, Washington, DC, USA

Table 10: Performance comparison between Mistral-7B and Llama3-8B against Mettack.
AdvEdge (↓) ACC (↑) w/o EP ACC (↑) Full
Dataset Ptb Rate Mistral-7B Llama3-8B Mistral-7B Llama3-8B Mistral-7B Llama3-8B
0% - - 83.78±0.38 83.72±0.30 84.13±0.33 84.34±0.56
Cora

5% 28(0.53%) 30(0.57%) 80.48±0.19 80.60±0.27 81.76±0.69 81.98±0.55


10% 60(1.14%) 63(1.19%) 79.83±0.43 79.96±0.39 81.80±0.76 81.77±0.44
20% 102(1.93%) 107(2.03%) 78.68±0.28 79.22±0.34 81.41±0.77 81.71±0.51
0% - - 73.39±0.69 73.41±0.65 74.20±0.56 73.84±0.50
Citeseer

5% 30(0.71%) 25(0.59%) 72.17±0.57 72.44±0.52 73.94±0.56 73.28±1.19


10% 55(1.30%) 47(1.11%) 71.50±0.59 71.32±0.58 73.62±0.39 73.20±0.68
20% 92(2.18%) 91(2.15%) 71.07±0.35 70.85±0.64 74.12±0.85 74.06±0.78

   


$FFXUDF\ 

$FFXUDF\ 

$FFXUDF\ 

$FFXUDF\ 
   
  

  
   
 
 
 
   
   
   
   


 


K 

 


K 

 


K 

 


K
       
       

(a) Clean Cora. (b) Cora-Mettack-10%. (c) Clean Citeseer. (d) Citeseer-Mettack-10%.

Figure 7: Analysis of the hyper-parameter 𝛾 and 𝐾.

surpasses the robust GCN framework HANG-quad, demonstrating Llama3-8B can effectively identify malicious edges, and the perfor-
its superior defensive capability in the inductive setting. mance gap between them is negligible.

Table 11: Node classification accuracy (% ± 𝜎) under Inductive Table 12: Against Mettack-20%, node classification accuracy
Mettack. The best results are in bold. (% ± 𝜎) of LLM4RGNN under different text quality.
Dataset Dataset
HANG-quad GCN GAT GCN GAT RGCN Sim-PGCN
Ptb Rate Ptb Rate
Vanilla LLM4RGNN Vanilla LLM4RGNN
0% 82.09 ± 0.40 84.36 ± 0.28 84.25 ± 0.16 84.35 ± 0.62 84.39 ± 0.61
0% 81.41 ± 0.77 81.00 ± 0.98 81.39 ± 0.44 81.20 ± 0.63
Cora

5% 75.85 ± 0.67 77.35 ± 1.56 83.17 ± 0.17 81.20 ± 0.93 83.68 ± 0.26 10% 80.41 ± 0.71 80.72 ± 0.49 80.54 ± 0.61 81.14 ± 0.47
Cora

10% 70.95 ± 1.26 72.79 ± 1.49 83.02 ± 0.34 78.55 ± 1.12 83.48 ± 0.49 20% 79.86 ± 0.42 80.12 ± 0.69 80.14 ± 0.80 80.75 ± 0.40
20% 58.81 ± 1.99 61.56 ± 2.64 82.69 ± 0.65 67.65 ± 2.69 83.14 ± 0.69 40% 79.54 ± 1.51 79.88 ± 1.18 80.07 ± 1.22 80.18 ± 1.11
0% 73.15 ± 0.94 73.76 ± 0.34 74.07 ± 0.51 73.84 ± 0.57 74.26 ± 0.69 0% 74.12 ± 0.85 73.94 ± 0.67 74.04 ± 0.70 73.70 ± 0.64
Citeseer

Citeseer

5% 71.50 ± 0.55 70.60 ± 0.56 73.35 ± 0.35 72.47 ± 0.82 73.25 ± 0.91 10% 73.21 ± 0.98 73.65 ± 0.71 73.10 ± 1.01 73.68 ± 0.82
10% 69.42 ± 0.64 67.64 ± 0.73 72.68 ± 0.41 71.11 ± 1.02 73.25 ± 0.40
20% 67.90 ± 1.21 61.35 ± 1.35 72.54 ± 0.36 65.76 ± 2.85 73.32 ± 0.52
20% 73.55 ± 0.66 73.02 ± 0.66 73.30 ± 0.93 73.84 ± 0.78
40% 72.67 ± 0.70 72.62 ± 1.09 73.18 ± 0.94 73.47 ± 0.55
D.2 More Hyper-parameter Sensitivity D.4 The Impact of Text Quality
We conduct more hyper-parameter experiments on the probability
Considering that LLM4RGNN relies on textual information for
threshold 𝜆 and the number of important edges 𝐾. The result is
reasoning, we further analyze the impact of node text quality on
shown in Figure 7, demonstrating that the performance of the pro-
the effectiveness of LLM4RGNN. Specifically, against the worst-case
posed LLM4RGNN is stable under various parameter configurations
scenario of 20% Mettack, we further add random text replacement
and consistently outperforms existing SOTA methods.
perturbations to the Cora and Citeseer datasets to reduce the text
quality of nodes. As shown in Table 12, experimental results show
D.3 Comparative Experiment between
that under 10%, 20%, and 40% text perturbations, the performance
Llama3-8B and Mistral-7B of LLM4RGNN only decreases by an average of 0.54%, 0.77%, and
In Section 5.3.2, the newly released Llama3-8B and Mistral-7B 1.14%, respectively. Its robustness consistently surpasses that of
achieved close performance. Considering that Llama3-8B is released existing robust GNN frameworks, demonstrating that LLM4RGNN
after all experiments are completed, we report additional compar- maintains superior robustness even with lower text quality. One
ative experiments between Mistral-7B and Llama3-8B. According possible explanation is that LLMs have strong robustness to text
to Table 10, we observe that both the well-tuned Mistral-7B and perturbations [2], and LLM4RGNN fully inherits this capability.
Conference’17, July 2017, Washington, DC, USA Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, and Chuan Shi

E CASE STUDY E.2 Cora (Mistral-7B)


In this section, we show some cases using GPT-4 and well-tuned
local LLMs (Mistral-7B) to infer the relationships between nodes. It User content: Node 1→Title: A Neural Network Model of
can be observed that the well-tuned Mistral-7B can achieve the edge Memory Consolidation\nAbstract: Some forms of memory
relation inference ability of GPT-4. They infer the edge relations and rely temporarily on a system of brain structures located in
provide analysis by discussing the background, problems, methods, the medial temporal lobe that includes the hippocampus.
and applications of two nodes. The recall of recent events is one task that relies crucially
on the proper functioning of this system. As the event
becomes less recent, the medial temporal lobe becomes
less critical to the recall of the event, and the recollection
E.1 TAPE-Arxiv23 (GPT-4) appears to rely more upon the neocortex. It has been pro-
posed that a process called consolidation is responsible
User content: Node 1→Title: when renewable energy for transfer of memory from the medial temporal lobe to
meets building thermal mass a real time load management the neocortex. We examine a network model proposed by
scheme\nAbstract: We consider the optimal power man- P. Alvarez and L. Squire designed to incorporate some of
agement in renewable driven smart building MicroGrid the known features of consolidation, and propose several
under noise corrupted conditions as a stochastic optimiza- possible experiments intended to help evaluate the perfor-
tion problem. We first propose our user satisfaction and mance of this model under more realistic conditions. Fi-
electricity consumption balanced (USECB) profit model nally, we implement an extended version of the model that
as the objective for optimal power management. We then can accommodate varying assumptions about the number
cast the problem in noise corrupted conditions into the of areas and connections within the brain and memory
class of expectation maximizing in stochastic optimization capacity, and examine the performance of our model on
problem with convex constraints. For this task, we design Alvarez and Squire’s original task.
a Bregemen projection based mirror decent algorithm as Node 2→Title: A Preliminary Investigation of Evolution as
an approximation solution to our stochastic optimization a Form Design Strategy\nAbstract: We describe the prelim-
problem. Convergence and upper-bound of our algorithm inary version of our investigative software, GGE Genera-
with proof are also provided in our paper. We then conduct tive Genetic Explorer, in which genetic operations interact
a broad type of experiment in our simulation to test the with Au-toCAD to generate novel 3D forms for the archi-
justification of our model as well as the effectiveness of tect. GGE allows us to asess how evolutionary algorithms
our algorithm. should be tailored to suit Architecture CAD tasks.
Node 2→Title: hiding sound in image by k lsb muta-
tion\nAbstract: In this paper a novel approach to hide Analysis: The two nodes represent research in different
sound files in a digital image is proposed and implemented domains: Node1 discusses a neural network model of mem-
such that it becomes difficult to conclude about the exis- ory consolidation in the brain, while Node2 focuses on a
tence of the hidden data inside the image. In this approach, software tool for generating 3D forms in architecture us-
we utilize the rightmost k-LSB of pixels in an image to ing evolutionary algorithms. There is no direct connection
embed MP3 sound bits into a pixel. The pixels are so cho- between the brain’s memory consolidation process and
sen that the distortion in image would be minimized due the evolution of 3D forms in architecture. The topics are
to embedding. This requires comparing all the possible unrelated, and the expertise required for each is distinct.
permutations of pixel values, which may would lead to Relevance Score: 1
exponential time computation. To speed up this, Cuckoo
Search (CS) could be used to find the most optimal solution.
The advantage of using proposed CS is that it is easy to
implement and is very effective at converging in relatively
less iterations/generations.

Analysis: The two nodes represent research in different


domains: Node1 discusses renewable energy and building
thermal mass in the context of smart grids, while Node2
focuses on hiding sound files in images using k-LSB mu-
tation and Cuckoo Search. There is no direct connection
between the optimization of renewable energy in buildings
and the embedding of sound files in images. The topics are
unrelated, and the research methods and applications are
distinct.
Relevance Score: 1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy