0% found this document useful (0 votes)
25 views37 pages

Chap7 GNN (20240229) - DL4H Practioner Guide

Uploaded by

thesupprter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views37 pages

Chap7 GNN (20240229) - DL4H Practioner Guide

Uploaded by

thesupprter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

i i

i i
“output” — 2024/3/4 — 6:48 — page 132 — #135

7 Graph Neural Network

Previously, we discussed how to use convolutional neural network or recurrent neural


network to encode regular grid-like data. However, many real-world data are irregular
and are represented as graphs. Specifically, graph is a data structure that consists of a
set of nodes and a set of edges connecting these nodes. Graph Neural Networks (GNNs)
are a specialized class of neural networks designed for processing such network data.
GNNs also have various healthcare domains encompassing diverse graphs, including
molecular representations [56, 57, 58], spatiotemporal networks [59], drug-drug inter-
action networks [60], protein-protein interaction networks [61], and gene expression
networks [62] and biomedical knowledge graphs and medical ontologies [63, 64]
A unique strength of GNNs lies in their capacity to model intricate relationships
among entities. Consider a medical network: its node could represent patients, diseases,
medications, or healthcare providers, while the edges encode various relations between
these concepts, such as "patient A has disease B", "drug A could interact with drug
B." Out of all deep learning models, GNNs are particularly good at understanding
the connectivity between entities and extracting nuanced information from graph-
structured data based on neighborhood similarity patterns.
One of the most exciting applications of Graph Neural Networks (GNNs) is in
drug discovery. GNNs provide a powerful tool for predicting the characteristics of
prospective drug candidates by representing a molecule as a graph, where nodes
represent atoms and edges represent chemical bonds. With this representation, GNNs
can predict important properties such as the solubility or toxicity of a new molecule,
thus expediting the drug discovery process [65]. This approach can streamline the
identification of promising compounds and is a promising area of research in drug
development.
In this chapter, we introduce different variants of graph neural networks, including
graph convolutional networks (GCN) [66], graph attention networks (GAT) [67], and
message passing neural networks (MPNN) [68]. We also present two hands-on GNN
applications in healthcare: 1) GNN application in drug property prediction, and 2)
GNN pipelines for clinical predictive tasks using the PyHealth package.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 133 — #136

7.1 Introduction to Graph Neural Networks 133

7.1 Introduction to Graph Neural Networks

Learning representations of graph structures, such as node embedding or edge em-


bedding, has found numerous applications in modeling real-world graphs, including
protein networks [69], social networks [70], and co-author networks [66]. Traditional
graph-based models use well-defined heuristics for learning graph node embeddings.
For example, DeepWalk [71] borrows ideas from random walk on graphs, LINE [72]
leverages local and global node proximity, and Node2Vec [73] relies on both breadth-
first search (BFS) and depth-first search (DFS) over the graphs.
In 2016, Kipf and Welling proposed graph convolutional networks (GCN) [74],
which elegantly inject the graph adjacency information into a neural network architec-
ture (simply aggregating node features by matrix product with adjacency matrix) and
show consistently better performance against traditional baselines.
Extending GCN, graph attention networks (GAT) proposed by Velickovic et al.
(2017) [67] dynamically learn edge weights of the networks. While GCN and GAT
learn embedding vectors of nodes not for edges, message-passing neural networks
(MPNN) developed by Gilmer et al. (2017) [68] provide a method to learn embedding
vectors for both nodes and edges.
In this chapter, we conduct an in-depth analysis of these models, accompanied by
an illustration depicted in Figure 7.1, aiming to facilitate a better understanding of
different neighborhood aggregation mechanisms.

Figure 7.1 Illustration of three graph neural network models

Graph convolution networks (GCN)


Kipf et al. [66] introduced graph convolution networks (GCN) to extend the concept
of image convolution (i.e., grid-like graph) to any arbitrary graph. In this approach,
a graph structure is defined as a set of nodes, denoted by N , where the number of
nodes is represented by |N | = N. Additionally, each node has a feature vector x 2 Rd ,
and we stack the feature vectors for all N nodes to form a matrix X 2 R N ⇥d . For
instance, the graph can represent a molecular graph where each node is an atom with a
d-dimensional feature vector (e.g., atomic number, atomic mass, valence, aromaticity,
etc.) and edges denote the chemical bonds.
Let us assume A 2 {0, 1} N ⇥N be the adjacency matrix of the graph structure, which

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 134 — #137

134 Graph Neural Network

indicates node connections: if node i and node j is connected in the graph, A[i, j] = 1,
otherwise 0. Additionally, we assume E as the edge set of the graph. For example,
(i, j) 2 E if A[i, j] = 1, which means the there is an edge between node i and j. Note
that GNN works mostly with undirected graphs, i.e., A[i, j] = 1 implies A[ j, i] = 1,
and thus A is symmetric.
Example. For certain prediction tasks, such as forecasting the COVID-19 cases
at every county in the US, traditional DNN models might treat each county as an
independent training sample and build a multi-layer model to predict the counts based
on county-level features. Differently, GCN models could leverage the spatio-temporal
connections between counties. It first connects nearby counties together to form an
adjacency graph A since they are geographically similar and thus share similar COVID
responses. Then, GCN models will leverage the graph structure A to aggregate the
features among nearby counties to forecast the target.
Formally, one graph convolution layer could be defined as,
1)
H(t) = ReLU(ÃH(t W(t) ), (7.1)
where the initial hidden embeddings, H(0) = X, are the county-level features. Ã repre-
sents the normalized adjacency matrix (we will discuss it below) and W(t) represents
the layer-wise parameter matrix (t is the layer index). In comparison, the layer-wise
propagation of a simple DNN can be represented as H(t) = ReLU(H(t 1) W(t) ), and
the difference is the multiplication of the adjacency matrix. Essentially, GNN adds
dependency between different counties when making the predictions.

Normalizing the graph adjacency matrix For numerical stability purposes, the graph
learning algorithm usually requires a normalized adjacency matrix. There are two
common ways of normalizing the adjacency matrix: random walk normalization and
symmetric normalization.
The matrix A is symmetric by definition. Researchers often add a self-loop to
connect a node with itself and improve the numeric stability of graph operations. This
is achieved by setting A[i, i] = 1, 8 i, which is equivalent to adding an identity matrix
to the adjacency.

A A + I. (7.2)
Random Walk Adjacency Normalization: The first type is named random walk
normalization, which calculates the degree of each node, resulting into a matrix D 2
N N ⇥N . D is a diagonal matrix and each element is the row sum of the self-looped
adjacency matrix.

N
dii = ai j . (7.3)
j=1

Then, the random walk normalization will normalize A over each row, making the
row sum become 1, which aligns with the concept of random walk transition matrix

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 135 — #138

7.1 Introduction to Graph Neural Networks 135

in Markov process (will discuss that in Chapter 9). Thus, its name is random walk
normalized adjacency Ãrw .
Ãrw = D 1 A. (7.4)
Example: Consider a simple graph with the following adjacency matrix:

©0 1 1™
A = ≠≠1 0 0ÆÆ̈
´1 0 0
Adding the self-loop gives:

©1 1 1™
A = ≠≠1 1 0ÆÆ̈
´1 0 1
Calculating the degree matrix D, we have:

©3 0 0™
D = ≠≠0 2 0ÆÆ̈
´0 0 2

Then, the random walk normalized adjacency Ãrw is:


1 1 1
©3 3 3™
Ãrw = D A = ≠≠ 12 12 0 ÆÆ̈
1
1 1
´2 0 2
Symmetric Adjacency Normalization: The second type uses the same degree matrix
D, however, on both side of A to keep the output matrix symmetric.
1 1
Ãsym = D 2 AD 2 . (7.5)
Example: Continuing from the previous example with the self-looped adjacency ma-
trix:
©1 1 1™
A = ≠≠1 1 0ÆÆ̈
´1 0 1
and the degree matrix:

©3 0 0™
D = ≠≠0 2 0ÆÆ̈
´0 0 2

The symmetrically normalized adjacency Ãsym is calculated as follows:


q q
1
© 3 q0 0 ™ 1 1 1 © 13 0 0 ™
≠ Æ© ™≠ q Æ
1 1

Ãsym = D 2 AD 2 = ≠ 0 1 Æ ≠ Æ
0 Æ ≠1 1 0Æ̈ ≠ 0 ≠ 1
0 ÆÆ
≠ 2 q Æ̈ ≠ 2 q Æ̈
1 ´1 0 1 1
´ 0 0 2 ´ 0 0 2

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 136 — #139

136 Graph Neural Network

q q
1 1 1
©q3 6™
≠ Æ
6
= ≠≠ 16 1
0 ÆÆ
≠q 2
Æ̈
1 1
´ 6 0 2

In fact, the Equation (7.1) can also be re-written into node-level updating formula (with
self-loop and random walk normalization applied to the adjacency matrix),
1)
Z(t) = H(t W(t) , (7.6)
(t) 1
↵uv = , 8(u, v) 2 E, (7.7)
|N (u)| + 1

© ’ (t) (t) ™
h(t)
u = ReLU ≠ ↵uv zv Æ̈ . (7.8)
´ v 2N(u)

Here, N (u) means the neighborhood set of node u.


• Equation (7.6) is the same as DNN model, which is a linear transformation.
• Equation (7.7) is the weight score for all edges connected to node u. With the random
walk normalization, the GCN model treats each edge (within the neighborhood of
1
node u) equally and consistently use |N(u)|+1 (the denominator is the number of
(t)
neighbors of u |N (u)| plus 1) as the weight. Essentially, ↵uv is the (u, v) element
in the inversed degree matrix (after adding self-loop) D . 1

• Equation (7.8) takes the weighted sum of all the neighborhood set (including the
node u itself) and applies the ReLU function to add non-linearity.
The PyTorch implementation of GCN model could be found below. Readers might
refer to this repository 1 as well, where we provide a notebook on applying the GCN
model to the Zachary’s karate club dataset 2 .
1 class GraphConvolutionLayer(nn.Module):
2 def __init__(self, in_features, out_features):
3 super(GraphConvolutionLayer, self).__init__()
4 self.transform = nn.Linear(in_features, out_features)
5

6 def forward(self, adjacency_matrix, input_features):


7 """
8 adjacency_matrix: adjacency matrix, N x N
9 input_features: node feature matrix, N x d_i
10 """
11 # apply the adjacency matrix first
12 adj_X = adjacency_matrix @ input_features
13 output = self.transform(adj_X)
14 return output
15

16 class GCN(nn.Module):
17 def __init__(self, num_features, hidden_dim, num_classes):
18 super(GCN, self).__init__()

1 https://github.com/sunlabuiuc/pyhealth-book/tree/main/chap7-GNN/notebook
2 https://en.wikipedia.org/wiki/Zachary’s_karate_club

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 137 — #140

7.1 Introduction to Graph Neural Networks 137

19 # define two graph covolutional layers


20 self.layer1 = GraphConvolutionLayer(num_features, hidden_dim)
21 self.layer2 = GraphConvolutionLayer(hidden_dim, num_classes)
22

23 def forward(self, adjacency_matrix, features):


24 # add relu between two GCN layers
25 h = F.relu(self.layer1(adjacency_matrix, features))
26 output = self.layer2(adjacency_matrix, h)
27 return output
28
29 model = GCN(num_features=10, hidden_dim=128, num_classes=1)

• GraphConvolutionLayer class: This class defines a single graph convolutional


layer. It takes in the number of input features and the number of output features
as parameters. Inside the constructor (__init__), it initializes a linear transfor-
mation (self.transform) which will be applied to the input features. In the
forward method, it takes an adjacency matrix and input features as input, per-
forms the graph convolution operation by multiplying the adjacency matrix with
the input features, and then applies the linear transformation to the result.
• GCN class: This class represents the entire graph convolutional network. It takes
the number of input features, hidden dimension size, and the number of output
classes as parameters. In the constructor, it initializes two graph convolutional
layers (GraphConvolutionLayer) with appropriate input and output feature
dimensions. In the forward method, it performs the forward pass through the
network. It first applies the first graph convolutional layer followed by a ReLU
activation function. Then, it applies the second graph convolutional layer to
produce the final output.
GCN for Directed Graphs. For directed graphs, the adjacency matrix A is not neces-
sarily symmetric: the presence of an edge from node i to node j does not guarantee an
edge in the opposite direction. To adapt GCN to directed graphs, we can, for example,
(1) use different normalized adjacency matrices for inbound and outbound edges, apply
separate convolutional operations for each, and then combine the results; or (2) include
edge features in the convolutional operation. Both modifications can enable the GCN
to capture the directional properties of the edges.

Graph attention networks (GAT)


GAT model is invented by Velivckovic et al. [67], which incorporates the attention
mechanism (we discussed in Chapter 5) into GCN model. The main difference is
that GCN implicitly treats all edges as equal while GAT learns to use the attention
mechanism between two connected nodes to re-weight the edges. The attention score
will be used as the edge importance in graph convolution. The layer-wise propagation

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 138 — #141

138 Graph Neural Network

in GAT can be formulated similarly in the node level as,


1)
Z(t) = H(t W(t) , (7.9)
⇣h i ⌘
(t)
euv = LeakyReLU z(t) (t)
u ||zv ✓
(t)
, 8(u, v) 2 E, (7.10)
⇣ ⌘
(t)
exp euv
(t)
↵uv = Õ ⇣ ⌘, (7.11)
(t)
k 2N(u) exp euk

© ’ (t) (t) ™
u = ReLU ≠
h(t) ↵uv zv Æ̈ . (7.12)
´v 2N(u)
Here, both the layer-wise weight matrix W(t) and the attention weight ✓ (t) are parame-
ters.
• Equation (7.10) calculates the attention score using a non-linear neural network (the
third approaches in Section 5.2.3). Basically, we concatenate the embedding of
two connected nodes, u and v, and then apply a linear transformation with a
LeakyReLU activation.
• Equation (7.11) applies the Softmax activation on the attention score, ensuring that
the normalized attention scores sum up to 1 for the neighborhood of node u.
In GCN, these two steps are merged into one with equal weights in the same
neighborhood.
• Equation (7.12) similarly takes the weighted sum of all the neighborhood set (in-
cluding the node u itself) and applies the ReLU function to add non-linearity.
The PyTorch implementation of GAT is presented below, and we provide the note-
book showing its application on Karate dataset in this repository 3 .
1 class GraphAttentionLayer(nn.Module):
2 def __init__(self, in_features, out_features):
3 super(GraphAttentionLayer, self).__init__()
4 self.in_features = in_features
5 self.out_features = out_features
6

7 self.W = nn.Linear(in_features, out_features)


8 self.a = nn.Linear(2*out_features, 1)
9 self.leakyrelu = nn.LeakyReLU(0.05) # leaky relu has a hyper param
10
11 def forward(self, adj, X):
12 """
13 adj: adjacency matrix, N x N
14 X: node feature matrix, N x d_i
15 """
16 # step 1
17 h = torch.mm(X, self.W)
18 N, _ = h.shape
19

20 # step 2

3 https://github.com/sunlabuiuc/pyhealth-book/tree/main/chap7-GNN/notebook

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 139 — #142

7.1 Introduction to Graph Neural Networks 139

21 # attention input: concatenating embedding of node u and node v


22 a_input = torch.concat([h.unsqueeze(1).repeat(1, N, 1), \
23 h.unsqueeze(0).repeat(N, 1, 1)], 2)
24 # calculate attention scores
25 e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))
26 # add mask to attention matrix
27 zero_vec = -1e15*torch.ones_like(e)
28 attention = torch.where(adj > 0, e, zero_vec)
29 # apply softmax to attention
30 attention = F.softmax(attention, dim=1)
31

32 # step 3
33 # use attention to reweight the nodes and sum them up
34 h = torch.matmul(attention, h)
35 return h
36
37

38 class GAT(nn.Module):
39 def __init__(self, num_features, hidden_dim, num_classes=4):
40 super(GAT, self).__init__()
41 self.layer1 = GraphAttentionLayer(num_features, hidden_dim)
42 self.layer2 = GraphAttentionLayer(hidden_dim, num_classes)
43

44 def forward(self, adjacency_matrix, features):


45 h = F.relu(self.layer1(adjacency_matrix, features))
46 output = self.layer2(adjacency_matrix, h)
47 return output
48

49 model = GAT(num_features=10, hidden_dim=128, num_classes=1)

The main difference lies in the GraphAttentionLayer class. This class defines a
single graph attention layer. It takes in the number of input features and the number
of output features as parameters. Inside the constructor (__init__), it initializes
parameters for linear transformations (self.W and self.a) which will be applied
to the input features. Additionally, it initializes a leaky ReLU activation function
(self.leakyrelu). In the forward method, it takes an adjacency matrix (adj) and
input features (X) as input. It first applies a linear transformation to the input features
(h = torch.mm(X, self.W), step 1). Then, it calculates attention scores using
a learned attention mechanism (step 2). It concatenates embeddings of node pairs,
calculates attention scores, applies a mask to the attention matrix, applies a softmax
function to get attention weights. Finally, it reweights the nodes based on attention
weights (step 3).

Message Passing Neural Network (MPNN)


MPNN is a generalized framework of GCN, GAT, proposed in [68]. It unifies different
graph convolution variants into two stages: (i) aggregate message from neighbors; (ii)
use message to update the node representation.
⇣ ⌘
m(t)
u = Aggregate
(t)
h(tu 1) , {h(tv 1) : v 2 N (u)} , 8u 2 V, (7.13)
⇣ ⌘
h(t)
u = Update
(t)
h(tu 1) , m(t)
u . (7.14)

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 140 — #143

140 Graph Neural Network

• Equation (7.13) take the last-layer embedding from neighborhood set of node u and
aggregated them into a message m(t) u . For example, the aggregation function in
GCN is an equally weighted sum while the aggregation in GAT is an attention
sum.
• Then, Equation (7.14) take the message m(t) u as well as the previous embedding of
node u as two inputs and update the current embedding by an update function.
For example, the update function in both GCN and GAT is adding the previous
embedding to the message and then apply a non-linear transformation to be the
updated embedding h(t) u .

The pytorch implemention of various MPNN models could refer to this repository 4 .
Below, we show one type of MPNN implementation from a recent drug recomendation
paper [75] (refer to Equation 7 and Equation 8 of the paper), which uses MPNN for
drug molecule graph representation. More concrete applications on Karate datasets
could be found in this repository 5 .
1 class MessagePassingLayer(nn.Module):
2 def __init__(self, in_features, out_features):
3 super(MessagePassingLayer, self).__init__()
4 self.message_passing = nn.Linear(2 * in_features, out_features)
5 self.read_out = nn.Linear(out_features, out_features)
6

7 def forward(self, adj, X):


8 """
9 adj: adjacency matrix, N x N
10 X: node feature matrix, N x d_i
11 """
12 N, _ = X.shape
13 # combine features of node u and node v
14 # X.unsqueeze(1).repeat(1, N, 1): N x N x d_i, repeat X at dimension 1
15 # X.unsqueeze(0).repeat(N, 1, 1): N x N x d_i, repeat X at dimension 0
16 # Z: N x N x 2d_i, features of every node pair
17 Z = torch.concat([X.unsqueeze(1).repeat(1, N, 1), \
18 X.unsqueeze(0).repeat(N, 1, 1)], 2)
19 # apply message passing to aggregate their features
20 # Z: N x N x 2d_i -> N x N x d_o
21 Z = self.message_passing(Z)
22 # apply graph convolution to sum the embedding of (u,v) pairs
23 # N x N x d_o, N x N -> N x N x d_o
24 # use Einstein summation to perform a batch matrix multiplication
25 Z = torch.einsum("abc,ad->dbc", Z, adj)
26 # N x d_o
27 # extract the diagonal elements from the tensor Z
28 Z = Z[torch.arange(N), torch.arange(N)]
29 # apply the read out function
30 # the updated representations for each node after message passing
31 H = self.read_out(torch.relu(Z))
32 return H
33
34

4 https://github.com/priba/nmp_qc
5 https://github.com/sunlabuiuc/pyhealth-book/tree/main/chap7-GNN/notebook

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 141 — #144

7.2 Advancements in Graph Neural Networks 141

35 class MPNN(nn.Module):
36 def __init__(self, num_features, hidden_dim, num_classes=4):
37 super(MPNN, self).__init__()
38 self.layer1 = MessagePassingLayer(num_features, hidden_dim)
39 self.layer2 = MessagePassingLayer(hidden_dim, num_classes)
40

41 def forward(self, adjacency_matrix, features):


42 h = F.relu(self.layer1(adjacency_matrix, features))
43 output = self.layer2(adjacency_matrix, h)
44 return output
45

46 model = MPNN(num_features=10, hidden_dim=128, num_classes=1)

Essentially, GCN, GAT, MPNN, and their variants are fundamental tools for learning
graph node embeddings, and the embedding vectors could be used for various different
purposes, such as node classification, node value regression, edge classification, graph
classification, etc.

7.2 Advancements in Graph Neural Networks

Deep learning on graphs has made much exciting progress in both practical deploy-
ments and various application domains. This section will cover some advancements in
graph neural network research.

7.2.1 Improvements on GNN training


To apply common GNN models (including GCN, GAT, etc) on real-world graphs,
some issues require attention from researchers.
• Neighborhood sampling. First, the real-world graph is often large (tens of millions
of nodes on the graph), and each node could connect to many edges (e.g., a dense
graph). Applying graph convolution on such graphs is expensive and sometimes
unnecessary. To approximate the convolution effect from all neighbors, Graph-
SAGE [76] leverages Monte-Carlo methods and samples a small subset of the
neighbors via edge sampling to calculate the neighborhood aggregation effects.
Various experiments have shown that this method could greatly reduce the com-
putation head of graph convolution while providing similar outcomes.
• Distributed graph training. GraphSAGE could alleviate the dense graph issue to
some extent. However, some real-world graphs could be huge (such as social
networks on Facebook) and may not fit into the memory of a single machine, or
a single party could not access the whole graph structure. To address this issue,
researchers have proposed methods of distributed graph learning at scale [77, 78]
that decompose the large graph into different subgraphs and run graph learning
algorithms in a distributed way. Then, the model broadcasts the gradients from
other machines to sync the global parameters.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 142 — #145

142 Graph Neural Network

• Learning on heterogeneous graphs. Real graphs (such as medical concept graphs)


can contain different types of concepts (such as patient node, medication node,
diagnosis node) and various relations among nodes. In contrast, standard graph
learning algorithms can only be applied to simple graphs. Thus, recent researchers
also propose graph learning algorithms [79, 80, 81, 82] on these heterogeneous
graphs for learning embeddings of different types of entities.

7.2.2 Applications of GNN in Different Domains


GNN methods have also benefited many other fields exposed to structured data, such
as computer vision and natural language processing.

GNN in computer vision


In computer vision, researchers use graph neural networks to build the prediction
models on various data modalities, such as 2D natural images [83], videos [84], 3D
graphics data [85], vision + language [86], as well as medical images [87]. Recent
survey papers [88, 89] provide a holistic view of GNN applications in vision.
In 2D images, the graph could be constructed from various perspectives, such as (i)
label space [83], where each node represents a label (using word embedding as initial
features) and their inter-dependencies as edges; (ii) coordinate space graph [90]; (iii)
region graphs [91] considering spatial and temporal similarities; (iv) patch graphs [92]
from the original image by splitting it into patches (such as 4-by-4) and then connect
the patches into graphs via content similarity or local similarity.
Similarly, for 3D videos, various types of graphs have been constructed, such as (i)
[84] connects all objects in the frames as a space-time region graph while the edges are
based on spatial-temporal relations in consecutive frames; (ii) [93] connects joints in
the same frame according to the natural connectivity in the human skeleton as a graph;
(iii) [94] uses object detection graph to encode its temporal structure and predicts
whether an edge is active or non-active to obtain the connected edges.
In medical imaging, different types of graphs are built that could improve analysis of
human brain activities, such as (i) [95] divides the brain images into regions while the
inter-region connections can be viewed as graph edges; (ii) [87] connects the imaging
features of different subjects as vertices in the graph for accessing brain analysis
in population-level; (iii) [96] uses Transformer to process the dynamic brain graph
constructed from spatial-temporal attention.

GNN in natural language processing (NLP)


In natural language processing (NLP), research on graph neural works could be de-
composed into two parts, as summarized in this survey paper [97].
First, many methods have been proposed for graph construction, including construct-
ing static graphs [98] (such as dependency graphs [99], AMR graphs [100], similarity
graphs, co-occurrence graphs, co-reference graphs [101], constituency graphs), dy-
namic graphs [102] (such as node embedding based similarity graphs, attention-based
similarity graphs [35], cosine similarity graphs, structured-aware similarity graphs

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 143 — #146

7.3 Graph Neural Networks for Drug Property Prediction 143

[103]), or hybrid graphs for the application of GNNs, and the nodes in the graphs
could be homogeneous [104] (such as molecular graphs with atom as nodes), hetero-
geneous [81] (such as disease-drug bipartite graphs) or multi-relational [105] (EHR
knowledge graphs contain patients, drugs, diagnoses codes, etc).
Second, with these diverse graphs, GNN models have benefited various applica-
tions in many different problem settings, such as graph-to-sequence [106], graph-to-
tree [107], graph-to-graph translations [108], and various NLP tasks, such as natural
language generation [103], questions answering [109], information extraction [110],
knowledge graph reasoning [111], etc.

7.3 Graph Neural Networks for Drug Property Prediction

The structural composition of drug molecules plays a pivotal role in determining


their properties, as specific molecular arrangements are intricately linked to distinct
characteristics. For example, particular molecular structures may bolster efficacy, while
others might influence factors such as toxicity or bioavailability.
GNNs are good at leveraging the intricate representations inherent in molecular
graph structures, showcasing promise in predicting diverse drug properties, as evi-
denced by notable studies [112, 113, 114]. Typically, leveraging GNNs for predicting
the properties of drug molecules involves a two-stage procedure:
• Node Embedding: Utilizing multiple graph convolutional or attention layers, GNNs
generate node embeddings that encapsulate the structural characteristics of the
molecule. These embeddings are designed to capture hierarchical representations
by discerning the molecular context at various scales.
• Property Prediction: Post-generation of node embeddings, prior studies typically
implement a pooling layer to aggregate these embeddings into the overall molec-
ular graph embedding. Subsequently, a final one or two-layer neural network
is employed on top of the graph embedding to forecast whether the molecule
exhibits specific properties. This prediction task commonly assumes a binary
classification framework, aiming to discern the presence or absence of specific
properties within the molecule.
This section will show a concrete pipeline of applying GNNs on molecular structures
presented as SMILES strings to predict their molecule properties. The dataset is about
SARS coronavirus 3C-like protease, and we will build a model to predict whether the
molecule is active to 3C-like protease.

7.3.1 Introduction of AID1706 SARS CoV 3CL Dataset


SARS-CoV identified in 2003 causes severe acute respiratory syndrome (SARS). The
virus encodes a polypeptide that is processed by two main proteases, one being the
3C-like protease (3CLpro). This protease is essential for viral gene expression and is
a potential target for inhibitors that could inhibit SARS-CoV replication. Studies have

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 144 — #147

144 Graph Neural Network

shown that 3CLpro is essential for the viral life cycle, making it a critical pathogenic
component of SARS-CoV. The AID1706 SARS Cov 3CL dataset 6 records tens of
thousands of molecules that could potentially be active to 3CLpro.
We load this dataset from DeepPurpose package 7 , which contains 26640 molecules
represented as SMILES strings and each is associated with an "active or not" binary
labels. For convenience, we only use the first 1000 SMILES to build the training and
test datasets.
1 from DeepPurpose.dataset import *
2

3 # load AID1706 Assay Data


4 X_drugs, _, y = load_AID1706_SARS_CoV_3CL()
5

6 print (len(X_drugs), len(y))


7 """
8 26640, 26640
9 """
10
11 # we use the first 1000 molecules for this demo
12 X_drugs = X_drugs[:1000]
13 y = y[:1000]

Let us dive deeper and show what the dataset looks like:
1 # We look at the first 10 SMILES strings
2 print (X_drugs[:10])
3 """
4 [’CC1=C(SC(=N1)NC(=O)COC2=CC=CC=C2OC)C’ ’CC1=CC=C(C=C1)C(=O)NCCCN2CCOCC2’
5 ’CSC1=CC=C(C=C1)C(=O)NC2CCSC3=CC=CC=C23’
6 ’CCOC(=O)N1CCC(CC1)N2CC34C=CC(O3)C(C4C2=O)C(=O)NC5=CC=C(C=C5)C’
7 ’CC1=CC(=NN1C(=O)C2=CC(=CC(=C2)[N+](=O)[O-])[N+](=O)[O-])C’
8 ’CC1=CC=C(C=C1)C(=O)CSC2=NN=C(N2CC3=CC=CO3)CNC4=C(C=C(C=C4)C)C’
9 ’CC(C1=CC(=C(C=C1)Cl)Cl)NC(=O)CCl’
10 ’CCOC(=O)CN1CC23C=CC(O2)C(C3C1=O)C(=O)NC4=CC5=C(C=C4)OCO5’
11 ’COC(=O)C1=CC=C(C=C1)COC(=O)C2=CC(=C(N=C2)Cl)Cl’
12 ’C1=CC=C2C(=C1)C=C(C(=O)O2)C3=C(C=C(C=C3)NC(=O)CC4=CC=C(C=C4)Cl)Cl]
13 """
14

15 # We look at the first 10 labels


16 print (y[:10])
17 """
18 [0 0 0 0 0 1 1 0 0 0]
19 """

7.3.2 Dataset Split and Processing


Given the clean SMILES strings and their labels, we would like to process each
SMILES string into a molecule graph (get the atoms as nodes atom bonds as edges
to extract the adjacency graph) and then learn the graph representations to predict the
final binary label.
6 https://pubchem.ncbi.nlm.nih.gov/bioassay/1706
7 https://github.com/kexinhuang12345/DeepPurpose

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 145 — #148

7.3 Graph Neural Networks for Drug Property Prediction 145

First, we combine all SMILES and labels to be a pandas.DataFrame and split the
data into training and test portions by 80% and 20%.
1 # split the dataset into 80%: 20%
2

3 data = pd.DataFrame(np.stack([X_drugs, y]).T, columns = ["SMILES", "Label"])


4 train, test = data.iloc[:int(len(data)*0.8)], data.iloc[int(len(data)*0.8):]
5

6 print (train)
7

8 """
9 SMILES Label
10 0 CC1=C(SC(=N1)NC(=O)COC2=CC=CC=C2OC)C 0
11 1 CC1=CC=C(C=C1)C(=O)NCCCN2CCOCC2 0
12 2 CSC1=CC=C(C=C1)C(=O)NC2CCSC3=CC=CC=C23 0
13 3 CCOC(=O)N1CCC(CC1)N2CC34C=CC(O3)C(C4C2=O)C(=O)... 0
14 4 CC1=CC(=NN1C(=O)C2=CC(=CC(=C2)[N+](=O)[O-])[N+... 0
15 .. ... ...
16 795 COC1=CC=CC=C1N(CC(=O)NC2=CC(=C(C=C2)Cl)C(=O)OC... 1
17 796 COC1=C(C=C(C=C1Cl)C(=O)NN)Cl 1
18 797 C1CC(C2=CC=CC=C2C1)NC(=O)CCC(=O)N3CCN(CC3)S(=O... 0
19 798 CC(=O)NC1=CC=C(C=C1)N(C(C2=CC=C(C=C2)OC)C(=O)N... 1
20 799 CC(C)N(CC1=CC=CC=C1)CC(COC2=CC=CC3=C2C(=CN3)CC... 0
21

22 [800 rows x 2 columns]


23 """

To process the SMILES strings, we use the rdkit package 8 . The main function below
is the create_dataset function, which takes the training or test datasets in and loops
over the SMILES strings. Within the for-loop, smiles is the SMILES string of the
current molecule, and property is the label.
1 def create_dataset(data_in, radius=2):
2 dataset = []
3
4 for smiles, property in data_in.values:
5 try:
6 """Create each data with the above defined functions."""
7 mol = Chem.AddHs(Chem.MolFromSmiles(smiles))
8 atoms = create_atoms(mol, atom_dict)
9 molecular_size = len(atoms)
10 i_jbond_dict = create_ijbonddict(mol, bond_dict)
11 fingerprints = extract_fingerprints(radius, atoms, i_jbond_dict,
12 fingerprint_dict, edge_dict)
13 adjacency = Chem.GetAdjacencyMatrix(mol)
14

15 """Transform the above each data of numpy


16 to pytorch tensor on a device (i.e., CPU).
17 """
18 fingerprints = torch.LongTensor(fingerprints)
19 adjacency = torch.FloatTensor(adjacency)
20
21 dataset.append((fingerprints, adjacency, molecular_size, \
22 int(property)))

8 https://www.rdkit.org/

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 146 — #149

146 Graph Neural Network

23 except:
24 pass
25

26 return dataset

We use Chem.AddHs and Chem.MolFromSmiles to transform the SMILES string


to molecule format. Then, we use the create_atoms, create_ijbonddict, and
extract_fingerprints functions to extract the nodes, edges, and initial node fea-
tures (we use molecule fingerprints) for each molecule graph. The adjacency matrix
is created by the Chem.GetAdjacencyMatrix function. Finally, the fingerprints as
features, adjacency matrix, molecule size, and the final label are combined as a tuple
to be appended into dataset list.
The try-except term here will prevent some erroneous SMILES strings that
cause issues for Chem module. All the auxiliary functions (create_atoms,
create_ijbonddict, and extract_fingerprints) are defined below.
1 from collections import defaultdict
2 import numpy as np
3 from rdkit import Chem
4 import torch
5

6
7 def create_atoms(mol, atom_dict):
8 """Transform the atom types in a molecule (e.g., H, C, and O)
9 into the indices (e.g., H=0, C=1, and O=2).
10 Note that each atom index considers the aromaticity.
11 """
12 atoms = [a.GetSymbol() for a in mol.GetAtoms()]
13 for a in mol.GetAromaticAtoms():
14 i = a.GetIdx()
15 atoms[i] = (atoms[i], ’aromatic’)
16 atoms = [atom_dict[a] for a in atoms]
17 return np.array(atoms)
18

19

20 def create_ijbonddict(mol, bond_dict):


21 """Create a dictionary, in which each key is a node ID
22 and each value is the tuples of its neighboring node
23 and chemical bond (e.g., single and double) IDs.
24 """
25 i_jbond_dict = defaultdict(lambda: [])
26 for b in mol.GetBonds():
27 i, j = b.GetBeginAtomIdx(), b.GetEndAtomIdx()
28 bond = bond_dict[str(b.GetBondType())]
29 i_jbond_dict[i].append((j, bond))
30 i_jbond_dict[j].append((i, bond))
31 return i_jbond_dict
32

33

34 def extract_fingerprints(radius, atoms, i_jbond_dict,


35 fingerprint_dict, edge_dict):
36 """Extract the fingerprints from a molecular graph
37 based on Weisfeiler-Lehman algorithm.
38 """

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 147 — #150

7.3 Graph Neural Networks for Drug Property Prediction 147

39

40 if (len(atoms) == 1) or (radius == 0):


41 nodes = [fingerprint_dict[a] for a in atoms]
42

43 else:
44 nodes = atoms
45 i_jedge_dict = i_jbond_dict
46

47 for _ in range(radius):
48
49 """Update each node ID considering its neighboring nodes and edges.
50 The updated node IDs are the fingerprint IDs.
51 """
52 nodes_ = []
53 for i, j_edge in i_jedge_dict.items():
54 neighbors = [(nodes[j], edge) for j, edge in j_edge]
55 fingerprint = (nodes[i], tuple(sorted(neighbors)))
56 nodes_.append(fingerprint_dict[fingerprint])
57
58 """Also update each edge ID considering
59 its two nodes on both sides.
60 """
61 i_jedge_dict_ = defaultdict(lambda: [])
62 for i, j_edge in i_jedge_dict.items():
63 for j, edge in j_edge:
64 both_side = tuple(sorted((nodes[i], nodes[j])))
65 edge = edge_dict[(both_side, edge)]
66 i_jedge_dict_[i].append((j, edge))
67

68 nodes = nodes_
69 i_jedge_dict = i_jedge_dict_
70
71 return np.array(nodes)

Finally, we could use this create_dataset function to process the training


and test sets. The atom_dict, bond_dict, fingerprint_dict, and edge_dict
are different dictionaries that are shared in training and test sets, while only the
fingerprint_dict is used to calculate the number of fingerprint, which will be
used to initialize the learnable tables for initial atom features.
1 """Initialize x_dict, in which each key is a symbol type
2 (e.g., atom and chemical bond) and each value is its index.
3 """
4 atom_dict = defaultdict(lambda: len(atom_dict))
5 bond_dict = defaultdict(lambda: len(bond_dict))
6 fingerprint_dict = defaultdict(lambda: len(fingerprint_dict))
7 edge_dict = defaultdict(lambda: len(edge_dict))
8
9 dataset_train = create_dataset(train[["SMILES", "Label"]])
10 dataset_test = create_dataset(test[["SMILES", "Label"]])
11
12 N_fingerprints = len(fingerprint_dict)

To this end, let us look at the summary of the first two molecules in the training set,
following the example below, which includes:

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 148 — #151

148 Graph Neural Network

• The atom index of the molecule (in tensor form), which will be used to index the
learnable atom embeddings in the later molecule graph neural network class.
• Adjacency matrix of the molecular graph (in the tensor form).
• Molecule size (such as 36).
• Molecule property label (such as 0).

1 # look at the first 2 data points in training


2 # they are at (fingerprints, adjacency, molecular_size, property) structure
3 print (dataset_train[:2])
4

5 """
6 [(tensor([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 29, 28,
7 31, 32, 33, 34, 34, 34, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 34, 34, 34]),
8 tensor([[0., 1., 0., ..., 0., 0., 0.],
9 [1., 0., 1., ..., 0., 0., 0.],
10 [0., 1., 0., ..., 0., 0., 0.],
11 ...,
12 [0., 0., 0., ..., 0., 0., 0.],
13 [0., 0., 0., ..., 0., 0., 0.],
14 [0., 0., 0., ..., 0., 0., 0.]]), 36, 0),
15

16 (tensor([46, 47, 48, 48, 49, 48, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
17 58, 57, 34, 34, 34, 37, 37, 37, 37, 60, 61, 61, 62, 62, 61, 61, 61, 61, 36,
18 36, 36, 36, 61, 61]),
19 tensor([[0., 1., 0., ..., 0., 0., 0.],
20 [1., 0., 1., ..., 0., 0., 0.],
21 [0., 1., 0., ..., 0., 0., 0.],
22 ...,
23 [0., 0., 0., ..., 0., 0., 0.],
24 [0., 0., 0., ..., 0., 0., 0.],
25 [0., 0., 0., ..., 0., 0., 0.]]), 41, 0)]
26 """

7.3.3 Graph neural networks for molecules


Next, we define our graph neural networks (GNN) for the molecules, which are repre-
sented as graphs, with atoms as nodes and chemical bonds as edges. We assume the
atom set (node set) as N , while |N | = N is the number of nodes in the molecular
graph. We assume the bond set (edge set) as E, and any two connected atoms u, v 2 N
form a bond (u, v) 2 E. Each atom and bond in the molecule graph possesses asso-
ciated features, such as atom type, solubility, bond type, electronegativity, or other
physicochemical properties.
Assuming the initial node embeddings are represented as H(0) 2 R N ⇥d , we adopt a
simple variant of the GCN model by utilizing the following update function,
1)
Z(t) = ReLU(H(t W (t) ), (7.15)
H(t) = (A + I)Z(t) , (7.16)
where we use non-linear project first to transform the last layer hidden embedding, and
then apply an un-normalized adjacency with self-loop (normalization could prevent

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 149 — #152

7.3 Graph Neural Networks for Drug Property Prediction 149

potential numerical issues while un-normalized adjacency works fine here) to aggregate
the neighborhood information to be the current hidden node embedding. The output
graph embedding is calculated by summing over all the node embeddings and then we
apply a final linear layer to obtain the predicted probability of molecule property.
In the implementation, the self.forward() function of the model takes two steps:
(i) use gnn with molecule adjacency matrix to get molecular graph embeddings; (ii)
apply a property prediction layer on top of the graph embedding to predict the binary
property label. The self.gnn() part is the core of the MoleculeGNN class, which
implements batch processing for molecule graphs with different sizes using padding.
1 class MolecularGNN(nn.Module):
2 """
3 based on https://github.com/masashitsubaki/molecularGNN_smiles
4 """
5 def __init__(self, N_fingerprints, dim, layer_gnn_hidden):
6 super(MolecularGNN, self).__init__()
7 # learnable atom initial features
8 self.embed_fingerprint = nn.Embedding(N_fingerprints, dim)
9 # gnn layers (will be used together with the adj in self.gnn)
10 self.W_fingerprint = nn.ModuleList([nn.Linear(dim, dim)
11 for _ in range(layer_gnn_hidden)])
12 # final prediction layers
13 self.W_property = nn.Linear(dim, 1)
14
15 def pad(self, matrices, pad_value):
16 """Pad the list of matrices
17 with a pad_value (e.g., 0) for batch processing.
18 For example, given a list of matrices [A, B, C],
19 we obtain a new matrix [A00, 0B0, 00C],
20 where 0 is the zero (i.e., pad value) matrix.
21 """
22 shapes = [m.shape for m in matrices]
23 M, N = sum([s[0] for s in shapes]), sum([s[1] for s in shapes])
24 zeros = torch.FloatTensor(np.zeros((M, N)))
25 pad_matrices = pad_value + zeros
26 i, j = 0, 0
27 for k, matrix in enumerate(matrices):
28 m, n = shapes[k]
29 pad_matrices[i:i+m, j:j+n] = matrix
30 i += m
31 j += n
32 return pad_matrices
33

34 def update(self, matrix, vectors, layer):


35 hidden_vectors = torch.relu(self.W_fingerprint[layer](vectors))
36 return hidden_vectors + torch.matmul(matrix, hidden_vectors)
37

38 def sum(self, vectors, axis):


39 sum_vectors = [torch.sum(v, 0) for v in torch.split(vectors, axis)]
40 return torch.stack(sum_vectors)
41

42 def mean(self, vectors, axis):


43 mean_vectors = [torch.mean(v, 0) for v in torch.split(vectors, axis)]
44 return torch.stack(mean_vectors)

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 150 — #153

150 Graph Neural Network

45

46 def gnn(self, inputs):


47 """Cat or pad each input data for batch processing."""
48 fingerprints, adjacencies, molecular_sizes = inputs
49 fingerprints = torch.cat(fingerprints)
50 adjacencies = self.pad(adjacencies, 0)
51
52 """GNN layer (update the fingerprint vectors)."""
53 fingerprint_vectors = self.embed_fingerprint(fingerprints)
54 for l in range(len(self.W_fingerprint)):
55 hs = self.update(adjacencies, fingerprint_vectors, l)
56

57 """Molecular vector by sum or mean of the fingerprint vectors."""


58 molecular_vectors = self.sum(fingerprint_vectors, molecular_sizes)
59 # molecular_vectors = self.mean(fingerprint_vectors, molecular_sizes)
60 return molecular_vectors
61

62 def forward(self, data_batch):


63 molecular_vectors = self.gnn(data_batch)
64 predicted_scores = self.W_property(molecular_vectors)
65 return predicted_scores
66

67 model = MolecularGNN(N_fingerprints, 128, 2)

7.3.4 GNN training and evaluation


We then call the binary cross entropy loss to calculate the loss function (we use
torch.nn.BCEWithLogitsLoss() since it includes a Sigmoid function on top of
our output, which is more numerically stable than using a plain torch.nn.Sigmoid
followed by a torch.nn.BCELoss 9 ) and use AdamW optimizer to calculate the
gradients.
The model is trained for 15 epochs, while after each epoch, the AUROC score will
be computed based on the test set. The training procedures are similar to the previous
PyTorch model training pipeline. Essentially, we find that the loss function is going
down quickly and the AUROC gradually climbs up to almost a perfect score.
1 from sklearn.metrics import roc_auc_score
2
3 batch_size = 32
4 criterion = torch.nn.BCEWithLogitsLoss()
5 optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
6

7 for epoch in range(15):


8 """ model training """
9 np.random.shuffle(dataset_train)
10 train_loss = 0
11

12 model.train()
13 for i in range(0, len(dataset_train), batch_size):

9 This is because by combining the operations into one layer, we take advantage of the log-sum-exp trick
for numerical stability.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 151 — #154

7.3 Graph Neural Networks for Drug Property Prediction 151

14 data_batch = list(zip(*dataset_train[i: i+batch_size]))


15
16 # feed features into the model
17 pred = model.forward(data_batch[:3]).squeeze(-1)
18
19 # use gt property as labels
20 label = torch.FloatTensor(data_batch[-1])
21 loss = criterion(pred, label)
22

23 optimizer.zero_grad()
24 loss.backward()
25 optimizer.step()
26 train_loss += loss.item()
27

28 """ model evaluation """


29 model.eval()
30

31 predicted, groundtruth = [], []


32 with torch.no_grad():
33 for i in range(0, len(dataset_test), batch_size):
34 data_batch = list(zip(*dataset_train[i: i+batch_size]))
35 # feed features into the model
36 pred = model.forward(data_batch[:3]).squeeze(-1).numpy().tolist()
37 label = data_batch[-1]
38

39 predicted += pred
40 groundtruth += label
41

42 print (f"--- epoch: {epoch} ---, train loss: {train_loss}, \


43 test AUROC: {roc_auc_score(groundtruth, predicted)}")
44

45 """
46 --- epoch: 0 ---, train loss: 39.3298280239, test AUROC: 0.5515151515151515
47 --- epoch: 1 ---, train loss: 24.1569204330, test AUROC: 0.6504513140027158
48 --- epoch: 2 ---, train loss: 18.3505403399, test AUROC: 0.7955907021327583
49 --- epoch: 3 ---, train loss: 15.5090635120, test AUROC: 0.8324640937174036
50 --- epoch: 4 ---, train loss: 12.8106079697, test AUROC: 0.8664265706282513
51 --- epoch: 5 ---, train loss: 11.1026673614, test AUROC: 0.9002656363197294
52 --- epoch: 6 ---, train loss: 9.69138415157, test AUROC: 0.9347860791826309
53 --- epoch: 7 ---, train loss: 8.68385870754, test AUROC: 0.9528820856254485
54 --- epoch: 8 ---, train loss: 7.63521204888, test AUROC: 0.9483418367346939
55 --- epoch: 9 ---, train loss: 6.85020373761, test AUROC: 0.969544766004943
56 --- epoch: 10 ---, train loss: 6.0492331385, test AUROC: 0.9728867623604466
57 --- epoch: 11 ---, train loss: 5.4066562131, test AUROC: 0.989516129032258
58 --- epoch: 12 ---, train loss: 4.9491942748, test AUROC: 0.9864766964501328
59 --- epoch: 13 ---, train loss: 4.4005933329, test AUROC: 0.9948178266762338
60 --- epoch: 14 ---, train loss: 3.9376147910, test AUROC: 0.9946519795657727
61 """

If readers want to practice by their own, a complete notebook can be found in this
public repository 10 .

10 https://github.com/sunlabuiuc/pyhealth-book/tree/main/chap7-GNN/notebook

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 152 — #155

152 Graph Neural Network

7.4 Clinical Predictive Modeling with GNNs

In this section, we are going to revisit the ChestXray image classification challenge
using Graph Neural Networks (GNNs). In previous sections, Convolutional Neural
Network (CNN) models were employed for ChestXray image classification, treating
each patient’s image data as an independent sample. However, GNNs offer a novel
approach that capitalizes on leveraging demographic similarities to potentially enhance
predictions. This novel approach operates under the premise that patients with similar
demographic features may exhibit similar labels in their X-ray images.
The approach involves constructing a comprehensive graph where each node repre-
sents a single patient (we assume each patient only has a single x-ray image) Crucially,
the connection between nodes in this graph is established based on the similarity of
patient demographics. Consequently, during training, the embeddings of neighboring
nodes play a pivotal role in updating the embedding of the current node. To manage
instances where numerous images are connected to a single image, a neighborhood
sampling strategy, akin to the one proposed in GraphSAGE by Hamilton et al. [76], is
employed. This strategy aids in managing and updating node embeddings effectively
in a scalable way, optimizing the learning process within the GNN framework for
enhanced ChestXray image classification.
We will use pyhealth modules to implement the whole pipeline.

Graph structures to improve the prediction accuracy


Comparing Graph Neural Networks (GNNs) with straightforward Deep Neural Net-
works (DNNs) reveals a distinctive approach. GNN-based models incorporate external
information to construct a graph structure encompassing all data samples. Subse-
quently, leveraging graph convolutions, these models borrow information from neigh-
boring samples within the graph. This methodology hinges on the assumption that
samples connected within the graph should exhibit similarities in their respective
labels, thereby enhancing the model’s predictive capabilities.
This approach is not limited to ChestXray image classification; it is applicable
across various healthcare domains. For instance, in spatio-temporal disease prediction,
geographical data serves as the foundation for constructing the graph. The underlying
assumption here is that diseases might spread more easily among nearby geographic
locations. By creating a graph based on spatial proximity, GNNs can capitalize on this
geographical information to improve disease prediction models. This strategy aligns
with the notion that regions with geographic proximity might exhibit similarities in
disease prevalence or transmission patterns, which GNNs can effectively leverage to
enhance predictive accuracy.

7.4.1 Step 1: data loading


The whole pipeline starts with data loading, for which we use the
COVID19CXRDataset API similar in Section 4.3. This API only requires the root

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 153 — #156

7.4 Clinical Predictive Modeling with GNNs 153

path to the ChestXray as the argument, and everything else is handled by the API
already.
1 from pyhealth.datasets import COVID19CXRDataset
2

3 root = "/srv/local/data/COVID-19_Radiography_Dataset"
4 base_dataset = COVID19CXRDataset(root)

7.4.2 Step 2: define machine learning task


As usual, we apply the set_task() function for processing the dataset into a listed
dictionary structure, where the information of each sample is formatted as a dictionary.
We can see that this dataset itself has a default task which is to classify the chest Xray
images into different categories. We then directly use the default task for processing.
1 base_dataset.default_task
2 # COVID19CXRClassification(task_name=’COVID19CXRClassification’, \
3 input_schema={’path’: ’image’}, output_schema={’label’: ’label’})
4

5 sample_dataset = base_dataset.set_task()

Similar to Section 4.3, we add one more sample transformation step to further clean up
the image data, which involves enforcing channel consistency, resizing and normalizing
images. The transformation is made possible with the featurizer design in the module.
We does not show the data transformation for other pipelines for simplicity.
1 from torchvision import transforms
2
3 transform = transforms.Compose([
4 transforms.Lambda(lambda x: x if x.shape[0] == 3 else x.repeat(3, 1, 1)),
5 transforms.Resize((224, 224)),
6 transforms.Normalize(mean=[0.5862785803043838], std=[0.27950088968644304])
7 ])
8
9 def encode(sample):
10 sample["path"] = transform(sample["path"])
11 return sample
12

13 sample_dataset.set_transform(encode)

After data transformation, as usual, we split the data into training, validation, and
test by 70% : 10% : 20%. This time, we leverage another data splitter, which is split
by sample, meaning that we do not care whether the data from the same patients are in
the same datasets (either training, validation, or test).
1 from pyhealth.datasets import split_by_sample
2
3 # Get Index of train, valid, test set
4 train_index, val_index, test_index = split_by_sample(
5 dataset=sample_dataset,
6 ratios=[0.7, 0.1, 0.2],
7 get_index = True
8 )

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 154 — #157

154 Graph Neural Network

7.4.3 Step 3: initialize ML model


PyHealth provides a set of graph neural network models combined with advanced CNN
backbones in the Graph_TorchvisionModel API. In this example, we initialize
the resnet18 backbone while there are other choices, such as resnet34, resnet50,
densenet121.
1 from pyhealth.models import Graph_TorchvisionModel
2

3 model = Graph_TorchvisionModel(
4 dataset=sample_dataset,
5 feature_keys=["path"],
6 label_key="label",
7 mode="multiclass",
8 model_name="resnet18",
9 model_config={},
10 gnn_config={"input_dim": 256, "hidden_dim": 128},
11 )

A new procedure in this pipeline is that we need to build an image graph structure.
The demographic information of each patient is already stored in sample_dataset
and our initialized model will build a graph based on the information, so we input the
sample_dataset as an argument.
Next, we leverage the neighborhood sampler to simplify the graph and sparsify the
connections by the GraphSAGE model [76]. After doing so, the training, validation,
and test sets are prepared for training and evaluation.
1 from pyhealth.sampler import NeighborSampler
2
3 graph = model.build_graph(sample_dataset, random = True)
4

5 # Define Sampler as Dataloader


6 train_dataloader = NeighborSampler(sample_dataset, graph["edge_index"],
node_idx=train_index, sizes=[15, 10],
batch_size=64, shuffle=True,
num_workers=12)
7

8 # We sample all edges connected to target node for validation and test (Sizes =
[-1, -1])
9 valid_dataloader = NeighborSampler(sample_dataset, graph["edge_index"],
node_idx=val_index, sizes=[-1, -1],
batch_size=64, shuffle=False,
num_workers=12)
10 test_dataloader = NeighborSampler(sample_dataset, graph["edge_index"], node_idx
=test_index, sizes=[-1, -1], batch_size
=64, shuffle=False, num_workers=12)

7.4.4 Step 4&5: model training and inference


As usual, we use the typical pyhealth.trainer to train the model and use “accuracy"
as the monitoring metric. We then evaluate the model on the test set.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 155 — #158

7.4 Clinical Predictive Modeling with GNNs 155

1 from pyhealth.trainer import Trainer


2

3 resnet_trainer = Trainer(model=model, device="cpu")


4 resnet_trainer.train(
5 train_dataloader=train_dataloader,
6 val_dataloader=valid_dataloader,
7 epochs=10,
8 monitor="accuracy",
9 )

After training the model for 10 epochs, we could evaluate the model performance on
the test set.
1 print(resnet_trainer.evaluate(test_dataloader))
2 """
3 {’accuracy’: 0.4786590097780537, ’f1_macro’: 0.1618557783142647, ’f1_micro’: 0.
4786590097780537, ’loss’: 1.
256981566770753}
4 """

A complete jupyter notebook of this example can be found in the repository 11 .

11 https://github.com/sunlabuiuc/pyhealth-
book/blob/main/chap6/notebook/graph_torchvision_model.ipynb

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 156 — #159

156 Graph Neural Network

7.5 Takeaways

• Graph Data: Many real-world data are irregular and are represented as graphs
(compared to images and time-series which are regular grid-like data). Specifi-
cally, graph is a data structure that consists of a set of nodes and a set of edges
connecting these nodes.
• Common Graph Neural Networks (GNN): Graph convolutional networks (GCN),
graph attention network (GAT), message passing neural networks (MPNN) are
common GNN models. They differ in that GCN treats all edges equally, GAT
learns the edge weights by attention modules, and MPNN allows flexible aggre-
gation mechanism.
• Advancements in GNN Training: Some key components of modern GNN training
tricks are practically useful, such as neighborhood sampling, distributed graph
partition and training, heterogeneous graph modeling.
• GNN Applications in CV and NLP: In the domains of computer vision and natu-
ral language processing, researchers basically construct different types of graph
structures from data and then apply the GNN model to learn node or graph
embeddings.
• Molecule Property Prediction Example: GNN is powerful in learning molecule
graph structures by treating atom as nodes and bonds as edges for predicting the
molecular structure properties.
• PyHealth GNN code examples: GNN could be applied on similarity graphs con-
structed by patient demongraphics, which could further enhance the predictive
model, such as on ChestXray image classification.

This chapter presents the Transformer architecture, pre-training strategies employed


by models like BERT and diverse applications in healthcare enabled by this break-
through technique. The key components and innovations of Transformers are the
foundation for many AI applications including large language models.

Questions

• What are the key elements of a graph? What is the adjacency matrix of a graph?
What are the degree matrix of a graph?
• Comparing two different ways of normalizing the adjacency graph?
• In the section, we discuss the undirected graph. However, the concepts of adjacency
matrix, degree matrix, and normalized adjacency graph can be generalized to
directed graph as well. Could you specify these concepts for the graph shown in
Figure 7.2.
• What are graph neural networks? What are the key difference between GNNs and
DNNs? Could you explain the difference between GCN and multi-layer perceptron
(both use ReLU as activation)?

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 157 — #160

7.5 Takeaways 157

Figure 7.2 A directed graph

• Could you elaborate on the difference between GCN and GAT? Why MPNN is a
general form? Could you use MPNN equation (i.e., Message and Update) function
to formulate the GCN and GAT networks?
• Why GNN is suitable for molecule property prediction? Could you choose your own
favorite molecule graph and your favorite GNN models and explain how to use
this GNN model to model the molecule graph structure?
• In Section 7.4, we connect the graph by patient demographics features, could you
connect a new Xray image graph by image pixel-level similarity and re-implement
the whole pipeline again. Show the results and explain the why two different
graphs lead to different final prediction results.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 256 — #259

References in Cambridge Author, Date Style

[1] C. C. Tappert, “Who is the father of deep learning?” in 2019 International Conference
on Computational Science and Computational Intelligence (CSCI), 2019, pp. 343–348.
[2] A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard,
S. Hao, B. Moody, B. Gow et al., “Mimic-iv, a freely accessible electronic health record
dataset,” Scientific data, vol. 10, no. 1, p. 1, 2023.
[3] C. Yang, Z. Wu, P. Jiang, Z. Lin, J. Gao, B. P. Danek, and J. Sun, “Pyhealth: A deep
learning toolkit for healthcare applications,” in Proceedings of the 29th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, 2023, pp. 5788–5789.
[4] C. Yang, M. B. Westover, and J. Sun, “ManyDG: Many-domain generalization
for healthcare applications,” in The Eleventh International Conference on Learning
Representations, 2023. [Online]. Available: https://openreview.net/forum?id=lcSfirnflpW
[5] E. Choi, Z. Xu, Y. Li, M. W. Dusenberry, G. Flores, Y. Xue, and A. M. Dai, “Learning the
graphical structure of electronic health records with graph convolutional transformer,”
2020.
[6] C. Xiao, T. Ma, A. B. Dieng, D. M. Blei, and F. Wang, “Readmission prediction via deep
contextual embedding of clinical concepts,” PloS one, vol. 13, no. 4, p. e0195024, 2018.
[7] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart, “Retain: An
interpretable predictive model for healthcare using reverse time attention mechanism,”
Advances in neural information processing systems, vol. 29, 2016.
[8] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[9] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent
neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[10] E. Choi, A. Schuetz, W. F. Stewart, and J. Sun, “Medical concept representation learning
from electronic health records and its application on heart failure prediction,” arXiv
preprint arXiv:1602.03686, 2016.
[11] S. Mallya, M. Overhage, N. Srivastava, T. Arai, and C. Erdman, “Effectiveness of lstms
in predicting congestive heart failure onset,” arXiv preprint arXiv:1902.02443, 2019.
[12] G. Maragatham and S. Devi, “Lstm model for prediction of heart failure in big data,”
Journal of medical systems, vol. 43, pp. 1–13, 2019.
[13] K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism
of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36,
no. 4, pp. 193–202, Apr 1980. [Online]. Available: https://doi.org/10.1007/BF00344251
[14] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel,
“Handwritten digit recognition with a back-propagation network,” Advances in neural
information processing systems, vol. 2, 1989.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 257 — #260

References in Cambridge Author, Date Style 257

[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to


document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con-
volutional neural networks,” Advances in neural information processing systems, vol. 25,
2012.
[17] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv preprint arXiv:1409.1556, 2014.
[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2015, pp. 1–9.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
[20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected
convolutional networks,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 4700–4708.
[21] J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, and J. Eisenstein, “Explainable
prediction of medical codes from clinical text,” in Proceedings of the 2018 Conference of
the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long Papers), M. Walker, H. Ji, and A. Stent, Eds.
New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp.
1101–1111. [Online]. Available: https://aclanthology.org/N18-1100
[22] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “GRAM: Graph-based
attention model for healthcare representation learning,” in Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp.
787–795.
[23] J. Gao, C. Yang, J. Heintz, S. Barrows, E. Albers, M. Stapel, S. Warfield, A. Cross, J. Sun,
and N3C consortium, “MedML: Fusing medical knowledge and machine learning models
for early pediatric COVID-19 hospitalization and severity prediction,” iScience, vol. 25,
no. 9, p. 104970, Sep. 2022.
[24] J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun, “Gamenet: Graph augmented
memory networks for recommending medication combination,” Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 1126–1133, Jul. 2019.
[Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/3905
[25] J. Shang, T. Ma, C. Xiao, and J. Sun, “Pre-training of graph augmented transformers
for medication recommendation,” in Proceedings of the Twenty-Eighth International
Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences
on Artificial Intelligence Organization, 7 2019, pp. 5953–5959. [Online]. Available:
https://doi.org/10.24963/ijcai.2019/825
[26] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” Advances in neural information processing systems, vol. 27, 2014.
[27] Y. Zhang, R. Chen, J. Tang, W. F. Stewart, and others, “LEAP: learning to prescribe
effective and safe treatment combinations for multimorbidity,” proceedings of the 23rd,
2017.
[28] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representa-
tions of words and phrases and their compositionality,” Advances in neural information
processing systems, vol. 26, 2013.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 258 — #261

258 References in Cambridge Author, Date Style

[29] Beam search strategies for neural machine translation. Association for Computational
Linguistics, 2017.
[30] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to
align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[31] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based
neural machine translation,” 2015.
[32] L. Cui, S. Biswal, L. M. Glass, G. Lever, J. Sun, and C. Xiao, “CONAN: Complementary
pattern augmentation for rare disease detection,” AAAI, vol. 34, no. 01, pp. 614–621, Apr.
2020.
[33] Z. Yang, A. Mitra, W. Liu, D. Berlowitz, and H. Yu, “TransformEHR: transformer-
based encoder-decoder generative model to enhance prediction of disease outcomes using
electronic health records,” Nat. Commun., vol. 14, no. 1, p. 7857, Nov. 2023.
[34] B. Theodorou, C. Xiao, and J. Sun, “Synthesize high-dimensional longitudinal electronic
health records via hierarchical autoregressive language model,” Nat. Commun., vol. 14,
no. 1, p. 5305, Aug. 2023.
[35] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u.
Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural
Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates,
Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/
file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[36] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional
transformers for language understanding,” in Proceedings of the 2019 Conference of
the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019,
Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds.
Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available:
https://doi.org/10.18653/v1/n19-1423
[37] E. Alsentzer, J. Murphy, W. Boag, W.-H. Weng, D. Jindi, T. Naumann, and
M. McDermott, “Publicly available clinical BERT embeddings,” in Proceedings of
the 2nd Clinical Natural Language Processing Workshop, A. Rumshisky, K. Roberts,
S. Bethard, and T. Naumann, Eds. Minneapolis, Minnesota, USA: Association
for Computational Linguistics, Jun. 2019, pp. 72–78. [Online]. Available:
https://aclanthology.org/W19-1909
[38] N. Kitaev, £. Kaiser, and A. Levskaya, “Reformer: The efficient transformer,” arXiv
preprint arXiv:2001.04451, 2020.
[39] I. Beltagy, M. E. Peters, and A. Cohan, “Longformer: The long-document transformer,”
arXiv preprint arXiv:2004.05150, 2020.
[40] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert:
A lite bert for self-supervised learning of language representations,” arXiv preprint
arXiv:1909.11942, 2019.
[41] S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear
complexity,” arXiv preprint arXiv:2006.04768, 2020.
[42] K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins,
J. Davis, A. Mohiuddin, L. Kaiser et al., “Rethinking attention with performers,” arXiv
preprint arXiv:2009.14794, 2020.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 259 — #262

References in Cambridge Author, Date Style 259

[43] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-


xl: Attentive language models beyond a fixed-length context,” arXiv preprint
arXiv:1901.02860, 2019.
[44] R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating long sequences with sparse
transformers,” arXiv preprint arXiv:1904.10509, 2019.
[45] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed
representations of words and phrases and their compositionality,” in Advances
in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc.,
2013. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2013/file/
9aa42b31882ec039965f3c4923ce901b-Paper.pdf
[46] J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word
representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha,
Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online].
Available: https://aclanthology.org/D14-1162
[47] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirec-
tional transformers for language understanding,” 2019.
[48] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” 2019.
[49] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert:
smaller, faster, cheaper and lighter,” 2020.
[50] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert
for self-supervised learning of language representations,” 2020.
[51] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “Biobert: a
pre-trained biomedical language representation model for biomedical text mining,”
Bioinformatics, vol. 36, no. 4, p. 1234–1240, Sep. 2019. [Online]. Available:
http://dx.doi.org/10.1093/bioinformatics/btz682
[52] K. Huang, J. Altosaar, and R. Ranganath, “Clinicalbert: Modeling clinical notes and
predicting hospital readmission,” 2020.
[53] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. De-
hghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is
worth 16x16 words: Transformers for image recognition at scale,” 2021.
[54] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer:
Hierarchical vision transformer using shifted windows,” 2021.
[55] C. Yang, M. B. Westover, and J. Sun, “BIOT: Biosignal transformer for cross-data
learning in the wild,” in Thirty-seventh Conference on Neural Information Processing
Systems, 2023. [Online]. Available: https://openreview.net/forum?id=c2LZyTyddi
[56] T. Ma, J. Chen, and C. Xiao, “Constrained generation of semantically valid
graphs via regularizing variational autoencoders,” in Proceedings of the 32Nd
International Conference on Neural Information Processing Systems, ser. NIPS’18.
USA: Curran Associates Inc., 2018, pp. 7113–7124. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3327757.3327814
[57] W. Jin, R. Barzilay, and T. S. Jaakkola, “Junction tree variational autoencoder for
molecular graph generation,” CoRR, vol. abs/1802.04364, 2018. [Online]. Available:
http://arxiv.org/abs/1802.04364

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 260 — #263

260 References in Cambridge Author, Date Style

[58] Q. Liu, M. Allamanis, M. Brockschmidt, and A. L. Gaunt, “Constrained


graph variational autoencoders for molecule design,” in Proceedings of the 32Nd
International Conference on Neural Information Processing Systems, ser. NIPS’18.
USA: Curran Associates Inc., 2018, pp. 7806–7815. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3327757.3327877
[59] J. Gao, J. Heintz, C. Mack, L. Glass, A. Cross, and J. Sun, “Evidence-driven spatiotem-
poral COVID-19 hospitalization prediction with ising dynamics,” Nat. Commun., vol. 14,
no. 1, p. 3093, May 2023.
[60] T. Ma, C. Xiao, J. Zhou, and F. Wang, “Drug similarity integration through attentive
multi-view graph auto-encoders,” in Proceedings of the Twenty-Seventh International
Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm,
Sweden., 2018, pp. 3477–3483.
[61] M. Zitnik, M. Agrawal, and J. Leskovec, “Modeling polypharmacy side effects with graph
convolutional networks,” Bioinformatics, vol. 34, no. 13, pp. i457–i466, Jul. 2018.
[62] J. Li, D. Zhou, Y. Shi, J. Yang, S. Chen, Q. Wang, and P. Hui, “Application of weighted
gene co-expression network analysis for data from paired design,” Scientific Reports,
2018.
[63] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “Gram: Graph-based attention
model for healthcare representation learning,” in SIGKDD, 2017.
[64] J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun, “Gamenet: Graph augmented memory
networks for recommending medication combination,” in AAAI, 2019.
[65] T. Gaudelet, B. Day, A. R. Jamasb, J. Soman, C. Regep, G. Liu, J. B. Hayter, R. Vickers,
C. Roberts, J. Tang et al., “Utilizing graph machine learning within drug discovery and
development,” Briefings in bioinformatics, vol. 22, no. 6, p. bbab159, 2021.
[66] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional
networks,” arXiv preprint arXiv:1609.02907, 2016.
[67] P. Veli koviÊ, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph
attention networks,” arXiv preprint arXiv:1710.10903, 2017.
[68] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message pass-
ing for quantum chemistry,” in International conference on machine learning. PMLR,
2017, pp. 1263–1272.
[69] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph
representation learning with differentiable pooling,” Advances in neural information
processing systems, vol. 31, 2018.
[70] P. Veli koviÊ, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, “Deep
graph infomax,” arXiv preprint arXiv:1809.10341, 2018.
[71] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representa-
tions,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge
discovery and data mining, 2014, pp. 701–710.
[72] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information
network embedding,” in Proceedings of the 24th international conference on world wide
web, 2015, pp. 1067–1077.
[73] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Pro-
ceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery
and data mining, 2016, pp. 855–864.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 261 — #264

References in Cambridge Author, Date Style 261

[74] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional


networks,” in International Conference on Learning Representations, 2017. [Online].
Available: https://openreview.net/forum?id=SJU4ayYgl
[75] C. Yang, C. Xiao, F. Ma, L. Glass, and J. Sun, “Safedrug: Dual molecular graph encoders
for recommending effective and safe drug combinations,” 2021.
[76] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large
graphs,” Advances in neural information processing systems, vol. 30, 2017.
[77] S. Gandhi and A. P. Iyer, “P3: Distributed deep graph learning at scale,” in 15th {USENIX}
Symposium on Operating Systems Design and Implementation ({OSDI} 21), 2021, pp.
551–568.
[78] Y. Shao, H. Li, X. Gu, H. Yin, Y. Li, X. Miao, W. Zhang, B. Cui, and L. Chen, “Distributed
graph neural network training: A survey,” ACM Computing Surveys, 2022.
[79] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph
neural network,” in Proceedings of the 25th ACM SIGKDD international conference on
knowledge discovery & data mining, 2019, pp. 793–803.
[80] X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip, “A survey on heterogeneous graph
embedding: methods, techniques, applications and sources,” IEEE Transactions on Big
Data, vol. 9, no. 2, pp. 415–436, 2022.
[81] Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph transformer,” in Proceedings
of the web conference 2020, 2020, pp. 2704–2710.
[82] C. Yang, R. Wang, S. Yao, and T. Abdelzaher, “Semi-supervised hypergraph node clas-
sification on hypergraph line expansion,” in Proceedings of the 31st ACM International
Conference on Information & Knowledge Management, 2022, pp. 2352–2361.
[83] Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph
convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, 2019, pp. 5177–5186.
[84] X. Wang and A. Gupta, “Videos as space-time region graphs,” in Proceedings of the
European conference on computer vision (ECCV), 2018, pp. 399–417.
[85] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic
graph cnn for learning on point clouds,” ACM Transactions on Graphics (tog), vol. 38,
no. 5, pp. 1–12, 2019.
[86] D. Teney, L. Liu, and A. van Den Hengel, “Graph-structured representations for visual
question answering,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 1–9.
[87] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, R. G. Moreno, B. Glocker, and D. Rueckert,
“Spectral graph convolutions for population-based disease prediction,” in Medical Image
Computing and Computer Assisted Intervention- MICCAI 2017: 20th International Con-
ference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part III 20.
Springer, 2017, pp. 177–185.
[88] C. Chen, Y. Wu, Q. Dai, H.-Y. Zhou, M. Xu, S. Yang, X. Han, and Y. Yu, “A survey
on graph neural networks and graph transformers in computer vision: a task-oriented
perspective,” arXiv preprint arXiv:2209.13232, 2022.
[89] L. Jiao, J. Chen, F. Liu, S. Yang, C. You, X. Liu, L. Li, and B. Hou, “Graph representation
learning meets computer vision: A survey,” IEEE Transactions on Artificial Intelligence,
vol. 4, no. 1, pp. 2–22, 2022.
[90] L. Zhang, X. Li, A. Arnab, K. Yang, Y. Tong, and P. H. Torr, “Dual graph convolutional
network for semantic segmentation,” arXiv preprint arXiv:1909.06121, 2019.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 262 — #265

262 References in Cambridge Author, Date Style

[91] H. Xu, C. Jiang, X. Liang, and Z. Li, “Spatial-aware graph relation network for large-scale
object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2019, pp. 9298–9307.
[92] K. Han, Y. Wang, J. Guo, Y. Tang, and E. Wu, “Vision gnn: An image is worth graph
of nodes,” Advances in Neural Information Processing Systems, vol. 35, pp. 8291–8303,
2022.
[93] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-
based action recognition,” in Proceedings of the AAAI conference on artificial intelligence,
vol. 32, no. 1, 2018.
[94] G. Brasó and L. Leal-Taixé, “Learning a neural solver for multiple object tracking,” in
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
2020, pp. 6247–6257.
[95] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, and F. Meneguzzi, “Iden-
tification of autism spectrum disorder using deep learning and the abide dataset,” Neu-
roImage: Clinical, vol. 17, pp. 16–23, 2018.
[96] B.-H. Kim, J. C. Ye, and J.-J. Kim, “Learning dynamic graph representation of brain
connectome with spatio-temporal attention,” Advances in Neural Information Processing
Systems, vol. 34, pp. 4314–4327, 2021.
[97] L. Wu, Y. Chen, K. Shen, X. Guo, H. Gao, S. Li, J. Pei, B. Long et al., “Graph neural net-
works for natural language processing: A survey,” Foundations and Trends® in Machine
Learning, vol. 16, no. 2, pp. 119–328, 2023.
[98] C. Zhang, Q. Li, and D. Song, “Aspect-based sentiment classification with aspect-specific
graph convolutional networks,” in Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th International Joint Conference on
Natural Language Processing (EMNLP-IJCNLP), K. Inui, J. Jiang, V. Ng, and X. Wan,
Eds. Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp.
4568–4578. [Online]. Available: https://aclanthology.org/D19-1464
[99] K. Xu, L. Wu, Z. Wang, Y. Feng, M. Witbrock, and V. Sheinin, “Graph2seq: Graph to se-
quence learning with attention-based neural networks,” arXiv preprint arXiv:1804.00823,
2018.
[100] T. Wang, X. Wan, and H. Jin, “AMR-to-text generation with graph transformer,” Trans-
actions of the Association for Computational Linguistics, vol. 8, 2020.
[101] M. Xu, L. Li, D. Wong, Q. Liu, L. S. Chao et al., “Document graph for neural machine
translation,” arXiv preprint arXiv:2012.03477, 2020.
[102] Y. Chen, L. Wu, and M. J. Zaki, “Graphflow: exploiting conversation flow with graph neu-
ral networks for conversational machine comprehension,” in Proceedings of the Twenty-
Ninth International Joint Conference on Artificial Intelligence, ser. IJCAI’20, 2021.
[103] D. Cai and W. Lam, “Graph transformer for graph-to-sequence learning,” in Proceedings
of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 7464–7471.
[104] C. Zheng and P. Kordjamshidi, “SRLGRN: Semantic role labeling graph reasoning net-
work,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language
Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association
for Computational Linguistics, Nov. 2020.
[105] Z. Guo, Y. Zhang, Z. Teng, and W. Lu, “Densely connected graph convolutional networks
for graph-to-sequence learning,” Transactions of the Association for Computational Lin-
guistics, vol. 7, pp. 297–312, 2019.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 263 — #266

References in Cambridge Author, Date Style 263

[106] L. Song, Y. Zhang, Z. Wang, and D. Gildea, “A graph-to-sequence model for AMR-
to-text generation,” in Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), I. Gurevych and Y. Miyao, Eds.
Melbourne, Australia: Association for Computational Linguistics, Jul. 2018.
[107] S. Li, L. Wu, S. Feng, F. Xu, F. Xu, and S. Zhong, “Graph-to-tree neural networks
for learning structured input-output translation with applications to semantic parsing
and math word problem,” in Findings of the Association for Computational Linguistics:
EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Online: Association for Computational
Linguistics, Nov. 2020.
[108] Q. Fu, L. Song, W. Du, and Y. Zhang, “End-to-end AMR coreference resolution,” in
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics
and the 11th International Joint Conference on Natural Language Processing (Volume
1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Online: Association for
Computational Linguistics, Aug. 2021.
[109] J. Han, B. Cheng, and X. Wang, “Open domain question answering based on text en-
hanced knowledge graph with hyperedge infusion,” in Findings of the Association for
Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Online:
Association for Computational Linguistics, Nov. 2020.
[110] Y. Luo and H. Zhao, “Bipartite flat-graph network for nested named entity recognition,”
in Proceedings of the 58th Annual Meeting of the Association for Computational Lin-
guistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds. Online: Association for
Computational Linguistics, Jul. 2020.
[111] P. Kapanipathi, V. Thost, S. S. Patel, S. Whitehead, I. Abdelaziz, A. Balakrishnan,
M. Chang, K. Fadnis, C. Gunasekara, B. Makni et al., “Infusing knowledge into the
textual entailment task using graph convolutional networks,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8074–8081.
[112] O. Wieder, S. Kohlbacher, M. Kuenemann, A. Garon, P. Ducrot, T. Seidel, and T. Langer,
“A compact review of molecular property prediction with graph neural networks,” Drug
Discovery Today: Technologies, vol. 37, pp. 1–12, 2020.
[113] D. Jiang, Z. Wu, C.-Y. Hsieh, G. Chen, B. Liao, Z. Wang, C. Shen, D. Cao, J. Wu, and
T. Hou, “Could graph neural networks learn better molecular representation for drug
discovery? a comparison study of descriptor-based and graph-based models,” Journal of
cheminformatics, vol. 13, no. 1, pp. 1–23, 2021.
[114] J. Lim, S. Ryu, K. Park, Y. J. Choe, J. Ham, and W. Y. Kim, “Predicting drug–target
interaction using a novel graph neural network with 3d structure-embedded graph repre-
sentation,” Journal of chemical information and modeling, vol. 59, no. 9, pp. 3981–3988,
2019.
[115] I. Ghebrehiwet, N. Zaki, R. Damseh, and M. S. Mohamad, “Revolutionizing personalized
medicine with generative ai: A systematic review,” 2024.
[116] W. H. Pinaya, M. S. Graham, E. Kerfoot, P.-D. Tudosiu, J. Dafflon, V. Fernandez,
P. Sanchez, J. Wolleb, P. F. da Costa, A. Patel et al., “Generative ai for medical imaging:
extending the monai framework,” arXiv preprint arXiv:2307.15208, 2023.
[117] X. Zeng, F. Wang, Y. Luo, S.-g. Kang, J. Tang, F. C. Lightstone, E. F. Fang, W. Cornell,
R. Nussinov, and F. Cheng, “Deep generative molecular design reshapes drug discovery,”
Cell Reports Medicine, 2022.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 264 — #267

264 References in Cambridge Author, Date Style

[118] T. Das, Z. Wang, and J. Sun, “Twin: Personalized clinical trial digital twin generation,” in
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data
Mining, 2023, pp. 402–413.
[119] Z. Wang, C. Gao, L. M. Glass, and J. Sun, “Artificial intelligence for in silico clinical
trials: A review,” arXiv preprint arXiv:2209.09023, 2022.
[120] K. Yu, Y. Wang, Y. Cai, C. Xiao, E. Zhao, L. Glass, and J. Sun, “Rare disease de-
tection by sequence modeling with generative adversarial networks,” arXiv preprint
arXiv:1907.01022, 2019.
[121] B. Yelmen, A. Decelle, L. Ongaro, D. Marnetto, C. Tallec, F. Montinaro, C. Furtlehner,
L. Pagani, and F. Jay, “Creating artificial human genomes using generative neural net-
works,” PLoS genetics, vol. 17, no. 2, p. e1009303, 2021.
[122] B. Theodorou, C. Xiao, and J. Sun, “Synthesize high-dimensional longitudinal electronic
health records via hierarchical autoregressive language model,” Nature communications,
vol. 14, no. 1, p. 5305, 2023.
[123] Z. Wang, Q. She, A. F. Smeaton, T. E. Ward, and G. Healy, “Synthetic-neuroscore: Using
a neuro-ai interface for evaluating generative adversarial networks,” Neurocomputing,
vol. 405, pp. 26–36, 2020.
[124] T. Golany, K. Radinsky, and D. Freedman, “Simgans: Simulator-based generative adver-
sarial networks for ecg synthesis to improve deep ecg classification,” in International
Conference on Machine Learning. PMLR, 2020, pp. 3597–3606.
[125] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing
systems, vol. 27, 2014.
[126] ——, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11,
pp. 139–144, 2020.
[127] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint
arXiv:1312.6114, 2013.
[128] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural
networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
[129] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing
robust features with denoising autoencoders,” in Proceedings of the 25th international
conference on Machine learning, 2008, pp. 1096–1103.
[130] L. Weng, “From autoencoder to beta-vae,” lilianweng.github.io, 2018. [Online].
Available: https://lilianweng.github.io/posts/2018-08-12-vae/
[131] A. Makhzani and B. Frey, “K-sparse autoencoders,” arXiv preprint arXiv:1312.5663,
2013.
[132] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,”
in International conference on machine learning. PMLR, 2017, pp. 214–223.
[133] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation us-
ing cycle-consistent adversarial networks,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2223–2232.
[134] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and
A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational
framework,” in International conference on learning representations, 2016.
[135] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances
in neural information processing systems, vol. 34, pp. 8780–8794, 2021.

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 265 — #268

References in Cambridge Author, Date Style 265

[136] J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded diffusion
models for high fidelity image generation,” The Journal of Machine Learning Research,
vol. 23, no. 1, pp. 2249–2281, 2022.
[137] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H.
Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM
Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023.
[138] J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised
learning using nonequilibrium thermodynamics,” 2015.
[139] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distri-
bution,” 2020.
[140] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020.
[141] M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient langevin dynamics,”
in Proceedings of the 28th international conference on machine learning (ICML-11),
2011, pp. 681–688.
[142] L. Weng, “What are diffusion models?” lilianweng.github.io, Jul 2021. [Online].
Available: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
[143] R. Bellman, “The theory of dynamic programming,” Bulletin of the American Mathemat-
ical Society, vol. 60, no. 6, pp. 503–515, 1954.
[144] ——, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[145] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-
miller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602,
2013.
[146] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic
policy gradient algorithms,” in International conference on machine learning. Pmlr,
2014, pp. 387–395.
[147] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for
reinforcement learning with function approximation,” Advances in neural information
processing systems, vol. 12, 1999.
[148] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and
K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Inter-
national conference on machine learning. PMLR, 2016, pp. 1928–1937.
[149] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy
optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[150] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and
D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint
arXiv:1509.02971, 2015.
[151] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy opti-
mization,” in International conference on machine learning. PMLR, 2015, pp. 1889–
1897.
[152] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-
critic methods,” in International conference on machine learning. PMLR, 2018, pp.
1587–1596.
[153] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum
entropy deep reinforcement learning with a stochastic actor,” in International conference
on machine learning. PMLR, 2018, pp. 1861–1870.
[154] G.-Q. Zhang, L. Cui, R. Mueller, S. Tao, M. Kim, M. Rueschman, S. Mariani, D. Mobley,
and S. Redline, “The national sleep research resource: towards a sleep data commons,”

i i

i i
i i

i i
“output” — 2024/3/4 — 6:48 — page 50 — #269

Journal of the American Medical Informatics Association, vol. 25, no. 10, pp. 1351–1358,
2018.
[155] S. F. Quan, B. V. Howard, C. Iber, J. P. Kiley, F. J. Nieto, G. T. O’Connor, D. M. Rapoport,
S. Redline, J. Robbins, J. M. Samet et al., “The sleep heart health study: design, rationale,
and methods,” Sleep, vol. 20, no. 12, pp. 1077–1085, 1997.

i i

i i

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy