0% found this document useful (0 votes)
7 views62 pages

07-theory

CS224W is a course on Machine Learning with Graphs taught by Jure Leskovec at Stanford University. The course covers the design space of Graph Neural Networks (GNNs), focusing on their expressive power and the importance of injective neighbor aggregation functions for distinguishing different graph structures. Key concepts include message passing, aggregation, and the role of computational graphs in generating node embeddings.

Uploaded by

yf970113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views62 pages

07-theory

CS224W is a course on Machine Learning with Graphs taught by Jure Leskovec at Stanford University. The course covers the design space of Graph Neural Networks (GNNs), focusing on their expressive power and the importance of injective neighbor aggregation functions for distinguishing different graph structures. Key concepts include message passing, aggregation, and the role of computational graphs in generating node embeddings.

Uploaded by

yf970113
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University


http://cs224w.stanford.edu
ANNOUNCEMENTS
• Homework 1 due on Thursday (2/2)
• Based on course feedback, we will hold in-person
OHs every week on Wednesday 9-11 AM PT.
Location will be updated on the OH calendar.

CS224W: Machine Learning with Graphs


Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation


3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
Dataset split

Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head

Loss
function
Implementation resources:
PyG provides core modules for this pipeline
GraphGym further implements the full pipeline to facilitate GNN design
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 4
How powerful are GNNs?
 Many GNN models have been proposed (e.g.,
GCN, GAT, GraphSAGE, design space).
 What is the expressive power (ability to
distinguish different graph structures) of these
GNN models?
 How to design a maximally expressive GNN
model?

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
 We focus on message passing GNNs:
▪ (1) Message: each node computes a message
(𝑙) 𝑙 𝑙−1
𝐦𝑢 = MSG 𝐡𝑢 , 𝑢 ∈ {𝑁 𝑣 ∪ 𝑣}
▪ (2) Aggregation: aggregate messages from neighbors
(𝑙) 𝑙 𝑙 𝑙
𝐡𝑣 = AGG 𝐦𝑢 , 𝑢 ∈ 𝑁 𝑣 , 𝐦𝑣

(2) Aggregation

(1) Message

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
 Many GNN models have been proposed:
▪ GCN, GraphSAGE, GAT, Design Space etc.
?

?
?

Different GNN models use different


neural networks in the box
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 7
 GCN (mean-pool) [Kipf and Welling ICLR 2017]

?
?

Element-wise mean pooling +


Linear + ReLU non-linearity
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 8
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]

?
?

MLP + element-wise max-pooling

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 9
 We use node same/different colors to represent
nodes with same/different features.
▪ For example, the graph below assumes all the nodes
share the same feature.
1 2

5 4

 Key question: How well can a GNN distinguish


different graph structures?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 10
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 5
have different
neighborhood structures 1 2

because they have


3
different node degrees.
5 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 4
both have the same node
degree of 2. However, they 1 2

still have different


3
neighborhood structures
because their neighbors
5 4
have different node degrees.

Node 1 has neighbors of degrees 2 and 3.


Node 4 has neighbors of degrees 1 and 3.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 2
have the same
neighborhood structure 1 2

because they are


3
symmetric within the
graph.
5 4

Node 1 has neighbors of degrees 2 and 3.


Node 2 has neighbors of degrees 2 and 3.
And even if we go a step deeper to 2nd hop neighbors, both nodes
have the same degrees (Node 4 of degree 2)
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
 Key question: Can GNN node embeddings
distinguish different node’s local
neighborhood structures?
▪ If so, when? If not, when will a GNN fail?

 Next: We need to understand how a GNN


captures local neighborhood structures.
▪ Key concept: Computational graph

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
 In each layer, a GNN aggregates neighboring node
embeddings.
 A GNN generates node embeddings through a
computational graph defined by the neighborhood.
▪ Ex: Node 1’s computational graph (2-layer GNN)
1
1 2

3
2 5

5 4

1 5 1 2 4
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 15
 Ex: Nodes 1 and 2’s computational graphs.

1 2
1 2

3
2 5 1 5

5 4

1 5 1 2 4 2 5 1 2 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 16
 Ex: Nodes 1 and 2’s computational graphs.
 But GNN only sees node features (not IDs):

1 2

5 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 17
 A GNN will generate the same embedding for
nodes 1 and 2 because:
Note: GNN does not
▪ Computational graphs are the same. care about node ids, it
just aggregates features
▪ Node features (colors) are identical. vectors of different nodes.

1 2
1 2

5 4

GNN won’t be able to distinguish nodes 1 and 2


3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
 In general, different local neighborhoods
define different computational graphs
1 2

5 4

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 3 5 4 1 2 4 2 5 1 5 3 5

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 19
 Computational graphs are identical to rooted
subtree structures around each node.
1 2 Rooted subtree structures
(defined by recursively unfolding
3
neighboring nodes from the root nodes)

5 4

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 3 5 4 1 2 4 2 5 1 5 3 5

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 20
 GNN‘s node embeddings capture rooted
subtree structures.
 Most expressive GNN maps different rooted
subtrees into different node embeddings
(represented by different colors). Embedding

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 5 4 1 2 4 2 5 1 5 3 5
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 21
 Function 𝑓: 𝑋 → Y is injective if it maps
different elements into different outputs.
 Intuition: 𝑓 retains all the information about
input.
𝑋 𝑌
𝑓
1 D
B
2
C
3 A

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
 Most expressive GNN should map subtrees to
the node embeddings injectively.
Embedding space

ℝ𝑑

Subtrees

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 5 4 1 2 4 2 5 1 5 3 5
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 23
 Key observation: Subtrees of the same depth
can be recursively characterized from the leaf
nodes to the root nodes.
From leaves From leaves
to the root to the root
(2 neighbors, (1 neighbor, 4
1
3 neighbors) 3 neighbors)

2 5 3 5
2 neighbors 3 neighbors 1 neighbor 3 neighbors

1 5 1 2 4 4 1 2 4
Input features Input features
are uniform are uniform

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
 If each step of GNN’s aggregation can fully
retain the neighboring information, the
generated node embeddings can distinguish
different rooted subtrees.
Fully retain (2 neighbors, (1 neighbor,
neighboring 1 4
3 neighbors) 3 neighbors)
information

2 5 3 5
2 neighbors 3 neighbors 1 neighbor 3 neighbors
Fully retain 1 5 1 2 4 4 1 2 4
neighboring
information Input features Input features
are uniform are uniform
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 25
 In other words, most expressive GNN would
use an injective neighbor aggregation
function at each step.
▪ Maps different neighbors to different embeddings.

Injective 1 4
neighbor
aggregation
2 5 3 5
Injective
neighbor
aggregation 1 5 1 2 4 4 1 2 4
Input features Input features
are uniform are uniform
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 26
 Summary so far
▪ To generate a node embedding, GNNs use a
computational graph corresponding to a subtree
rooted around each node.
Input graph Computational 1
graph Using injective
1 2 = Rooted neighbor
subtree 2 5 aggregation
3 → distinguish
different
5 4 1 5 1 2 4 subtrees

▪ GNN can fully distinguish different subtree


structures if every step of its neighbor
aggregation is injective.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 27
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Key observation: Expressive power of GNNs
can be characterized by that of neighbor
aggregation functions they use.
▪ A more expressive aggregation function leads to a
more expressive a GNN.
▪ Injective aggregation function leads to the most
expressive GNN.
 Next:
▪ Theoretically analyze expressive power of
aggregation functions.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 29
 Observation: Neighbor aggregation can be
abstracted as a function over a multi-set (a
set with repeating elements).
Examples of
Equivalent multi-set

Neighbor Multi-set function


aggregation
Same color indicates the
same features.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 30
 Next: We analyze aggregation functions of
two popular GNN models
▪ GCN (mean-pool) [Kipf & Welling, ICLR 2017]
▪ Uses element-wise mean pooling over neighboring node
features
Mean( 𝑥𝑢 𝑢∈𝑁(𝑣) )
▪ GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Uses element-wise max pooling over neighboring node
features
Max( 𝑥𝑢 𝑢∈𝑁 𝑣 )

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
 GCN (mean-pool) [Kipf & Welling ICLR 2017]
▪ Take element-wise mean, followed by linear
function and ReLU activation, i.e., max(0, 𝑥).
▪ Theorem [Xu et al. ICLR 2019]
▪ GCN’s aggregation function cannot distinguish different
multi-sets with the same color proportion.
Failure case

 Why?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 32
 For simplicity, we assume node features
(colors) are represented by one-hot encoding.
▪ Example: If there are two distinct colors:

▪ This assumption is sufficient to illustrate how GCN


fails.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
 GCN (mean-pool) [Kipf & Welling ICLR 2017]
▪ Failure case illustration
Same outputs!

Linear + ReLU Linear + ReLU

Element-wise-
mean-pool

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 34
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Apply an MLP, then take element-wise max.
▪ Theorem [Xu et al. ICLR 2019]
▪ GraphSAGE’s aggregation function cannot distinguish
different multi-sets with the same set of distinct colors.
Failure case

 Why?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 35
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Failure case illustration
The same outputs!
Element-wise-
max-pool

For simplicity,
assume the one-
hot encoding
after MLP.

MLP

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 36
 We analyzed the expressive power of GNNs.
 Main takeaways:
▪ Expressive power of GNNs can be characterized by
that of the neighbor aggregation function.
▪ Neighbor aggregation is a function over multi-sets
(sets with repeating elements)
▪ GCN and GraphSAGE’s aggregation functions fail to
distinguish some basic multi-sets; hence not injective.
▪ Therefore, GCN and GraphSAGE are not maximally
powerful GNNs.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 37
 Our goal: Design maximally powerful GNNs
in the class of message-passing GNNs.
 This can be achieved by designing injective
neighbor aggregation function over multi-
sets.

 Here, we design a neural network that can


model injective multiset function.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 38
Theorem [Xu et al. ICLR 2019]
Any injective multi-set function can be expressed
Some non-
as: linear function
Some non-
linear function

Sum over multi-set

𝑆 : multi-set
𝑓 +𝑓 +𝑓

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
Proof Intuition: [Xu et al. ICLR 2019]
𝑓 produces one-hot encodings of colors. Summation of
the one-hot encodings retains all the information about
the input multi-set.

Example:
𝑓 +𝑓 +𝑓

One-hot + + =
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 40
 How to model 𝜱 and 𝒇 in 𝜱 σ𝒙∈𝑺 𝒇(𝒙) ?
 We use a Multi-Layer Perceptron (MLP).
 Theorem: Universal Approximation Theorem
[Hornik et al., 1989]

▪ 1-hidden-layer MLP with sufficiently-large hidden


dimensionality and appropriate non-linearity 𝜎(⋅)
(including ReLU and sigmoid) can approximate any
continuous function to an arbitrary accuracy.

Input 𝑾1 𝜎 𝑾2 Output

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 41
 We have arrived at a neural network that can
model any injective multiset function.

▪ In practice, MLP hidden dimensionality of 100 to


500 is sufficient.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 42
 Graph Isomorphism Network (GIN) [Xu et al. ICLR 2019]
▪ Apply an MLP, element-wise sum, followed by
another MLP.

 Theorem [Xu et al. ICLR 2019]


▪ GIN‘s neighbor aggregation function is injective.
 No failure cases!
 GIN is THE most expressive GNN in the class of
message-passing GNNs we have introduced!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 43
 So far: We have described the neighbor
aggregation part of GIN.

 We now describe the full model of GIN by


relating it to WL graph kernel (traditional way
of obtaining graph-level features).
▪ We will see how GIN is a “neural network” version
of the WL graph kernel.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 44
Recall: Color refinement algorithm in WL kernel.
 Given: A graph 𝐺 with a set of nodes 𝑉.
▪ Assign an initial color 𝑐 0 𝑣 to each node 𝑣.
▪ Iteratively refine node colors by
𝑘+1 𝑘 𝑘
𝑐 𝑣 = HASH 𝑐 𝑣 , 𝑐 𝑢 ,
𝑢∈𝑁 𝑣
where HASH maps different inputs to different colors.
▪ After 𝐾 steps of color refinement, 𝑐 𝐾 𝑣
summarizes the structure of 𝐾-hop neighborhood

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 45
Example of color refinement given two graphs
▪ Assign initial colors
1 1 1 1

1 1 1 1

1 1 1 1

▪ Aggregate neighboring colors


1,11 1,111
1,111 1,11

1,1111 1,111 1,1111 1,111

1,1 1,1
1,1 1,1

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Example of color refinement given two graphs
▪ Aggregated colors:
1,11 1,111
1,111 1,11

1,1111 1,111 1,1111 1,111

1,1 1,1
1,1 1,1

▪ Injectively HASH the aggregated colors


3 4 HASH table: Injective!
4 3
1,1 --> 2
5 4 5 4 1,11 --> 3
1,111 --> 4
1,1111 --> 5
2 2 2 2

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
Example of color refinement given two graphs
 Process continues until a stable coloring is
reached
 Two graphs are considered isomorphic if they
have the same set of colors.

11 8 9 11

12 11 13 10

7 7 7 6

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
 GIN uses a neural network to model the
injective HASH function.
𝑘+1 𝑘 𝑘
𝑐 𝑣 = HASH 𝑐 𝑣 , 𝑐 𝑢
𝑢∈𝑁 𝑣

 Specifically, we will model the injective


function over the tuple:
(𝑐 𝑘 𝑣 , 𝑐 𝑘 𝑢 𝑢∈𝑁 𝑣 )
Root node
Neighboring
features
node colors
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 49
Theorem (Xu et al. ICLR 2019)
Any injective function over the tuple
Root node
(𝑐 𝑘 𝑣 , 𝑐 𝑘 𝑢 ) Neighboring
feature 𝑢∈𝑁 𝑣 node features

can be modeled as

where 𝜖 is a learnable scalar.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 50
 If input feature 𝑐 0 (𝑣) is represented as one-
hot, direct summation is injective.
Example: + +

+ + =
 We only need Φ to ensure the injectivity.

Root node
features Neighboring node
features This MLP can provide “one-hot” input
feature for the next layer.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 51
 GIN’s node embedding updates
 Given: A graph 𝐺 with a set of nodes 𝑉.
▪ Assign an initial vector 𝑐 0 𝑣 to each node 𝑣.
▪ Iteratively update node vectors by
𝑘+1 𝑘 𝑘
𝑐 𝑣 = GINConv 𝑐 𝑣 , 𝑐 𝑢 ,
𝑢∈𝑁 𝑣
Differentiable color HASH function
where GINConv maps different inputs to different embeddings.
▪ After 𝐾 steps of GIN iterations, 𝑐 𝐾 𝑣 summarizes
the structure of 𝐾-hop neighborhood.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 52
 GIN can be understood as differentiable neural
version of the WL graph Kernel:
Update target Update function
WL Graph Kernel Node colors HASH
(one-hot)
GIN Node embeddings GINConv
(low-dim vectors)

 Advantages of GIN over the WL graph kernel are:


▪ Node embeddings are low-dimensional; hence, they can
capture the fine-grained similarity of different nodes.
▪ Parameters of the update function can be learned for the
3/7/2023
downstream tasks.
Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 53
 Because of the relation between GIN and the
WL graph kernel, their expressive is exactly the
same.
▪ If two graphs can be distinguished by GIN, they can be
also distinguished by the WL kernel, and vice versa.
 How powerful is this?
▪ WL kernel has been both theoretically and
empirically shown to distinguish most of the real-
world graphs [Cai et al. 1992].
▪ Hence, GIN is also powerful enough to distinguish
most of the real graphs!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 54
Failure cases for mean and max pooling:

Colors represent feature values


Ranking by discriminative power:

Jure Les kovec, Stanford University 55


 Can the expressive power of GNNs be improved?
▪ There are basic graph structures that existing GNN
framework cannot distinguish, such as difference in cycles.
Graphs Computational graphs
A 𝑣1 for nodes 𝑣1 and 𝑣2 :
B 𝑣2

▪ GNNs’ expressive power can be improved to resolve


the above problem. [You et al. AAAI 2021, Li et al. NeurIPS 2020]
▪ Stay tuned for Lecture 15: Advanced Topics in GNNs
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
 We design a neural network that can model
injective multi-set function.
 We use the neural network for neighbor
aggregation function and arrive at GIN---the
most expressive GNN model.
 The key is to use element-wise sum pooling,
instead of mean-/max-pooling.
 GIN is closely related to the WL graph kernel.
 Both GIN and WL graph kernel can distinguish
most of the real graphs!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Data preprocessing is important:
▪ Node attributes can vary a lot! Use normalization
▪ E.g. probability ranges (0,1), but some inputs could have much
larger range, say (−1000, 1000)
 Optimizer: ADAM is relatively robust to learning rate
 Activation function
▪ ReLU activation function often works well
▪ Other good alternatives: LeakyReLU, PReLU
▪ No activation function at your output layer
▪ Include bias term in every layer
 Embedding dimensions:
▪ 32, 64 and 128 are often good starting points
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
 Debug issues: Loss/accuracy not converging
during training
▪ Check pipeline (e.g. in PyTorch we need zero_grad)
▪ Adjust hyperparameters such as learning rate
▪ Pay attention to weight parameter initialization
▪ Scrutinize loss function!
 Important for model development:
▪ Overfit on (part of) training data:
▪ With a small training dataset, loss should be essentially
close to 0, with an expressive neural network
▪ Monitor the training & validation loss curve
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 60
GraphGym:
Easy and flexible end-to-end GNN pipeline
based on PyTorch Geometric (PyG)

GNN frameworks:
DGL GraphNets Implements a variety
of GNN architectures

Auto-differentiation frameworks
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 61
Tutorials and overviews:
▪ Relational inductive biases and graph networks (Battaglia et al., 2018)
▪ Representation learning on graphs: Methods and applications (Hamilton et al., 2017)
Attention-based neighborhood aggregation:
▪ Graph attention networks (Hoshen, 2017; Velickovic et al., 2018; Liu et al., 2018)
Embedding entire graphs:
▪ Graph neural nets with edge embeddings (Battaglia et al., 2016; Gilmer et. al., 2017)
▪ Embedding entire graphs (Duvenaud et al., 2015; Dai et al., 2016; Li et al., 2018) and graph pooling
(Ying et al., 2018, Zhang et al., 2018)
▪ Graph generation and relational inference (You et al., 2018; Kipf et al., 2018)
▪ How powerful are graph neural networks(Xu et al., 2017)
Embedding nodes:
▪ Varying neighborhood: Jumping knowledge networks (Xu et al., 2018), GeniePath (Liu et al., 2018)
▪ Position-aware GNN (You et al. 2019)

Spectral approaches to graph neural networks:


▪ Spectral graph CNN & ChebNet (Bruna et al., 2015; Defferrard et al., 2016)
▪ Geometric deep learning (Bronstein et al., 2017; Monti et al., 2017)
Other GNN techniques:
▪ Pre-training Graph Neural Networks (Hu et al., 2019)
▪ GNNExplainer: Generating Explanations for Graph Neural Networks (Ying et al., 2019)

12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy