0% found this document useful (0 votes)

7 views62 pages

07-theory

CS224W is a course on Machine Learning with Graphs taught by Jure Leskovec at Stanford University. The course covers the design space of Graph Neural Networks (GNNs), focusing on their expressive power and the importance of injective neighbor aggregation functions for distinguishing different graph structures. Key concepts include message passing, aggregation, and the role of computational graphs in generating node embeddings.

Uploaded by

yf970113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views62 pages

07-theory

Uploaded by

yf970113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

http://cs224w.stanford.edu
ANNOUNCEMENTS
• Homework 1 due on Thursday (2/2)
• Based on course feedback, we will hold in-person
OHs every week on Wednesday 9-11 AM PT.
Location will be updated on the OH calendar.

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
Dataset split

Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head

Loss
function
Implementation resources:
PyG provides core modules for this pipeline
GraphGym further implements the full pipeline to facilitate GNN design
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 4
How powerful are GNNs?
 Many GNN models have been proposed (e.g.,
GCN, GAT, GraphSAGE, design space).
 What is the expressive power (ability to
distinguish different graph structures) of these
GNN models?
 How to design a maximally expressive GNN
model?

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
 We focus on message passing GNNs:
▪ (1) Message: each node computes a message
(𝑙) 𝑙 𝑙−1
𝐦𝑢 = MSG 𝐡𝑢 , 𝑢 ∈ {𝑁 𝑣 ∪ 𝑣}
▪ (2) Aggregation: aggregate messages from neighbors
(𝑙) 𝑙 𝑙 𝑙
𝐡𝑣 = AGG 𝐦𝑢 , 𝑢 ∈ 𝑁 𝑣 , 𝐦𝑣

(2) Aggregation

(1) Message

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
 Many GNN models have been proposed:
▪ GCN, GraphSAGE, GAT, Design Space etc.
?

?
?

Different GNN models use different

neural networks in the box
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 7
 GCN (mean-pool) [Kipf and Welling ICLR 2017]

?
?

Element-wise mean pooling +

Linear + ReLU non-linearity
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 8
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]

?
?

MLP + element-wise max-pooling

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 9
 We use node same/different colors to represent
nodes with same/different features.
▪ For example, the graph below assumes all the nodes
share the same feature.
1 2

5 4

 Key question: How well can a GNN distinguish

different graph structures?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 10
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 5
have different
neighborhood structures 1 2

because they have

3
different node degrees.
5 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 4
both have the same node
degree of 2. However, they 1 2

still have different

3
neighborhood structures
because their neighbors
5 4
have different node degrees.

Node 1 has neighbors of degrees 2 and 3.

Node 4 has neighbors of degrees 1 and 3.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
 We specifically consider local neighborhood
structures around each node in a graph.
▪ Example: Nodes 1 and 2
have the same
neighborhood structure 1 2

because they are

3
symmetric within the
graph.
5 4

Node 1 has neighbors of degrees 2 and 3.

Node 2 has neighbors of degrees 2 and 3.
And even if we go a step deeper to 2nd hop neighbors, both nodes
have the same degrees (Node 4 of degree 2)
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
 Key question: Can GNN node embeddings
distinguish different node’s local
neighborhood structures?
▪ If so, when? If not, when will a GNN fail?

 Next: We need to understand how a GNN

captures local neighborhood structures.
▪ Key concept: Computational graph

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
 In each layer, a GNN aggregates neighboring node
embeddings.
 A GNN generates node embeddings through a
computational graph defined by the neighborhood.
▪ Ex: Node 1’s computational graph (2-layer GNN)
1
1 2

3
2 5

5 4

1 5 1 2 4
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 15
 Ex: Nodes 1 and 2’s computational graphs.

1 2
1 2

3
2 5 1 5

5 4

1 5 1 2 4 2 5 1 2 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 16
 Ex: Nodes 1 and 2’s computational graphs.
 But GNN only sees node features (not IDs):

1 2

5 4

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 17
 A GNN will generate the same embedding for
nodes 1 and 2 because:
Note: GNN does not
▪ Computational graphs are the same. care about node ids, it
just aggregates features
▪ Node features (colors) are identical. vectors of different nodes.

1 2
1 2

5 4

GNN won’t be able to distinguish nodes 1 and 2

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
 In general, different local neighborhoods
define different computational graphs
1 2

5 4

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 3 5 4 1 2 4 2 5 1 5 3 5

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 19
 Computational graphs are identical to rooted
subtree structures around each node.
1 2 Rooted subtree structures
(defined by recursively unfolding
3
neighboring nodes from the root nodes)

5 4

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 3 5 4 1 2 4 2 5 1 5 3 5

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 20
 GNN‘s node embeddings capture rooted
subtree structures.
 Most expressive GNN maps different rooted
subtrees into different node embeddings
(represented by different colors). Embedding

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 5 4 1 2 4 2 5 1 5 3 5
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 21
 Function 𝑓: 𝑋 → Y is injective if it maps
different elements into different outputs.
 Intuition: 𝑓 retains all the information about
input.
𝑋 𝑌
𝑓
1 D
B
2
C
3 A

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
 Most expressive GNN should map subtrees to
the node embeddings injectively.
Embedding space

ℝ𝑑

Subtrees

1 2 3 4 5

2 5 1 5 4 3 5 1 2 4

1 5 1 2 4 2 5 1 2 4 5 4 1 2 4 2 5 1 5 3 5
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 23
 Key observation: Subtrees of the same depth
can be recursively characterized from the leaf
nodes to the root nodes.
From leaves From leaves
to the root to the root
(2 neighbors, (1 neighbor, 4
1
3 neighbors) 3 neighbors)

2 5 3 5
2 neighbors 3 neighbors 1 neighbor 3 neighbors

1 5 1 2 4 4 1 2 4
Input features Input features
are uniform are uniform

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
 If each step of GNN’s aggregation can fully
retain the neighboring information, the
generated node embeddings can distinguish
different rooted subtrees.
Fully retain (2 neighbors, (1 neighbor,
neighboring 1 4
3 neighbors) 3 neighbors)
information

2 5 3 5
2 neighbors 3 neighbors 1 neighbor 3 neighbors
Fully retain 1 5 1 2 4 4 1 2 4
neighboring
information Input features Input features
are uniform are uniform
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 25
 In other words, most expressive GNN would
use an injective neighbor aggregation
function at each step.
▪ Maps different neighbors to different embeddings.

Injective 1 4
neighbor
aggregation
2 5 3 5
Injective
neighbor
aggregation 1 5 1 2 4 4 1 2 4
Input features Input features
are uniform are uniform
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 26
 Summary so far
▪ To generate a node embedding, GNNs use a
computational graph corresponding to a subtree
rooted around each node.
Input graph Computational 1
graph Using injective
1 2 = Rooted neighbor
subtree 2 5 aggregation
3 → distinguish
different
5 4 1 5 1 2 4 subtrees

▪ GNN can fully distinguish different subtree

structures if every step of its neighbor
aggregation is injective.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 27
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Key observation: Expressive power of GNNs
can be characterized by that of neighbor
aggregation functions they use.
▪ A more expressive aggregation function leads to a
more expressive a GNN.
▪ Injective aggregation function leads to the most
expressive GNN.
 Next:
▪ Theoretically analyze expressive power of
aggregation functions.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 29
 Observation: Neighbor aggregation can be
abstracted as a function over a multi-set (a
set with repeating elements).
Examples of
Equivalent multi-set

Neighbor Multi-set function

aggregation
Same color indicates the
same features.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 30
 Next: We analyze aggregation functions of
two popular GNN models
▪ GCN (mean-pool) [Kipf & Welling, ICLR 2017]
▪ Uses element-wise mean pooling over neighboring node
features
Mean( 𝑥𝑢 𝑢∈𝑁(𝑣) )
▪ GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Uses element-wise max pooling over neighboring node
features
Max( 𝑥𝑢 𝑢∈𝑁 𝑣 )

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
 GCN (mean-pool) [Kipf & Welling ICLR 2017]
▪ Take element-wise mean, followed by linear
function and ReLU activation, i.e., max(0, 𝑥).
▪ Theorem [Xu et al. ICLR 2019]
▪ GCN’s aggregation function cannot distinguish different
multi-sets with the same color proportion.
Failure case

 Why?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 32
 For simplicity, we assume node features
(colors) are represented by one-hot encoding.
▪ Example: If there are two distinct colors:

▪ This assumption is sufficient to illustrate how GCN

fails.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
 GCN (mean-pool) [Kipf & Welling ICLR 2017]
▪ Failure case illustration
Same outputs!

Linear + ReLU Linear + ReLU

Element-wise-
mean-pool

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 34
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Apply an MLP, then take element-wise max.
▪ Theorem [Xu et al. ICLR 2019]
▪ GraphSAGE’s aggregation function cannot distinguish
different multi-sets with the same set of distinct colors.
Failure case

 Why?
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 35
 GraphSAGE (max-pool) [Hamilton et al. NeurIPS 2017]
▪ Failure case illustration
The same outputs!
Element-wise-
max-pool

For simplicity,
assume the one-
hot encoding
after MLP.

MLP

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 36
 We analyzed the expressive power of GNNs.
 Main takeaways:
▪ Expressive power of GNNs can be characterized by
that of the neighbor aggregation function.
▪ Neighbor aggregation is a function over multi-sets
(sets with repeating elements)
▪ GCN and GraphSAGE’s aggregation functions fail to
distinguish some basic multi-sets; hence not injective.
▪ Therefore, GCN and GraphSAGE are not maximally
powerful GNNs.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 37
 Our goal: Design maximally powerful GNNs
in the class of message-passing GNNs.
 This can be achieved by designing injective
neighbor aggregation function over multi-
sets.

 Here, we design a neural network that can

model injective multiset function.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 38
Theorem [Xu et al. ICLR 2019]
Any injective multi-set function can be expressed
Some non-
as: linear function
Some non-
linear function

Sum over multi-set

𝑆 : multi-set
𝑓 +𝑓 +𝑓

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
Proof Intuition: [Xu et al. ICLR 2019]
𝑓 produces one-hot encodings of colors. Summation of
the one-hot encodings retains all the information about
the input multi-set.

Example:
𝑓 +𝑓 +𝑓

One-hot + + =
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 40
 How to model 𝜱 and 𝒇 in 𝜱 σ𝒙∈𝑺 𝒇(𝒙) ?
 We use a Multi-Layer Perceptron (MLP).
 Theorem: Universal Approximation Theorem
[Hornik et al., 1989]

▪ 1-hidden-layer MLP with sufficiently-large hidden

dimensionality and appropriate non-linearity 𝜎(⋅)
(including ReLU and sigmoid) can approximate any
continuous function to an arbitrary accuracy.

Input 𝑾1 𝜎 𝑾2 Output

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 41
 We have arrived at a neural network that can
model any injective multiset function.

▪ In practice, MLP hidden dimensionality of 100 to

500 is sufficient.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 42
 Graph Isomorphism Network (GIN) [Xu et al. ICLR 2019]
▪ Apply an MLP, element-wise sum, followed by
another MLP.

 Theorem [Xu et al. ICLR 2019]

▪ GIN‘s neighbor aggregation function is injective.
 No failure cases!
 GIN is THE most expressive GNN in the class of
message-passing GNNs we have introduced!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 43
 So far: We have described the neighbor
aggregation part of GIN.

 We now describe the full model of GIN by

relating it to WL graph kernel (traditional way
of obtaining graph-level features).
▪ We will see how GIN is a “neural network” version
of the WL graph kernel.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 44
Recall: Color refinement algorithm in WL kernel.
 Given: A graph 𝐺 with a set of nodes 𝑉.
▪ Assign an initial color 𝑐 0 𝑣 to each node 𝑣.
▪ Iteratively refine node colors by
𝑘+1 𝑘 𝑘
𝑐 𝑣 = HASH 𝑐 𝑣 , 𝑐 𝑢 ,
𝑢∈𝑁 𝑣
where HASH maps different inputs to different colors.
▪ After 𝐾 steps of color refinement, 𝑐 𝐾 𝑣
summarizes the structure of 𝐾-hop neighborhood

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 45
Example of color refinement given two graphs
▪ Assign initial colors
1 1 1 1

1 1 1 1

▪ Aggregate neighboring colors

1,11 1,111
1,111 1,11

1,1111 1,111 1,1111 1,111

1,1 1,1
1,1 1,1

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Example of color refinement given two graphs
▪ Aggregated colors:
1,11 1,111
1,111 1,11

1,1111 1,111 1,1111 1,111

1,1 1,1
1,1 1,1

▪ Injectively HASH the aggregated colors

3 4 HASH table: Injective!
4 3
1,1 --> 2
5 4 5 4 1,11 --> 3
1,111 --> 4
1,1111 --> 5
2 2 2 2

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
Example of color refinement given two graphs
 Process continues until a stable coloring is
reached
 Two graphs are considered isomorphic if they
have the same set of colors.

11 8 9 11

12 11 13 10

7 7 7 6

3/7/2023 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
 GIN uses a neural network to model the
injective HASH function.
𝑘+1 𝑘 𝑘
𝑐 𝑣 = HASH 𝑐 𝑣 , 𝑐 𝑢
𝑢∈𝑁 𝑣

 Specifically, we will model the injective

function over the tuple:
(𝑐 𝑘 𝑣 , 𝑐 𝑘 𝑢 𝑢∈𝑁 𝑣 )
Root node
Neighboring
features
node colors
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 49
Theorem (Xu et al. ICLR 2019)
Any injective function over the tuple
Root node
(𝑐 𝑘 𝑣 , 𝑐 𝑘 𝑢 ) Neighboring
feature 𝑢∈𝑁 𝑣 node features

can be modeled as

where 𝜖 is a learnable scalar.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 50
 If input feature 𝑐 0 (𝑣) is represented as one-
hot, direct summation is injective.
Example: + +

+ + =
 We only need Φ to ensure the injectivity.

Root node
features Neighboring node
features This MLP can provide “one-hot” input
feature for the next layer.
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 51
 GIN’s node embedding updates
 Given: A graph 𝐺 with a set of nodes 𝑉.
▪ Assign an initial vector 𝑐 0 𝑣 to each node 𝑣.
▪ Iteratively update node vectors by
𝑘+1 𝑘 𝑘
𝑐 𝑣 = GINConv 𝑐 𝑣 , 𝑐 𝑢 ,
𝑢∈𝑁 𝑣
Differentiable color HASH function
where GINConv maps different inputs to different embeddings.
▪ After 𝐾 steps of GIN iterations, 𝑐 𝐾 𝑣 summarizes
the structure of 𝐾-hop neighborhood.

3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 52
 GIN can be understood as differentiable neural
version of the WL graph Kernel:
Update target Update function
WL Graph Kernel Node colors HASH
(one-hot)
GIN Node embeddings GINConv
(low-dim vectors)

 Advantages of GIN over the WL graph kernel are:

▪ Node embeddings are low-dimensional; hence, they can
capture the fine-grained similarity of different nodes.
▪ Parameters of the update function can be learned for the
3/7/2023
downstream tasks.
Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 53
 Because of the relation between GIN and the
WL graph kernel, their expressive is exactly the
same.
▪ If two graphs can be distinguished by GIN, they can be
also distinguished by the WL kernel, and vice versa.
 How powerful is this?
▪ WL kernel has been both theoretically and
empirically shown to distinguish most of the real-
world graphs [Cai et al. 1992].
▪ Hence, GIN is also powerful enough to distinguish
most of the real graphs!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 54
Failure cases for mean and max pooling:

Colors represent feature values

Ranking by discriminative power:

Jure Les kovec, Stanford University 55

 Can the expressive power of GNNs be improved?
▪ There are basic graph structures that existing GNN
framework cannot distinguish, such as difference in cycles.
Graphs Computational graphs
A 𝑣1 for nodes 𝑣1 and 𝑣2 :
B 𝑣2

▪ GNNs’ expressive power can be improved to resolve

the above problem. [You et al. AAAI 2021, Li et al. NeurIPS 2020]
▪ Stay tuned for Lecture 15: Advanced Topics in GNNs
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
 We design a neural network that can model
injective multi-set function.
 We use the neural network for neighbor
aggregation function and arrive at GIN---the
most expressive GNN model.
 The key is to use element-wise sum pooling,
instead of mean-/max-pooling.
 GIN is closely related to the WL graph kernel.
 Both GIN and WL graph kernel can distinguish
most of the real graphs!
3/7/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Data preprocessing is important:
▪ Node attributes can vary a lot! Use normalization
▪ E.g. probability ranges (0,1), but some inputs could have much
larger range, say (−1000, 1000)
 Optimizer: ADAM is relatively robust to learning rate
 Activation function
▪ ReLU activation function often works well
▪ Other good alternatives: LeakyReLU, PReLU
▪ No activation function at your output layer
▪ Include bias term in every layer
 Embedding dimensions:
▪ 32, 64 and 128 are often good starting points
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
 Debug issues: Loss/accuracy not converging
during training
▪ Check pipeline (e.g. in PyTorch we need zero_grad)
▪ Adjust hyperparameters such as learning rate
▪ Pay attention to weight parameter initialization
▪ Scrutinize loss function!
 Important for model development:
▪ Overfit on (part of) training data:
▪ With a small training dataset, loss should be essentially
close to 0, with an expressive neural network
▪ Monitor the training & validation loss curve
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 60
GraphGym:
Easy and flexible end-to-end GNN pipeline
based on PyTorch Geometric (PyG)

GNN frameworks:
DGL GraphNets Implements a variety
of GNN architectures

Auto-differentiation frameworks
12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 61
Tutorials and overviews:
▪ Relational inductive biases and graph networks (Battaglia et al., 2018)
▪ Representation learning on graphs: Methods and applications (Hamilton et al., 2017)
Attention-based neighborhood aggregation:
▪ Graph attention networks (Hoshen, 2017; Velickovic et al., 2018; Liu et al., 2018)
Embedding entire graphs:
▪ Graph neural nets with edge embeddings (Battaglia et al., 2016; Gilmer et. al., 2017)
▪ Embedding entire graphs (Duvenaud et al., 2015; Dai et al., 2016; Li et al., 2018) and graph pooling
(Ying et al., 2018, Zhang et al., 2018)
▪ Graph generation and relational inference (You et al., 2018; Kipf et al., 2018)
▪ How powerful are graph neural networks(Xu et al., 2017)
Embedding nodes:
▪ Varying neighborhood: Jumping knowledge networks (Xu et al., 2018), GeniePath (Liu et al., 2018)
▪ Position-aware GNN (You et al. 2019)

Spectral approaches to graph neural networks:

▪ Spectral graph CNN & ChebNet (Bruna et al., 2015; Defferrard et al., 2016)
▪ Geometric deep learning (Bronstein et al., 2017; Monti et al., 2017)
Other GNN techniques:
▪ Pre-training Graph Neural Networks (Hu et al., 2019)
▪ GNNExplainer: Generating Explanations for Graph Neural Networks (Ying et al., 2019)

12/6/18 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 62

04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
14-gnn
No ratings yet
14-gnn
58 pages
02-nodeemb
No ratings yet
02-nodeemb
71 pages
04-GNN2
No ratings yet
04-GNN2
73 pages
09-hetero
No ratings yet
09-hetero
62 pages
03-GNN1
No ratings yet
03-GNN1
71 pages
05-GNN2
No ratings yet
05-GNN2
72 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
08 GNN
No ratings yet
08 GNN
79 pages
CE6146_Lecture_5
No ratings yet
CE6146_Lecture_5
55 pages
04-GNN1
No ratings yet
04-GNN1
73 pages
Graph Neural Network Node Emending - Node Edge and Sub Graph
No ratings yet
Graph Neural Network Node Emending - Node Edge and Sub Graph
66 pages
10 KG
No ratings yet
10 KG
63 pages
How Powerful Are Graph Neural Networks
No ratings yet
How Powerful Are Graph Neural Networks
17 pages
Graph Conv
No ratings yet
Graph Conv
16 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
CS 224W Fall 2023 HW1
No ratings yet
CS 224W Fall 2023 HW1
11 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
Documents 2025-3 [v2] GNN (Node Classification) GNN Classification v2
No ratings yet
Documents 2025-3 [v2] GNN (Node Classification) GNN Classification v2
74 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
Xford Presentation GNN Part 1
No ratings yet
Xford Presentation GNN Part 1
6 pages
gnns
No ratings yet
gnns
75 pages
4320 A New Perspective On How Graph Compressed
No ratings yet
4320 A New Perspective On How Graph Compressed
23 pages
Lecture 14 Graph Neural Networks (GNNs)
No ratings yet
Lecture 14 Graph Neural Networks (GNNs)
16 pages
09 Hetero
No ratings yet
09 Hetero
72 pages
GNNs
No ratings yet
GNNs
28 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
No ratings yet
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
42 pages
grl unit 3
No ratings yet
grl unit 3
14 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
06-GNN3
No ratings yet
06-GNN3
73 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
GNN-Foundations-Frontiers-and-Applications-chapter3
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter3
11 pages
Graph Neural Networks: Primeview
No ratings yet
Graph Neural Networks: Primeview
1 page
Intro To GNN
No ratings yet
Intro To GNN
49 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
5 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
CS224w Machine Learning With Graphs
No ratings yet
CS224w Machine Learning With Graphs
127 pages
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
No ratings yet
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
13 pages
2024_Introduction to Graph Neural Networks A Starting
No ratings yet
2024_Introduction to Graph Neural Networks A Starting
49 pages
Ai Presentation
No ratings yet
Ai Presentation
71 pages
Graph Neural Networks (GNNs)
No ratings yet
Graph Neural Networks (GNNs)
22 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
9 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Cover Data Dukung
No ratings yet
Cover Data Dukung
1 page
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
Poc Unit-1 Notes
No ratings yet
Poc Unit-1 Notes
46 pages
GNN Review
No ratings yet
GNN Review
26 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Keras to Kubernetes: The Journey of a Machine Learning Model to Production
From Everand
Keras to Kubernetes: The Journey of a Machine Learning Model to Production
Dattaraj Rao
No ratings yet
SAP IQ Installation and Configuration Linux en
No ratings yet
SAP IQ Installation and Configuration Linux en
240 pages
Ne Report
No ratings yet
Ne Report
6 pages
Comparative Evaluation of Radio Network Planning for Different 5G-NR Channel Models on Urban Macro Environments in Quito city
No ratings yet
Comparative Evaluation of Radio Network Planning for Different 5G-NR Channel Models on Urban Macro Environments in Quito city
23 pages
Technique Library PDF
No ratings yet
Technique Library PDF
328 pages
Stainless Steel Pipe Weight Per Meter and Pipe Thickness Chart in MM
No ratings yet
Stainless Steel Pipe Weight Per Meter and Pipe Thickness Chart in MM
4 pages
Instant Download (Ebook PDF) Java Foundations: Introduction To Program Design and Data Structures 5th Edition PDF All Chapter
83% (6)
Instant Download (Ebook PDF) Java Foundations: Introduction To Program Design and Data Structures 5th Edition PDF All Chapter
51 pages
Wordbeginner
0% (1)
Wordbeginner
3 pages
9778d5d219c5080b9a6a17bef029331c_1
No ratings yet
9778d5d219c5080b9a6a17bef029331c_1
8 pages
HB-0286-006 1090119 HB IAS QIAxcel DNA 1114 WW
No ratings yet
HB-0286-006 1090119 HB IAS QIAxcel DNA 1114 WW
56 pages
LECTURE 6
No ratings yet
LECTURE 6
26 pages
20160120-ML Installtaion of Application Software
No ratings yet
20160120-ML Installtaion of Application Software
10 pages
Seismic Manual Opa 2123 10 - 2 132
No ratings yet
Seismic Manual Opa 2123 10 - 2 132
66 pages
4000 Service Connector Kit
No ratings yet
4000 Service Connector Kit
16 pages
ALFOPlus80HDX OO CC
No ratings yet
ALFOPlus80HDX OO CC
29 pages
Research Methods
No ratings yet
Research Methods
7 pages
Smart Switch Modelo DS 3e1326p Si Hikvision
No ratings yet
Smart Switch Modelo DS 3e1326p Si Hikvision
6 pages
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
RedmineTextFormattingTextile - Redmine
No ratings yet
RedmineTextFormattingTextile - Redmine
7 pages
Lec. - 1 - (Linear Eqns.)
No ratings yet
Lec. - 1 - (Linear Eqns.)
8 pages
Personal Profile : Kimeli Birgen
No ratings yet
Personal Profile : Kimeli Birgen
8 pages
DWDM Externallab2022for Student
No ratings yet
DWDM Externallab2022for Student
3 pages
Power Plant: Teledyne Continental O-200-A.
No ratings yet
Power Plant: Teledyne Continental O-200-A.
8 pages
Experiment 2 Ping Message
No ratings yet
Experiment 2 Ping Message
3 pages
Lesson3-Audit Documentation, Audit Evidence & Audit Sampling
No ratings yet
Lesson3-Audit Documentation, Audit Evidence & Audit Sampling
7 pages
Lester Khiets Roa Bsce 2-A 10 Engineers Who Became President or General Manager of A Large Company
No ratings yet
Lester Khiets Roa Bsce 2-A 10 Engineers Who Became President or General Manager of A Large Company
8 pages
Xero User Manual
No ratings yet
Xero User Manual
8 pages
Checklist - Casing - Non Pressure Parts Drgs.
No ratings yet
Checklist - Casing - Non Pressure Parts Drgs.
3 pages
Enterprise Resource Planning - Lecture Notes, Study Material and Important Questions, Answers
No ratings yet
Enterprise Resource Planning - Lecture Notes, Study Material and Important Questions, Answers
4 pages
Tamil Typing Practice Book Free Download PDF
No ratings yet
Tamil Typing Practice Book Free Download PDF
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

07-theory

Uploaded by

07-theory

Uploaded by

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

CS224W: Machine Learning with Graphs

(5) Learning objective

(4) Graph augmentation

Different GNN models use different

Element-wise mean pooling +

MLP + element-wise max-pooling

 Key question: How well can a GNN distinguish

because they have

still have different

Node 1 has neighbors of degrees 2 and 3.

because they are

Node 1 has neighbors of degrees 2 and 3.

 Next: We need to understand how a GNN

GNN won’t be able to distinguish nodes 1 and 2

▪ GNN can fully distinguish different subtree

Neighbor Multi-set function

▪ This assumption is sufficient to illustrate how GCN

Linear + ReLU Linear + ReLU

 Here, we design a neural network that can

Sum over multi-set

▪ 1-hidden-layer MLP with sufficiently-large hidden

▪ In practice, MLP hidden dimensionality of 100 to

 Theorem [Xu et al. ICLR 2019]

 We now describe the full model of GIN by

▪ Aggregate neighboring colors

1,1111 1,111 1,1111 1,111

1,1111 1,111 1,1111 1,111

▪ Injectively HASH the aggregated colors

 Specifically, we will model the injective

where 𝜖 is a learnable scalar.

 Advantages of GIN over the WL graph kernel are:

Colors represent feature values

Jure Les kovec, Stanford University 55

▪ GNNs’ expressive power can be improved to resolve

Spectral approaches to graph neural networks:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.