07 Theory2
07 Theory2
material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://cs224w.Stanford.edu
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
¡ Move recitation time
We will host our recitations in the evenings from now on
to accommodate remote students. Recordings are also
available via Ed posts.
¡ Clarification on project feedbacks
After project proposal, you will be assigned a TA to
mentor your project for detailed feedbacks.
¡Lecture pace
We will slow down the pace.
¡ Individual questions around lecture content
Please come to OH for in-depth QA.
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
𝑣"
𝑣"
A grid graph NYC road network
A 𝑣! B 𝑣"
Example input
graphs
… …
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 14
Different Inputs but the same computational graph à GNN fails
𝑣" 𝑣#
Example input Edge A and B share
graphs A B node 𝑣%
We look at embeddings
for 𝑣! and 𝑣"
𝑣$
Existing GNNs’
computational
𝑣" A = B 𝑣#
graphs
… …
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 15
Different Inputs but the same computational graph à GNN fails
A B
Example input
graphs
We look at embeddings
for each node For each node: For each node:
A B
Existing GNNs’
computational
graphs =
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
¡ Recall the GIN update:
§
§ Where 𝐴𝜖{0,1}"×" is the adjacency matrix of the graph, i.e., 𝐴 𝑢, 𝑣 = 1
if (u,v) is an edge and 𝐴 𝑢, 𝑣 = 0 if (u,v) is not an edge.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 20
¡ Let's compute the eigenvalue decomposition of the graph
adjacency matrix A.
¡ The weights of the first MLP layer depend on the eigenvalues and the dot
product between the eigenvectors and the colors at the previous level.
¡ If we zoom in
¡ The new node colors only depend on the eigenvectors that are not
orthogonal to 1.
¡ Graphs with symmetries admit eigenvectors orthogonal to 1.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
¡ The WL kernel cannot distinguish between some
basic graph structures, e.g.,
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
¡ The WL kernel cannot count basic graph
structures:
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Summary: The limitations of the WL kernel
are limitations of the initial node color.
§ These limitations are well understood in the
spectral domain.
§ Constant node colorings are orthogonal with adjacency
eigenvectors and critical spectral components
(eigenvalues and eigenvectors) are omitted.
§ In a high level, colors generated by the WL kernel
obey the same symmetries as graph structure.
§ These joint symmetries lock the message-passing
operations to limited representations.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 27
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ We use the following thinking:
§ Two different inputs (nodes, edges, graphs) are labeled differently
§ A “failed” model will always assign the same embedding to them
§ A “successful” model will assign different embeddings to them
§ Embeddings are determined by GNN computational graphs:
A !! B !"
Input graphs
0001
Input graphs
0001
§ Issues:
§ Not scalable: Need 𝑂(𝑁) feature dimensions (𝑁 is the
number of nodes)
§ Not inductive: Cannot generalize to new nodes/graphs
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 32
¡ Feature augmentation: constant vs. one-hot
Constant node feature One-hot node feature
1 2
1 1
1 3
1 6
1 4
1 5
Expressive power Medium. All the nodes are High. Each node has a unique ID,
identical, but GNN can still learn so node-specific information can
from the graph structure be stored
Inductive learning High. Simple to generalize to new Low. Cannot generalize to new
(Generalize to nodes: we assign constant nodes: new nodes introduce new
unseen nodes) feature to them, then apply our IDs, GNN doesn’t know how to
GNN embed unseen IDs
Computational Low. Only 1 dimensional feature High. High dimensional feature,
cost cannot apply to large graphs
Use cases Any graph, inductive settings Small graph, transductive settings
(generalize to new nodes) (no new nodes)
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021
B !" A !! B !"
𝑣Goal:
! classify !! and !" 𝑣"
computational graph ID-GNN rooted subtrees
1 1
B !" !! A B !"
= ≠ 0
Cycle count 0
at each level 2 2
2 0
… … …
length-3 cycles = 2 length-3 cycles = 0
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
C. Kanatsoulis, A. Ribeiro. Graph Neural Network Are More Powerful Than we Think, ICASSP 2024
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
Why do we need feature augmentation?
¡ (2) Certain structures are hard to learn by GNN
¡ Other commonly used augmented features:
§ Clustering coefficient
§ PageRank
§ Centrality
§ …
¡ Any feature we have introduced in
Lecture 1 can be used!
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
¡ Feature augmentation: constant vs. Structure
Constant node feature Structure-aware node feature
1 2
1 1
1 3
1 6
1 4
1 5
Expressive power Medium. All the nodes are High. Each node has a structure-
identical, but GNN can still learn aware ID, so node-specific
from the graph structure information can be stored
Inductive learning High. Simple to generalize to new High. Simple to generalize to new
(Generalize to nodes: we assign constant nodes: can count triangles or
unseen nodes) feature to them, then apply our closed loops for any graph
GNN
Computational Low. Only 1 dimensional feature Low/High. Depending on the
cost structures we are counting
Use cases Any graph, inductive settings Any graph, inductive settings
(generalize to new nodes) (generalize new nodes)
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
CS224W: Machine Learning with Graphs
Charilaos Kanatsoulis and Jure Leskovec, Stanford
University
http://cs224w.stanford.edu
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024
1
3 Random samples for node 3
4
6 [0.2, 1.5, -2.3, -10.1]
5
Total number of random samples = 4
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
2
Node 1 [3.3, -1.7, -1.2, -0.1]
1 Node 2 [-0.1, -5.4, 3.0, -9.8]
3
Node 3 [0.2, 1.5, -2.3, -10.1]
Node 4 [0.5, 1.9, -12.7, 11.1]
6
4 Node 5 [5.1, -0.7, -2.9, -13.5]
5 Node 6 [-1.2, 7.5, -0.3, -7.9]
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
2
Node 1 [3.3, -1.7, -1.2, -0.1]
1 Node 2 [-0.1, -5.4, 3.0, -9.8]
3
Node 3 [0.2, 1.5, -2.3, -10.1]
Node 4 [0.5, 1.9, -12.7, 11.1]
6
4 Node 5 [5.1, -0.7, -2.9, -13.5]
5 Node 6 [-1.2, 7.5, -0.3, -7.9]
-0.1
3.3
0.2
GNN
-1.2
0.5
5.1
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Node 1 [3.3, -1.7, -1.2, -0.1]
Node 2 [-0.1, -5.4, 3.0, -9.8]
Node 3 [0.2, 1.5, -2.3, -10.1]
3.3
0.2
GNN
-1.2
0.5
5.1
-9.8
-0.1
-10.1
GNN
-7.9
11.1
-13.5
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024
Position-aware task
¡ Nodes are labeled by
their positions in the
A !" !# B
A B graph
A B
… … … …
Relative
Distances
A 𝑣# 𝑣$ B 𝑠# 𝑠$
A B
𝑠# 𝑠$ 𝑣# 1 2
A B 𝑣$ 2 1
Anchor Anchor
§ where
§ 𝑐 is a constant.
§ 𝑆/,0 ⊂ 𝑉 is chosen by including each node in 𝑉 independently with
"
probability ' .
#
§ 𝑑%&' 𝑣, 𝑆/,0 ≡ min 𝑑 𝑣, 𝑢 .
1∈3',)
A 𝑣# B 𝑠# 𝑠$ 𝑠%
A B 𝑣!’s Position
𝑠# 𝑠$ 𝑣# 1 2 1 encoding
𝑠% 𝑣#’s Position
A 𝑣% B 𝑣% 1 2 0 encoding
Anchor
Size-2
Anchor-set