0% found this document useful (0 votes)

5 views71 pages

03 GNN1

CS224W is a course on Machine Learning with Graphs, focusing on mapping nodes to low-dimensional embeddings to capture similarities. It discusses the limitations of shallow embedding methods and introduces deep learning techniques using Graph Neural Networks (GNNs) for tasks such as node classification and link prediction. The course emphasizes the importance of permutation invariance in graph representation and the challenges posed by the complex structure of graph data.

Uploaded by

Sher1ock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views71 pages

03 GNN1

Uploaded by

Sher1ock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

http://cs224w.stanford.edu
ANNOUNCEMENTS
• Next Thursday (10/12): Colab 1 due

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Intuition: Map nodes to 𝑑-dimensional
embeddings such that similar nodes in the
graph are embedded close together

f( )=
Input graph 2D node embeddings

How to learn mapping function 𝒇?

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
"
Goal: similarity 𝑢, 𝑣 ≈ 𝐳! 𝐳#

Need to define!

Input network d-dimensional

embedding space
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4
¡ Encoder: Maps each node to a low-dimensional
vector d-dimensional
ENC 𝑣 = 𝐳! embedding
node in the input graph
¡ Similarity function: Specifies how the
relationships in vector space map to the
relationships in the original network
similarity 𝑢, 𝑣 ≈ 𝐳!" 𝐳# Decoder
Similarity of 𝑢 and 𝑣 in dot product between node
the original network embeddings
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
Simplest encoding approach: Encoder is just an
embedding-lookup
embedding vector for a
embedding specific node
matrix

Dimension/size
𝐙= of embeddings

one column per node

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
¡ Limitations of shallow embedding methods:
§ 𝑶(|𝑽|𝒅) parameters are needed:
§ No sharing of parameters between nodes
§ Every node has its own unique embedding
§ Inherently “transductive”:
§ Cannot generate embeddings for nodes that are not seen
during training
§ Do not incorporate node features:
§ Nodes in many graphs have features that we can and
should leverage

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
¡ Today: We will now discuss deep learnig
methods based on graph neural networks
(GNNs):
multiple layers of
ENC 𝑣 = non-linear transformations
based on graph structure

¡ Note: All these deep encoders can be

combined with node similarity functions
defined in the Lecture 3.
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
…

Output: Node embeddings.

Also, we can embed subgraphs,
and graphs
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
Tasks we will be able to solve:
¡ Node classification
§ Predict the type of a given node
¡ Link prediction
§ Predict whether two nodes are linked
¡ Community detection
§ Identify densely linked clusters of nodes
¡ Network similarity
§ How similar are two (sub)networks

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10
Images

Text/Speech

Modern deep learning toolbox is designed

for simple sequences & grids
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
But networks are far more complex!
§ Arbitrary size and complex topological structure (i.e.,
no spatial locality like grids)

vs.
Text

Networks Images

§ No fixed node ordering or reference point

§ Often dynamic and have multimodal features
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
1. Basics of deep learning

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
¡ Loss function:
min ℒ(𝒚, 𝑓! 𝒙 )
!
¡ 𝑓 can be a simple linear layer, an MLP, or other
neural networks (e.g., a GNN later)
¡ Sample a minibatch of input 𝒙
¡ Forward propagation: Compute ℒ given 𝒙
¡ Back-propagation: Obtain gradient ∇! ℒ using a
chain rule.
¡ Use stochastic gradient descent (SGD) to
optimize ℒ for Θ over many iterations.
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Local network neighborhoods:
§ Describe aggregation strategies
§ Define computation graphs

¡ Stacking multiple layers:

§ Describe the model, parameters, training
§ How to fit the model?
§ Simple example for unsupervised and
supervised training

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16
¡ Assume we have a graph 𝑮:
§ 𝑉 is the vertex set
§ 𝑨 is the adjacency matrix (assume binary)
§ 𝑿 ∈ ℝ ! ×# is a matrix of node features
§ 𝑣: a node in 𝑉; 𝑁 𝑣 : the set of neighbors of 𝑣.
§ Node features:
§ Social networks: User profile, User image
§ Biological networks: Gene expression profiles, gene
functional information
§ When there is no node feature in the graph dataset:
§ Indicator vectors (one-hot encoding of a node)
§ Vector of constant 1: [1, 1, …, 1]
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
A naïve approach
• Take adjacency matrix and feature matrix

• Concatenate them [A, X]

¡ Join adjacency matrix and features
¡ Feed
• Feed them
them into deepinto
(fully aconnected)
deep neural net:
neural net

• Done?
A B C D E Feat
A 0 1 1 1 0 1 0
A

C
B

D
E
B
C
D
1
1
1
0
0
1
0
0
1
1
1
0
1
0
1
0
0
1
0
1
1
?
¡ Issues with this idea:
E 0 1 0 1 0 1 0

¡ Issues with this idea:

Problems:
§ 𝑂(|𝑉|) parameters
• Huge number of parameters
§ Not applicable to graphs of different sizes
• No inductive learning possible
§ Sensitive to node ordering
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18
CNN on an image:

Goal is to generalize convolutions beyond simple lattices

Leverage node features/attributes (e.g., text, images)
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
Hidden layer Hidden l
Graph-structured data
Graph-structured data
What if our data looks like this?
But our graphs
Input
What look
if our data like
looks likethis:
this?
Input

or or
this:
this: ReLU
ReLU
or this:

… …
…
§ There is no fixed notion of locality or sliding
window on the graph
§ Graph is permutation invariant
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
¡ Graph does not have a canonical order of the nodes!
¡ We can have many different order plans.

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ Graph does not have a canonical order of the nodes!
Node features 𝑿𝟏 Adjacency matrix 𝑨𝟏
Order plan 1 B A A B C D E F
A
B
A B
C C
C
D D
F E E
D E
F F

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ Graph does not have a canonical order of the nodes!
Node features 𝑿𝟏 Adjacency matrix 𝑨𝟏
Order plan 1 B A A B C D E F
A
B
A B
C C
C
D D
F E E
D E
F F

Node features 𝑿𝟐 Adjacency matrix 𝑨𝟐

Order plan 2 F A A B C D E F
A
B
E B
D C
C
D D
A E E
C B
F F
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ Graph does not have a canonical order of the nodes!
Node features 𝑿𝟏 Adjacency matrix 𝑨𝟏
Order plan 1 B A A B C D E F
A
B
A B
C C
C
D D
F
Graph and nodeF representations
D E E E
F
should be the same for Order
Node feature 𝑿 planmatrix
Adjacency 1 𝑨 𝟐 𝟐
Order plan 2 A B C D E F
F
and Order
A
B
plan 2 A
E B
D C
C
D D
A E E
C B
F F
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
What does it mean by “graph representation is
same for two order plans”? In other words, 𝑓 maps a
graph to a 𝑑-dim embedding

¡ Consider we learn a function 𝑓 that maps a

graph 𝐺 = (𝑨, 𝑿) to a vector ℝ$ then
𝑓 𝑨% , 𝑿% = 𝑓 𝑨& , 𝑿& 𝑨 is the adjacency matrix
𝑿 is the node feature matrix

Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐

B
For two order plans, F
A output of 𝑓 should E
C
be the same! D

F
D E A
C B
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
What does it mean by “graph representation is
same for two order plans”?
¡ Consider we learn a function 𝑓 that maps a graph
𝐺 = (𝑨, 𝑿) to a vector ℝ" . 𝑨 is the adjacency matrix
𝑿 is the node feature matrix

¡ Then, if 𝑓 𝑨# , 𝑿# = 𝑓 𝑨$ , 𝑿$ for any order plan 𝑖

and 𝑗, we formally say 𝑓 is a permutation invariant
function. For a graph with |𝑉| nodes, there
are |𝑉|! different order plans. 𝑚… each node has a 𝑚-dim
feature vector associated with it.

¡ Definition: For any graph function 𝑓: ℝ % ×' ×

ℝ % ×|%| → ℝ" , 𝑓 is permutation-invariant if
𝑓 𝐴, 𝑋 = 𝑓 𝑃𝐴𝑃) , 𝑃𝑋 for any permutation 𝑃.
𝑑… output embedding dimensionality of Permutation 𝑃: a shuffle of the node order
embedding the graph 𝐺 = (𝐴, 𝑋) Example: (A,B,C)->(B,C,A)
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
For node representation: We learn a function 𝑓 that
maps nodes of 𝐺 to a matrix ℝ|%|×" . Inmapped
other words, each node in 𝑉 is
to a 𝑑-dim embedding.

Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐

B F

A E
C D

F A
D E C B
A A
B B
C C
𝑓 𝑨! , 𝑿! = 𝑓 𝑨" , 𝑿" =
D D
E E
F F
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
For node representation: We learn a function 𝑓 that
maps nodes of 𝐺 to a matrix ℝ|%|×" .
Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐
B F

A E
C D

F A
D E C B
Representation vector
of the brown node A A A
B B
C C
𝑓 𝑨! , 𝑿! = 𝑓 𝑨" , 𝑿" =
D D
Representation vector
E For two order plans, the vector of node at E of the brown node E

F the same position in the graph is the same! F

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
For node representation: We learn a function 𝑓 that
maps nodes of 𝐺 to a matrix ℝ|%|×" .
Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐
B F

A E
C D

F A
D E C B
A A
B B
Representation vector
C C
𝑓 𝑨! , 𝑿! =
of the green node C
𝑓 𝑨" , 𝑿" = Representation vector
D D of the green node D

E For two order plans, the vector of node at E

F the same position in the graph is the same! F
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
For node representation:
¡ Consider we learn a function 𝑓 that maps a
graph 𝐺 = (𝑨, 𝑿) to a matrix ℝ|(|×$
¡ If the output vector of a node at the same
position in the graph remains unchanged for any
order plan, we say 𝑓 is permutation
equivariant. 𝑚… each node has a 𝑚-dim
feature vector associated with it.

¡ Definition: For any node function 𝑓: ℝ ( ×* ×

ℝ ( ×|(| → ℝ ( ×$ , 𝑓 is permutation-
equivariant if 𝑃𝑓 𝐴, 𝑋 = 𝑓 𝑃𝐴𝑃+ , 𝑃𝑋 for any
permutation 𝑃. 𝑓 maps each node in 𝑉 to a 𝑑-dim embedding.
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30
¡ Permutation-invariant Permute the input, the output
stays the same.
𝑓 𝐴, 𝑋 = 𝑓 𝑃𝐴𝑃+ , 𝑃𝑋 (map a graph to a vector)

¡ Permutation-equivariant Permute the input, output also

𝑃𝑓 𝐴, 𝑋 = 𝑓 𝑃𝐴𝑃+ , 𝑃𝑋 permutes accordingly.
(map a graph to a matrix)

¡ Examples:
§ 𝑓 𝐴, 𝑋 = 1$ 𝑋 : Permutation-invariant
§ Reason: 𝑓 𝑃𝐴𝑃& , 𝑃𝑋 = 1& 𝑃𝑋 = 1& 𝑋 = 𝑓 𝐴, 𝑋
§ 𝑓 𝐴, 𝑋 = 𝑋 : Permutation-equivariant
§ Reason: 𝑓 𝑃𝐴𝑃& , 𝑃𝑋 = 𝑃𝑋 = 𝑃𝑓 𝐴, 𝑋
§ 𝑓 𝐴, 𝑋 = 𝐴𝑋 : Permutation-equivariant
§ Reason: 𝑓 𝑃𝐴𝑃& , 𝑃𝑋 = 𝑃𝐴𝑃& 𝑃𝑋 = 𝑃𝐴𝑋 = 𝑃𝑓 𝐴, 𝑋
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31
[Bronstein, ICLR 2021 keynote]

¡ Graph neural networks consist of multiple

permutation equivariant / invariant functions.

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
Are other neural network architectures, e.g.,
MLPs, permutation invariant / equivariant?
¡ No.
Switching the order of the
input leads to different
outputs!

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
A naïve approach
• Take adjacency matrix and feature matrix
Are other neural network
• Concatenate them [A, X] architectures, e.g.,
MLPs, permutation invariant / equivariant?
• Feed them into deep (fully connected) neural net
¡ No.
• Done?
A B C D E Feat
A 0 1 1 1 0 1 0
A

C
B

D
E
B
C
D
1
1
1
0
0
1
0
0
1
1
1
0
1
0
1
0
0
1
0
1
1
?
E 0 1 0 1 0 1 0

Problems:

• Huge number of parameters

This explains why the naïve MLP approach
fails for graphs!
• No inductive learning possible

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34
A naïve approach
• Take adjacency matrix and feature matrix
¡ Are any neural network
• Concatenate them [A, X] architectures, e.g.,
MLPs, permutation invariant / equivariant?
• Feed them into deep (fully connected) neural net

• Done?
Next: Design graph neural
networks that are permutation
A B C D E Feat
A 0 1 1 1 0 1 0
A

C
B

D
B

D
1

1
0
0
1
0

invariant / equivariant by
E C 1 0
1
1
1
0
1
0
1
0
0
1
0
1
1
?
passing and aggregating
E 0 1 0 1 0 1 0

information from neighbors!

Problems:
This explains why the naïve MLP approach is bad!
• Huge number of parameters
• No inductive learning possible

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35
1. Basics of deep learning

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
[Kipf and Welling, ICLR 2017]

Idea: Node’s neighborhood defines a

computation graph

𝑖 𝑖

Determine node Propagate and

computation graph transform information

Learn how to propagate information across the

graph to compute node features
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37
¡ Key idea: Generate node embeddings based
on local network neighborhoods

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
¡ Intuition: Nodes aggregate information from
their neighbors using neural networks
A

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

Neural networks
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
¡ Intuition: Network neighborhood defines a
computation graph
Every node defines a computation
graph based on its neighborhood!

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
¡ Model can be of arbitrary depth:
§ Nodes have embeddings at each layer
§ Layer-0 embedding of node 𝑣 is its input feature, 𝑥𝑣
§ Layer-𝑘 embedding gets information from nodes that
are 𝑘 hops away
Layer-0
Layer-1 A xA
TARGET NODE B B C xC
Layer-2 A xA
A
C B xB
F
A C
E xE
D
E
F xF
D
INPUT GRAPH A
xA
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
¡ Neighborhood aggregation: Key distinctions
are in how different approaches aggregate
information across the layers
A

TARGET NODE B B ? C

What is in the box? A

A
C B
A ? C ? E
F
D F
E

INPUT GRAPH
D
? A

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42
¡ Basic approach: Average information from
neighbors and apply a neural network

(1) average messages A

TARGET NODE B from neighbors B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A
(2) apply neural network
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
¡ Basic approach: Average neighbor messages
and apply a neural network
Initial 0-th layer embeddings are
equal to node features embedding of
h&% = x%
𝑣 at layer 𝑘
(*)
(*+,) h- (*)
h% = 𝜎(W* ; + B* h% ), ∀𝑘 ∈ {0, … , 𝐾 − 1}
N(𝑣)
-∈/(%)
(()
Total number
z% = h% Average of neighbor’s of layers
previous layer embeddings
Embedding after K
layers of neighborhood Non-linearity
aggregation (e.g., ReLU) Notice summation is a permutation
invariant pooling/aggregation.
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
What are the invariance and equivariance
properties for a GCN?
¡ Given a node, the GCN that computes its
embedding is permutation invariant
Shared NN weights
B
B
A
C
D A C

F D
D E
Target Node Average of neighbor’s previous layer
embeddings - Permutation invariant
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
¡ Considering all nodes in a graph, GCN computation
is permutation equivariant
Embeddings 𝐻!
B A
Order A
B

plan 1 C C
D
F E
D E F
Target Node Permute the input, the output also permutes
accordingly - permutation equivariant
F Embeddings 𝐻"
Order E
A

plan 2 D B
C
D
A
C B E
F
Target Node
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
¡ Considering all nodes in a graph, GCN computation
is permutation equivariant
Embeddings 𝐻!
Detailed reasoning: A
B
1. The rows of input node features and C
output embeddings are aligned D
2. We know computing the embedding E
of a given node with GCN is invariant. F

3. So, after permutation, the location Permute the input, the output also permutes
of a given node in the input node accordingly - permutation equivariant
Embeddings 𝐻"
feature matrix is changed, and the the A
output embedding of a given node B
stays the same (the colors of node C
feature and embedding are matched) D
This is permutation equivariant E
F

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
How do we train the GCN to
generate embeddings?

𝒛0

Need to define a loss function on the embeddings.

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
Trainable weight matrices
(2) (i.e., what we learn)
h0 = x0
(5)
(567) h8 (5)
h0 = 𝜎(W5 @ + B5 h0 ), ∀𝑘 ∈ {0. . 𝐾 − 1}
N(𝑣)
(4) 8∈:(0)
z0 = h0
Final node embedding

We can feed these embeddings into any loss function

and run SGD to train the weight parameters
ℎ!" : the hidden representation of node 𝑣 at layer 𝑘
¡ 𝑊" : weight matrix for neighborhood aggregation
¡ 𝐵" : weight matrix for transforming hidden vector of
self
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49
¡ Many aggregations can be performed
efficiently by (sparse) matrix operations
¡ Let 𝐻 (,) = [ℎ(,) … ℎ(,) ]5 Matrix of hidden embeddings 𝐻 ($%&)
2(,) |4|
¡ Then: ∑(∈*1 ℎ( = A!,: H (,)
¡ Let 𝐷 be diagonal matrix where
𝐷!,! = Deg 𝑣 = |𝑁 𝑣 |
§ The inverse of 𝐷: 𝐷 !" is also diagonal:
!" = 1/|𝑁 𝑣 |
𝐷#,# ($%&)
𝒉(
¡ Therefore,
(+,!)
ℎ%
;
|𝑁(𝑣)|
𝐻(*+,) = 𝐷8, 𝐴𝐻(*)
%∈'())

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 50
¡ Re-writing update function in matrix form:
I (*) 𝑊*9 + 𝐻
𝐻(*+,) = 𝜎(𝐴𝐻 *
𝐵*9 )
where 𝐴I = 𝐷8, 𝐴
(&) (&)
𝐻(&) = [ℎ" … ℎ|)| ]*
§ Red: neighborhood aggregation
§ Blue: self transformation

¡ In practice, this implies that efficient sparse

matrix multiplication can be used (𝐴@ is sparse)
¡ Note: not all GNNs can be expressed in a simple matrix form,
when aggregation function is complex
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
¡ Node embedding 𝒛0 is a function of input graph
¡ Supervised setting: We want to minimize loss ℒ:
min ℒ(𝒚, 𝑓, 𝒛! )
,
§ 𝒚: node label
§ ℒ could be L2 if 𝒚 is real number, or cross entropy
if 𝒚 is categorical
¡ Unsupervised setting:
§ No node label available
§ Use the graph structure as the supervision!

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52
¡ One possible idea: “Similar” nodes have similar
embeddings:
𝐦𝐢𝐧𝚯 ℒ = ; CE(𝑦%,) , DEC 𝑧% , 𝑧) )
.) ,.*
§ where 𝑦%,) = 1 when node 𝑢 and 𝑣 are similar
§ 𝑧2 = 𝑓3 𝑢 and DEC(⋅,⋅) is the dot product
§ CE is the cross entropy loss:
§ CE 𝒚, 𝑓 𝒙 = − ∑A?@,(𝑦? log 𝑓B (𝑥)? )
§ 𝑦4 and 𝑓3 (𝑥)4 are the actual and predicted values of the 𝑖-th class.
§ Intuition: the lower the loss, the closer the prediction is to one-hot
¡ Node similarity can be anything from
Lecture 2, e.g., a loss based on:
§ Random walks (node2vec, DeepWalk, struc2vec)
§ Matrix factorization
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53
Directly train the model for a supervised task
(e.g., node classification)

Safe or toxic
Safe or toxic drug?
drug?

E.g., a drug-drug
interaction network
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54
Directly train the model for a supervised task
(e.g., node classification)
¡ Use cross entropy loss (Slide 53)

ℒ = − @ 𝑦0 log(𝜎(z0I 𝜃)) + 1 − 𝑦0 log(1 − 𝜎 z0I 𝜃 )

0∈%

Encoder output: Classification

node embedding weights
Node class
label
Safe or toxic drug?

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
(1) Define a neighborhood
aggregation function

𝒛-

(2) Define a loss function on the

embeddings
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56
(3) Train on a set of nodes, i.e.,
a batch of compute graphs

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57
(4) Generate embeddings
for nodes as needed
Even for nodes we never
trained on!

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58
¡ The same aggregation parameters are shared
for all nodes:
§ The number of model parameters is sublinear in
|𝑉| and we can generalize to unseen nodes!

B shared parameters

A
C 𝑊. 𝐵.
F
D
shared parameters
E

INPUT GRAPH
Compute graph for node A Compute graph for node B

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59
z#

Train on one graph Generalize to new graph

Inductive node embedding Generalize to entirely unseen graphs

E.g., train on protein interaction graph from model organism A and generate
embeddings on newly collected data about organism B
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
z#

Generate embedding
Train with snapshot New node arrives for new node

¡ Many application settings constantly encounter

previously unseen nodes:
§ E.g., Reddit, YouTube, Google Scholar
¡ Need to generate new embeddings “on the fly”
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61
1. Basics of deep learning

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62
¡ How do GNNs compare to prominent
architectures such as Convolutional Neural
Nets?

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63
Convolutional neural networks (on gri
Convolutional neural network (CNN)
Single CNN layer with 3x3 filter:
layer with
3x3 filter:

CNN
Image Output
(Animation by
Vincent Dumoulin)
weights

(+,") - (+)
CNN formulation: h# = 𝜎(∑-∈/ # ∪{#} W+ h- ), ∀𝑙 ∈ {0, … , 𝐿 − 1}

𝑵 𝒗 represents the 8 neighbor pixels of 𝒗.

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64
Convolutional neural networks (on grids)
Convolutional neural network (CNN) layer with
Single CNN layer with 3x3 filter:
3x3 filter:

(Animation by
Vincent Dumoulin)

Image Graph
(#)
(,-&) 2! (,)
• GNN formulation: h+ = 𝜎(𝐖𝒍 ∑/∈1(+) 1(+)
+ B, h+ ), ∀𝑙 ∈ {0, … , 𝐿 − 1}
(,-&) (,)
• CNN formulation: (previous slide) h+ = 𝜎(∑/∈1 + ∪ + W,/ h/ ), ∀𝑙 ∈ {0, … , 𝐿 − 1}
(,-&)
End-to-end 𝒖
learning on graphs with GCNs (,) Thomas(,)
Kipf 5
if we rewrite: h+ = 𝜎(∑/∈1 + 𝐖𝒍 h/ + B, h+ ), ∀𝑙 ∈ {0, … , 𝐿 − 1}

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65
Convolutional neural networks (on grids)
Convolutional neural network (CNN) layer with
Single CNN layer with 3x3 filter:
3x3 filter:

(Animation by
Vincent Dumoulin)

Image Graph
(#)
(&'() -! (&)
GNN formulation: h% = 𝜎(𝐖𝒍 ∑*∈,(%) ,(%)
+ B& h% ), ∀𝑙 ∈ {0, … , 𝐿 − 1}

(&'() (&) (&)

CNN formulation: h% = 𝜎(∑*∈,(%) 𝐖𝒍𝒖 h* + B& h% ), ∀𝑙 ∈ {0, … , 𝐿 − 1}

End-to-end learning on graphs with GCNs Thomas Kipf 5

Key difference: We can learn different 𝑊,/ for different “neighbor” 𝑢 for pixel 𝑣 on
the image. The reason is we can pick an order for the 9 neighbors using relative
position to the center pixel: {(-1,-1). (-1,0), (-1, 1), …, (1, 1)}
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66
Convolutional neural networks (on grids)
Convolutional neural network (CNN) layer with
Single CNN layer with 3x3 filter:
3x3 filter:
• CNN can be seen as a special GNN with fixed neighbor
size and ordering:
• The size of the filter is pre-defined for a CNN.
• The advantage of GNN is it processes arbitrary
(Animation by
Vincent Dumoulin)

graphs with different degrees for each node.

Graph
Image
• CNN is not permutation invariant/equivariant. (#)
(&'() - (&)
• Switching the
GNN formulation: h% order of
= 𝜎(𝐖 𝒍 ∑ pixels
*∈,(%) ,(%)
+ leads
B& h% ), ∀𝑙to different
!
∈ {0, … , 𝐿 − 1}

outputs.
CNN formulation: h%
(&'() (&) (&)
= 𝜎(∑*∈,(%) 𝐖𝒍𝒖 h* + B& h% ), ∀𝑙 ∈ {0, … , 𝐿 − 1}

End-to-end learning on graphs with GCNs Thomas Kipf 5

Transformer is one of the

most popular
architectures that
achieves great
performance in many
sequence modeling tasks.
Key component: self-attention
¡ Every token/word attends to all the other tokens/words via
matrix calculation.

I am a Stanford student

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 69
A general definition of attention:
Given a set of vector values, and a vector query, attention is a technique to
compute a weighted sum of the values, dependent on the query.

Each token/word has a value vector and a query vector. The value
vector can be seen as the representation of the token/word. We use
the query vector to calculate the attention score (weights in the
weighted sum).

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 70
A nice blog plot for this: https://towardsdatascience.com/transformers-are-graph-neural-networks-bca9f75412aa

KeyTransformer layer
component: self can be seen as a
attention
¡ Every token/word
special attends
GNN that to all
runs onthea other
fully-
tokens/words via matrix calculation.
connected “word” graph!
I
Since each word attends to all the other
words, the computation graph of a
transformer layer is identical to that of a GNN
on the fully-connected “word” graph. am student

a Stanford
Text (Complete) Graph
10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 71
¡ In this lecture, we introduced
§ Idea for Deep Learning for Graphs
§ Multiple layers of embedding transformation
§ At every layer, use the embedding at previous layer as
the input
§ Aggregation of neighbors and self-embeddings
§ Graph Convolutional Network
§ Mean aggregation; can be expressed in matrix form
§ GNN is a general architecture
§ CNN can be viewed as a special GNN

10/7/21 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72

Azure ML Documentation
No ratings yet
Azure ML Documentation
731 pages
ICDTA'25 Conference Program VF
No ratings yet
ICDTA'25 Conference Program VF
33 pages
A Guide To Machine Learning For Biologists
No ratings yet
A Guide To Machine Learning For Biologists
16 pages
Unit V
No ratings yet
Unit V
26 pages
Deep Learning Technique For Plant Disease Detection
No ratings yet
Deep Learning Technique For Plant Disease Detection
8 pages
Age and Gender Detection Using Deep Learning
No ratings yet
Age and Gender Detection Using Deep Learning
14 pages
GROUP 1-MODULE 2-LESSON 1 - Article or Cases (Privacy and Security)
No ratings yet
GROUP 1-MODULE 2-LESSON 1 - Article or Cases (Privacy and Security)
3 pages
Computsci 2024 09 26 20 33 55
No ratings yet
Computsci 2024 09 26 20 33 55
102 pages
Data Science & Machine Deep Learning
No ratings yet
Data Science & Machine Deep Learning
17 pages
Law and Artifical Intilligance
No ratings yet
Law and Artifical Intilligance
300 pages
Week 02 Ch2.1 Introduction To Neural Networks
No ratings yet
Week 02 Ch2.1 Introduction To Neural Networks
44 pages
1615888543RME - Detail Syllabus PhD-2020
No ratings yet
1615888543RME - Detail Syllabus PhD-2020
28 pages
AI Techniques For Stability Analysis and Control in Smart Grids
No ratings yet
AI Techniques For Stability Analysis and Control in Smart Grids
28 pages
Malicious Uses and Abuses of Artificial Intelligence 1605900758
No ratings yet
Malicious Uses and Abuses of Artificial Intelligence 1605900758
80 pages
Cbse - Department of Skill Education Artificial Intelligence
No ratings yet
Cbse - Department of Skill Education Artificial Intelligence
10 pages
Artificial Neural Network Part-2
No ratings yet
Artificial Neural Network Part-2
15 pages
IEEE Conference Template 2
No ratings yet
IEEE Conference Template 2
6 pages
MRI Based Brain Tumor Image Segmentation Model
No ratings yet
MRI Based Brain Tumor Image Segmentation Model
42 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
36 pages
Backdoor Attacks
No ratings yet
Backdoor Attacks
42 pages
Fashion Analysis and Understanding With Artificial Intelligence
No ratings yet
Fashion Analysis and Understanding With Artificial Intelligence
15 pages
Final MSC Submmited by Abu Sheka - Doc
No ratings yet
Final MSC Submmited by Abu Sheka - Doc
93 pages
AT-07404 AFPM Whitepaper - Final
No ratings yet
AT-07404 AFPM Whitepaper - Final
15 pages
Dev Sec Ops
No ratings yet
Dev Sec Ops
45 pages
Final Project Report
No ratings yet
Final Project Report
24 pages
Age Prediction From Facial Images Using Deep Learning Architecture
No ratings yet
Age Prediction From Facial Images Using Deep Learning Architecture
8 pages
Explainable Artificial Intelligence Challenges and Future Directions
No ratings yet
Explainable Artificial Intelligence Challenges and Future Directions
36 pages
Stock Prediction Report
No ratings yet
Stock Prediction Report
27 pages
AI Smart Real Time Code Autocompletion, Error Fixing, Code Conversion and Optimization-Fixerbot
No ratings yet
AI Smart Real Time Code Autocompletion, Error Fixing, Code Conversion and Optimization-Fixerbot
7 pages
Audio Splicing Detection Using Convolutional Neural Network
No ratings yet
Audio Splicing Detection Using Convolutional Neural Network
5 pages
AI in A Product's Journey: Deep Learning Assignment Group 9
No ratings yet
AI in A Product's Journey: Deep Learning Assignment Group 9
11 pages
05 GNN2
No ratings yet
05 GNN2
72 pages
04 GNN2
No ratings yet
04 GNN2
73 pages
STCN Major 1
No ratings yet
STCN Major 1
95 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
08 Message
No ratings yet
08 Message
61 pages
07 Theory
No ratings yet
07 Theory
62 pages
09 Hetero
No ratings yet
09 Hetero
62 pages
09 Hetero
No ratings yet
09 Hetero
72 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
10 KG
No ratings yet
10 KG
63 pages
04 GNN1
No ratings yet
04 GNN1
73 pages
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
No ratings yet
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
74 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
Exam Preparation
No ratings yet
Exam Preparation
18 pages
GNN - PEter
No ratings yet
GNN - PEter
96 pages
STCN Major 2
No ratings yet
STCN Major 2
96 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
08 GNN
No ratings yet
08 GNN
79 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
14 GNN
No ratings yet
14 GNN
58 pages
CS 224W 01-Intro
No ratings yet
CS 224W 01-Intro
68 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Gnns
No ratings yet
Gnns
75 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
GNN Foundations Frontiers and Applications Chapter4
No ratings yet
GNN Foundations Frontiers and Applications Chapter4
21 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
124 pages
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
No ratings yet
Graph Convolutional Networks: A Comprehensive Review: Open Access Research
23 pages
Suevey On GNN
No ratings yet
Suevey On GNN
31 pages
Intro To GNN
No ratings yet
Intro To GNN
49 pages
Build Switch and Logic Gates Using Transistors on the Breadboard
From Everand
Build Switch and Logic Gates Using Transistors on the Breadboard
GURUPRASAD N H
No ratings yet
DLG Book
No ratings yet
DLG Book
332 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
02 Nodeemb
No ratings yet
02 Nodeemb
71 pages
06 GNN3
No ratings yet
06 GNN3
73 pages
09 Node2vec
No ratings yet
09 Node2vec
60 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Graph Neural Network Node Emending - Node Edge and Sub Graph
No ratings yet
Graph Neural Network Node Emending - Node Edge and Sub Graph
66 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
5 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
No ratings yet
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
11 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
Nrltutorial Part2 Gnns PDF
0% (1)
Nrltutorial Part2 Gnns PDF
66 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
GNNS
No ratings yet
GNNS
7 pages
Solderless Breadboard Explorations: Unveiling the Wonders of Electronic Prototyping
From Everand
Solderless Breadboard Explorations: Unveiling the Wonders of Electronic Prototyping
GURUPRASAD N H
No ratings yet
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
A Comprehensive Survey On Graph Neural Networks
No ratings yet
A Comprehensive Survey On Graph Neural Networks
22 pages
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
No ratings yet
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
60 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

03 GNN1

Uploaded by

03 GNN1

Uploaded by

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

CS224W: Machine Learning with Graphs

How to learn mapping function 𝒇?

Input network d-dimensional

one column per node

¡ Note: All these deep encoders can be

Output: Node embeddings.

Modern deep learning toolbox is designed

§ No fixed node ordering or reference point

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

¡ Stacking multiple layers:

• Concatenate them [A, X]

¡ Issues with this idea:

Goal is to generalize convolutions beyond simple lattices

Node features 𝑿𝟐 Adjacency matrix 𝑨𝟐

¡ Consider we learn a function 𝑓 that maps a

Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐

¡ Then, if 𝑓 𝑨# , 𝑿# = 𝑓 𝑨$ , 𝑿$ for any order plan 𝑖

¡ Definition: For any graph function 𝑓: ℝ % ×' ×

Order plan 1: 𝑨𝟏 , 𝑿𝟏 Order plan 2: 𝑨𝟐 , 𝑿𝟐

F the same position in the graph is the same! F

E For two order plans, the vector of node at E

¡ Definition: For any node function 𝑓: ℝ ( ×* ×

¡ Permutation-equivariant Permute the input, output also

¡ Graph neural networks consist of multiple

• Huge number of parameters

information from neighbors!

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

Idea: Node’s neighborhood defines a

Determine node Propagate and

Learn how to propagate information across the

What is in the box? A

(1) average messages A

TARGET NODE B from neighbors B C

Need to define a loss function on the embeddings.

We can feed these embeddings into any loss function

¡ In practice, this implies that efficient sparse

ℒ = − @ 𝑦0 log(𝜎(z0I 𝜃)) + 1 − 𝑦0 log(1 − 𝜎 z0I 𝜃 )

Encoder output: Classification

(2) Define a loss function on the

Train on one graph Generalize to new graph

Inductive node embedding Generalize to entirely unseen graphs

¡ Many application settings constantly encounter

2. Deep learning for graphs

3. Graph Convolutional Networks

4. GNNs subsume CNNs

𝑵 𝒗 represents the 8 neighbor pixels of 𝒗.

(&'() (&) (&)

End-to-end learning on graphs with GCNs Thomas Kipf 5

graphs with different degrees for each node.

End-to-end learning on graphs with GCNs Thomas Kipf 5

Transformer is one of the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.