0% found this document useful (0 votes)

25 views62 pages

07 Hetero

Uploaded by

juanfuxing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views62 pages

07 Hetero

Uploaded by

juanfuxing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Note to other teachers and users of these slides: We would be delighted if you found our

material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://cs224w.Stanford.edu

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Colab 1 due today
§ Gradescope submissions close at 11:59 PM
¡ Homework 1 due this Thursday

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ So far we only handle graphs with one edge
type
¡ How to handle graphs with multiple nodes or
edge types (a.k.a heterogeneous graphs)?
¡ Goal: Learning with heterogeneous graphs
§ Relational GCNs
§ Heterogeneous Graph Transformer
§ Design space for heterogeneous GNNs

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4
2 types of nodes:
¡ Node type A: Paper nodes
¡ Node type B: Author nodes
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
2 types of edges:
¡ Edge type A: Cite
¡ Edge type B: Like
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
A graph could have multiple types of nodes and
edges! 2 types of nodes + 2 types of edges.

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
8 possible relation types!

(Paper, Cite, Paper) (Author, Cite, Author)

(Paper, Like, Paper) (Author, Like, Author)

(Paper, Cite, Author) (Author, Cite, Paper)

(Paper, Like, Author) (Author, Like, Paper)

Relation types: (node_start, edge, node_end)

¡ We use relation type to describe an edge (as
opposed to edge type)
¡ Relation type better captures the interaction
between nodes and edges
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
¡ A heterogeneous graph is defined as
𝑮 = 𝑽, 𝑬, 𝜏, 𝜙
§ Nodes with node types 𝑣 ∈ 𝑉
§ Node type for node 𝑣: 𝜏 𝑣
An edge can be
§ Edges with edge types (𝑢, 𝑣) ∈ 𝐸 described as a
pair of nodes
§ Edge type for edge (𝑢, 𝑣): 𝜙 𝑢, 𝑣
§ Relation type for edge 𝑒 is a tuple: 𝑟 𝑢, 𝑣 =
(𝜏 𝑢 , 𝜙 𝑢, 𝑣 , 𝜏(𝑣))
¡ There are other definitions for heterogeneous graphs
as well – describe graphs with node & edge types
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
Biomedical Knowledge Graphs Event Graphs
Example node: Migraine Example node: SFO
Example relation: (fulvestrant, Example relation: (UA689, Origin,
Treats, Breast Neoplasms) LAX)
Example node type: Protein Example node type: Flight
Example edge type: Causes Example edge type: Destination

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10
¡ Example: E-Commerce Graph
§ Node types: User, Item, Query, Location, ...
§ Edge types: Purchase, Visit, Guide, Search, …
§ Different node type's features spaces can be different!

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
¡ Example: Academic Graph
§ Node types: Author, Paper, Venue, Field, ...
§ Edge types: Publish, Cite, …
§ Benchmark dataset: Microsoft Academic Graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
¡ Observation: We can also treat types of
nodes and edges as features
§ Example: Add a one-hot indicator for nodes and
edges
§ Append feature [1, 0] to each “author node”; Append
feature [0, 1] to each “paper node”
§ Similarly, we can assign edge features to edges with
different types
§ Then, a heterogeneous graph reduces to a
standard graph
¡ When do we need a heterogeneous graph?
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
¡ When do we need a heterogeneous graph?
§ Case 1: Different node/edge types have different
shapes of features
§ An “author node” has 4-dim feature, a “paper node” has
5-dim feature
§ Case 2: We know different relation types
represent different types of interactions
§ (English, translate, French) and (English, translate,
Chinese) require different models

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
¡ Ultimately, heterogeneous graph is a more
expressive graph representation
§ Captures different types of interactions between
entities
¡ But it also comes with costs
§ More expensive (computation, storage)
§ More complex implementation
¡ There are many ways to convert a
heterogeneous graph to a standard graph
(that is, a homogeneous graph)
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017

¡ (1) Graph Convolutional Networks (GCN)

(#) #
𝐡%#()
𝐡! = 𝜎 𝐖 %
𝑁 𝑣
%∈' !

¡ How to write this as Message + Aggregation?

Message

(#) #
𝐡%#() (2) Aggregation
𝐡! =𝜎 % 𝐖
𝑁 𝑣 (1) Message
%∈' !
Aggregation
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?

B
Target Node
A
C

F
D E
Input Graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?

B Only pass messages C

Target Node along direction of edges B
A
C F
A C
F
D E E
D
Input Graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
¡ What if the graph has multiple relation types?

𝑟) B
Target node 𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟)

Input graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types.
Weights 𝐖!! for 𝑟"
𝑟) B
Target node 𝑟+
A
Weights 𝐖!" for 𝑟#
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟) Weights 𝐖!# for 𝑟$

Input graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types! Aggregation
C
𝑟) B B
Target node 𝑟+
A F
𝑟) 𝑟* C A C
𝑟+ 𝑟*
F E
D E 𝑟) D

Input graph

Neural networks
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ Introduce a set of neural networks for each
relation type!

Weight for rel_1

…
…
Weight for rel_N

Weight for self-loop

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
¡ Relational GCN (RGCN):
#.) 1 # (#) (#) (#)
𝐡! =𝜎 % % 𝐖/ 𝐡% + 𝐖2 𝐡!
𝑐
" !,/
/∈0 %∈'!
¡ How to write this as Message + Aggregation?
¡ Message: Normalized by node degree
§ Each neighbor of a given relation: of the relation 𝑐%,! = 𝑁%!
(%) 1 % (%)
𝐦!,# = 𝐖# 𝐡!
𝑐',#
§ Self-loop:
(%) % (%)
𝐦' = 𝐖( 𝐡'
¡ Aggregation:
§ Sum over messages from neighbors and self-loop, then apply activation
%)* % %
§ 𝐡' = 𝜎 Sum 𝐦!,# , 𝑢 ∈ 𝑁(𝑣) ∪ 𝐦'

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
" # $
¡ Each relation has 𝐿 matrices: 𝐖! , 𝐖! ⋯ 𝐖!
%
¡ The size of each 𝐖! is 𝑑 (%'") ×𝑑 (%) 𝑑 is the hidden (")

dimension in layer 𝑙

¡ Rapid growth of the number of parameters w.r.t

number of relations!
§ Overfitting becomes an issue
(𝒍)
¡ Two methods to regularize the weights 𝐖𝒓
§ (1) Use block diagonal matrices
§ (2) Basis/Dictionary learning
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Key insight: make the weights sparse!
¡ Use block diagonal matrices for 𝐖!

𝐖+ =
Limitation: only nearby
neurons/dimensions
can interact through 𝑊

¡ If use 𝐵 low-dimensional matrices, then # param

+ !"# +!
reduces from 𝑑(%'") ×𝑑(%) to 𝐵× ×
, ,
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
¡ Key insight: Share weights across different
relations!
¡ Represent the matrix of each relation as a linear
combination of basis transformations
𝐖! = ∑,-." 𝑎!- ⋅ 𝐕- , where 𝐕- is shared across
all relations
§ 𝐕! are the basis matrices
§ 𝑎"! is the importance weight of matrix 𝐕!
,
¡ Now each relation only needs to learn 𝑎!- -." ,
which is 𝐵 scalars
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
¡ Goal: Predict the label of a given node
¡ RGCN uses the representation of the final layer:
§ If we predict the class of node 𝑨 from 𝒌 classes
(%)
§ Take the final layer (prediction head): ∈ ℝ' , 𝐡#
(%)
each item in 𝐡# represents the probability of that
class
𝑟) B
Target Node 𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟)
Input Graph
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
¡ Link prediction split: Every edge also has a
2 1
2 relation type, this is
1 3 3
Split independent of the 4
categories.
5 4 5 4
In a heterogeneous
The original graph Split Graph with 4 graph, the homogeneous
categories of edges graphs formed by every
Training message edges for 𝒓𝟏 single relation also have
Training supervision edges for 𝒓𝟏 the 4 splits.
Validation edges for 𝒓𝟏
Test edges for 𝒓𝟏
Training message edges
…..

Training supervision edges

Validation edges
Training message edges for 𝒓𝒏 Test edges
Training supervision edges for 𝒓𝒏
Validation edges for 𝒓𝒏
Test edges for 𝒓𝒏
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30
¡ Assume 𝑬, 𝒓𝟑 , 𝑨 is training supervision edge,
all the other edges are training message edges
¡ Use RGCN to score 𝑬, 𝒓𝟑 , 𝑨 !
% (%)
§ Take the final layer of 𝐸 and 𝐴: 𝐡$ and 𝐡& ∈ ℝ)
§ Relation-specific score function 𝑓* : ℝ) ×ℝ) → ℝ
§ One example 𝑓#) 𝐡. , 𝐡/ = 𝐡0. 𝐖#) 𝐡/ , 𝐖#) ∈ ℝ1×1

𝑟) B
𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
𝒓𝟑 F
D E 𝑟)
Input Graph
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31
¡ Training:
𝑟) B 1. Use RGCN to score the training
𝑟+ supervision edge 𝑬, 𝒓𝟑 , 𝑨
A
2. Create a negative edge by perturbing
𝑟) 𝑟* C
𝑟+ 𝑟* the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝒓𝟑 F • Corrupt the tail of 𝑬, 𝒓𝟑 , 𝑨
D E 𝑟) • e.g., 𝑬, 𝒓𝟑 , 𝑩 , 𝑬, 𝒓𝟑 , 𝑫
Input Graph

Note the negative edges should NOT

training supervision edges: 𝑬, 𝒓𝟑 , 𝑨 belong to training message edges or
training message edges: all the rest training supervision edges!
existing edges (solid lines) e.g., 𝑬, 𝒓𝟑 , 𝑪 is NOT a negative edge

(1) Use training message edges to

predict training supervision edges
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
¡ Training:
1. Use RGCN to score the training
𝑟) B supervision edge 𝑬, 𝒓𝟑 , 𝑨
𝑟+
A 2. Create a negative edge by perturbing
𝑟* C the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝑟) 𝑟+ 𝑟*
𝒓𝟑 3. Use GNN model to score negative edge
F
D E 𝑟) 4. Optimize a standard cross entropy loss
Input Graph (as discussed in Lecture 6)
1. Maximize the score of training supervision edge
2. Minimize the score of negative edge

ℓ = − log 𝜎 𝑓"! ℎ( , ℎ# − log(1 − 𝜎 𝑓"! (ℎ( , ℎ) ))

𝜎 … Sigmoid function
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B Evaluate how the model can predict the
𝑟+ validation edges with the relation types.
A
Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟) 𝑟* C
𝑟+ 𝑟+ 𝑟* Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟) in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
validation edges: 𝑬, 𝒓𝟑 , 𝑫
training message edges & training supervision
edges: all existing edges (solid lines)

(2) At validation time:

Use training message edges & training
supervision edges to predict validation edges
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B Evaluate how the model can predict the
𝑟+ validation edges with the relation types.
A
Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟) 𝑟* C
𝑟+ 𝑟+ 𝑟* Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟) in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
1. Calculate the score of 𝑬, 𝒓𝟑 , 𝑫
2. Calculate the score of all the negative edges: 𝑬, 𝒓𝟑 , 𝒗 𝒗 ∈ 𝑩, 𝑭 , since 𝑬, 𝒓𝟑 , 𝑨 ,
𝑬, 𝒓𝟑 , 𝑪 belong to training message edges & training supervision edges
3. Obtain the ranking 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 .
4. Calculate metrics
1. Hits@𝒌: 𝟏 𝑹𝑲 ≤ 𝒌 . Higher is better
𝟏
2. Reciprocal Rank: . Higher is better
𝑹𝑲
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) types of entities
§ Papers: 736k nodes
§ Authors: 1.1m nodes
§ Institutions: 9k nodes
§ Fields of study: 60k nodes

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) directed relations
§ An author is "affiliated with" an institution
§ An author "writes" a paper
§ A paper "cites" a paper
§ A paper "has a topic of" a field of study

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Prediction task
§ Each paper has a 128-dimensional word2vec feature vector
§ Given the content, references, authors, and author affiliations
from ogbn-mag, predict the venue of each paper
§ 349-class classification problem due to 349 venues considered
¡ Time-based dataset splitting
§ Training set: papers published before 2018
§ Test set: papers published after 2018

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark results:

SOTA

R-GCN

§ SOTA method: SeHGNN

§ ComplEx (Next lecture) + Simplified GCN (Lecture 17)

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
¡ Relational GCN, a graph neural network for
heterogeneous graphs

¡ Can perform entity classification as well as

link prediction tasks.

¡ Ideas can easily be extended into RGNN

(RGraphSAGE, RGAT, etc.)

¡ Benchmark: ogbn-mag from Microsoft

Academic Graph, to predict paper venues
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Graph Attention Networks (GAT)
(3) (3) (378)
𝐡2 = 𝜎(∑4∈6 2 𝛼24 𝐖 𝐡4 )
Attention weights

Not all node’s neighbors are equally important

§ Attention is inspired by cognitive attention.
§ The attention 𝜶𝒗𝒖 focuses on the important parts of
the input data and fades out the rest.
§ Idea: the NN should devote more computing power on that
small but important part of the data.
¡ Can we adapt GAT for heterogeneous graphs?
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Motivation: GAT is unable to represent different

node & different edge types
¡ Introduce a set of neural networks for each
relation type is too expensive for attention
§ Recall: relation describes (node_s, edge, node_e)

Weight for rel_1

… Too expensive!
Weight for rel_N
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
¡ HGT uses Scaled Dot-Product Attention
(proposed in Transformer)

¡ Query: 𝑄, Key: 𝐾, Value: 𝑉

§ 𝑄, 𝐾, 𝑉 have shape (batch_size, dim)
How do we obtain 𝑄, 𝐾, 𝑉?
¡ Apply Linear layer to the input
§ 𝑄 = 𝑄_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
§ 𝐾 = 𝐾_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
§ 𝑉 = 𝑉_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Recall: Applying GAT to a homogeneous graph

3
§𝐻 is the 𝑙-th layer representation:

How do we take relation type (node_s, edge,

node_e) into attention computation?

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Innovation: Decompose heterogeneous attention to

Node- and edge-type dependent attention mechanism
§ 3 node weight matrices, 2 edge weight matrices
§ Without decomposition: 3*2*3=18 relation types -> 18
weight matrices (suppose all relation types exist)
Paper

" Q-Linear!"#$%

'()*+[-]
Write Cite
/)+[0+] %&& 1--[01, -]
!!"#$
!! K-Linear!"#$%
Paper
'()*+[-] …
…
!" %&&
!'("#$ 1--[02, -]
K-Linear&'()*% /)+[0,]
Author
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Heterogeneous Mutual Attention:

¡ Each relation (𝑇 𝑠 , 𝑅 𝑒 , 𝑇 𝑡 ) has a distinct set

of projection weights
§ 𝑇 𝑠 : type of node 𝑠, 𝑅 𝑒 : type of edge 𝑒
§ 𝑇(𝑠) & 𝑇(𝑡) parameterize 𝐾_𝐿𝑖𝑛𝑒𝑎𝑟 > ? & 𝑄_𝐿𝑖𝑛𝑒𝑎𝑟 > @ ,
which further return Key and Query vectors 𝐾(𝑠) & 𝑄(𝑡)
§ Edge type 𝑅(𝑒) directly parameterizes WR(e)
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
¡ A full HGT layer

We have just computed

¡ Similarly, HGT decomposes weights with node & edge

types in the message computation

Weights for Weights for

each node type each edge type
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Benchmark: ogbn-mag from Microsoft

Academic Graph, to predict paper venues

¡ HGT uses much fewer parameters, even

though the attention computation is expensive,
while performs better than R-GCN
§ Thanks to the weight decomposition over node &
edge types
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

How do we extend the general GNN design

space to heterogneous graphs?
(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
¡ (1) Message computation
(3) 3 378
§ Message function: 𝐦4 = MSG 𝐡4
§ Intuition: Each node will create a message, which will be
sent to other nodes later
(#) #()
§ Example: A Linear layer 𝐦% = 𝐖 # 𝐡%

A
Node 𝒗
TARGET NODE B B C

A (2) Aggregation
A
C B
A C
F E
D (1) Message
F
E
D
INPUT GRAPH A

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52
¡ (1) Heterogeneous message computation
(3) 3 378
§ Message function: = 𝐦4 MSG" 𝐡4
§ Observation: A node could receive multiple types of
messages. Num of message type = Num of relation
type
§ Idea: Create a different message function for each
relation type
(,)
§𝐦+ = MSG*, 𝐡+,-. , 𝑟 = (𝑢, 𝑒, 𝑣) is the relation
type between node 𝑢 that sends the message, edge
type 𝑒 , and node 𝑣 that receive the message
(,) ,-. ,
§ Example: A Linear layer 𝐦+ = 𝐖* 𝐡+
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53
¡ (2) Aggregation
§ Intuition: Each node will aggregate the messages from
node 𝑣’s neighbors
(,) , 3
𝐡/ = AGG 𝐦4 ,𝑢 ∈ 𝑁 𝑣
§ Example: Sum(⋅), Mean(⋅) or Max(⋅) aggregator
§ 𝐡!# = Sum({𝐦%# , 𝑢 ∈ 𝑁(𝑣)})
A

TARGET NODE B Node 𝒗 B C

A
A
C (2) Aggregation
B
A C
F E
D F
E (1) Message
D
INPUT GRAPH A

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54
¡ (2) Heterogeneous Aggregation
§ Observation: Each node could receive multiple types of
messages from its neighbors, and multiple neighbors
may belong to each message type.
§ Idea: We can define a 2-stage message passing

(#) #
§ 𝐡! = AGGA## AGG/# 𝐦%# , 𝑢 ∈ 𝑁/ 𝑣
§ Given all the messages sent to a node
§ Within each message type, aggregate the messages
that belongs to the edge type with AGG/#
#
§ Aggregate across the edge types with AGGA##
# #
§ Example: 𝐡! = Concat Sum 𝐦% , 𝑢 ∈ 𝑁/ 𝑣
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
¡ (3) Layer connectivity
§ Add skip connections, pre/post-process layers

Pre-processing layers: Important when

encoding node features is necessary.
E.g., when nodes represent images/text

Post-processing layers: Important when

reasoning / transformation over node
embeddings are needed
E.g., graph classification, knowledge graphs

In practice, adding these layers works great!

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56
¡ Heterogeneous pre/post-process layers:
§ MLP layers with respect to each node type
§ Since the output of GNN are node embeddings
(3) (3)
§ 𝐡2 = MLP 9(2) (𝐡2 )
§ 𝑇(𝑣) is the type of node 𝑣
¡ Other successful GNN designs are
also encouraged for heterogeneous
GNNs: skip connections, batch/layer
normalization, …

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57
¡ Graph Feature manipulation
§ The input graph lacks features à feature
augmentation
¡ Graph Structure manipulation
§ The graph is too sparse à Add virtual nodes / edges
§ The graph is too dense à Sample neighbors when
doing message passing
§ The graph is too large à Sample subgraphs to
compute embeddings
§ Will cover later in lecture: Scaling up GNNs

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58
¡ Graph Feature manipulation
§ 2 Common options: compute graph statistics (e.g.,
node degree) within each relation type, or across the
full graph (ignoring the relation types)
¡ Graph Structure manipulation
§ Neighbor and subgraph sampling are also common
for heterogeneous graphs.
§ 2 Common options: sampling within each relation
type (ensure neighbors from each type are covered),
or sample across the full graph

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59
Node-level prediction:
$ (5) ($)
9𝒗 = Head0123 (𝐡4 ) = 𝐖 𝐡4
¡ 𝒚
Edge-level prediction:
$ $
9𝒖𝒗 = Head3278 (𝐡9 , 𝐡4 )=
¡ 𝒚
$ $
Linear(Concat(𝐡9 , 𝐡4 ))
Graph-level prediction:
$
9: = Head7;<=> ({𝐡4 ∈ ℝ+ , ∀𝑣 ∈ 𝐺})
¡ 𝒚

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
Node-level prediction:
$ 5 ($)
9𝒗 = Head0123, @(4) (𝐡4 ) = 𝐖@(4) 𝐡4
¡ 𝒚
Edge-level prediction:
$ $
9𝒖𝒗 = Head3278, ! (𝐡9 , 𝐡4 )=
¡ 𝒚
$ $
Linear! (Concat(𝐡9 , 𝐡4 ))
Graph-level prediction:
$
9
¡ 𝒚: = AGG(Head7;<=>, A ({𝐡4 ∈
ℝ+ , ∀𝑇 𝑣 = 𝑖}))

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

Heterogeneous GNNs extend GNNs by separately

modeling node/relation types + additional AGG
(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation

11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62
¡ Heterogeneous graphs: graphs with multiple
nodes or edge types
§ Key concept: relation type (node_s, edge, node_e)
§ Be aware that we don’t always need
heterogeneous graphs
¡ Learning with heterogeneous graphs
§ Key idea: separately model each relation type
§ Relational GCNs
§ Heterogeneous Graph Transformer
§ Design space for heterogeneous GNNs
11/14/23 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63

Air Compressor Parts PDF
0% (1)
Air Compressor Parts PDF
51 pages
09 Hetero
No ratings yet
09 Hetero
62 pages
09 Hetero
No ratings yet
09 Hetero
72 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
05 GNN2
No ratings yet
05 GNN2
72 pages
04 GNN1
No ratings yet
04 GNN1
73 pages
10 KG
No ratings yet
10 KG
63 pages
08 GNN
No ratings yet
08 GNN
79 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
07 Theory
No ratings yet
07 Theory
62 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
08 Message
No ratings yet
08 Message
61 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
04 GNN2
No ratings yet
04 GNN2
73 pages
STCN Major 1
No ratings yet
STCN Major 1
95 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
Butler 2025workshop Graph Networks Talk
No ratings yet
Butler 2025workshop Graph Networks Talk
46 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
Ai Presentation
No ratings yet
Ai Presentation
71 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
CS 224W 01-Intro
No ratings yet
CS 224W 01-Intro
68 pages
Exam Preparation
No ratings yet
Exam Preparation
18 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
STCN Major 2
No ratings yet
STCN Major 2
96 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
GNN Foundations Frontiers and Applications Chapter4
No ratings yet
GNN Foundations Frontiers and Applications Chapter4
21 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
124 pages
14 GNN
No ratings yet
14 GNN
58 pages
03 GNN1
No ratings yet
03 GNN1
71 pages
Atlas of Dermatopathology Practical Differential Diagnosis by Clinicopathologic Pattern, 1st Edition Scribd Download
100% (12)
Atlas of Dermatopathology Practical Differential Diagnosis by Clinicopathologic Pattern, 1st Edition Scribd Download
16 pages
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
No ratings yet
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
74 pages
ASI Show Orlando 2025 Exhibitor List
No ratings yet
ASI Show Orlando 2025 Exhibitor List
16 pages
02 Nodeemb
No ratings yet
02 Nodeemb
71 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Arsenal Football Club PLC V Reed
100% (1)
Arsenal Football Club PLC V Reed
23 pages
Xford Presentation GNN Part 1
No ratings yet
Xford Presentation GNN Part 1
6 pages
Intro To GNN
No ratings yet
Intro To GNN
49 pages
06 GNN3
No ratings yet
06 GNN3
73 pages
ARDUINO SOLAR CHARGE CONTROLLER Version 30
No ratings yet
ARDUINO SOLAR CHARGE CONTROLLER Version 30
79 pages
Xford Presentation Part 2 GNN
No ratings yet
Xford Presentation Part 2 GNN
5 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Graph Neural Network Node Emending - Node Edge and Sub Graph
No ratings yet
Graph Neural Network Node Emending - Node Edge and Sub Graph
66 pages
Gnns
No ratings yet
Gnns
75 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Cambridge Advanced Practice Tests 2015
0% (1)
Cambridge Advanced Practice Tests 2015
17 pages
SDG Primer FINAL PDF
No ratings yet
SDG Primer FINAL PDF
238 pages
GML Introduction
No ratings yet
GML Introduction
11 pages
Gravity Light Project
No ratings yet
Gravity Light Project
16 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
Problem Solving 11 20
No ratings yet
Problem Solving 11 20
10 pages
Kafd A1 111 Comn BF1 XXXXX SHP Arc Asb 00023
No ratings yet
Kafd A1 111 Comn BF1 XXXXX SHP Arc Asb 00023
1 page
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Mathematics 11 00578
No ratings yet
Mathematics 11 00578
18 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
5 pages
What Is Graph Neural Network - An Introduction To GNN and Its Applications - Simplilearn
No ratings yet
What Is Graph Neural Network - An Introduction To GNN and Its Applications - Simplilearn
13 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
No ratings yet
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
11 pages
Fusion Graph Convolutional Networks
No ratings yet
Fusion Graph Convolutional Networks
10 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Elliott Wave Theroist
100% (1)
Elliott Wave Theroist
5 pages
Mahaveer Price List
No ratings yet
Mahaveer Price List
6 pages
GNNS
No ratings yet
GNNS
7 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Module 9: Social and Resources Mobilization: An Approach in The Implementation of Civic Welfare and Training Services
No ratings yet
Module 9: Social and Resources Mobilization: An Approach in The Implementation of Civic Welfare and Training Services
25 pages
Knitting Chapter
No ratings yet
Knitting Chapter
12 pages
Unity TCP Open Block Library Users Manual
No ratings yet
Unity TCP Open Block Library Users Manual
124 pages
Partnership - Case Digests (Thyrz)
No ratings yet
Partnership - Case Digests (Thyrz)
15 pages
The Role of Peer Interaction and Second Language Learning For Esl Students in Academic Contexts: An Extended Literature Review
No ratings yet
The Role of Peer Interaction and Second Language Learning For Esl Students in Academic Contexts: An Extended Literature Review
74 pages
12 Reach Dealer Parts
No ratings yet
12 Reach Dealer Parts
185 pages
Manual Slake Durability Device
No ratings yet
Manual Slake Durability Device
40 pages
Grade 7 History Term 1 Worksheets 2023
No ratings yet
Grade 7 History Term 1 Worksheets 2023
23 pages
Lesson 3 - Week 1
No ratings yet
Lesson 3 - Week 1
28 pages
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
No ratings yet
NZ Pa 36 New Zealand Numeracy Stages 1 To 8 Weekly Planning Template English Ver 2
12 pages
Quiz Ecology
No ratings yet
Quiz Ecology
9 pages
11.interview C Coding Question
No ratings yet
11.interview C Coding Question
7 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
Group 2 - Aspects of Connected Speech
No ratings yet
Group 2 - Aspects of Connected Speech
31 pages
Tendernotice - 1 (5) - 1
No ratings yet
Tendernotice - 1 (5) - 1
4 pages
Project Proposal Seminar Workshop
No ratings yet
Project Proposal Seminar Workshop
6 pages
Mega Cap Trader Strategy Guide
No ratings yet
Mega Cap Trader Strategy Guide
8 pages
Plant Growth and Devlopment
No ratings yet
Plant Growth and Devlopment
7 pages
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Generative Design: Visualize, Program, and Create with JavaScript in p5.js
From Everand
Generative Design: Visualize, Program, and Create with JavaScript in p5.js
Benedikt Gross
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

07 Hetero

Uploaded by

07 Hetero

Uploaded by

Note to other teachers and users of these slides: We would be delighted if you found our

CS224W: Machine Learning with Graphs

(Paper, Cite, Paper) (Author, Cite, Author)

(Paper, Like, Paper) (Author, Like, Author)

(Paper, Cite, Author) (Author, Cite, Paper)

(Paper, Like, Author) (Author, Like, Paper)

Relation types: (node_start, edge, node_end)

¡ (1) Graph Convolutional Networks (GCN)

¡ How to write this as Message + Aggregation?

B Only pass messages C

Weight for rel_1

Weight for self-loop

¡ Rapid growth of the number of parameters w.r.t

¡ If use 𝐵 low-dimensional matrices, then # param

Training supervision edges

Note the negative edges should NOT

(1) Use training message edges to

ℓ = − log 𝜎 𝑓"! ℎ( , ℎ# − log(1 − 𝜎 𝑓"! (ℎ( , ℎ) ))

(2) At validation time:

§ SOTA method: SeHGNN

¡ Can perform entity classification as well as

¡ Ideas can easily be extended into RGNN

¡ Benchmark: ogbn-mag from Microsoft

Not all node’s neighbors are equally important

¡ Motivation: GAT is unable to represent different

Weight for rel_1

¡ Query: 𝑄, Key: 𝐾, Value: 𝑉

¡ Recall: Applying GAT to a homogeneous graph

How do we take relation type (node_s, edge,

¡ Innovation: Decompose heterogeneous attention to

¡ Heterogeneous Mutual Attention:

¡ Each relation (𝑇 𝑠 , 𝑅 𝑒 , 𝑇 𝑡 ) has a distinct set

We have just computed

¡ Similarly, HGT decomposes weights with node & edge

Weights for Weights for

¡ Benchmark: ogbn-mag from Microsoft

¡ HGT uses much fewer parameters, even

How do we extend the general GNN design

(4) Graph augmentation

TARGET NODE B Node 𝒗 B C

Pre-processing layers: Important when

Post-processing layers: Important when

In practice, adding these layers works great!

Heterogeneous GNNs extend GNNs by separately

(4) Graph augmentation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.