0% found this document useful (0 votes)
9 views57 pages

07 Theory2

Uploaded by

laijiahao0430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views57 pages

07 Theory2

Uploaded by

laijiahao0430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Note to other teachers and users of these slides: We would be delighted if you found our

material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://cs224w.Stanford.edu

CS224W: Machine Learning with Graphs


Charilaos Kanatsoulis and Jure Leskovec, Stanford
University
http://cs224w.stanford.edu
¡ Homework 1 due Thursday, 10/17
§ Late submissions accepted until end of day
Monday, 10/21
¡ Project Proposal due Tuesday, 10/22
¡ Colab 2 due Thursday, 10/24

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
¡ Move recitation time
We will host our recitations in the evenings from now on
to accommodate remote students. Recordings are also
available via Ed posts.
¡ Clarification on project feedbacks
After project proposal, you will be assigned a TA to
mentor your project for detailed feedbacks.
¡Lecture pace
We will slow down the pace.
¡ Individual questions around lecture content
Please come to OH for in-depth QA.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 3


Dataset split

Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head

Loss
function

Today’s lecture: Can we make GNN


representation more expressive?
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ A thought experiment: What should a perfect
GNN do?
§ A 𝑘-layer GNN embeds a node based on the 𝐾-hop
neighborhood structure

§ A perfect GNN should build an injective function


between neighborhood structure (regardless of
hops) and node embeddings
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 7
¡ For a perfect GNN (ignore node attributes for now):
§ Observation 1: If two nodes have the same neighborhood
structure, they must have the same embedding
𝑣! ℎ!$ = ℎ!% 𝑣"

§ Observation 2: If two nodes have different neighborhood


structure, they must have different embeddings
𝑣! ℎ!$ ≠ ℎ!& 𝑣#

(Considering that attributes


of all nodes are the same)
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 8
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021

¡ Observation 2 often cannot be satisfied:


§ The GNNs we have introduced so far are not perfect
§ In previous lecture, we discussed that their expressive
power is upper bounded by the WL test
§ For example, message passing GNNs cannot count the
cycle length:
The computational graphs
𝑣! resides in a cycle 𝑣" resides in a cycle for nodes 𝒗𝟏 and 𝒗𝟐 are
with length 3 with length 4 always the same
(ignoring node attributes)
𝑣! 𝑣" !!

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs … 9


J. You, R. Ying, J. Leskovec. Postion-aware Graph Neural Networks, ICML 2019

¡ Observation 1 could also have issues:


§ Even though two nodes may have the same neighborhood
structure, we may want to assign different embeddings to them
§ Because these nodes appear in different positions in the graph
§ We call these tasks Position-aware tasks
§ Even a perfect GNN will fail for these tasks:
𝑣#
𝑣#

𝑣"
𝑣"
A grid graph NYC road network

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 10


We will resolve both issues by building more
expressive GNNs
¡ Fix issues in Observation 2:
§ Build message passing GNNs that are more
expressive than WL test
§ Example method: Structurally-aware GNNs
¡ Fix issues in Observation 1:
§ Create node embeddings based on their positions
in the graph
§ Example method: Position-aware GNNs
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 11
CS224W: Machine Learning with Graphs
Charilaos Kanatsoulis and Jure Leskovec, Stanford
University
http://cs224w.stanford.edu
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021

¡ GNNs exhibit three levels of failure cases in


structure-aware tasks:
§ Node level
§ Edge level
§ Graph level

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 13


Different Inputs but the same computational graph à GNN fails

A 𝑣! B 𝑣"
Example input
graphs

Existing GNNs’ 𝑣! A B 𝑣"


=
computational
graphs

… …
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 14
Different Inputs but the same computational graph à GNN fails

𝑣" 𝑣#
Example input Edge A and B share
graphs A B node 𝑣%
We look at embeddings
for 𝑣! and 𝑣"
𝑣$

Existing GNNs’
computational
𝑣" A = B 𝑣#
graphs

… …
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 15
Different Inputs but the same computational graph à GNN fails
A B

Example input
graphs

We look at embeddings
for each node For each node: For each node:
A B
Existing GNNs’
computational
graphs =

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 16


¡ The WL kernel colors inherit the graph symmetries.

¡ Symmetric colors are associated with limitations involving


the spectral decomposition of the graph.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
¡ Recall the GIN update:

§ We can unroll the first MLP layer:

§ denotes all the MLP layers except the first.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 18


¡ Recall the GIN update:

§ We can unroll the first MLP layer:

§ denotes all the MLP layers except the first.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 19


¡ Recall the GIN update:

§ We can unroll the first MLP layer:

§ We can write the color update in a matrix form:

§
§ Where 𝐴𝜖{0,1}"×" is the adjacency matrix of the graph, i.e., 𝐴 𝑢, 𝑣 = 1
if (u,v) is an edge and 𝐴 𝑢, 𝑣 = 0 if (u,v) is not an edge.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 20
¡ Let's compute the eigenvalue decomposition of the graph
adjacency matrix A.

¡ is the orthonormal matrix of eigenvectors


¡ I is the diagonal matrix of eigenvalues
§ The eigenvalue (spectral) decomposition of the adjacency is a
universal characterization of the graph.
§ Different graphs have different spectral decompositions
§ The number of cycles in a graph can be viewed as functions of
eigenvalues and eigenvectors, e.g.,

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 21


¡ We can interpret GIN layers as MLPs operating on the
eigenvectors:

¡ If we replace A with the spectral decomposition

¡ The weights of the first MLP layer depend on the eigenvalues and the dot
product between the eigenvectors and the colors at the previous level.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 22


¡ We can interpret GIN layers as MLPs operating on the
eigenvectors:

¡ If we zoom in

¡ The new node colors only depend on the eigenvectors that are not
orthogonal to 1.
¡ Graphs with symmetries admit eigenvectors orthogonal to 1.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 23


¡ The WL kernel cannot distinguish between some
basic graph structures, e.g.,

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
¡ The WL kernel cannot distinguish between some
basic graph structures, e.g.,

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
¡ The WL kernel cannot count basic graph
structures:

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Summary: The limitations of the WL kernel
are limitations of the initial node color.
§ These limitations are well understood in the
spectral domain.
§ Constant node colorings are orthogonal with adjacency
eigenvectors and critical spectral components
(eigenvalues and eigenvectors) are omitted.
§ In a high level, colors generated by the WL kernel
obey the same symmetries as graph structure.
§ These joint symmetries lock the message-passing
operations to limited representations.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 27
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ We use the following thinking:
§ Two different inputs (nodes, edges, graphs) are labeled differently
§ A “failed” model will always assign the same embedding to them
§ A “successful” model will assign different embeddings to them
§ Embeddings are determined by GNN computational graphs:

A !! B !"

Two inputs: nodes 𝑣# and 𝑣$


Different labels: A and B
Goal: assign different embeddings to 𝑣# and 𝑣$
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 29
¡ A naïve solution: One-hot encoding
§ Encode each node with a different ID, then we can
always differentiate different nodes/edges/graphs
1000 0100
1000
A !! B !"

Input graphs
0001

0100 0010 0001

0100 !! 0001 0010 𝑣


!! 0100 Computational
graphs are clearly
Computational
different if each
graphs
node has a
different ID
… …
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 31
¡ A naïve solution: One-hot encoding
§ Encode each node with a different ID, then we can
always differentiate different nodes/edges/graphs
1000 0100
1000
A !! B !"

Input graphs
0001

0100 0010 0001

§ Issues:
§ Not scalable: Need 𝑂(𝑁) feature dimensions (𝑁 is the
number of nodes)
§ Not inductive: Cannot generalize to new nodes/graphs
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 32
¡ Feature augmentation: constant vs. one-hot
Constant node feature One-hot node feature
1 2

1 1
1 3

1 6
1 4
1 5

Expressive power Medium. All the nodes are High. Each node has a unique ID,
identical, but GNN can still learn so node-specific information can
from the graph structure be stored
Inductive learning High. Simple to generalize to new Low. Cannot generalize to new
(Generalize to nodes: we assign constant nodes: new nodes introduce new
unseen nodes) feature to them, then apply our IDs, GNN doesn’t know how to
GNN embed unseen IDs
Computational Low. Only 1 dimensional feature High. High dimensional feature,
cost cannot apply to large graphs
Use cases Any graph, inductive settings Small graph, transductive settings
(generalize to new nodes) (no new nodes)
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021

Why do we need feature augmentation?


¡ (2) Certain structures are hard to learn by GNN
¡ Solution:
§ We can use cycle count as augmented node features

Augmented node feature for 𝒗𝟏 Augmented node feature for 𝒗𝟏


We start
from cycle
with length 0
[0, 0, 0, 1, 0, 0] [0, 0, 0, 0, 1, 0]
𝑣! resides in a cycle with length 3 𝑣! resides in a cycle with length 4
𝑣! 𝑣!

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021

B !" A !! B !"

𝑣Goal:
! classify !! and !" 𝑣"
computational graph ID-GNN rooted subtrees
1 1
B !" !! A B !"
= ≠ 0
Cycle count 0
at each level 2 2
2 0
… … …
length-3 cycles = 2 length-3 cycles = 0

¡ Idea: Count cycles originating from a given node, use it as


initial feature.
§ Include identity information as an augmented node feature
§ Use cycle counts in each layer as an augmented node
feature. Also can be used together with any GNN
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 37
C. Kanatsoulis, A. Ribeiro. Graph Neural Network Are More Powerful Than we Think, ICASSP 2024

¡ We can also use the diagonals of the adjacency powers


as augmented node features.
¡ They correspond to the closed loops each node is involved in.

Augmented node feature for 𝒗𝟏 Augmented node feature for 𝒗𝟏


[1, 0, 2, 2, 6, 8] [1, 0, 2, 0, 8, 0]
𝑣! resides in a cycle with length 3 𝑣! resides in a cycle with length 4
𝑣! 𝑣!

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
C. Kanatsoulis, A. Ribeiro. Graph Neural Network Are More Powerful Than we Think, ICASSP 2024

¡ Theorem: If two graphs have adjacency matrices with


different eigenvalues, there exists a GNN with closed-
loop initial node features that can always tell them
apart.

¡ GNNs with structural initial node features can produce


different representations for almost all real-world graphs.

¡ GIN with structural initial node features is strictly more


powerful than the WL-kernel.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
Why do we need feature augmentation?
¡ (2) Certain structures are hard to learn by GNN
¡ Other commonly used augmented features:
§ Clustering coefficient
§ PageRank
§ Centrality
§ …
¡ Any feature we have introduced in
Lecture 1 can be used!

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
¡ Feature augmentation: constant vs. Structure
Constant node feature Structure-aware node feature
1 2

1 1
1 3

1 6
1 4
1 5

Expressive power Medium. All the nodes are High. Each node has a structure-
identical, but GNN can still learn aware ID, so node-specific
from the graph structure information can be stored
Inductive learning High. Simple to generalize to new High. Simple to generalize to new
(Generalize to nodes: we assign constant nodes: can count triangles or
unseen nodes) feature to them, then apply our closed loops for any graph
GNN
Computational Low. Only 1 dimensional feature Low/High. Depending on the
cost structures we are counting
Use cases Any graph, inductive settings Any graph, inductive settings
(generalize to new nodes) (generalize new nodes)
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
CS224W: Machine Learning with Graphs
Charilaos Kanatsoulis and Jure Leskovec, Stanford
University
http://cs224w.stanford.edu
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024

Can we count graph substructures with GNNs


only?
¡ Assign unique IDs to nodes
§ These IDs are represented by random samples
§ Each node will be represented by a different set
of random variables

1
3 Random samples for node 3

4
6 [0.2, 1.5, -2.3, -10.1]
5
Total number of random samples = 4
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024

¡ We design a simple GNN


§ With SUM Aggregations and Linear Message
Functions.
§ We add a square pointwise nonlinearity
in the last layer.

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
2
Node 1 [3.3, -1.7, -1.2, -0.1]
1 Node 2 [-0.1, -5.4, 3.0, -9.8]
3
Node 3 [0.2, 1.5, -2.3, -10.1]
Node 4 [0.5, 1.9, -12.7, 11.1]
6
4 Node 5 [5.1, -0.7, -2.9, -13.5]
5 Node 6 [-1.2, 7.5, -0.3, -7.9]

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
2
Node 1 [3.3, -1.7, -1.2, -0.1]
1 Node 2 [-0.1, -5.4, 3.0, -9.8]
3
Node 3 [0.2, 1.5, -2.3, -10.1]
Node 4 [0.5, 1.9, -12.7, 11.1]
6
4 Node 5 [5.1, -0.7, -2.9, -13.5]
5 Node 6 [-1.2, 7.5, -0.3, -7.9]

-0.1
3.3
0.2
GNN
-1.2
0.5
5.1

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Node 1 [3.3, -1.7, -1.2, -0.1]
Node 2 [-0.1, -5.4, 3.0, -9.8]
Node 3 [0.2, 1.5, -2.3, -10.1]

-0.1 Node 4 [0.5, 1.9, -12.7, 11.1]


Node 5 [5.1, -0.7, -2.9, -13.5]
Node 6 [-1.2, 7.5, -0.3, -7.9]

3.3
0.2
GNN
-1.2
0.5
5.1

-9.8
-0.1
-10.1
GNN
-7.9
11.1
-13.5

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
C. Kanatsoulis, A. Ribeiro. Counting Graph Substructures with Graph Neural Networks, ICLR 2024

¡ To maintain inductive capability the final output:

§ Which in practice is computed as:

§ We can show that the previous procedure computes


the closed loops of a graph:

§ And a GNN can break the limits of the WL kernel and


count important substructures in the graph.
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Postion-aware Graph Neural Networks, ICML 2019

¡ There are two types of tasks on graphs

Structure-aware task ¡ Nodes are labeled by


A !" A
their structural roles in
B
!#
B
the graph
A A

Position-aware task
¡ Nodes are labeled by
their positions in the
A !" !# B
A B graph
A B

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 50


¡ We showed how to design GNNs to work
well for structure-aware tasks
Structure-aware task
A !" A ¡ GNNs work J
B B
!# ¡ Can differentiate 𝑣% and
A A 𝑣& by using different
computational graphs
A !" !# B

… … … …

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 51


¡ GNNs will always fail for position-aware
tasks
¡ GNNs fail L
Position-aware task ¡ 𝑣% and 𝑣& will always
A !"
A B
!# B have the same
computational graph,
A B
due to structure
symmetry
A !" !# B
¡ Can we define deep
= learning methods that
are position-aware?
… … … …

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 52


¡ Randomly pick a node 𝑠% as an anchor node
¡ Represent 𝑣% and 𝑣& via their relative distances w.r.t.
the anchor 𝑠% , which are different
¡ An anchor node serves as a coordinate axis
§ Which can be used to locate nodes in the graph
Relative
Distances
A 𝑣# 𝑣$ B
A B 𝑠#
𝑠#
𝑣# 1
A B 𝑣$ 2
Anchor

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 53


¡ Pick more nodes 𝑠% , 𝑠& as anchor nodes
¡ Observation: More anchors can better characterize
node position in different regions of the graph
¡ Many anchors –> Many coordinate axes

Relative
Distances
A 𝑣# 𝑣$ B 𝑠# 𝑠$
A B
𝑠# 𝑠$ 𝑣# 1 2
A B 𝑣$ 2 1
Anchor Anchor

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 54


¡ Generalize anchor from a single node to a set of nodes
§ We define distance to an anchor-set as the minimum distance
to all the nodes in the ancho-set
¡ Observation: Large anchor-sets can sometimes provide
more precise position estimate
§ We can save the total number of anchors
Relative Distances
A 𝑣# B
A B 𝑠# 𝑠$ 𝑠%
𝑠# 𝑠$
𝑣# 1 2 1
𝑠%
A 𝑣% B 𝑣% 1 2 0
Anchor
Size-2
Anchor 𝑠! , 𝑠" cannot differentiate
Anchor-set
node 𝑣! , 𝑣& , but anchor-set 𝑠& can
10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 55
¡ Goal: Embed the metric space 𝑉, 𝑑 into the
Euclidian space ℝ! such that the original
distance metric is preserved.
§ For every node pairs 𝑢, 𝑣 ∈ 𝑉, the Euclidian
embedding distance 𝒛' − 𝒛( & is close to the
original distance metric 𝑑 𝑢, 𝑣 .

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 56


¡ Bourgain Theorem [Informal] [Bourgain 1985]
§ Consider the following embedding function of node 𝑣 ∈ 𝑉.
%
𝑓 𝑣 = 𝑑%&' 𝑣, 𝑆"," , 𝑑%&' 𝑣, 𝑆",# , … , 𝑑%&' 𝑣, 𝑆)*+ -,.)*+ - ∈ ℝ. )*+ -

§ where
§ 𝑐 is a constant.
§ 𝑆/,0 ⊂ 𝑉 is chosen by including each node in 𝑉 independently with
"
probability ' .
#
§ 𝑑%&' 𝑣, 𝑆/,0 ≡ min 𝑑 𝑣, 𝑢 .
1∈3',)

§ The embedding distance produced by 𝒇 is provably close to


the original distance metric 𝑽, 𝒅 .

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 57


P-GNN follows the theory of Bourgain theorem
§ First samples 𝑂(log & 𝑛) anchor sets 𝑆8,9 .
§ Embed each node 𝑣 via
. )*+%-
𝑑%&' 𝑣, 𝑆"," , 𝑑%&' 𝑣, 𝑆",# , … , 𝑑%&' 𝑣, 𝑆)*+ -,.)*+ - ∈ ℝ .
P-GNN maintains the inductive capability
§ During training, new anchor sets are re-sampled
every time.
§ P-GNN is learned to operate over the new anchor
sets.
§ At test time, given a new unseen graph, new
10/15/24
anchor sets are sampled.
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 58
¡ Position encoding for graphs: Represent a node’s
position by its distance to randomly selected anchor-sets
§ Each dimension of the position encoding is tied to an anchor-set

A 𝑣# B 𝑠# 𝑠$ 𝑠%
A B 𝑣!’s Position
𝑠# 𝑠$ 𝑣# 1 2 1 encoding
𝑠% 𝑣#’s Position
A 𝑣% B 𝑣% 1 2 0 encoding
Anchor
Size-2
Anchor-set

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 59


¡ The simple way: Use position encoding as an
augmented node feature (works well in
practice)

§ Issue: Since each dimension of position encoding is


tied to a random anchor set, dimensions of
positional encoding can be randomly permuted,
without changing its meaning
§ Imagine you permute the input dimensions of a
normal NN, the output will surely change

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 60


¡ The rigorous solution: Requires a special NN
that can maintain the permutation invariant
property of position encoding
§ Permuting the input feature dimension will only
result in the permutation of the output dimension,
the value in each dimension won’t change
§ Position-aware GNN paper has more details

10/15/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs 61

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy