0% found this document useful (0 votes)

38 views71 pages

Ai Presentation

Uploaded by

Ravi Prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views71 pages

Ai Presentation

Uploaded by

Ravi Prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Advancements in Graph

Neural Networks:
PGNNs, Pretraining, and OGB
Jure Leskovec

Includes joint work with W. Hu, J. You, M. Fey, Y. Dong, B. Liu,

M. Catasta, K. Xu, S. Jegelka, M. Zitnik, P. Liang, V. Pande
Modern ML Toolbox

Images

Text/Speech

Modern deep learning toolbox is

designed for simple sequences & grids
Jure Leskovec, Stanford University 2
But not everything
can be represented as
a sequence or a grid
How can we develop neural
networks that are much more
broadly applicable?
New frontiers beyond classic neural
networks that learn on images and
sequences
Jure Leskovec, Stanford University 3
Representation Learning in Graphs

…
z

Predictions: Node labels,

New links, Generated
Input: Network graphs and subgraphs
Jure Leskovec, Stanford University 4
Networks of Interactions

Social networks Knowledge graphs Biological networks

Complex Systems Code

Molecules
Jure Leskovec, Stanford University 5
Why is it Hard?
Networks are complex!
§ Arbitrary size and complex topological
structure (i.e., no spatial locality like grids)

vs.
Text

Networks Images
§ No fixed node ordering or reference point
§ Often dynamic and have multimodal features
Jure Leskovec, Stanford University 6
Graph Neural Networks
A

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

Each node defines a computation graph

§ Each edge in this graph is a
transformation/aggregation function
Scarselli et al. 2005. The Graph Neural Network Model. IEEE Transactions on Neural Networks.
Jure Leskovec, Stanford University 7
Graph Neural Networks
A

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

Neural networks

Intuition: Nodes aggregate information from

their neighbors using neural networks
Inductive Representation Learning on Large Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS, 2017.
Jure Leskovec, Stanford University 8
Idea: Aggregate Neighbors
Intuition: Network neighborhood
defines a computation graph
Every node defines a computation
graph based on its neighborhood!

Can be viewed as learning a generic linear combination

of graph low-pass and high-pass operators
[Bronstein et al., 2017] Jure Leskovec, Stanford University 9
[NIPS ‘17]

Our Approach: GraphSAGE

!( #' ∈ ) *ℎ% )
?… combine
xA
@… aggregate xC
? @

Update for node +:

ℎ,
(-./)
=1 2 -
ℎ,- , 4 # 1(5 -
ℎ%- )
%∈) ,
9+ 1=> level Transform *’s own Transform and aggregate
embedding of node * embedding from level 9 embeddings of neighbors :
6
§ ℎ, = features 7, of node *, 1(⋅) is a sigmoid activation function
Jure Leskovec, Stanford University 10
[NIPS ‘17]

GraphSAGE: Training
§ Aggregation parameters are shared for all nodes
§ Number of model parameters is independent of |V|
§ Can use different loss functions:
*
§ Classification/Regression: ℒ ℎ% = '% − ) ℎ%
§ Pairwise Loss: ℒ ℎ% , ℎ, = max(0, 1 − 3456 ℎ% , ℎ, )

B shared parameters

A
C 8 (9) , : (9)
F
D
shared parameters
E

INPUT GRAPH
Compute graph for node A Compute graph for node B
Jure Leskovec, Stanford University 11
Inductive Capability
zu

generate embedding
train with a snapshot new node arrives for new node

Even for nodes we

Jure Leskovec, Stanford University
never trained on!
12
[NeurIPS ‘18]

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the

entire graph.
Problem: Learn how to hierarchical pool the
nodes to embed the entire graph
Our solution: DIFFPOOL
§ Learns hierarchical pooling strategy
§ Sets of nodes are pooled hierarchically
§ Soft assignment of nodes to next-level nodes
Hierarchical Graph Representation LearningJure
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Leskovec, Stanford University 13
[NeurIPS ‘18]

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the

How expressive are
entire graph.
Graph
Problem: Neural
Learn how Networks?
to hierarchical pool the
nodes to embed the entire graph
Our solution: DIFFPOOL
§ Learns hierarchical pooling strategy
§ Sets of nodes are pooled hierarchically
§ Soft assignment of nodes to next-level nodes
Hierarchical Graph Representation LearningJure
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Leskovec, Stanford University 14
How expressive are GNNs?
Theoretical framework: Characterize
GNN’s discriminative power:
§ Characterize upper bound of the
discriminative power of GNNs
GNN tree:
§ Propose a maximally powerful GNN
§ Characterize discriminative power
of popular GNNs
Aggregation:

How Powerful are Graph Neural Networks? K. Xu, et al. ICLR 2019.
Jure Leskovec, Stanford University
Multiset15!
Key Insight: Rooted Subtrees
Assume no node features, then single nodes cannot
be distinguished but rooted trees can be distinguished:
Graph: GNN distinguishes:

The most powerful GNN is able to distinguish

rooted subtrees of different structure
GNN can distinguish blue D and violet E but not violet E and pink F node
Jure Leskovec, Stanford University 16
Discriminative Power of GNNs

Multiset "
Idea: If GNN aggregation is injective, then
different multisets are distinguished and GNN can
capture subtree structure
Theorem: Injective multiset function ! can be
written as: ! " = $(∑'∈) *(+))
Funcs. $, * are based on universal approximation thm.
Consequence: Sum aggregator is the most
expressive!
Jure Leskovec, Stanford University 17
> >
Power of Aggregators
Input sum - multiset mean - distribution

Figure 2: Ranking by expressive power for sum, mean and max-pooling aggregators over a multiset.
Left panel shows the input multiset and the three panels illustrate the aspects of the multiset a given
max - set

aggregator is able to capture: sum captures the full multiset, mean captures the proportion/distribution
of elements of a given type, and the max aggregator ignores multiplicities (reduces the multiset to a
Failure cases for mean and max agg.
simple set).

A A A A A
vs. vs. A vs.

(a) Mean and Max both fail (b) Max fails (c) Mean and Max both fail
Under Figure
review 3:
as Examples
a conference paper atgraph
of simple ICLR 2019 that mean and max-pooling aggregators fail to
structures

Ranking by discriminative power

distinguish. Figure 2 gives reasoning about how different aggregators “compress” different graph
structures/multisets.

existing GNNs instead use a 1-layer perceptron W (Duvenaud et al., 2015; Kipf & Welling, 2017;

> >
Zhang etAal., 2018), a linear mapping followed
A A
by a non-linear activation A as a ReLU.
function such
Such 1-layer mappings are examples of Generalized Linear Models (Nelder & Wedderburn, 1972).
Therefore, we are interested in understanding whether 1-layer perceptrons are enough for graph
learning. Lemma 7 suggests that there are indeed network neighborhoods (multisets) that models
Inputperceptrons can never
with 1-layer sumdistinguish.
- multiset mean - distribution max - set
Jure Leskovec, Stanford University 18
Three Consequences of GNNs
1) The GNN does two things:
§ 1) Learns how to “borrow”
feature information from
nearby nodes to enrich
the target node
§ 2) Each node can have a different
computation graph and the network is
also able to capture/learn its structure
Jure Leskovec, Stanford University 19
Three Consequences of GNNs
2) Computation graphs can be
chosen:
§ Aggregation does not
need to happen across
all neighbors
§ Neighbors can be
strategically chosen/sampled
§ Leads to big gains in practice!
Jure Leskovec, Stanford University 20
Three Consequences of GNNs
3) We understand GNN failure cases:
§ GNNs fail to distinguish isomorphic
nodes
§ Nodes with identical rooted subtrees will
be classified in the same class (in the
absence of differentiating node features)

Jure Leskovec, Stanford University 21

Structure-Aware Position-Aware
Task Task

A A A B
B B A B
Vs.
A A A B

GNNs can GNNs cannot

predict J predict L
Jure Leskovec, Stanford University 22
Core Issues with GNNs
Structure-aware task:
A $&
B B
$* A GNNs work J
$% $'
GNN can distinguish
A $( $) A nodes $& and $%

!" and !# have different comp. graphs:

A $& $% B
$( $% ≠ $& $' $(

$& $% $& $( $' $( $% $% $) $* $& $%

Jure Leskovec, Stanford University 23
Core Issues with GNNs
Position-aware task:
A $%
A B
$& B GNNs fail L
$( $*
GNN cannot distinguish
A $' $) B nodes $% and $&
How do we make GNNs
!" and !# have same comp. graphs:
more expressive?
A $% $& B
$' $( = $) $(

$% $( $% $' $* $% $( $% $' $*
Jure Leskovec, Stanford University 24
Our Solution

PGNN: Position-Aware
Graph Neural Networks

Position-aware Graph Neural Networks. J. You, R. Ying, J. Leskovec. ICML, 2019.

Data, code: https://snap.stanford.edu/pgnn
Jure Leskovec, Stanford University 25
Key Insight: Anchors
Idea: Nodes need to know ”where” in the
network they are
Solution:
§ Anchor: a randomly selected node
§ Anchor-set: a randomly selected node
subset, anchor is a size-1 anchor-set
§ (1) Randomly choose many anchor-sets
§ (2) A given node can then use its distance
to these anchor-sets to understand its
location/position in the network
Jure Leskovec, Stanford University 26
Power of “Anchor”
Relative
A !" !$ B
A B Distances
#" #"

A !" 1
B
Anchor !$ 2

Observation: Represent !" and !$ via their

relative distances w.r.t. an anchor #" , which
are different Jure Leskovec, Stanford University 27
Power of “Anchors”
Relative
A !" !$ B
A B Distances
#" #% #" #%

A !" 1 2
B
Anchor Anchor !$ 2 1

Observation: More anchors can better

characterize node position
Jure Leskovec, Stanford University 28
Power of “Anchor-sets”
Relative
A !" !% B
A B Distances
$" $# $" $# $&
$& !" 1 2 1
A !# Size-1
B
Size-2 Anchor-set !% 2 1 2
Anchor-set !# 1 2 0

Observation: Large anchor-sets are more likely to

differentiate symmetric nodes (!" and !# ), thus
save the number of anchors we need to use.
Jure Leskovec, Stanford University 29
Position-aware GNN Framework
§ P-GNN is a family of models
§ We demonstrate one possible design, but
other models are possible (future work!)

Anchor-set selection Embedding computation for a single node !)

+) <(!) , !) )× ℎ&' ℎ&' @ 1223
!" !) +) %&' 1 1226 Next
<(!) , !" )× ℎ&' ℎ&/ layer
ℎ&'
!$
+" %&' 2 ∈ ℝ9
<(!) , !* )× ℎ&' ℎ&0
!# !* ×: Output
;&'
+* +" <(!) , !# )× ℎ&' ℎ&- +* %&' 3
∈ ℝ*

Jure Leskovec, Stanford University 30

Overview of Position-aware GNN
§ (a) Randomly select anchor-sets
§ (b) Compute pairwise node distances
§ (c) Compute anchor-set messages
§ (d) Transform messages to node embeddings
Anchor-set selection Embedding computation for a single node !)
+) <(!) , !) )× ℎ&' ℎ&' @ 1223
!" !) +) %&' 1 1226 Next
<(!) , !" )× ℎ&' ℎ&/ layer
ℎ&'
!$
+" %&' 2 ∈ ℝ9
<(!) , !* )× ℎ&' ℎ&0
!# !* ×: ;&' Output
+* +" <(!) , !# )× ℎ&' ℎ&- +* %&' 3
∈ ℝ*

Jure Leskovec, Stanford University 31

Position-aware GNN Framework
Step (a): Select anchor-sets [Bourgain, 1985]
§ Randomly choose anchor-sets with sizes from
1, 2, 4, …, n/2 (log(,) number of sizes)
§ For each size of anchor-set, repeat . ⋅ log(,) times
§ In total, 0 = . ⋅ log 2 (,) anchors
'%
!" !%

!# !&
'& '"

Jure Leskovec, Stanford University 32

Position-aware GNN Framework
Step (b): Compute pairwise node distances
§ Position-based similarities: For example, shortest
path, personalized PageRank, …
%
§ We use !-hop shortest path distance "#$ ('( , '* )
for fast computation
pairwise node distances +('( , '* )
'% '0 '1 '2 '3 '4
1
!" !% +('( , '* ) = % '0 1 0.5 0.5 0.3 0.3
"#$ ('( , '* ) + 1
!$ '1 0.5 1 0.3 0.5 0.5
'2 0.5 0.3 1 0.5 0.3
!# !& '3 0.3 0.5 0.5 1 0.5
'& '" '4 0.3 0.5 0.3 0.5 1

Jure Leskovec, Stanford University 33

Position-aware GNN Framework
Step (c): Compute anchor-set messages
§ Position info + Feature info à Anchor-set Messages
Message
computation

/(12 , 12 )× ℎ"# ℎ"# 8 *++,

92 !"# 1
/(12 , 15 )× ℎ"# ℎ"(
95 !"# 2
'% /(12 , 16 )× ℎ"# ℎ") !(#$ , #& )
!" !% !" !# !$ !% !&
/(12 , 17 )× ℎ"# ℎ"& 96 !"# 3 !" 1 0.5 0.5 0.3 0.3
!$ !# 0.5 1 0.3 0.5 0.5

!# !&
One per !$ 0.5 0.3 1 0.5 0.3

'& '"
anchor set !% 0.3 0.5 0.5 1 0.5
!& 0.3 0.5 0.3 0.5 1
Jure Leskovec, Stanford University 34
Position-aware GNN Framework
Step (d): From messages to node embeddings
!"# 1
Messages from
3 anchor-sets !"# 2
×( Output
)"#
!"# 3
∈ ℝ,

(1) Position-aware node embedding !"# :

- Output of P-GNN
- 3 anchor-sets à 3 dimensions
- Each dimension is tied to an anchor-set,
and is thus position-aware
Jure Leskovec, Stanford University 35
Position-aware GNN Framework
Step (d): From messages to node embeddings
!"# 1 ())* Next
layer
ℎ"#
Messages from
∈ ℝ-
3 anchor-sets !"# 2

!"# 3

(2) Node message !"# :

- Fed into next layer of P-GNN
- Aggregate over 3 anchor-sets, keeping the feature
information
- Each dimension is independent form anchor-set
selection
Jure Leskovec, Stanford University 36
Tasks and Datasets
§ Link prediction:
§ Grid: dimension 20 x 20
§ Communities: 20 communities of 20 nodes
§ PPI: 24 graphs of 3000 nodes, node have features
§ Pairwise-node classification:
§ Node label: Whether two nodes belong to the same
community, or same class
§ Communities: 20 communities of 20 nodes
§ Emails: 7 graphs with 6 communities, have features
§ Protein: 113 protein graphs, node have features
Jure Leskovec, Stanford University 37
Results: Link Prediction
§ Link prediction

P-GNN up to 66% improvement in ROC AUC

Jure Leskovec, Stanford University 38
Results: Node Classification
§ Pairwise node classification

P-GNN up to 61% improvement in ROC AUC

Jure Leskovec, Stanford University 39
PGNN: Visualizing Embeddings
Input graph GNN embedding P-GNN embedding

Jure Leskovec, Stanford University 40

Strategies for Pretraining
Graph Neural Networks

Strategies For Pre-training Graph Neural Networks. W. Hu, B. Liu, J. Gomes, M.

Zitnik, P. Liang, V. Pande, J. Leskovec. ICLR, 2020.
Jure Leskovec, Stanford University 41
Graph ML in Scientific Domains
¡ Chemistry: Molecular graphs
§ Molecular property prediction
f( ) = toxic?

¡ Biology: Protein Protein Interaction Networks

§ Protein function prediction

f( ) = biological activity?

Jure Leskovec, Stanford University 42

Graph ML in Scientific Domains
¡ Chemistry: Molecular graphs
§ Molecular property prediction Our running
example
f( ) = toxic?

¡ Biology: Protein Protein Interaction Networks

§ Protein function prediction

f( ) = biological activity?

Jure Leskovec, Stanford University 43

Challenges for ML in
Scientific Domains
1. Scarcity of labeled data
§ Obtaining labels requires expensive lab
experiments
à GNNs overfit to small training datasets

2. Out-of-distribution prediction
§ Test examples tend to be very different from
training examples
à GNNs extrapolate poorly

Jure Leskovec, Stanford University 44

Our Solution: Pre-training GNNs

We design GNN pre-training strategies

to systematically investigate:
Q1: How effective is pre-training GNNs?
Q2: What are the most effective
strategies?

Jure Leskovec, Stanford University 45

How Effective is Pre-training GNNs?
Naïve strategy:
Supervised pre-training on relevant labels 1 Toxicity A?

0 Toxicity B?

0 Toxicity C?

1 Toxicity D?

Pre-training labels from chemical database

• ~450K molecules from ChEMBL
• 1310 diverse binary tasks
Downstream task:
• 8 diverse molecular classification datasets
Jure Leskovec, Stanford University 46
How Effective is Pre-training GNNs?
Naïve strategy:
Supervised pre-training on relevant labels
-> Limited improvement. Often leads to negative transfer!

Molecule classification performance

16.0

: Naïve strategy
improvement over no

14.0

12.0
pre-training
ROCAUC

10.0

8.0

6.0

4.0
Limited
2.0 improvement
0.0

Non pre-trained -2.0 Negative transfer

GNN

Downstream datasets
Jure Leskovec, Stanford University 47
What are the Effective Strategies?
Key insight: Pre-train both node and graph
embeddings

Nodes are separated. But Graphs are separated. But Separation both in node as
embeddings are not embeddings of individual well as in graph space.
composable (graphs are nodes are not!
not separated)! Jure Leskovec, Stanford University 48
Possible Pre-training Methods

Jure Leskovec, Stanford University 49

Attribute Masking
1. Mask node attributes: Atom types
2. Let GNNs predict them

Jure Leskovec, Stanford University 50

Context Prediction
1. Sample a center node for each molecule
2. Sample neighborhood, and a context graph
3. Let GNN distinguish true (neighborhood,
context) pairs from false ones

Jure Leskovec, Stanford University 51

Supervised Attribute Prediction
1. Multi-task prediction of many relevant labels

1 Toxicity A?

0 Toxicity B?
GNN

0 Toxicity C?

1 Toxicity D?

Jure Leskovec, Stanford University 52

Overall Strategy
1. Node-level pre-training on unlabeled data
2. Graph-level pre-training on labeled data
3. Fine-tune on downstream data
1 Unlabeled molecules
(2M molecules from ZINC)
Downstream
task 1
Node pre-train

3
GNN Fine-tune
Chemistry database
2
Downstrea
Graph pre-train m task N
Jure Leskovec, Stanford University 53
Results of Our Strategy
§ Avoids negative transfer.
§ Significantly improve the performance.
Molecule classification performance
16.0

: Our strategy
improvement over no

14.0

12.0
pre-training
ROCAUC

: Naïve strategy
ROCAUC improvement over no

10.0

8.0

6.0
pre-training

4.0

2.0
Avoids negative transfer!
0.0

Non pre-trained -2.0

GNNs

Downstream datasets

Jure Leskovec, Stanford University 54

– AttrMasking 64.3 ±2.8 76.7 ±0.4 64.2 ±0.5 61.0 ±0.7 71.8 ±4.1 74.7 ±1.4 77.2 ±1.1 79.3 ±1.6 71.1
– ContextPred 68.0 ±2.0 75.7 ±0.7 63.9 ±0.6 60.9 ±0.6 65.9 ±3.8 75.8 ±1.7 77.3 ±1.0 79.6 ±1.2 70.9
Supervised – 68.3 ±0.7 77.0 ±0.3 64.4 ±0.4 62.1 ±0.5 57.2 ±2.5 79.4 ±1.3 74.4 ±1.2 76.9 ±1.0 70.0

Comparison of GNNs
Supervised Infomax 68.0 ±1.8 77.8 ±0.3 64.9 ±0.7 60.9 ±0.6 71.2 ±2.8 81.3 ±1.4 77.8 ±0.9 80.1 ±0.9 72.8
Supervised EdgePred 66.6 ±2.2 78.3 ±0.3 66.5 ±0.3 63.3 ±0.9 70.9 ±4.6 78.5 ±2.4 77.5 ±0.8 79.1 ±3.7 72.6
Supervised AttrMasking 66.5 ±2.5 77.9 ±0.4 65.1 ±0.3 63.9 ±0.9 73.7 ±2.8 81.2 ±1.9 77.1 ±1.2 80.3 ±0.9 73.2
Supervised ContextPred 68.7 ±1.3 78.1 ±0.6 65.7 ±0.6 62.7 ±0.8 72.6 ±1.5 81.3 ±2.1 79.9 ±0.7 84.5 ±0.7 74.2

Table 1: Test ROC-AUC (%) performance on molecular prediction benchmarks using different
pre-training strategies with GIN. The rightmost column averages the mean of test performance
Q: What are the effect of pre-training on
across the 8 datasets. The best result for each dataset and comparable results (i.e., results within one
standard deviation from the best result) are bolded. The shaded cells indicate negative transfer, i.e.,
different GNN architectures?
ROC-AUC of a pre-trained model is worse than that of a non-pre-trained model. Notice that node- as
well as graph-level pretraining are essential for good performance.

Chemistry Biology
Non-pre-trained Pre-trained Gain Non-pre-trained Pre-trained Gain
GIN 67.0 74.2 +7.2 64.8 ± 1.0 74.2 ± 1.5 +9.4
GCN 68.9 72.2 +3.4 63.2 ± 1.0 70.9 ± 1.7 +7.7
GraphSAGE 68.3 70.3 +2.0 65.7 ± 1.2 68.5 ± 1.5 +2.8
GAT 66.8 60.3 -6.5 68.2 ± 1.1 67.8 ± 3.6 -0.4

Table 2: Test ROC-AUC (%) performance of different GNN architectures with and without
pre-training. Without pre-training, the less expressive GNNs give slightly better performance
A: More expressive models (GIN) benefit the
than the most expressive GIN because of their smaller model complexity in a low data regime.
However, with pre-training, the most expressive GIN is properly regularized and dominates the other
most from pre-training
architectures. For results split by chemistry datasets, see Table 4 in Appendix H. Pre-training strategy
for chemistry data: Context Prediction + Graph-level supervised pre-training; pre-training strategy
for biology data: Attribute Masking + Graph-level supervised pre-training.

Jure Leskovec, Stanford University 55

COVID-19 Challenge
§ On-going initiative by MIT
§ Open task: Molecule prediction of antibacterial
properties to find antibiotics for COVID-19.
§ Dataset: 2,335 training molecules, binary class.
§ Evaluation: Prediction on the hidden test set.
https://www.aicures.mit.edu

Jure Leskovec, Stanford University 56

COVID-19 Challenge
§ Our pre-trained GNN ranked 1st place. The 3rd
place also used pre-training.
§ The model will be eventually used for virtual
screening.

Jure Leskovec, Stanford University 57

Open Graph Benchmark
§ On-going effort for large-scale realistic
benchmark datasets for graph ML.

Webpage: https://ogb.stanford.edu/
Github: https://github.com/snap-stanford/ogb

Major release and paper coming in two weeks!

Jure Leskovec, Stanford University 58
ML with Graphs Today
Datasets commonly used today:
§ Node classification
§ CORA: 2,708 nodes, 5,429 edges
§ Citeseer: 3,327 nodes, 4,732 edges
§ Graph Classification
§ MUTAG: 188 molecules
§ Knowledge Graphs
§ FB15k: 15,000 nodes, 600 edges.
Jure Leskovec, Stanford University 59
ML with Graphs
To properly track progress and identify
issues with current approaches it is
critical for our community to...

…develop diverse, challenging, and

realistic benchmark datasets for
machine learning on graphs

Jure Leskovec, Stanford University 60

Why a New Benchmark?
1) Current focus is on small graphs or small sets of
graphs from just a handful of domains:
§ Datasets are too small
§ Datasets do not contain rich node or edge features
§ Hard to reliably and rigorously evaluate algorithms

2) Lack of common benchmark datasets for

comparing different methods:
§ Every paper design its own, custom train/test split
§ Performance across papers is not comparable

3) Dataset splits follow conventional random splits:

§ Unrealistic for real-world applications
§ Accuracies are over-optimistic under conventional splits
Jure Leskovec, Stanford University 61
The Open Graph Benchmark
OGB is a set of benchmarks for graph ML:
1. Ready-to-use datasets for key tasks on graphs:
§ Node classification, link prediction, graph classification
2. Common codebase to load, construct &
represent graphs:
§ Popular deep frameworks, e.g., DGL, PyTorch Geometric
3. Common codebase with performance metrics
for fast model evaluation and comparison:
§ Meaningful train/validation/test splits

Jure Leskovec, Stanford University 62

OGB Datasets are Diverse

Jure Leskovec, Stanford University 63

Overview of Current OGB Datasets
§ Covers diverse ML tasks, domains, and scales.
§ More datasets to come to increase the coverage.

Jure Leskovec, Stanford University 64

Meaningful Data Splits
Currently: Prediction accuracies on graph benchmarks are
saturating. Generalization gap is quite small once we have
enough labeled data.
Meaningful data splits focusing on generalization:
§ Scaffold split: For molecular graph datasets, OGB:
§ Clusters molecules by scaffold (molecular graph substructure)
§ Gives validation/test sets with structurally different molecules
§ Species split: For protein interaction datasets, OGB:
§ Uses protein graphs from model species (weed, worm, E. coli, fly,
mouse, yeast, zebrafish) as train/validation sets
§ Uses protein graphs from humans as test set
§ Time split: For the KG completion dataset, OGB:
§ Uses triplets until a certain timestamp as training/validation sets.
§ Uses newly-added triplets as test set.
Jure Leskovec, Stanford University 65
Case Study: Products Dataset
Out-of-dist. pred. in Amazon product graph.
§ Nodes are split according to product sales ranking.
Visualization of the split. Performance on OGB and random splits.
Method Train Acc (%) Test Acc (%)
GraphSAGE
92.98 ± 0.16 78.03 ± 0.22
(OGB split)
GraphSAGE
90.47 ± 0.28 87.74 ± 0.06
(rand split)

§ Non-random split is much more

challenging than random split.
Q: How to close the generalization gap in out-
of-distribution prediction over graphs?
Jure Leskovec, Stanford University 66
Case Study: Products Dataset
Scalability challenge in Amazon product graph.

§ 2.5M nodes. Method Accuracy (%)

§ Full-batch GraphSAGE is GraphSAGE
78.03 ± 0.22
(full-batch)
not scalable (requires 40GB
Cluster
GPU memory.) GraphSAGE 75.18 ± 0.41
(KDD 2018)
§ Recent scalable GNNs are
GraphSAINT
worse than full-batch GNN. (ICLR 2020)
77.29 ± 0.19

Q: How to design scalable GNNs that are as

accurate as full-batch GNN?
Jure Leskovec, Stanford University 67
Open Graph Benchmark
Resource for graph ML problems
§ We provide pip-installable Python OGB
package with loaders and evaluators.
We envision OGB to be:
§ Common, community-driven platform for
graph ML research
§ Teaching resource

https://ogb.stanford.edu/ https://github.com/snap-stanford/ogb
Jure Leskovec, Stanford University 68
Open Graph Benchmark
https://ogb.stanford.edu
ogb@cs.stanford.edu

Core development team

Weihua Hu, B. Liu, J. Dong, M. Fey, M. Zitnik, J. Leskovec

Steering committee
Regina Barzilay, Peter Battaglia, Yoshua Bengio, Michael Bronstein, Stephan
Günnemann, Will Hamilton, Tommi Jaakkola, Stefanie Jegelka, Maximilian
Nickel, Chris Re, Le Song, Jian Tang, Max Welling, Rich Zemel
Jure Leskovec, Stanford University 69
Postdoc positions in :
(1) ML on Graphs; (2) NLP and knowledge
graphs; (3) ML for biomedicine
Apply at http://snap.stanford.edu
Jure Leskovec, Stanford University 70
Industry Partnerships
PhD Students

Alexandra Camilo Claire Emma Weihua

Porter Ruiz Donnat Pierson Hu

Funding
Jiaxuan Bowen Hongyu Rex
You Liu Ren Ying

Post-Doctoral Fellows
Collaborators
Dan Jurafsky, Linguistics, Stanford University
David Grusky, Sociology, Stanford University
Stephen Boyd, Electrical Engineering, Stanford University
Baharan Marinka Michele Pan Shantao David Gleich, Computer Science, Purdue University
Mirzasoleiman Zitnik Catasta Li Li VS Subrahmanian, Computer Science, University of Maryland
Marinka Zitnik, Medicine, Harvard University
Russ Altman, Medicine, Stanford University
Research Jochen Profit, Medicine, Stanford University
Staff Eric Horvitz, Microsoft Research
Jon Kleinberg, Computer Science, Cornell University
Maria Adrijan Rok Sendhill Mullainathan, Economics, Harvard University
Brbic Bradaschia Sosic Scott Delp, Bioengineering, Stanford University
James Zou, Medicine, Stanford University
Jure Leskovec, Stanford University 71

04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
How Powerful Are Graph Neural Networks
No ratings yet
How Powerful Are Graph Neural Networks
17 pages
07-theory
No ratings yet
07-theory
62 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
Are Powerful Graph Neural Nets Necessary
No ratings yet
Are Powerful Graph Neural Nets Necessary
16 pages
14-gnn
No ratings yet
14-gnn
58 pages
Graph Conv
No ratings yet
Graph Conv
16 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
12 pages
CS 224W Fall 2023 HW1
No ratings yet
CS 224W Fall 2023 HW1
11 pages
gnns
No ratings yet
gnns
75 pages
23 - AAAI - Substructure Aware Graph Neural Networks
No ratings yet
23 - AAAI - Substructure Aware Graph Neural Networks
9 pages
A Comparison Between Recursive Neural Networks and Graph Neural Networks
No ratings yet
A Comparison Between Recursive Neural Networks and Graph Neural Networks
8 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
No ratings yet
[2020 Arxiv]A Survey on The Expressive Power of Graph Neural Networks
42 pages
Documents 2025-3 [v2] GNN (Node Classification) GNN Classification v2
No ratings yet
Documents 2025-3 [v2] GNN (Node Classification) GNN Classification v2
74 pages
DLG Book
No ratings yet
DLG Book
332 pages
1911.05954v3
No ratings yet
1911.05954v3
9 pages
GNNs
No ratings yet
GNNs
28 pages
Chapter 4 - Machine Learning With Graphs II: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 4 - Machine Learning With Graphs II: Prepared By: Shier Nee, SAW
48 pages
Intro To GNN
No ratings yet
Intro To GNN
49 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
124 pages
06-GNN3
No ratings yet
06-GNN3
73 pages
GNN-Foundations-Frontiers-and-Applications-chapter9
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter9
15 pages
GNN-Foundations-Frontiers-and-Applications-chapter4
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter4
21 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
05-GNN2
No ratings yet
05-GNN2
72 pages
Fusion Graph Convolutional Networks
No ratings yet
Fusion Graph Convolutional Networks
10 pages
GNNChap 7
No ratings yet
GNNChap 7
26 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Graph Neural Networks In Action Meap Version 4 Chapters 4 Of 8 Keita Broadwater pdf download
No ratings yet
Graph Neural Networks In Action Meap Version 4 Chapters 4 Of 8 Keita Broadwater pdf download
52 pages
2024_Introduction to Graph Neural Networks A Starting
No ratings yet
2024_Introduction to Graph Neural Networks A Starting
49 pages
Lecture 14 Graph Neural Networks (GNNs)
No ratings yet
Lecture 14 Graph Neural Networks (GNNs)
16 pages
GML Introduction
No ratings yet
GML Introduction
11 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
2204.07697v1
No ratings yet
2204.07697v1
23 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
grl unit 3
No ratings yet
grl unit 3
14 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
GNN-Foundations-Frontiers-and-Applications-chapter3
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter3
11 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
5 pages
2302.08043v3
No ratings yet
2302.08043v3
12 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
CAAI Trans on Intel Tech - 2024 - Sharma - Image and video analysis using graph neural network for Internet of Medical
No ratings yet
CAAI Trans on Intel Tech - 2024 - Sharma - Image and video analysis using graph neural network for Internet of Medical
15 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
9 pages
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
No ratings yet
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
13 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
GNNS
No ratings yet
GNNS
7 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ai Presentation

Uploaded by

Ai Presentation

Uploaded by

Advancements in Graph

Includes joint work with W. Hu, J. You, M. Fey, Y. Dong, B. Liu,

Modern deep learning toolbox is

Predictions: Node labels,

Social networks Knowledge graphs Biological networks

Complex Systems Code

Each node defines a computation graph

Intuition: Nodes aggregate information from

Can be viewed as learning a generic linear combination

Our Approach: GraphSAGE

Update for node +:

Even for nodes we

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the

The most powerful GNN is able to distinguish

Ranking by discriminative power

Jure Leskovec, Stanford University 21

GNNs can GNNs cannot

!" and !# have different comp. graphs:

$& $% $& $( $' $( $% $% $) $* $& $%

Position-aware Graph Neural Networks. J. You, R. Ying, J. Leskovec. ICML, 2019.

Observation: Represent !" and !$ via their

Observation: More anchors can better

Observation: Large anchor-sets are more likely to

Anchor-set selection Embedding computation for a single node !)

Jure Leskovec, Stanford University 30

Jure Leskovec, Stanford University 31

Jure Leskovec, Stanford University 32

Jure Leskovec, Stanford University 33

/(12 , 12 )× ℎ"# ℎ"# 8 *++,

(1) Position-aware node embedding !"# :

(2) Node message !"# :

P-GNN up to 66% improvement in ROC AUC

P-GNN up to 61% improvement in ROC AUC

Jure Leskovec, Stanford University 40

Strategies For Pre-training Graph Neural Networks. W. Hu, B. Liu, J. Gomes, M.

¡ Biology: Protein Protein Interaction Networks

Jure Leskovec, Stanford University 42

¡ Biology: Protein Protein Interaction Networks

Jure Leskovec, Stanford University 43

Jure Leskovec, Stanford University 44

We design GNN pre-training strategies

Jure Leskovec, Stanford University 45

Pre-training labels from chemical database

Molecule classification performance

Non pre-trained -2.0 Negative transfer

Jure Leskovec, Stanford University 49

Jure Leskovec, Stanford University 50

Jure Leskovec, Stanford University 51

Jure Leskovec, Stanford University 52

Non pre-trained -2.0

Jure Leskovec, Stanford University 54

Jure Leskovec, Stanford University 55

Jure Leskovec, Stanford University 56

Jure Leskovec, Stanford University 57

Major release and paper coming in two weeks!

…develop diverse, challenging, and

Jure Leskovec, Stanford University 60

2) Lack of common benchmark datasets for

3) Dataset splits follow conventional random splits:

Jure Leskovec, Stanford University 62

Jure Leskovec, Stanford University 63

Jure Leskovec, Stanford University 64

§ Non-random split is much more

§ 2.5M nodes. Method Accuracy (%)

Q: How to design scalable GNNs that are as

Core development team

Alexandra Camilo Claire Emma Weihua

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.