0% found this document useful (0 votes)
38 views71 pages

Ai Presentation

Uploaded by

Ravi Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views71 pages

Ai Presentation

Uploaded by

Ravi Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Advancements in Graph

Neural Networks:
PGNNs, Pretraining, and OGB
Jure Leskovec

Includes joint work with W. Hu, J. You, M. Fey, Y. Dong, B. Liu,


M. Catasta, K. Xu, S. Jegelka, M. Zitnik, P. Liang, V. Pande
Modern ML Toolbox

Images

Text/Speech

Modern deep learning toolbox is


designed for simple sequences & grids
Jure Leskovec, Stanford University 2
But not everything
can be represented as
a sequence or a grid
How can we develop neural
networks that are much more
broadly applicable?
New frontiers beyond classic neural
networks that learn on images and
sequences
Jure Leskovec, Stanford University 3
Representation Learning in Graphs


z

Predictions: Node labels,


New links, Generated
Input: Network graphs and subgraphs
Jure Leskovec, Stanford University 4
Networks of Interactions

Social networks Knowledge graphs Biological networks

Complex Systems Code


Molecules
Jure Leskovec, Stanford University 5
Why is it Hard?
Networks are complex!
§ Arbitrary size and complex topological
structure (i.e., no spatial locality like grids)

vs.
Text

Networks Images
§ No fixed node ordering or reference point
§ Often dynamic and have multimodal features
Jure Leskovec, Stanford University 6
Graph Neural Networks
A

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

Each node defines a computation graph


§ Each edge in this graph is a
transformation/aggregation function
Scarselli et al. 2005. The Graph Neural Network Model. IEEE Transactions on Neural Networks.
Jure Leskovec, Stanford University 7
Graph Neural Networks
A

TARGET NODE B B C

A
A
C B
A C
F E
D F
E
D
INPUT GRAPH A

Neural networks

Intuition: Nodes aggregate information from


their neighbors using neural networks
Inductive Representation Learning on Large Graphs. W. Hamilton, R. Ying, J. Leskovec. NIPS, 2017.
Jure Leskovec, Stanford University 8
Idea: Aggregate Neighbors
Intuition: Network neighborhood
defines a computation graph
Every node defines a computation
graph based on its neighborhood!

Can be viewed as learning a generic linear combination


of graph low-pass and high-pass operators
[Bronstein et al., 2017] Jure Leskovec, Stanford University 9
[NIPS ‘17]

Our Approach: GraphSAGE


!( #' ∈ ) *ℎ% )
?… combine
xA
@… aggregate xC
? @

Update for node +:


ℎ,
(-./)
=1 2 -
ℎ,- , 4 # 1(5 -
ℎ%- )
%∈) ,
9+ 1=> level Transform *’s own Transform and aggregate
embedding of node * embedding from level 9 embeddings of neighbors :
6
§ ℎ, = features 7, of node *, 1(⋅) is a sigmoid activation function
Jure Leskovec, Stanford University 10
[NIPS ‘17]

GraphSAGE: Training
§ Aggregation parameters are shared for all nodes
§ Number of model parameters is independent of |V|
§ Can use different loss functions:
*
§ Classification/Regression: ℒ ℎ% = '% − ) ℎ%
§ Pairwise Loss: ℒ ℎ% , ℎ, = max(0, 1 − 3456 ℎ% , ℎ, )

B shared parameters

A
C 8 (9) , : (9)
F
D
shared parameters
E

INPUT GRAPH
Compute graph for node A Compute graph for node B
Jure Leskovec, Stanford University 11
Inductive Capability
zu

generate embedding
train with a snapshot new node arrives for new node

Even for nodes we


Jure Leskovec, Stanford University
never trained on!
12
[NeurIPS ‘18]

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the


entire graph.
Problem: Learn how to hierarchical pool the
nodes to embed the entire graph
Our solution: DIFFPOOL
§ Learns hierarchical pooling strategy
§ Sets of nodes are pooled hierarchically
§ Soft assignment of nodes to next-level nodes
Hierarchical Graph Representation LearningJure
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Leskovec, Stanford University 13
[NeurIPS ‘18]

DIFFPOOL: Pooling for GNNs

Don’t just embed individual nodes. Embed the


How expressive are
entire graph.
Graph
Problem: Neural
Learn how Networks?
to hierarchical pool the
nodes to embed the entire graph
Our solution: DIFFPOOL
§ Learns hierarchical pooling strategy
§ Sets of nodes are pooled hierarchically
§ Soft assignment of nodes to next-level nodes
Hierarchical Graph Representation LearningJure
with Differentiable Pooling. R. Ying, et al. NeurIPS 2018.
Leskovec, Stanford University 14
How expressive are GNNs?
Theoretical framework: Characterize
GNN’s discriminative power:
§ Characterize upper bound of the
discriminative power of GNNs
GNN tree:
§ Propose a maximally powerful GNN
§ Characterize discriminative power
of popular GNNs
Aggregation:

How Powerful are Graph Neural Networks? K. Xu, et al. ICLR 2019.
Jure Leskovec, Stanford University
Multiset15!
Key Insight: Rooted Subtrees
Assume no node features, then single nodes cannot
be distinguished but rooted trees can be distinguished:
Graph: GNN distinguishes:

The most powerful GNN is able to distinguish


rooted subtrees of different structure
GNN can distinguish blue D and violet E but not violet E and pink F node
Jure Leskovec, Stanford University 16
Discriminative Power of GNNs

Multiset "
Idea: If GNN aggregation is injective, then
different multisets are distinguished and GNN can
capture subtree structure
Theorem: Injective multiset function ! can be
written as: ! " = $(∑'∈) *(+))
Funcs. $, * are based on universal approximation thm.
Consequence: Sum aggregator is the most
expressive!
Jure Leskovec, Stanford University 17
> >
Power of Aggregators
Input sum - multiset mean - distribution

Figure 2: Ranking by expressive power for sum, mean and max-pooling aggregators over a multiset.
Left panel shows the input multiset and the three panels illustrate the aspects of the multiset a given
max - set

aggregator is able to capture: sum captures the full multiset, mean captures the proportion/distribution
of elements of a given type, and the max aggregator ignores multiplicities (reduces the multiset to a
Failure cases for mean and max agg.
simple set).

A A A A A
vs. vs. A vs.

(a) Mean and Max both fail (b) Max fails (c) Mean and Max both fail
Under Figure
review 3:
as Examples
a conference paper atgraph
of simple ICLR 2019 that mean and max-pooling aggregators fail to
structures

Ranking by discriminative power


distinguish. Figure 2 gives reasoning about how different aggregators “compress” different graph
structures/multisets.

existing GNNs instead use a 1-layer perceptron W (Duvenaud et al., 2015; Kipf & Welling, 2017;

> >
Zhang etAal., 2018), a linear mapping followed
A A
by a non-linear activation A as a ReLU.
function such
Such 1-layer mappings are examples of Generalized Linear Models (Nelder & Wedderburn, 1972).
Therefore, we are interested in understanding whether 1-layer perceptrons are enough for graph
learning. Lemma 7 suggests that there are indeed network neighborhoods (multisets) that models
Inputperceptrons can never
with 1-layer sumdistinguish.
- multiset mean - distribution max - set
Jure Leskovec, Stanford University 18
Three Consequences of GNNs
1) The GNN does two things:
§ 1) Learns how to “borrow”
feature information from
nearby nodes to enrich
the target node
§ 2) Each node can have a different
computation graph and the network is
also able to capture/learn its structure
Jure Leskovec, Stanford University 19
Three Consequences of GNNs
2) Computation graphs can be
chosen:
§ Aggregation does not
need to happen across
all neighbors
§ Neighbors can be
strategically chosen/sampled
§ Leads to big gains in practice!
Jure Leskovec, Stanford University 20
Three Consequences of GNNs
3) We understand GNN failure cases:
§ GNNs fail to distinguish isomorphic
nodes
§ Nodes with identical rooted subtrees will
be classified in the same class (in the
absence of differentiating node features)

Jure Leskovec, Stanford University 21


Structure-Aware Position-Aware
Task Task

A A A B
B B A B
Vs.
A A A B

GNNs can GNNs cannot


predict J predict L
Jure Leskovec, Stanford University 22
Core Issues with GNNs
Structure-aware task:
A $&
B B
$* A GNNs work J
$% $'
GNN can distinguish
A $( $) A nodes $& and $%

!" and !# have different comp. graphs:


A $& $% B
$( $% ≠ $& $' $(

$& $% $& $( $' $( $% $% $) $* $& $%


Jure Leskovec, Stanford University 23
Core Issues with GNNs
Position-aware task:
A $%
A B
$& B GNNs fail L
$( $*
GNN cannot distinguish
A $' $) B nodes $% and $&
How do we make GNNs
!" and !# have same comp. graphs:
more expressive?
A $% $& B
$' $( = $) $(

$% $( $% $' $* $% $( $% $' $*
Jure Leskovec, Stanford University 24
Our Solution

PGNN: Position-Aware
Graph Neural Networks

Position-aware Graph Neural Networks. J. You, R. Ying, J. Leskovec. ICML, 2019.


Data, code: https://snap.stanford.edu/pgnn
Jure Leskovec, Stanford University 25
Key Insight: Anchors
Idea: Nodes need to know ”where” in the
network they are
Solution:
§ Anchor: a randomly selected node
§ Anchor-set: a randomly selected node
subset, anchor is a size-1 anchor-set
§ (1) Randomly choose many anchor-sets
§ (2) A given node can then use its distance
to these anchor-sets to understand its
location/position in the network
Jure Leskovec, Stanford University 26
Power of “Anchor”
Relative
A !" !$ B
A B Distances
#" #"

A !" 1
B
Anchor !$ 2

Observation: Represent !" and !$ via their


relative distances w.r.t. an anchor #" , which
are different Jure Leskovec, Stanford University 27
Power of “Anchors”
Relative
A !" !$ B
A B Distances
#" #% #" #%

A !" 1 2
B
Anchor Anchor !$ 2 1

Observation: More anchors can better


characterize node position
Jure Leskovec, Stanford University 28
Power of “Anchor-sets”
Relative
A !" !% B
A B Distances
$" $# $" $# $&
$& !" 1 2 1
A !# Size-1
B
Size-2 Anchor-set !% 2 1 2
Anchor-set !# 1 2 0

Observation: Large anchor-sets are more likely to


differentiate symmetric nodes (!" and !# ), thus
save the number of anchors we need to use.
Jure Leskovec, Stanford University 29
Position-aware GNN Framework
§ P-GNN is a family of models
§ We demonstrate one possible design, but
other models are possible (future work!)

Anchor-set selection Embedding computation for a single node !)


+) <(!) , !) )× ℎ&' ℎ&' @ 1223
!" !) +) %&' 1 1226 Next
<(!) , !" )× ℎ&' ℎ&/ layer
ℎ&'
!$
+" %&' 2 ∈ ℝ9
<(!) , !* )× ℎ&' ℎ&0
!# !* ×: Output
;&'
+* +" <(!) , !# )× ℎ&' ℎ&- +* %&' 3
∈ ℝ*

Jure Leskovec, Stanford University 30


Overview of Position-aware GNN
§ (a) Randomly select anchor-sets
§ (b) Compute pairwise node distances
§ (c) Compute anchor-set messages
§ (d) Transform messages to node embeddings
Anchor-set selection Embedding computation for a single node !)
+) <(!) , !) )× ℎ&' ℎ&' @ 1223
!" !) +) %&' 1 1226 Next
<(!) , !" )× ℎ&' ℎ&/ layer
ℎ&'
!$
+" %&' 2 ∈ ℝ9
<(!) , !* )× ℎ&' ℎ&0
!# !* ×: ;&' Output
+* +" <(!) , !# )× ℎ&' ℎ&- +* %&' 3
∈ ℝ*

Jure Leskovec, Stanford University 31


Position-aware GNN Framework
Step (a): Select anchor-sets [Bourgain, 1985]
§ Randomly choose anchor-sets with sizes from
1, 2, 4, …, n/2 (log(,) number of sizes)
§ For each size of anchor-set, repeat . ⋅ log(,) times
§ In total, 0 = . ⋅ log 2 (,) anchors
'%
!" !%

!$

!# !&
'& '"

Jure Leskovec, Stanford University 32


Position-aware GNN Framework
Step (b): Compute pairwise node distances
§ Position-based similarities: For example, shortest
path, personalized PageRank, …
%
§ We use !-hop shortest path distance "#$ ('( , '* )
for fast computation
pairwise node distances +('( , '* )
'% '0 '1 '2 '3 '4
1
!" !% +('( , '* ) = % '0 1 0.5 0.5 0.3 0.3
"#$ ('( , '* ) + 1
!$ '1 0.5 1 0.3 0.5 0.5
'2 0.5 0.3 1 0.5 0.3
!# !& '3 0.3 0.5 0.5 1 0.5
'& '" '4 0.3 0.5 0.3 0.5 1

Jure Leskovec, Stanford University 33


Position-aware GNN Framework
Step (c): Compute anchor-set messages
§ Position info + Feature info à Anchor-set Messages
Message
computation

/(12 , 12 )× ℎ"# ℎ"# 8 *++,


92 !"# 1
/(12 , 15 )× ℎ"# ℎ"(
95 !"# 2
'% /(12 , 16 )× ℎ"# ℎ") !(#$ , #& )
!" !% !" !# !$ !% !&
/(12 , 17 )× ℎ"# ℎ"& 96 !"# 3 !" 1 0.5 0.5 0.3 0.3
!$ !# 0.5 1 0.3 0.5 0.5

!# !&
One per !$ 0.5 0.3 1 0.5 0.3

'& '"
anchor set !% 0.3 0.5 0.5 1 0.5
!& 0.3 0.5 0.3 0.5 1
Jure Leskovec, Stanford University 34
Position-aware GNN Framework
Step (d): From messages to node embeddings
!"# 1
Messages from
3 anchor-sets !"# 2
×( Output
)"#
!"# 3
∈ ℝ,

(1) Position-aware node embedding !"# :


- Output of P-GNN
- 3 anchor-sets à 3 dimensions
- Each dimension is tied to an anchor-set,
and is thus position-aware
Jure Leskovec, Stanford University 35
Position-aware GNN Framework
Step (d): From messages to node embeddings
!"# 1 ())* Next
layer
ℎ"#
Messages from
∈ ℝ-
3 anchor-sets !"# 2

!"# 3

(2) Node message !"# :


- Fed into next layer of P-GNN
- Aggregate over 3 anchor-sets, keeping the feature
information
- Each dimension is independent form anchor-set
selection
Jure Leskovec, Stanford University 36
Tasks and Datasets
§ Link prediction:
§ Grid: dimension 20 x 20
§ Communities: 20 communities of 20 nodes
§ PPI: 24 graphs of 3000 nodes, node have features
§ Pairwise-node classification:
§ Node label: Whether two nodes belong to the same
community, or same class
§ Communities: 20 communities of 20 nodes
§ Emails: 7 graphs with 6 communities, have features
§ Protein: 113 protein graphs, node have features
Jure Leskovec, Stanford University 37
Results: Link Prediction
§ Link prediction

P-GNN up to 66% improvement in ROC AUC


Jure Leskovec, Stanford University 38
Results: Node Classification
§ Pairwise node classification

P-GNN up to 61% improvement in ROC AUC


Jure Leskovec, Stanford University 39
PGNN: Visualizing Embeddings
Input graph GNN embedding P-GNN embedding

Jure Leskovec, Stanford University 40


Strategies for Pretraining
Graph Neural Networks

Strategies For Pre-training Graph Neural Networks. W. Hu, B. Liu, J. Gomes, M.


Zitnik, P. Liang, V. Pande, J. Leskovec. ICLR, 2020.
Jure Leskovec, Stanford University 41
Graph ML in Scientific Domains
¡ Chemistry: Molecular graphs
§ Molecular property prediction
f( ) = toxic?

¡ Biology: Protein Protein Interaction Networks


§ Protein function prediction

f( ) = biological activity?

Jure Leskovec, Stanford University 42


Graph ML in Scientific Domains
¡ Chemistry: Molecular graphs
§ Molecular property prediction Our running
example
f( ) = toxic?

¡ Biology: Protein Protein Interaction Networks


§ Protein function prediction

f( ) = biological activity?

Jure Leskovec, Stanford University 43


Challenges for ML in
Scientific Domains
1. Scarcity of labeled data
§ Obtaining labels requires expensive lab
experiments
à GNNs overfit to small training datasets

2. Out-of-distribution prediction
§ Test examples tend to be very different from
training examples
à GNNs extrapolate poorly

Jure Leskovec, Stanford University 44


Our Solution: Pre-training GNNs

We design GNN pre-training strategies


to systematically investigate:
Q1: How effective is pre-training GNNs?
Q2: What are the most effective
strategies?

Jure Leskovec, Stanford University 45


How Effective is Pre-training GNNs?
Naïve strategy:
Supervised pre-training on relevant labels 1 Toxicity A?

0 Toxicity B?

0 Toxicity C?

1 Toxicity D?

Pre-training labels from chemical database


• ~450K molecules from ChEMBL
• 1310 diverse binary tasks
Downstream task:
• 8 diverse molecular classification datasets
Jure Leskovec, Stanford University 46
How Effective is Pre-training GNNs?
Naïve strategy:
Supervised pre-training on relevant labels
-> Limited improvement. Often leads to negative transfer!

Molecule classification performance


16.0

: Naïve strategy
improvement over no

14.0

12.0
pre-training
ROCAUC

10.0

8.0

6.0

4.0
Limited
2.0 improvement
0.0

Non pre-trained -2.0 Negative transfer


GNN

Downstream datasets
Jure Leskovec, Stanford University 47
What are the Effective Strategies?
Key insight: Pre-train both node and graph
embeddings

Nodes are separated. But Graphs are separated. But Separation both in node as
embeddings are not embeddings of individual well as in graph space.
composable (graphs are nodes are not!
not separated)! Jure Leskovec, Stanford University 48
Possible Pre-training Methods

Jure Leskovec, Stanford University 49


Attribute Masking
1. Mask node attributes: Atom types
2. Let GNNs predict them

Jure Leskovec, Stanford University 50


Context Prediction
1. Sample a center node for each molecule
2. Sample neighborhood, and a context graph
3. Let GNN distinguish true (neighborhood,
context) pairs from false ones

Jure Leskovec, Stanford University 51


Supervised Attribute Prediction
1. Multi-task prediction of many relevant labels

1 Toxicity A?

0 Toxicity B?
GNN

0 Toxicity C?

1 Toxicity D?

Jure Leskovec, Stanford University 52


Overall Strategy
1. Node-level pre-training on unlabeled data
2. Graph-level pre-training on labeled data
3. Fine-tune on downstream data
1 Unlabeled molecules
(2M molecules from ZINC)
Downstream
task 1
Node pre-train

3
GNN Fine-tune
Chemistry database
2
Downstrea
Graph pre-train m task N
Jure Leskovec, Stanford University 53
Results of Our Strategy
§ Avoids negative transfer.
§ Significantly improve the performance.
Molecule classification performance
16.0

: Our strategy
improvement over no

14.0

12.0
pre-training
ROCAUC

: Naïve strategy
ROCAUC improvement over no

10.0

8.0

6.0
pre-training

4.0

2.0
Avoids negative transfer!
0.0

Non pre-trained -2.0


GNNs

Downstream datasets

Jure Leskovec, Stanford University 54


– AttrMasking 64.3 ±2.8 76.7 ±0.4 64.2 ±0.5 61.0 ±0.7 71.8 ±4.1 74.7 ±1.4 77.2 ±1.1 79.3 ±1.6 71.1
– ContextPred 68.0 ±2.0 75.7 ±0.7 63.9 ±0.6 60.9 ±0.6 65.9 ±3.8 75.8 ±1.7 77.3 ±1.0 79.6 ±1.2 70.9
Supervised – 68.3 ±0.7 77.0 ±0.3 64.4 ±0.4 62.1 ±0.5 57.2 ±2.5 79.4 ±1.3 74.4 ±1.2 76.9 ±1.0 70.0

Comparison of GNNs
Supervised Infomax 68.0 ±1.8 77.8 ±0.3 64.9 ±0.7 60.9 ±0.6 71.2 ±2.8 81.3 ±1.4 77.8 ±0.9 80.1 ±0.9 72.8
Supervised EdgePred 66.6 ±2.2 78.3 ±0.3 66.5 ±0.3 63.3 ±0.9 70.9 ±4.6 78.5 ±2.4 77.5 ±0.8 79.1 ±3.7 72.6
Supervised AttrMasking 66.5 ±2.5 77.9 ±0.4 65.1 ±0.3 63.9 ±0.9 73.7 ±2.8 81.2 ±1.9 77.1 ±1.2 80.3 ±0.9 73.2
Supervised ContextPred 68.7 ±1.3 78.1 ±0.6 65.7 ±0.6 62.7 ±0.8 72.6 ±1.5 81.3 ±2.1 79.9 ±0.7 84.5 ±0.7 74.2

Table 1: Test ROC-AUC (%) performance on molecular prediction benchmarks using different
pre-training strategies with GIN. The rightmost column averages the mean of test performance
Q: What are the effect of pre-training on
across the 8 datasets. The best result for each dataset and comparable results (i.e., results within one
standard deviation from the best result) are bolded. The shaded cells indicate negative transfer, i.e.,
different GNN architectures?
ROC-AUC of a pre-trained model is worse than that of a non-pre-trained model. Notice that node- as
well as graph-level pretraining are essential for good performance.

Chemistry Biology
Non-pre-trained Pre-trained Gain Non-pre-trained Pre-trained Gain
GIN 67.0 74.2 +7.2 64.8 ± 1.0 74.2 ± 1.5 +9.4
GCN 68.9 72.2 +3.4 63.2 ± 1.0 70.9 ± 1.7 +7.7
GraphSAGE 68.3 70.3 +2.0 65.7 ± 1.2 68.5 ± 1.5 +2.8
GAT 66.8 60.3 -6.5 68.2 ± 1.1 67.8 ± 3.6 -0.4

Table 2: Test ROC-AUC (%) performance of different GNN architectures with and without
pre-training. Without pre-training, the less expressive GNNs give slightly better performance
A: More expressive models (GIN) benefit the
than the most expressive GIN because of their smaller model complexity in a low data regime.
However, with pre-training, the most expressive GIN is properly regularized and dominates the other
most from pre-training
architectures. For results split by chemistry datasets, see Table 4 in Appendix H. Pre-training strategy
for chemistry data: Context Prediction + Graph-level supervised pre-training; pre-training strategy
for biology data: Attribute Masking + Graph-level supervised pre-training.

Jure Leskovec, Stanford University 55


COVID-19 Challenge
§ On-going initiative by MIT
§ Open task: Molecule prediction of antibacterial
properties to find antibiotics for COVID-19.
§ Dataset: 2,335 training molecules, binary class.
§ Evaluation: Prediction on the hidden test set.
https://www.aicures.mit.edu

Jure Leskovec, Stanford University 56


COVID-19 Challenge
§ Our pre-trained GNN ranked 1st place. The 3rd
place also used pre-training.
§ The model will be eventually used for virtual
screening.

Jure Leskovec, Stanford University 57


Open Graph Benchmark
§ On-going effort for large-scale realistic
benchmark datasets for graph ML.

Webpage: https://ogb.stanford.edu/
Github: https://github.com/snap-stanford/ogb

Major release and paper coming in two weeks!


Jure Leskovec, Stanford University 58
ML with Graphs Today
Datasets commonly used today:
§ Node classification
§ CORA: 2,708 nodes, 5,429 edges
§ Citeseer: 3,327 nodes, 4,732 edges
§ Graph Classification
§ MUTAG: 188 molecules
§ Knowledge Graphs
§ FB15k: 15,000 nodes, 600 edges.
Jure Leskovec, Stanford University 59
ML with Graphs
To properly track progress and identify
issues with current approaches it is
critical for our community to...

…develop diverse, challenging, and


realistic benchmark datasets for
machine learning on graphs

Jure Leskovec, Stanford University 60


Why a New Benchmark?
1) Current focus is on small graphs or small sets of
graphs from just a handful of domains:
§ Datasets are too small
§ Datasets do not contain rich node or edge features
§ Hard to reliably and rigorously evaluate algorithms

2) Lack of common benchmark datasets for


comparing different methods:
§ Every paper design its own, custom train/test split
§ Performance across papers is not comparable

3) Dataset splits follow conventional random splits:


§ Unrealistic for real-world applications
§ Accuracies are over-optimistic under conventional splits
Jure Leskovec, Stanford University 61
The Open Graph Benchmark
OGB is a set of benchmarks for graph ML:
1. Ready-to-use datasets for key tasks on graphs:
§ Node classification, link prediction, graph classification
2. Common codebase to load, construct &
represent graphs:
§ Popular deep frameworks, e.g., DGL, PyTorch Geometric
3. Common codebase with performance metrics
for fast model evaluation and comparison:
§ Meaningful train/validation/test splits

Jure Leskovec, Stanford University 62


OGB Datasets are Diverse

Jure Leskovec, Stanford University 63


Overview of Current OGB Datasets
§ Covers diverse ML tasks, domains, and scales.
§ More datasets to come to increase the coverage.

Jure Leskovec, Stanford University 64


Meaningful Data Splits
Currently: Prediction accuracies on graph benchmarks are
saturating. Generalization gap is quite small once we have
enough labeled data.
Meaningful data splits focusing on generalization:
§ Scaffold split: For molecular graph datasets, OGB:
§ Clusters molecules by scaffold (molecular graph substructure)
§ Gives validation/test sets with structurally different molecules
§ Species split: For protein interaction datasets, OGB:
§ Uses protein graphs from model species (weed, worm, E. coli, fly,
mouse, yeast, zebrafish) as train/validation sets
§ Uses protein graphs from humans as test set
§ Time split: For the KG completion dataset, OGB:
§ Uses triplets until a certain timestamp as training/validation sets.
§ Uses newly-added triplets as test set.
Jure Leskovec, Stanford University 65
Case Study: Products Dataset
Out-of-dist. pred. in Amazon product graph.
§ Nodes are split according to product sales ranking.
Visualization of the split. Performance on OGB and random splits.
Method Train Acc (%) Test Acc (%)
GraphSAGE
92.98 ± 0.16 78.03 ± 0.22
(OGB split)
GraphSAGE
90.47 ± 0.28 87.74 ± 0.06
(rand split)

§ Non-random split is much more


challenging than random split.
Q: How to close the generalization gap in out-
of-distribution prediction over graphs?
Jure Leskovec, Stanford University 66
Case Study: Products Dataset
Scalability challenge in Amazon product graph.

§ 2.5M nodes. Method Accuracy (%)


§ Full-batch GraphSAGE is GraphSAGE
78.03 ± 0.22
(full-batch)
not scalable (requires 40GB
Cluster
GPU memory.) GraphSAGE 75.18 ± 0.41
(KDD 2018)
§ Recent scalable GNNs are
GraphSAINT
worse than full-batch GNN. (ICLR 2020)
77.29 ± 0.19

Q: How to design scalable GNNs that are as


accurate as full-batch GNN?
Jure Leskovec, Stanford University 67
Open Graph Benchmark
Resource for graph ML problems
§ We provide pip-installable Python OGB
package with loaders and evaluators.
We envision OGB to be:
§ Common, community-driven platform for
graph ML research
§ Teaching resource

https://ogb.stanford.edu/ https://github.com/snap-stanford/ogb
Jure Leskovec, Stanford University 68
Open Graph Benchmark
https://ogb.stanford.edu
ogb@cs.stanford.edu

Core development team


Weihua Hu, B. Liu, J. Dong, M. Fey, M. Zitnik, J. Leskovec

Steering committee
Regina Barzilay, Peter Battaglia, Yoshua Bengio, Michael Bronstein, Stephan
Günnemann, Will Hamilton, Tommi Jaakkola, Stefanie Jegelka, Maximilian
Nickel, Chris Re, Le Song, Jian Tang, Max Welling, Rich Zemel
Jure Leskovec, Stanford University 69
Postdoc positions in :
(1) ML on Graphs; (2) NLP and knowledge
graphs; (3) ML for biomedicine
Apply at http://snap.stanford.edu
Jure Leskovec, Stanford University 70
Industry Partnerships
PhD Students

Alexandra Camilo Claire Emma Weihua


Porter Ruiz Donnat Pierson Hu

Funding
Jiaxuan Bowen Hongyu Rex
You Liu Ren Ying

Post-Doctoral Fellows
Collaborators
Dan Jurafsky, Linguistics, Stanford University
David Grusky, Sociology, Stanford University
Stephen Boyd, Electrical Engineering, Stanford University
Baharan Marinka Michele Pan Shantao David Gleich, Computer Science, Purdue University
Mirzasoleiman Zitnik Catasta Li Li VS Subrahmanian, Computer Science, University of Maryland
Marinka Zitnik, Medicine, Harvard University
Russ Altman, Medicine, Stanford University
Research Jochen Profit, Medicine, Stanford University
Staff Eric Horvitz, Microsoft Research
Jon Kleinberg, Computer Science, Cornell University
Maria Adrijan Rok Sendhill Mullainathan, Economics, Harvard University
Brbic Bradaschia Sosic Scott Delp, Bioengineering, Stanford University
James Zou, Medicine, Stanford University
Jure Leskovec, Stanford University 71

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy