0% found this document useful (0 votes)
100 views22 pages

The Asic: A Fire-Breathing Monster For Molecular Dynamics Simulations

The document summarizes the Anton 3 special-purpose supercomputer for molecular dynamics simulations. Some key points: 1. Anton 3 features 528 specialized processor cores optimized for pairwise force calculations and a 7nm process with over 31 billion transistors. 2. The system architecture focuses computational resources on pairwise interactions through a core tile design with bonded force calculators. 3. Communication bandwidth and efficiency were improved through an edge tile design separating and optimizing the edge network. 4. Initial results show Anton 3 can simulate over 110,000 atoms per node and runs are faster than the prior Anton 2 system even at lower clock speeds.

Uploaded by

ddscribe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views22 pages

The Asic: A Fire-Breathing Monster For Molecular Dynamics Simulations

The document summarizes the Anton 3 special-purpose supercomputer for molecular dynamics simulations. Some key points: 1. Anton 3 features 528 specialized processor cores optimized for pairwise force calculations and a 7nm process with over 31 billion transistors. 2. The system architecture focuses computational resources on pairwise interactions through a core tile design with bonded force calculators. 3. Communication bandwidth and efficiency were improved through an edge tile design separating and optimizing the edge network. 4. Initial results show Anton 3 can simulate over 110,000 atoms per node and runs are faster than the prior Anton 2 system even at lower clock speeds.

Uploaded by

ddscribe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

The ASIC: a Fire-Breathing

Monster for Molecular Dynamics Simulations

Hot Chips 33
24 August 2021
The Anton 3 hardware team

Peter J. Adams†, Brannon Batson, Alistair Bell, Jhanvi Bhatt†, J. Adam Butts,
Timothy Correia, Bruce Edwards, Peter Feldmann, Christopher H. Fenton,
Anthony Forte, Joseph Gagliardo, Gennette Gill, Maria Gorlatova†, Brian Greskamp,
J.P. Grossman†, Jeremy Hunt, Bryan L. Jackson, Mollie M. Kirk†, Jeffrey S. Kuskin†,
Roy J. Mader, Richard McGowen†, Adam McLaughlin, Mark A. Moraes,
Mohamed Nasr†, Lawrence J. Nociolo, Lief O'Donnell, Andrew Parker, Jon L. Peticolas,
Terry Quan, T. Carl Schwink†, Keun Sup Shim, Naseer Siddique†, Jochen Spengler†,
Michael Theobald, Brian Towles†, William Vick†, Stanley C. Wang, Michael Wazlowski,
Madeleine J. Weingarten, John M. Williams†, David E. Shaw

† Work conducted while at D. E. Shaw Research; author’s affiliation has subsequently changed.

2
Molecular dynamics (MD) simulation
• Understand biomolecular
systems through their motions
• Numerical integration of
Newton’s laws of motion
– Model atoms as point masses
– Compute forces on every atom
based on current positions
– Update atom velocities and positions
in discrete time steps of a few
femtoseconds
• Force computation described by
a model: the force field

3
Biomolecular force fields

F= + Rcut +

Bonded Near electrostatic, Distant electrostatic


Van der Waals

4
Meet

5
: The defending champion
Flex Tile
Dispatch 256KB Network
Unit SRAM Interface

Geometry Geometry Geometry Geometry


Core Core Core Core

16KB D$ 16KB D$ 16KB D$ 16KB D$


64KB I$ 64KB I$ 64KB I$ 64KB I$

Mini-flex tile ICB


Interaction Ctrl Blk (ICB) PPIM array
Pairwise Point Interaction Module (PPIM)
128
output
64 KB Network queues

control/commands
PPIM PPIM PPIM PPIM
SRAM Interface 0 1 17 18

640 KB
local memory PPIM PPIM PPIM PPIM
Dispatch Geometry
Unit Core 19 20 36 37

High-Throughput Interaction Subsystem (HTIS)


6
How do we make it better?
• Increase computational
throughput
– Pairwise force computation Geometry
– Programmable cores PPIM Core
• Address exposed bottlenecks (GC)
– Bond computation
– Communication bandwidth
• Improve utility
– Maximum simulation capacity
– Programmability
– Flexibility
• Manage design complexity
7
Concentrated compute: The Core Tile

❶ ❷ ❸
• Evolutionary changes
① Support additional functional forms
128 KiB ② Increase memory capacity
GC
PPIM SRAM ③ Tune instruction set for MD application
③ Increase code density
❹ ❺
RTR BOND • Revolutionary changes
④ Co-locate compute resources
⑤ Specialize bonded force computation
PPIM 128 KiB ① Double effective density of pairwise interaction
GC calculation
SRAM
②④ Implement fine-grained synchronization within
memory and network

8
Bond calculator
Term Stretch Angle Dihedral / Torsion
Positions
Atoms 2 3 4 r1 … rN
C
Parameters
O H H q (k, 0, …)
θ

C Cα 𝑑𝑑𝑉𝑉
C O φ 𝑑𝑑𝑞𝑞

Coordinate (q)  θ ϕ
F1 … FN
Potential (V) k ( − 0 )2 k (θ − θ0 )2 ∑n≤6 kn cos (nϕ − ϕ0)
Forces

9
Near versus far
Volume within radius r Electrostatic force at distance r

𝑁𝑁 ∝ 𝑟𝑟 3 𝐹𝐹 ∝ 1� 2
𝑟𝑟
Vol (r)

BIG

F (r)
small
r r

10
Efficient communication: The Edge Tile

❶ ❹ • Evolutionary changes

PCACHE
① Increase SERDES data rate
ICB
② Reduce hop latency


SERDES RTR
❺ • Revolutionary changes
③ Separate edge network
MCAST

ICB ④ MD-specific compression


⑤ Novel interaction method

11
Laying tiles
Edge tile Core tile

12
Physical design
• Channel-less, abutted layout
• Few unique blocks
• Global, low-skew clock mesh
• Engineered global routing
• Column-level redundancy
• Robust power delivery

13
The evolution of

Tape-out 2007 2012 2020


CPU cores 8+4+1 66 528*
PPIMs 32 76 528*
Flex SRAM 0.125 MiB 4 MiB 66 MiB*
Atoms / node 460 8,000 110,000*
Clock frequency 0.485/0.970 GHz 1.65 GHz 2.8+ GHz
Channel bandwidth 0.607 Tbps 2.7 Tbps 5.6+ Tbps
Process node 90 nm 40 nm 7 nm
Transistors 0.2 G 2.0 G 31.8 G
Die size 299 mm2 410 mm2 451 mm2
Power 30 W 190 W 360 W
* 22/24 columns
14
Baby pictures

29 September 2020: chips arrive


MD running (water) < 9 h later

30 September 2020: 1st protein run 31 October 2020: Multi-node


Faster @ 250 MHz than Anton 2 15
Node board
48 VDC, torus links, Ethernet, USB

500+ W custom voltage regulator

Node control complex

ASIC interface FPGA


Data processor (64-bit Linux)

Control processor (no OS)


16
Scale up

8×8 nodes
2×64 nodes 512 nodes

17
Network

Complete 512-node, 3D torus

Z-dimension cabled
(blue and yellow)

Y-dimension split across


X-dimension connected cables (green) and
entirely in backplane backplane

18
Taming (cooling) the beast

TJ < 65 °C @ 500 W
19
MD performance

Performance (simulated μs/day)

>100× faster

log scale!

Simulation size (atoms)

20
Acknowledgements
 System software group for
machine bring-up
 Embedded software group
for creating and tuning the
application
 Ken Mackenzie for performance
results and figures
 Systems group for support
and infrastructure
 And lots of photos!
 Chemistry team for putting
Anton to good use
 Kevin Yuh for MD simulation
videos

21
Performance references
• GPU performance results
– M. Bergdorf et al, “Desmond/GPU Performance as of April 2021”, DESRES/TR--2021-01, [Online: April 2021].
https://deshawresearch.com/publications.html.
– "A100, V100 and Multi-GPU Benchmarks", [Online: January 2020]. https://github.com/openmm/openmm/issues/2971.
– "NVIDIA HPC Application Performance", [Online: July 2021]. https://developer.nvidia.com/hpc-application-performance.
– "Gromacs/NAMD Multi-GPU Scaling", unpublished internal benchmarking.
• Supercomputer performance results
– “RIKEN Team Use Supercomputer to Explore the Molecular Dynamics of the New Coronavirus,” HPCwire announcement,
[Online: March 2020]. https://www.hpcwire.com/off-the-wire/riken-team-use-supercomputer-to-explore-the-molecular-dynamics-of-the-new-
coronavirus/
– S. Páll et al., “Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS,” in: S. Markidis, E. Laure (eds) Solving
Software Challenges for Exascale. EASC 2014. Lecture Notes in Computer Science, vol. 8759, 2015.
– NAMD scaling on Summit, [Online: May 2018]. http://www.ks.uiuc.edu/Research/namd/benchmarks/
– L. Casalino et al., “AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics,” International Journal of High
Performance Computing Applications, 2021.
– J. R. Perilla and K. Schulten, “Physical Properties of the HIV-1 Capsid from All-Atom Molecular Dynamics Simulations,” Nature Communications,
vol. 8 (15959), 2017.
• Anton performance results (original publications; improved performance used in comparisons)
– D. E. Shaw et al., “Millisecond-Scale Molecular Dynamics Simulations on Anton,” in SC‘09: Proceedings of the Conference on High Performance
Computing Networking, Storage, and Analysis, 2009, 1–11.
– D. E. Shaw et al., “Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer,” in
SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, 2014, 41–53.
– D. E. Shaw et al., “Anton 3: Twenty Microseconds of Molecular Dynamics Simulation Before Lunch,” to appear in SC’21: Proceedings of the
International Conference for High Performance Computing, Networking, Storage, and Analysis, 2021.

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy