0% found this document useful (0 votes)
220 views

UALink-1.0-Specification-Webinar_FINAL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views

UALink-1.0-Specification-Webinar_FINAL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UALink 200G 1.

0 Specification Overview

4/22/2025 Ultra Accelerator Link 2025


Presenters
• Peter Onufryk, UALink Consortium President, Intel
• Nathan Kalyanasundharam, UALink Consortium Technical Task Force
Co-Chair, AMD
• Chris Petersen, UALink Consortium Director, Astera Labs

4/22/2025 Ultra Accelerator Link 2025 2


Introduction

Peter Onufryk, Intel


UALink Consortium President

4/22/2025 Ultra Accelerator Link 2024 3


Advancing AI Across Data Centers
AI models continue to grow requiring more compute and memory
to efficiently execute training and inference on large models

The industry needs an open solution that enables efficient


distribution of models across many accelerators within a pod

Large inference models will require scale-up of 10’s – 100’s of


accelerators in pods

Large training models will require scale-up and scale-out from


100’s – 10,000’s of accelerators by connecting multiple pods

4/22/2025
Ultra Accelerator Link 2025 4
Board of Directors

Contributor Members

90+Members
4/22/2025 Ultra Accelerator Link 2025
Ultra Accelerator Link Timeline

UALink 200G 1.0


Specification

Promoter Group UALink


Press Release Membership
Forms Posted
To Website
April 2025

May 2024 October 2024

4/22/2025 Ultra Accelerator Link 2025 6


UALink Creates the Scale-up Pod
 High performance
 Up to 800Gbps per Port, Ethernet Scale-Out
scalable ports per accelerator,
Up to 1,024 accelerators

 Low latency
 1 RACK : UALink
 Optimized protocol,  1 RACK : UALink
transaction, link & physical
 2 2RACKS
RACKS :: UALink
UALink
 Low power 3-4RACKS
 3-4 RACKS : UALink
UALink oror UEC
Ethernet
 The simplified UALink stack  > 4 RACKS : UEC
leads to lower power solutions  >4 RACKS : Ethernet

 Low die area


 Optimized data layer and
transaction layer saves
significant die area
UALink1.0 focus is to deliver optimized scale-up solutions with single tier switching

4/22/2025 Confidential | Ultra Accelerator Link 2025 7


UALink 200G 1.0 Specification
 The UALink interconnect enables Accelerator-to-Accelerator communication
 The initial focus is sharing memory among accelerators
 Direct load, store, and atomic operations between accelerators (i.e. GPUs)
 Low latency, high bandwidth fabric for 100’s of accelerators in a pod (up to 1K)
 Simple load/store/atomics semantics with software coherency
 The initial UALink specification taps into the experience of the Promoters developing and deploying a broad range
of accelerators and seeded with the proven Infinity Fabric protocol

4/22/2025 Ultra Accelerator Link 2025 8


UALink 200G 1.0 Benefits

 Performance, Power & Efficiency


 Low-latency, high-bandwidth interconnect for hundreds of accelerators in a pod
 Features the same raw speed as Ethernet with the latency of PCIe® switches
 Enables a highly efficient switch design that reduces power and complexity with small packets,
fixed FLIT sizes, ID based routing, and overall simplicity
 Significantly smaller die area for link stack, lowering power and acquisition costs &
 Increased bandwidth efficiency further enables lower TCO
 Open and Standardized
 UALink harnesses the innovation of member companies to drive leading-edge features into the
specification and interoperable products to the market
 Leverages ubiquitous Ethernet infrastructure
 Cables, Connectors, Retimers, Management Software, and more.

4/22/2025 Ultra Accelerator Link 2025 9


Technical
Overview
Nathan Kalyanasundharam, AMD
UALink Technical Task Force Co-Chair

4/22/2025 Ultra Accelerator Link 2024 10


UALink Stack Features & Goals

 Standard Ethernet Physical


 UALink DL
 UALink TL
 UALink Protocol

4/22/2025 Ultra Accelerator Link 2025 11


UALink Protocol Interface (UPLI)
• Simple symmetric interface protocol
• Request
• Request Data
• Read Response + Data
• Write Response
• Originator interface sends requests to other
accelerators and receives responses.
• Completer interface receives requests from other
accelerators and returns responses
• Src/Dst Identifier(ID) based routing
• Provisioned to enable multiple address spaces
• Same address ordering for Requests; 1x4b, 2 x 2b OR 4x1b
Completions unordered

4/22/2025 Ultra Accelerator Link 2025 12


Transaction Layer (TL)
 TL Flit organized as sixteen Eff.
95.2%
4-byte Sectors
 TL Flit is also divided into Upper and
Eff
Lower 32-byte Half Flits 92.3%

 Control half-flit is used for


 Requests, read responses, write responses,
flow control and NOP indication

 Data uses half & full Flits


 Read response data, Write data and byte
mask, Atomic operand data and byte mask

 Requests & responses may be


compressed
 Uncompressed Requests = 16B
 Compressed Requests = 8B
 Uncompressed Responses = 8B
 Compressed Responses = 4B

Note: For illustration

4/22/2025 Ultra Accelerator Link 2025 13


Data Link Layer (DL) – 640B

 640 Byte DL FLIT


 Flit Header = 3 Bytes
 Segment Hdr = 5 Bytes
 CRC = 4 Bytes
 Efficiency = 628/640 = 98.125%

 FEC Code Word = 680 Bytes


 Higher signaling rate (212.5 GHz) to
cover the FEC overhead

Simplified viewfor illustration.

4/22/2025 Ultra Accelerator Link 2025 14


Scale-up POD

 Single tier switches


 Number of switch planes scaled with bandwidth
per accelerator
 Number of Accelerators per POD is limited by
lanes per switch
 POD may be configured as many virtual pods
 Virtual POD reconfiguration does not impact each
other
 Error in one Virtual POD does not impact another
 Error recovery expected to be contained to a
Virtual POD through Port or Station Reset
 Internal Switch Errors may impact the entire POD.
Requires application restart

4/22/2025 Ultra Accelerator Link 2025 15


Data Flow
 Accelerators finely interleave (256B) memory channels  TL packs requests and responses into same FLIT
 Maximizes bandwidth to local and peer GPU memory  Requests and responses to many destination may be
packed together
 Load/store/atomic memory accesses use small packets  Reduces latency and area
 Application may communicate with multiple peers  TL is a light-weight implementation consuming
simultaneously ~0.3 sqmm in N3 technology

4/22/2025 Ultra Accelerator Link 2025 16


Systems Specifications
Conclusion
Chris Petersen, Astera Labs
UALink Board Director

4/22/2025 Ultra Accelerator Link 2024 17


Switch & Cluster Management
REST API GUI

Initialization & Setup


UALink Fault Mgmt & RAS
System Node Management Interface Pod Workload Mgmt Out‐of‐Band Management Interface
Controller vPod Mgmt
Telemetry & Reports

Switch Management Interface

• Flexible management models for switches UALink Pod

• Ethernet-like appliance model & UALink Switch Platform 1


Switch Mgmt
UALink Switch Platform 2
Switch Mgmt
UALink Switch Platform 3
Switch Mgmt

• Lightweight PCIe-like switch model


NIC Agent BMC NIC Agent BMC NIC Agent BMC
Processor Processor Processor
TPM TPM TPM
UALink Switch UALink Switch UALink Switch

• Common work-flows/APIs

• Leverage industry specifications


• OCP, CPER, etc. Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc Acc

• For Telemetry, Accelerator management, RAS, etc.


CPU CPU CPU CPU CPU CPU
Pod Pod Pod
Mgmt Mgmt Mgmt
Agent Agent Agent

NIC BMC NIC BMC NIC BMC

UALink System Node 1 UALink System Node 2 UALink System Node 3

4/22/2025 Ultra Accelerator Link 2025 18


Management Layer

Example for illustration


4/22/2025 Ultra Accelerator Link 2025 19
In Progress
128G DL/PL Specification In-Network Collectives (INC) Specification 128G & 200G UCIe PHY Chiplet Specification
Expected release : July 2025 Expected release : Dec 2025 Under investigation
2.0 Tb

64b @16G 64b @16G


1b @0.8G
Sideband

S
Electrical Electrical
B (mainband) (mainband)

ucie
PHY PHY

Die‐to‐Die Adapter Die‐to‐Die Adapter

UPLI packing (ucie) UPLI packing (ucie)

PHYLET
UAL UAL
TL TL
U128 U200 U128 U200
DL DL DL DL
Layer1 Layer1
PcieG7 PcieG7
Ethernet Ethernet
UAL_PCS UAL_PCS
PL PL

PHY(212.5G) PHY(212.5G)
X4 X4
(212/106/128) (212/106/128)

1.6 Tb
(8 lanes; 2 * 800Gb)

4/22/2025 Ultra Accelerator Link 2025 20


Summary
 UALink addresses industry demand for a scale-up fabric empowering efficient, scalable AI
applications
 Facilitates direct load/store for AI accelerators
 Open industry standard enables advanced models across multiple AI accelerators
 Advances large AI model training & inference

 UALink enables an efficient, low-latency and high bandwidth interconnect across hundreds of
accelerators within a few racks

 The UALink 200G 1.0 Specification is available for download at: www.ualinkconsortium.org

Thank you!!

4/22/2025 Ultra Accelerator Link 2025 21


Q&A

4/22/2025 Ultra Accelerator Link 2025 22


THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy