0% found this document useful (0 votes)
117 views49 pages

04 PULP Chips

This document discusses chips that have been made using the PULP open source hardware platform. It describes single-core microcontroller chips like PULPino and PULPissimo, as well as many-core chips for more advanced workloads. The document also discusses lessons learned from developing these chips and releasing them using multi-project wafer runs.

Uploaded by

Nguyễn Sĩ Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views49 pages

04 PULP Chips

This document discusses chips that have been made using the PULP open source hardware platform. It describes single-core microcontroller chips like PULPino and PULPissimo, as well as many-core chips for more advanced workloads. The document also discusses lessons learned from developing these chips and releasing them using multi-project wafer runs.

Uploaded by

Nguyễn Sĩ Nam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

PULP PLATFORM

Open Source Hardware, the way it should be!

Working with RISC-V


Part 4 of 4 : PULP based chips

Frank K. Gürkaynak <kgf@ee.ethz.ch>


Luca Benini <lbenini@iis.ee.ethz.ch>

http://pulp-platform.org @pulp_platform https://www.youtube.com/pulp_platform


Working with RISC-V

Summary
Part 1 Introduction to RISC-V ISA
Part 2 Advanced RISC-V Architectures
Part 3 PULP concepts
Part 4 PULP based chips
From concept to reality
Single core microcontrollers: PULPino to PULPissimo
Many core systems: OpenPULP
Advanced systems with accelerators
Lessons learned, the good, the bad and the ugly.
|
ACACES 2020 - July 2020
Working with RISC-V

We will discuss chips we have made with PULP


Why make chips at all? Advanced PULP chips
MPW: Only limited samples Kosmodrom: 2x 64b Ariane cores
Use cases + ML accelerators
Making use of technology: Body biasing
Single core PULP chips
PULPino (Imperio) Lessons learned
PULPissimo (Arnold) There are many pitfalls
We had great success, but..
Many core PULP chips .. sometimes you have embarrasing
Cluster only (Honey Bunny) failures. Part of the process
PULPopen (Mr. Wolf)

|
ACACES 2020 - July 2020
Working with RISC-V

Multi Project Wafer, chips for prototyping


Cost sharing method for ICs
Multiple ICs are manufactured
together. They share the mask costs
1.5M cost / 10 projects = 150k per project
But you only get 1 / 10 of the area
Dedicated MPW services available
Europractice-IC for SMEs and academia

You only get a few chips


Usually 50 to 200
Per chip costs very high (few kUSD) Image taken from https://europractice-ic.com/mpw-prototyping/general/mpw-minisic/

All our chips through MPWs |


ACACES 2020 - July 2020
Working with RISC-V

Our ASICs have different use cases


Chips characterized on an IC tester (Poseidon 22nm)
Research demonstrators (Nano drone with Mr. Wolf/GAP8)
Industrial uses of our cores/peripherals (open-isa.org Vega)
board)

|
ACACES 2020 - July 2020
Working with RISC-V

Most of what we show is openly available


All our development is on GitHub
HDL source code, testbenches, software development kit, virtual platform
https://github.com/pulp-platform
PULP is released under the permissive Solderpad license
Allows anyone to use, change, and make products without restrictions.

|
ACACES 2020 - July 2020
Working with RISC-V

PULP has
RISC-V Cores
released a large
Peripherals
number of IPs
Interconnect
RI5CY Ibex Snitch Ariane JTAG SPI Logarithmic interconnect
+ Ara
UART I2S APB Peripheral Bus
32b 32b 32b 64b DMA GPIO AXI4 Interconnect
Platforms
M M M M
I M M M M M M M M
M M interconnect
M M MinterconnectM M
interconnect

I R5 R5 R5 RV

interconnect
O RV interconnect
I

interconnect
interconnect

A RV RV RV R5 R5 R5 RV
cluster
A O A RV RV RV
cluster
cluster O cluster
Single Core RV Multi-core RV Multi-cluster
PULPino Fulmine Hero
PULPissimo Mr. Wolf Open Piton
IOT HPC
Accelerators
HWCE Neurostream HWCrypt PULPO
|
(convolution) (ML) (crypto) (1st ord. opt)
ACACES 2020 - July 2020
Working with RISC-V

PULPino: Our first open source release


Simple design PULPino
Meant as a quick release SPI S Data
Mem
Separate data and inst. mem UART

AXI - interconnect
APB-interconnect
Makes it easy in HW I2C Boot
RISC-V
core
ROM
Not meant as a Harvard arch. UART
I$
Can use all our 32bit cores SPI M

RI5CY (CV32E40P), Zero/Micro-Riscy (Ibex) GPIO


Bus Inst
Adapt Mem

Peripherals from other projects


Any AXI and APB peripherals could be used
|
ACACES 2020 - July 2020
Working with RISC-V

Imperio 65nm RISC-V core


Chip implemented in 65nm
Using RI5CY (RV32IMC) core
64 kBytes of memory
Basic peripherals (GPIO, SPI, I2C)
Working debug interface

Functional up to 500 MHz


Main challenge was to find fast memory
cuts to work at that speed.
Memory made of multiple smaller cuts to
maximize the operating speed.
|
ACACES 2020 - July 2020
Working with RISC-V

Working chip on an Arduino compatible board

|
ACACES 2020 - July 2020
Working with RISC-V

#5 - Arnold (2018) Fastest collaboration


GF22nm
RISC-V microcontroller with eFPGA
Based around PULPissimo

Collaboration with Quicklogic


Met at GTC 2017 by coincidence
In one year chip was taped out
Only possible because of open source nature

Quicklogic is going open source


They announced June 2020 the Quicklogic Davide Schiavone, Davide Rossi, Alfio Di Mauro, Frank Gurkaynak, Timothy Saxe,
Mao Wang, Ket Chong Yap, Luca Benini, "Arnold: an eFPGA-Augmented RISC-V
Open Reconfigurable Computing SoC for Flexible and Low-Power IoT End-Nodes", arXiv: 2006.14256

https://www.quicklogic.com/QORC/ |
ACACES 2020 - July 2020
Working with RISC-V

PULPissimo: very good platform for extensions


Mem Mem
Bank Bank
Mem Mem
Bank Bank
Mem Mem
Bank Bank
Mem Mem
Bank Bank eFPGA added as accel.
Easy plug and play
JTAG Tightly Coupled Data Memory Interconnect
instr data Configuration over APB
DP DP
UART TCDM Ibuf

SPI adapter
Mem Mem / I$ Additional ALU and memory
(buffered) MAC RI5CY
I2 S
I2 C
I/O
uDMA
Uses the same memory
intfs
GPIO eFPGA
CPI
Event Unit
Multiple operation modes
GPIO Config Control
Configurable peripheral
APB / Peripheral Interconnect
Accelerator for core
Clock / Reset
Timer
Power Debug Accelerator for independent I/O
Generator Controller Unit
FLLs Always-On

|
ACACES 2020 - July 2020
Working with RISC-V

Experimental platform with many configurations


I/O subsystem accel
6.0mW, 2.5x

Custom I/O interface


BNN interface 12.5mW
2.2x

CPU accelerator
CRC 7.5mW 42x

Many more ideas


Dynamic reconfiguration
|
ACACES 2020 - July 2020
Working with RISC-V

Arnold test board with D. Schiavone

|
ACACES 2020 - July 2020
Working with RISC-V

Many cores running at low VDD is more efficient

|
ACACES 2020 - July 2020
Working with RISC-V

Instead of using a single fast core

Mem

RISCV
core

I$

|
ACACES 2020 - July 2020
Working with RISC-V

Let us have a cluster of cores

Mem Mem Mem Mem

RISCV RISCV RISCV RISCV


core core core core

I$ I$ I$ I$
CLUSTER
|
ACACES 2020 - July 2020
Working with RISC-V

Many cores connected to many memory banks


Tightly Coupled Data Memory

Mem Mem Mem Mem

Mem Mem Mem Mem

interconnect

RISCV RISCV RISCV RISCV


core core core core

I$ I$ I$ I$
CLUSTER
|
ACACES 2020 - July 2020
Working with RISC-V

DMA copies data from an external memory


Tightly Coupled Data Memory

Mem Mem Mem Mem


L2
Mem

interconnect
DMA Mem Mem Mem Mem

interconnect

Event RISCV RISCV RISCV RISCV


Unit core core core core

I$ I$ I$ I$
CLUSTER
|
ACACES 2020 - July 2020
Working with RISC-V

Add a SoC part that includes memory and I/O


Tightly Coupled Data Memory

Mem Mem Mem Mem


L2
Mem

interconnect
DMA Mem Mem Mem Mem

interconnect

Event RISCV RISCV RISCV RISCV


Unit core core core core

I/O
I$ I$ I$ I$
SoC CLUSTER
|
ACACES 2020 - July 2020
Working with RISC-V

Honey Bunny GF28 SLP


Our first RISC-V many-core chip
RISCV RISCV
Four RI5CY cores (RC32IMC) in one cluster
Mem
core core 64 kBytes of TCDM memory inside cluster

interconnect
RISCV RISCV
Mem
256 kBytes of L2 memory
core core
Runs at 400MHz+
I$ I$

CLUSTER
DMA
New technology for us
interconnect
Needed to port the clock generator (FLL)
Design has analog parts
L2 L2 L2 L2 Can not be made open source directly
FLL

Mem Mem Mem Mem


Major effort needed for every new technology
Size and number of blocks in the drawing are indicative and not to scale |
ACACES 2020 - July 2020
Working with RISC-V

Visiting card with 4x RISC-V cores in 28nm

See a video of how the board is assembled under:


https://www.youtube.com/watch?v=OEgPXQMRyyc |
ACACES 2020 - July 2020
Working with RISC-V

Mr. Wolf (TSMC 40): 8+1 core IoT Processor


Controller
One cluster with M M M M
On chip DC/DC
converters
8 RISC-V cores M M M M IP by Dolphin
2x shared FPU units M M M M Cluster
64 kByte of TCDM Interconnect
R5 M
R5 M

FPU
One controller with UART R5

Peripheral interconnect
R5 M

interconnect
SPI
512 kByte L2 RAM

Power Control
R5 M
I2 S
Peripherals R5 M
I2 C R5 M

FPU
On chip voltage regulators SDIO R5 M
By Dolphin Integration CPI R5 M
Antonio Pullini, Davide Rossi, Igor Loi, Alfio Di Mauro, Luca Benini, "Mr.Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IoT Edge Processing", In Proc. European Solid State Circuits
Conference (ESSCIRC) 2018, 3-6 Sep 2018, Dresden, DOI: 10.1109/ESSCIRC.2018.8494247
|
ACACES 2020 - July 2020
Working with RISC-V

On-chip regulators allow different power modes


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

It is possible to keep memory state intact


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

State Retentive Deep Sleep 0.8 V n.A. 77 108

Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

SoC is awake but is clock gated


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

State Retentive Deep Sleep 0.8 V n.A. 77 108

SoC Idle 0.8 1.1V SoC clock gated 0.55 1.96 mW

Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

Only SoC with a single RISC-V core running


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

State Retentive Deep Sleep 0.8 V n.A. 77 108

SoC Idle 0.8 1.1V SoC clock gated 0.55 1.96 mW

SoC active 0.8 1.1V 32 kHz 450 MHz 0.97 38 mW

Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

Cluster is active, but clock gated


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

State Retentive Deep Sleep 0.8 V n.A. 77 108

SoC Idle 0.8 1.1V SoC clock gated 0.55 1.96 mW

SoC active 0.8 1.1V 32 kHz 450 MHz 0.97 38 mW

Cluster Idle 0.8 1.1V Cluster clock gated 1.2 4.6 mW

Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

Cluster with 8 RISC-V cores is active


Power Mode VDD Frequency Range Power

Deep Sleep 0.8 V n.A. 72

State Retentive Deep Sleep 0.8 V n.A. 77 108

SoC Idle 0.8 1.1V SoC clock gated 0.55 1.96 mW

SoC active 0.8 1.1V 32 kHz 450 MHz 0.97 38 mW

Cluster Idle 0.8 1.1V Cluster clock gated 1.2 4.6 mW

Cluster Active 0.8 1.1V 32 kHz 350 MHz 1.6 153 mW


Controller Cluster
M M M M M M M M M M M
Power Control R5 M M M M Interconnect R5 R5 R5 R5 R5 R5 R5
|
ACACES 2020 - July 2020
Working with RISC-V

Our OpenPULP release is essentially Mr. Wolf


OpenPULP contains most of what we have as open source
This is a complex IoT processor, not like the much simpler PULPino
8 + 1 cores, FPUs, shared accelerators, multiple power down modes.

Still many parts still can not be open source


Technology specific information, P&R scripts
Memory macros, selected cuts, their performance
I/O cells
FLL, analog macros, I/O cells, memory cuts (affects performance), P&R scripts

OpenPULP facilitated interesting industry collaboration


Greenwaves, BitCraze, Dolphin
|
ACACES 2020 - July 2020
Working with RISC-V

Mr. Wolf has been used in multiple systems


Designed as an application
processor
We still build boards with it
Despite only 200 manufactured

Widespread industrial use:


Dolphin IP was validated on this chip
Greenwaves GAP8 is based on
the open source release OpenPULP
BitCraze AI Deck is related
GAP9 (Vega) is a follow up project
|
ACACES 2020 - July 2020
Working with RISC-V

Complete Application: DroNET on NanoDrone


Pluggable PCB:
PULP-Shield
~5g, 30×28mm
GAP8 SoC
8 MB HDRAM
16 MB HFlash
QVGA ULP
HiMax camera
Crazyflie 2.0
nano-drone
(27g)

Only onboard computation for autonomous flight + obstacle avoidance


no human operator, no ad-hoc external signals, and no remote base-station! |
ACACES 2020 - July 2020
Working with RISC-V

Moving to more advanced nodes: Kosmodrom


Globalfoundries 22FDX
In 2018, most advanced node for us
Minimum size 3mm x 3mm
That fits about 100 million transistors
Allows body biasing

Designs in 22FDX are more involved


More blocks, more functionality
More things that can go wrong
Challenging design
Collaboration with Globalfoundries |
ACACES 2020 - July 2020
Working with RISC-V

Kosmodrom: Main components


2x Ariane 64b RISC-V cores
AHP optimized for high speed
ALP optimized for low power

Automatic Body Bias Gen.


IP by INVECAS
Allows body bias to be tuned

NTX: Neural Training Accelerator


260 Gflops/Watt efficiency

Common infrastructure
SRAM, Debug, I/Os |
ACACES 2020 - July 2020
Working with RISC-V

Fine-Grained Shared-Memory Accelerators

Similar concept as OpenPULP, but fewer RISC-V cores and more accelerators
|
ACACES 2020 - July 2020
Working with RISC-V

NTX uses 1 RISC-V core to control 8 units


NTX runs at up to 1.25 GHz
Compute of 20 Gflop/s
Bandwidth of 5 GB/s
At 9.3 pJ/flop and using only 0.51 mm2
Scale up by replicating cluster

F. Schuiki, M. Schaffner, F. K. Gürkaynak and L. Benini, "A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets," in IEEE Transactions on
Computers, vol. 68, no. 4, pp. 484-497, 1 April 2019, doi: 10.1109/TC.2018.2876312.
|
ACACES 2020 - July 2020
Working with RISC-V

Kosmodrom ABB Demonstration Board


Test socket for Kosmodrom chip
STM microcontroller for control

USB connection to computer

Analog to Digital Measurement points for


Converter module Body bias voltage Supply voltage
generation generation all supplies |
ACACES 2020 - July 2020
Working with RISC-V

Boosting performance with Body Biasing


We set the performance target
(730MHz, @0.65V, ~40mW)
Actual chip performance is
measured
Forward VBB is applied (positive
VBP and negative VBN)
Until we reach the performance
goals
By individually applying VBB to
chips we can improve yield

|
ACACES 2020 - July 2020
Working with RISC-V

Gaining Energy Efficiency with Body Biasing


We set the desired
operating frequency (800MHz)
We decrease the voltage to the
minimum level chip will work
(0.8V)
At this point we start reducing
voltage further (0.65V)
Maximum operating frequency
will also drop (~500MHz)
We compensate for the lost
performance with forward VBB
(positive VBP and negative VBN)
Until we reach the desired
operating frequency. |
ACACES 2020 - July 2020
Working with RISC-V

The good the bad and the ugly

We designed and tested 37 chips as


part of PULP project (as of now)
Three more planned until end of year

Most worked great


But there were also mistakes made
Here is a look at some highs
and some lows
|
ACACES 2020 - July 2020
Working with RISC-V

Good: Fulmine the award winning one


UMC65
Earlier chip (2015)
4x OpenRISC cores (not yet RISC-V)
192 kBytes L2 + 64 kBytes TCDM
2x HW accelerators
HW Crypt (together with TU-Graz)
HW Convolution Engine

Publication from this chip


Francesco Conti, Robert Schilling, Davide Schiavone, Antonio Pullini, Davide Rossi,
Frank K. Gurkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain
Haougou, Stefan Mangard, Luca Benini, "An IoT Endpoint System-on-Chip for Secure
and Energy-Efficient Near-Sensor Analytics", IEEE Transactions on Circuits and
Systems I: Regular Papers, Vol: 64, Issue: 9, Sept. 2017,pp 2481 - 2494,
DOI: 10.1109/TCSI.2017.2698019 |
ACACES 2020 - July 2020
Working with RISC-V

Bad: Bonding issues on Poseidon


First GF22nm chip
Used Europractice IC service
Cost 150k CHF for 50 samples

Has three parts (trident..)


PULPissimo system
Ariane core
Independent ML accelerator

30 of 50 chips were packaged


We provide a bonding diagram
Mostly simple manual work
|
ACACES 2020 - July 2020
Working with RISC-V

Bad: Bonding issues on Poseidon


Look closer on the right side
There is a pad that is not bonded

We skipped one pad


All connections are shifted by one

VDD and GND are one after other


Bonding causes shorts between VDD and GND
Pretty much catastrophic

Fortunately: unpackaged dies


There were 20 unpackaged dies
We could bond those correctly |
ACACES 2020 - July 2020
Working with RISC-V

Downright Ugly, reset problem of Urania


2 PULP clusters, each with
4x RV32 RI5CY cores
4x transprecision FPUs
1x PULPO accelerator
64 kB TCDM in 8 banks

Ariane RV64 host processor


128 KiB Shared LLC
Urania
software-managed IOMMU 65nm
DDR3 DRAM Controller + PHY
by TU-Kaiserslautern
|
ACACES 2020 - July 2020
Working with RISC-V

The reset can not be released for clusters


Chip has many modules
1x Ariane core
1x DDR interface
2x Clusters

Reset to clusters is stuck 0


Design flow mistake
Some other control signals are stuck
as well affecting Ariane performance

DDR interface is functional


Not everything is lost
|
ACACES 2020 - July 2020
Working with RISC-V

IC Design is tricky and demands attention


Even the simplest things can derail a complex chip
A copy paste error in a bonding diagram, a mistake in reset

Academic research chips are not industrial products


Designed to test and verify ideas, not mass production
Much more effort needed in DfT and verification to make a successful product

Experience is key in IC Design


All the mistakes we make, add to our future success
Some lessons you learn the hard way
But these stay with you and help you for your future designs

|
ACACES 2020 - July 2020
Working with RISC-V

We hope this was helpful/fun for you


Covered the basics of RISC-V
Explained the ISA
Examples of Implementations
Advanced cores and Concepts

Talked about building open source systems around RISC-V


Showed the main concepts and talked about our ICs

You can find PULP related information


GitHub: http://github.com/pulp_platform
PULP Webpage: http://pulp-platform.org
Follow us on Twitter: @pulp_platform
|
ACACES 2020 - July 2020
Luca Benini, Davide Rossi, Andrea Borghesi, Michele Magno, Simone
Benatti, Francesco Conti, Francesco Beneventi, Daniele Palossi, Giuseppe
Tagliavini, Antonio Pullini, Germain Haugou, Manuele Rusci, Florian Glaser,
Fabio Montagna, Bjoern Forsberg, Pasquale Davide Schiavone, Alfio Di
Mauro, Victor Javier Kartsch Morinigo, Tommaso Polonelli, Fabian Schuiki,
Stefan Mach, Andreas Kurth, Florian Zaruba, Manuel Eggimann, Philipp
Mayer, Marco Guermandi, Xiaying Wang, Michael Hersche, Robert Balas,
Antonio Mastrandrea, Matheus Cavalcante, Angelo Garofalo, Alessio
Burrello, Gianna Paulin, Georg Rutishauser, Andrea Cossettini, Luca
Bertaccini, Maxim Mattheeuws, Samuel Riedel, Sergei Vostrikov, Vlad
Niculescu, Hanna Mueller, Matteo Perotti, Nils Wistoff, Luca Bertaccini, Thorir
Ingulfsson, Thomas Benz, Paul Scheffler, Alessio Burello, Moritz Scherer,
Matteo Spallanzani, Andrea Bartolini, Frank K. Gurkaynak,
and many more that we forgot to mention
http://pulp-platform.org @pulp_platform
Working with RISC-V

|
ACACES 2020 - July 2020

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy