Extended Abstract
Extended Abstract
Abstract—Central Processing Unit (CPU) based systems are To solve this problem, this work proposes a base SoC
complex systems which took several years to develop and needed equipped with a CPU, memory system and serial communica-
extraordinary amounts of capital investment. For a long time, tion, which can be easily customized by users to implement
only large companies could afford creating these Systems On
Chip (SoCs), where many of the components are licensed from more complex and specific SoCs. The IObSoc hardware is
other companies who serve many other customers. Recently, written in Verilog and the software is written in C. It uses
thanks to a large open source community, it is also becoming a Makefile tree and scripts written in Python, Tcl and Bash
possible for smaller companies to build their own SoCs using free to enable simulation, synthesis and place and route with
and high quality hardware and software components. These are various tools, free and commercial, for both FPGA and ASIC
typically available in repositories hosted in web-based platforms
like GitHub, Gitlab or Bitbucket. The largest initiative so far development flows.
to develop an open source processor and respective ecosystem is This project was entirely developed at company IObundle
arguably the RISC-V Instruction Set Architecture (ISA), whose Lda and is the continuation of the work in [1] and [2], which
ambition is to become the standard ISA for all computing de- were previous attempts to build a SoC with open source
vices, from microcontrollers to supercomputers. This dissertation components.
presents a development environment to create open source SoCs
that use the RISC-V processor architecture. A base SoC called
IObSoC is created which can be easily edited to create more II. T HE RISC-V ISA
complex SoCs. The IObSoC hardware is written in Verilog and There are two types of ISA: Reduced Instruction Set
the software is written in C. It uses a Makefile tree and scripts
Computer (RISC) and Complex Instruction Set Computer
written in Python, Tcl and Bash to enable simulation, synthesis
and place and route with various tools, free and commercial, for (CISC). The performance debate between both types of ISA
both Field-Programmable Gate Array (FPGA) and Application- is irrelevant nowadays [3][4], but the only CISC ISA still
Specific Integrated Circuit (ASIC) development flows. used today is Intel’s proprietary x86, mainly for compatibility
Index Terms—RISC-V Instruction Set Architecture, PicoRV32 with legacy systems. There are, however, several free and
processor, Open Source, System On Chip, Internet of Things.
proprietary RISC ISAs.
The main free (to use) ISA nowadays is RISC-V, a reduced
I. I NTRODUCTION
instruction set architecture that allows standard and custom
With the advent of the Internet of Things (IoT) and extensions of its base ISA. Open source hardware designs
portable devices, there has been a high demand for low power and software toolchains are now vastly available for download
CPU-based which need to meet stringent area, battery life, by any interested individual or organization. Large companies
power and performance specifications. Having just one or two such as NVidia and Western Digital Corp. have already
providers of CPU cores, such as ARM, cannot satisfy the decided to use RISC-V in their designs [5].
enormous appetite of many smaller businesses and consul- 1) Unprivileged ISA: At the time of writing of this docu-
tancies who may – and effectively can – provide services in ment, the latest version of the RISC-V unprivileged architec-
this domain. Therefore, open source processor cores are the ture is described in [6] and its status is summarized in page i of
way to go since the CPU is one of the most complex parts its preface. It supports several base ISAs and several standard
of the system but is also a commodity which, by itself, adds extensions, but it is also possible to build custom extensions
little value to the final product. In this context, after previous to create, for example, Graphics Processing Unit (GPU) based
attempts such as the OpenRISC project, the RISC-V ISA has instruction sets.
finally emerged and promises to become for hardware what The base modules of the RISC-V ISA are:
the Linux operating system is for software.
• RV32I, RV64I and RV128I, the base ISAs for 32-bit, 64-
Having access to a complete enough set of hardware and
bit and 128-bit processor implementations, respectively;
software building blocks to build useful systems is either
• RV32E, the base ISA, useful for embedded systems;
too expensive or time consuming, and constitutes a huge
• RISC-V Weak Memory Ordering (RVWMO), the
barrier for start-ups and smaller businesses. The IoT market
RISC-V default memory consistency model.
is competitive and forces companies to produce high quality
hardware and software systems in a short window of time and The most important RISC-V extensions for this project are:
with a low budget. • M, for integer multiplication and division instructions;
• F, for single-precision floating-point instructions; (AXI) Interconnect [9] [10] or other custom native in-
• C, for compressed instructions; terface controllers;
The base RISC-V ISA (RV32I) has 32-bit-wide instructions. • Serial protocol interfaces/controllers, such as Univer-
Because RISC architectures can cause large program sizes, sal Asynchronous Receiver-Transmitter (UART), Serial
RISC-V provides the C extension, which uses compressed Peripheral Interface (SPI), Ethernet and Universal Serial
versions of the regular instructions with 16 bits. RISC-V’s Bus (USB);
compressed ISA (or simply RVC) is a quite important feature • Co-processors, such as floating-point arithmetic mod-
in IoT because it reduces code size by around 25% [7] while ules;
allowing regular and compressed instructions be mixed in a • Coarse-Grain Reconfigurable Arrays (CGRAs), such
• Memory units, such as Random-Access Memory which consists of an more complete and complex system
(RAM), Read-Only Memory (ROM), Double Data Rate than PULPino. It can use a RI5CY or a zero-riscy CPU
(DDR) and Flash; as main core;
• Memory controllers, such as Dynamic Random-Access • PicoRV32 [19], a size-optimized RISC-V processor core
Memory (DRAM) controllers and Direct Memory Access that implements the RV32IMC and RV32E ISAs with a
(DMA) modules; high maximum clock frequency;
• On-chip module interconnection and communication Rocket Chip has a great amount of features but it has been
interface such as an Advanced eXtensible Interface found too hard to manipulate in the past at IObundle, so it was
put aside. Taiga was also put aside as it is solely optimized in this document. At the time of writing, it is implemented on
for FPGAs, thus not being suited for ASICs. PULPino proved three different FPGA board models – Xilinx Kintex UltraScale
hard to detach from its environment in the past at IObundle KU040 Development Board [31], Spartan-6 FPGA SP605
and its features go beyond the scope of this project. PULP is Evaluation Kit [32] and Cyclone V GT FPGA Development
not yet implemented on FPGA. Kit [33] – and is currently undergoing ASIC implementation
PicoRV32 is easy to deal with and well documented, with in the UMC 130 nm process at IObundle.
several validated examples available, such as Raven [20], an This document details IObSoC’s implementation on a Xilinx
ASIC SoC. PicoRV32 has been chosen for this project given XCKU040-1FBVA676 FPGA with -1 Speed Grade, which is
its simplicity and its complete repository, which features an hosted by an AES-KU040-DB-G Xilinx Kintex UltraScale
example SoC provided that runs code directly from an SPI Development Board manufactured by Avnet [31], as it is the
flash memory chip and can be used as a simple controller in most complete implementation at the time of writing because,
FPGA or ASIC designs. unlike the other two mentioned FPGA boards, it features data
access and code execution from an external DDR memory
C. Tools chip. Throughout the rest of this document, the AES-KU040-
In order to successfully design an SoC for an ASIC or DB-G board will simply be referred to as KU040 board for
FPGA, several software tools must be used to build the readability and writing simplicity.
software and hardware components. Most of these tools are IObSoC has several operating modes, each one having its
commercial although effective open source tools are becoming own set of hardware and software components, which are
more and more popular. included or not in the SoC by commenting/uncommenting the
1) Software toolchain: In order to produce software for a USE_BOOT and USE_DDR Verilog macros, which enable
RISC-V CPU, both the GNU [21] and LLVM toolchains [22] the use of the bootloader and the DDR Memory, respectively.
have added support for the RISC-V architectures. These Table I shows how SoC operating modes relate to these
toolchains offer compilers, profilers, debuggers, assemblers, macros.
linkers, etc.
2) RTL simulators: RTL simulation is an essential step TABLE I
IO B S O C OPERATING MODES .
for verifying the correctness of the design. NCSim [23]
from Cadence and ModelSim [24] from Mentor are well
Boot DDR Operating mode
known commercial RTL simulators which are supported in
this work. Free and open-source RTL simulators such as Icarus No No App. pre-loaded on SRAM
Verilog [25] and Verilator [26] also exist. No Yes App. pre-loaded on DDR (simulation only)
3) FPGA implementation tools: After successfully simu- Yes No App. loaded to SRAM from UART
lating a system’s RTL code, the FPGA emulation phase can Yes Yes App. loaded to DDR from UART
begin. The largest FPGA manufacturers are Xilinx and Intel.
Both provide proprietary software suites for RTL synthesis and IObSoC’s schematic is presented in Figure 1. Its board-
FPGA implementation tools for their own FPGAs: Xilinx’s dependent wrapper for the KU040 board is shown in Fig-
Vivado [27] and Intel’s Quartus [28], which require paid ure 2, where blue, red and yellow blocks/lines represent
licenses that are provided under different commercial options components/connections that are used on all SoC operating
(such as when acquiring their FPGA development boards). modes, when the DDR Memory is used and when it is not,
4) Toolchain integration: In order to integrate all the nec- respectively. The SoC’s several operating modes widens the
essary tools to design SoCs in a single environment, a build range of contexts and applications supported by it, which
automation tool such as Make and FuseSoC can be used. facilitates meeting various customers’ needs.
Make is a well known program for Unix-based OSs that can
call different tools to build an SoC according to user defined A. SoC Input/Output Ports
rules that set dependencies to avoid rerunning time-consuming IObSoC’s has two input ports for the clock and reset signals
programs if they have already been ran. and an output port for the CPU’s trap signal. It also has four
FuseSoC [29] works as a package manager for reusable additional ports which are the two serial lines of its UART and
hardware blocks and as a toolchain integration mechanism. two flow control signals Request-To-Send (RTS) and Clear-
FuseSoC’s operation revolves around core description files, To-Send (CTS). The SoC may also have ports for an AXI4
which configure the Hardware Description Language (HDL) Interface [9] [10] which is used to communicate with the DDR
sources and the tools necessary to compile, simulate and/or when it is used.
synthesize a core. The PicoRV32 repository supports both
Makefile and FuseSoC flows, which are executed in a terminal. B. Clock Scheme
The Differential Clock Oscillator feeds either the DDR
IV. I OB S O C A RCHITECTURE Controller (when the DDR is used) or the Clock Controller
IObSoC [30] is a new 32-bit SoC using the PicoRV32 [19] (otherwise) with two differential 250 MHz clock signals. The
RISC-V CPU architecture and it is presented for the first time Clock Controller then produces IObSoC’s clock signal.
Trap
PicoRV32
Internal reset Reset
CPU CPU negated reset Soft reset
Clock
trap
Internal Memory busy
resetn Memory bus
0 Memory AXI4
m_valid Master
1
Subsystem Interface
m_ready
IOB Interconnect
[31:0]
m_rdata m_valid s_valid Internal Memory
Boot
valid, ready and rdata
[31:0] External Memory
m_addr m_ready s_ready valid, ready and rdata
[3:0] [31:0]
[31:0]
m_wstrb m_rdata s_rdata [31:0]
Soft Reset &
[31:0]
m_wdata
[31:0] m_addr
[31:0] Boot Controller
MSB
[30-:WP]
UART RX
m_instr [31:2] UART UART TX
UART CTS
UART RTS
Peripheral bus
IObSoC
Fig. 1. IObSoC’s top level schematic.
Board-dependent wrapper inverter and fed into IObSoC’s reset port, resetting it. This
USB makes IObSoC restart the bootloader and resets the board-
UART TX dependent wrapper components (equivalent to a SoC start-up).
UART-to-USB
UART RX Bridge
2) Soft reset: IObSoC can also be internally soft reset via
a software write to an address of the Soft Reset & Boot
USE_DDR? Global Reset
Push-Button Controller. The soft reset’s is used to reset the SoC after
Reset the bootloader finishes loading the application to the Main
0
Reset Controller Memory.
1 Reset 3) IObSoC’s internal reset: Inside IObSoC, the external
IObSoC reset input and the internal soft reset signals are fed into an OR
AXI DDR DDR gate to produce the internal reset signal, used to reset the CPU
Interconnect Controller Memory and its peripherals. The CPU is also reset while the Boot ROM
Clock is copying its contents to the Static Random-Access Memory
(SRAM).
1
Clock
-
0 Clock Differential D. Address Scheme
Controller Clock Oscillator
+ IObSoC is a memory-mapped system, i.e., each register
USE_DDR? and memory word in the SoC has an unique address, which
the CPU uses to write or read. IObSoC’s address scheme is
Fig. 2. IObSoC’s board-dependent wrapper and external components. illustrated in Figure 3.
31 30 31-WP 30-WP 3 2 0
The DDR Controller generates the AXI Interconnect’s and
the DDR’s internal clocks and the AXI Interconnect’s master Memory select Peripheral select Internal addresses Byte select
side and slave side (IObSoC) reference clocks.
1 bit WP = ⌈log2(NS)⌉ bits 29-WP bits 2 bits
C. Reset Scheme
1) External reset: IObSoC’s reset signal is either provided Fig. 3. IObSoC address scheme.
by the DDR Controller (when the DDR is used) or the Reset
Controller (otherwise) and it is triggered on SoC start-up (after Each address is 32-bit wide and corresponds to a memory
loading the bitstream to the FPGA) or by pressing the Global byte, which accounts for a total of 4 GB of addressable
Reset Push-Button (hard reset for emergency situations). memory space. If NS is the number of slaves/peripherals in
When using the DDR, the DDR Controller first calibrates the SoC, we define WP as
the DDR and then resets the AXI Interconnect, which then
generates a negated reset pulse that is passed through an WP = dlog2 (NS )e (1)
The most significant bit of the address indicates whether the memory and peripheral buses, while valid , ready and
CPU is selecting the Main Memory (value 0) or a peripheral rdata are selected from one of the buses with the most sig-
(value 1). When this bit is asserted, the next WP most nificant address bit. instr is only used in the memory
significant bits of the address bus indicate which peripheral bus (necessary for the cache in the memory subsystem).
is selected by the CPU, as each peripheral corresponds to a
single combination of these bits – called prefix. The remaining F. IObSoC Components
bits of the address bus (except the two less significant bits) IObSoC has mandatory and optional components. The first
are the internal addresses of the memories/peripherals. The two ones are used in all SoC operating modes, while the second
less significant bits of the address bus specify a byte from a ones are not. The mandatory components are the CPU, the
32-bit word, but instead the SoC uses the CPU’s write strobe Boot ROM, the SRAM, the UART, the Soft Reset & Boot
for 1-byte or 2-byte reads/writes. Controller and the IOB Interconnect. The optional components
Table II contains IObSoC’s peripheral prefixes, while Ta- are the Cache [34] and the Native-to-AXI adapters (which are
ble III reveals its memory map. The SRAM and the Cache can not depicted in Figure 1).
be both accessed as Main Memory or peripherals (necessary 1) CPU: The (single) master in the SoC that controls the
for the bootloader). peripherals and the memory subsystem (slaves) via software
programs stored in the Main Memory. It is a PicoRV32 [19]
TABLE II core, a size-optimized open source processor with high max-
IO B S O C’ S PERIPHERALS ’ PREFIXES ASSUMING WP = 2. imum frequency that implements the RV32IMC and RV32E
ISAs. IObSoC’s CPU code is stored in IObundle’s IOb-RV32
Peripheral’s name Prefix repository [35], which is a fork of PicoRV32’s repository.
UART 00 2) Memory Subsystem: It contains a Boot ROM that stores
Soft Reset & Boot Controller 01 the bootloader, an internal SRAM to run programs from and
Cache 10 an IOb-Cache [34] module to access an external DDR. The
SRAM 11 bootloader is copied to the final part of the SRAM on SoC
start-up. Depending on the USE_BOOT configuration macro,
the SRAM then executes either the pre-loaded application in it
or the bootloader, loading the application to the Main Memory
TABLE III
IO B S O C’ S MEMORY MAP ASSUMING WP = 2 AND THE PERIPHERALS ’ via UART, soft resetting the SoC and then running the loaded
PREFIXES IN TABLE II. application from the Main Memory.
The SRAM and the Cache can be accessed from both
Peripheral or memory Address space memory and peripheral buses (necessary for the bootloader).
Main Memory (memory bus) 0x00000000 – 0x7FFFFFFF
When loading the application to the SRAM, the bootloader
UART 0x80000000 – 0x9FFFFFFF
copy on its final part becomes overwritten by the heap and
Soft Reset & Boot Controller 0xA0000000 – 0xBFFFFFFF
stack sections of the application. The Cache connects to the
Cache (peripheral bus) 0xC0000000 – 0xDFFFFFFF
CPU via native interface and to the AXI Interconnect via an
SRAM (peripheral bus) 0xE0000000 – 0xFFFFFFFF
AXI4 interface, which then further connects to the DDR.
3) UART: A peripheral that implements the RS-232 serial
communication protocol whose purpose is to serve as an
E. Native Interface interface between IObSoC and the outside world. It is an
IObUART [36] module and is mainly used to receive a
The set of signals used to connect the CPU to the memory software application from a computer during the bootloader
subsystem and peripherals are what is called the SoC’s native sequence and to transmit print messages from IObSoC to a
interface. These signals are the PicoRV32 CPU’s input/output computer terminal. It features RTS/CTS data flow control and
ports: dedicated C drivers were developed to operate it.
• wdata and rdata: the CPU’s 32-bit write and read data 4) Soft Reset & Boot Controller: The Soft Reset & Boot
buses; Controller is a peripheral used to soft reset the SoC compo-
• instr: which indicates if the CPU’s read data bus holds nents and swap the source of the CPU’s program instructions
an instruction or program data; after booting is complete. When the CPU writes to this
• addr: the CPU’s 32-bit address bus; peripheral, it edits the boot register inside of it and uses a
• wstrb: the CPU’s 4-bit write strobe, which is used to counter to generate a reset pulse that resets the CPU and its
control data size on write operations (1, 2 or 4 bytes); peripherals. After a soft reset, the CPU executes a program
• valid and ready: which implement a valid-ready hand- from the memory indicated by the boot register.
shake protocol between the CPU and its slaves. 5) IOB Interconnect: A parametrized bus switch that man-
The CPU connects to the memory subsystem via the ages the valid-ready handshake protocol between the peripher-
memory bus and to the peripherals via the peripheral bus. als and the CPU. It multiplexes the NS peripherals’ rdata
wdata and wstrb are CPU outputs broadcast to both and ready signals and forwards only the selected one’s
to the CPU. Likewise, it demultiplexes the CPU’s valid I. SoC Operating Modes
signal and forwards it only to the selected peripheral, while 1) Application pre-loaded on SRAM: This operating mode
de-asserting the others. corresponds to the first line of Table I and it uses the blue and
6) Native-to-AXI adapters: An optional interface module yellow components in Figure 2.
that can be used to connect peripherals with AXI4-Lite A 100 MHz clock signal is fed to IObSoC by the Clock
ports [9] [10] in IObSoC. It has a side with native interface Controller, while the reset signal is provided by the Reset
ports – connecting to the SoC’s peripheral bus – and a side Controller. IObSoC’s AXI4 interface and Cache are not used.
with AXI4-Lite ports – interfacing with the AXI peripheral –, The application is pre-loaded on the SRAM (Main Memory)
thus converting both sets of signals into the other. and is incorporated in the bitstream.
After loading the bitstream onto the FPGA, IObSoC copies
G. Board-Dependent Wrapper Components the bootloader to the SRAM (but does not execute it, as
The board-dependant wrapper is the system emulated the boot register is initialized with 0). It then executes the
inside the FPGA. It contains IObSoC and other components application stored in the SRAM.
(that depend on the SoC’s operating mode) to feed it its clock 2) Application pre-loaded on DDR (simulation only): This
and reset signals and connect it to the DDR Memory. operating mode corresponds to the second line of Table I. It
1) Clock Controller: A component used to generate does not use the board-dependent wrapper, but instead uses a
IObSoC’s clock signal when the DDR is not used. It uses the Verilog testbench for RTL simulation. The external DDR is
Differential Clock Oscillator’s 250 MHz differential clocks to substituted by an AXI SRAM [39] on the testbench, directly
generate IObSoC’s 100 MHz clock with 50% duty-cycle via an connected to the AXI4 master interface of IObSoC – the Unit
internal Phase-Locked Loop (PLL) [37] while also providing Under Test (UUT).
input and output buffering for a cleaner clock signal. It was IObSoC’s clock and reset inputs are fed by the testbench
generated with Vivado’s Clocking Wizard. and its Cache and AXI4 interface are used to access the
2) Reset Controller: An hardware counter with additional AXI SRAM (the Main Memory), which is pre-loaded with
logic that generates IObSoC’s reset signal when the DDR is the application. The boot register is initialized with 0, so the
not used. bootloader is not used. When the simulation starts, IObSoC
3) AXI Interconnect: A Xilinx component used to connect is reset, the bootloader is copied to the SRAM and then the
IObSoC to the DDR Controller via an AXI4 interface when the application stored in the AXI memory is executed.
DDR is used. It also provides IObSoC’s reset signal, although 3) Application loaded to SRAM from UART: This operating
it originates from the DDR Controller. mode corresponds the third line of Table I and it uses the blue
4) DDR Controller: A Xilinx Memory Interface Generator and yellow components in Figure 2.
(MIG) core that generates IObSoC’s (and others components’) A 100 MHz clock signal is fed to IObSoC by the Clock
clock and reset signals and allows it to access the DDR. Controller, while the reset signal is provided by the Reset
It connects to the AXI Interconnect via an AXI4 interface Controller. IObSoC’s AXI4 interface and Cache are not used.
and to the DDR4 memory chip via a DDR4 Synchronous After loading the bitstream onto the FPGA, the bootloader
Dynamic Random-Access Memory (SDRAM) interface [31]. is copied to the SRAM and executed, as the boot register is
It calibrates the DDR on SoC start-up. initialized with 1. It loads the application to the initial part
of the SRAM (Main Memory) via UART and soft resets the
H. External Components SoC, while updating the boot register to 0. After the soft reset,
IObSoC executes the application from the SRAM.
Physical components on the KU040 board are necessary for
4) Application loaded to DDR from UART: This operating
operating the SoC. All of the following are mandatory except
mode corresponds the fourth line of Table I and it uses the
the DDR.
blue and red components in Figure 2.
1) DDR Memory: A DDR SDRAM chip controlled by the A 100 MHz clock signal is fed to IObSoC by the DDR
DDR Controller. The DDR4 interface is implemented with two Controller, while the reset signal is provided by the AXI
512 MB Micron EDY4016AABG-DR-F devices [31] [38]. It Interconnect. IObSoC’s AXI4 interface and Cache are used.
can be used by IObSoC as Main Memory. After loading the bitstream to the FPGA, the DDR Con-
2) UART-to-USB Bridge: A chip that converts RS-232 to troller calibrates the DDR and resets the AXI Interconnect,
USB and vice-versa, thus allowing IObSoC’s UART and a which then starts-up the SoC. The bootloader is copied to the
computer to communicate in both directions. SRAM and is executed, as the boot register is initialized with
3) Differential Clock Oscillator: A Silicon Labs 1. It loads the application to the DDR via UART and soft resets
Si510/Si511 or compatible device [31] that generates the SoC, while updating the boot register to 0. After the soft
the board’s differential clocks and feeds them to the reset, IObSoC executes the application from the DDR.
board-dependent wrapper via the FPGA’s pins.
4) Global Reset Push-Button: A physical push-button con- V. IO B S O C D EVELOPMENT E NVIRONMENT
nected to the board-dependent wrapper via an FPGA pin used IObSoC is a base SoC which can be further developed by
to manually hard reset the SoC in emergency situations. adding new peripherals which the application firmware then
operates via software drivers to accomplish some goal. If the peripheral has AXI4-Lite ports, a native-to-AXI
adapter must be instantiated between the peripheral and the
A. Adding Peripherals SoC’s peripheral bus.
1) Creating the peripheral: The peripheral must be de- 4) Firmware: Next, the peripheral’s software drivers and
signed with the Verilog HDL and its ports must be compatible the SoC’s firmware must be written:
with IObSoC’s native interface or with the AXI4-Lite inter- • Write one or more C source and header files with the
face [9]; software drivers to operate the peripheral;
2) Editing IObSoC’s configuration file: Next, IObSoC’s • Include the header file(s) in the firmware source file, lo-
configuration file, located in rtl/include/system.vh , cated in software/firmware/firmware.c and
must be edited according to the following steps: develop the software application;
• Comment/uncomment the USE_BOOT and USE_DDR • If not using the DDR, remember to edit the
macros, depending on the desired SoC operating mode; MAINRAM_ADDR_W macro in IObSoC’s configuration
• Configure MAINRAM_ADDR_W – the SRAM’s address file so that the SRAM is large enough to store the
width when used as Main Memory –, depending on the firmware;
size of the application (not needed for DDR applications); 5) Editing the Makefiles and the FPGA tools’ scripts: Next,
• Increment N_SLAVES (i.e., the number of slaves/pe- the Makefiles and FPGA tools’ scripts in IObSoC’s repository
ripherals NS ); need to be updated with the new C and Verilog source and
• Increment N_SLAVES_W (i.e., WP given by Equa- header files (or respective include directories). The Makefiles
tion 1); and FPGA tools’ scripts to edit are:
• Add a base address for the new peripheral;
• software/firmware/Makefile ;
• Uncomment/comment USE_LA_IF to use or not Pi-
• $SIM_DIR/Makefile ;
coRV32’s Look-Ahead (LA) memory interface, respec- • $FPGA_DIR/Makefile ;
tively (simulation only). • $FPGA_DIR/<script> ;
The following code exemplifies how to configure IObundle’s where $SIM_DIR and $FPGA_DIR must be defined in
timer Intellectual Property (IP) – IObTimer [40] – in IObSoC’s the top Makefile located in the root directory of IObSoC’s
configuration file: repository, depending on the desired tools.
//Comment/uncomment to choose IObSoC’s operating mode
//‘define USE_BOOT B. Software
//‘define USE_DDR
The software executed in IObSoC is written with the C
//main memory address space (log2 of byte size)
‘define MAINRAM_ADDR_W 15 programming language and compiled with the GNU RISC-V
GCC cross-compiler [21]. The software used inside the SoC
//SLAVES
‘define N_SLAVES 5 // EDITED: was 4 before is the bootloader (in software/bootloader ) and the
firmware (in software/firmware ). The firmware can
//bits reserved to identify slave
‘define N_SLAVES_W 3 // ceil(log2(N_SLAVES)) // EDITED: was easily be tested while running on an FPGA because it is
2 before possible to load a new firmware binary file to the SoC via
//peripheral address prefixes UART without needing to recompile the bitstream.
‘define UART_BASE 0 The console program used by the computer to interact with
‘define SOFT_RESET_BASE 1
‘define DDR_BASE 2 IObSoC inside an FPGA is also written in C, but compiled
‘define SRAM_BASE 3 with the regular GNU Compiler Collection (GCC) compiler
‘define TIMER_BASE 4 // EDITED: added this line
on a Linux computer. It is stored in software/ld-sw .
//use CPU lookahead interface
//‘define USE_LA_IF C. RTL Simulation
... At the time of writing, the available RTL simulators
for IObSoC are Icarus Verilog [25], NCSim [23] and
3) Instantiating the peripheral in IObSoC: Next, IObSoC’s
ModelSim [24]. To run an RTL simulation, first edit
top source file, located in rtl/src/system.v , must be
SIM_DIR in the top Makefile (for example, to use Icarus, de-
edited. The following code exemplifies how to instantiate
fine SIM_DIR = simulation/icarus ). Then cd to
IObTimer in IObSoC’s top source file:
the repository’s root and run make sim . This also compiles
time_counter #(.COUNTER WIDTH(32)) the bootloader and the firmware before the simulation starts.
timer (
.rst (reset_int),
.clk (clk), D. FPGA Emulation
.addr (m addr[2]),
.data_in (m_wdata), The available RTL synthesis and FPGA implemen-
.data_out(s_rdata[‘TIMER_BASE]), tation tools for IObSoC are Intel’s Quartus [28] and
.valid (s_valid[‘TIMER_BASE]),
.ready (s_ready[‘TIMER_BASE]) Xilinx’s Vivado [27] and ISE [41]. To generate a bit-
); stream, first edit FPGA_DIR in the top Makefile (for
example, to use the KU040 board with Vivado, de- TABLE IV
fine FPGA_DIR = fpga/xilinx/AES-KU040-DB-G ). D HRYSTONE BENCHMARK RESULTS ON RTL SIMULATION WITH 100 RUNS
AND A CLOCK FREQUENCY OF 100 MH Z .
Then cd to the repository’s root, source the FPGA tools’
settings (if necessary) and run make fpga . If the FPGA
IObSoC mem. config. CPI DPS/MHz DMIPS/MHz
board is hosted by a remote machine, the fpga target also
sends the bitstream and the firmware binary file to it via Secure SRAM 5.496 373 0.212
Copy Protocol (SCP). DDR (8 KB Cache) 5.507 372 0.211
Then, on the FPGA board’s host machine – which can be DDR (256 B Cache) 7.956 257 0.146
accessed via Secure Shell (SSH) –, cd to the root directory of SRAM + LA 4.063 505 0.287
IObSoC’s repository (which must also be cloned in it as well) DDR (8 KB Cache) + LA 4.074 503 0.286
and run make ld-sw to setup the console to interact with DDR (256 B Cache) + LA 6.587 311 0.177
IObSoC. In a new terminal, source the FPGA tools’ settings
if needed and run make ld-hw to load the bitstream onto
the FPGA. TABLE V
D HRYSTONE BENCHMARK RESULTS ON RTL SIMULATION WITH 100
RUNS , A CLOCK FREQUENCY OF 100 MH Z AND THE -O3 FLAG ON GCC.
VI. R ESULTS
A. Dhrystone Benchmark Results For RTL Simulation IObSoC mem. config. CPI DPS/MHz DMIPS/MHz
To validate IObSoC and measure its performance, several SRAM 5.552 442 0.251
RTL simulations were performed using the Icarus Verilog DDR (8 KB Cache) 5.564 441 0.250
simulator for different configurations of IObSoC running the DDR (256 B Cache) 7.842 313 0.178
Dhrystone benchmark [42] [43]. The several RTL simulations
SRAM + LA 4.095 599 0.340
performed correspond to different combinations of the follow-
DDR (8 KB Cache) + LA 4.108 597 0.339
ing parameters:
DDR (256 B Cache) + LA 6.415 382 0.217
• IObSoC’s operating mode;
• The usage of PicoRV32’s LA memory interface;
• IObSoC’s cache configuration, which changed its size;
1) CPU: The obtained CPI when using the internal SRAM
• Dhrystone’s number of runs, i.e., the number of loop
and the CPU’s LA memory interface is 4.1, which coincides
iterations of the main code.;
with PicoRV32’s reference CPI. As expected from a RISC
• The usage or not of the optimization -O3 GCC flag.
CPU, the CPI is greater than 1.
IObSoC uses a 100 MHz clock signal on all simulations and 2) Cache: Performance is heavily influenced by the
does not use bootloader ( USE_BOOT commented). However, Cache’s configuration. Using a large (8 KB) cache, DDR
all of IObSoC’s operating modes (including those with the results are very similar to the SRAM ones because the Cache
bootloader feature) were successfully tested and held positive produces few read misses and the external memory is an AXI
results, validating the SoC. SRAM. However, using a smaller (256 B) cache increases
The obtained results are shown in Tables IV and V. The the CPI by 41%–62% and the DPS/MHz and DMIPS/MHz
results of simulations with 500 runs on Dhrystone are not decrease by 29%–38%
presented in this document because their results’ show a 3) -O3 GCC optimization flag: When using this flag, code
deviation from the 100 runs simulations less than % on the size and time are optimized and thus the number of cycles
relevant performance indicators, which are: and instructions decrease. The CPI varies very slightly than
• Cycles per Instruction (CPI) estimate of the CPU; when not using the -O3 flag, while the DPS/MHz and
• Dhrystones per Second (DPS) per MHz, the number DMIPS/MHz increase 18.5%–22.8%.
of iterations of the main code loop per second, divided 4) LA memory interface: PicoRV32’s LA memory interface
by the CPU’s clock frequency in MHz, rounded down; outputs the address and valid signals of every memory
• Dhrystone Mega Instructions per Second (DMIPS) access one cycle ahead of the regular memory interface, accel-
per MHz, which is the DMIPS value (Equation 2) erating memory accesses. The LA interface decreases the CPI
divided by the CPU’s clock frequency in MHz. by 18%–26% and increases the DPS/MHz and DMIPS/MHz
1757 is the number of Mega Instructions per Second (MIPS) by 22%–35%, being more notorious on SRAM and DDR +
when running Dhrystone on the VAX 11/780, a machine with large (8 KB) cache configurations.
nominal 1 MIPS [44]. Dividing the results by this value and the
CPU’s frequency normalizes these parameters across results of B. FPGA Implementation Results
several machines. All three IObSoC operating modes were successfully imple-
DPS mented on FPGA running Dhrystone with 100 runs without the
DMIPS = (2) -O3 flag in GCC, which produced the results in Table VII,
1757
thus validating the SoC.
The SoC’s clock frequency is 100 MHz and the LA memory was also implemented on FPGA and successfully executed the
interface is disabled. The DDR operating mode uses the same Dhrystone benchmark on all operating modes.
8 KB cache configuration as the RTL simulation. IObSoC’s development continued at IObundle after this dis-
FPGA resource utilization results are available in Table VI. sertation project’s conclusion and includes ASIC development
As expected, the SoC operating modes that do not use the flow (still underway), a new CPU-agnostic SoC architecture
DDR require a similar quantity of FPGA resources, while the and possibility of using the DDR as auxiliary data storage.
DDR operating mode requires much more resources because A. Achievements
it uses additional and larger components such as the Cache,
The first achievement of this project was a new SoC and
the AXI Interconnect and the MIG core.
development environment using RISC-V, the new standard
TABLE VI
open and free ISA, which has already been adopted by several
FPGA RESOURCE UTILIZATION FOR EACH IO B S O C OPERATING MODE . large companies to design some of their systems and has great
importance in the open-source community.
DDR Boot LUTs 36 KB BRAMs DSPs The second achievement is the highly configurable SoC
architecture, which features several operating modes that allow
No No 1787 8 4
it to be used in different contexts and applications, thus
No Yes 1776 9 4
increasing the range of potential clients for companies who
Yes Yes 13049 29.5 7
adopt it.
The third achievement is the reduction of development time
and effort of new SoCs. Further development of IObSoC can
TABLE VII be done systematically (as described in Section V) and several
D HRYSTONE FPGA RESULTS WITH 100 RUNS AND 100 MH Z CLOCK .
verification tools are already configured and ready-to-use via
simple make commands in a terminal, which accelerates
DDR Boot CPI DPS/MHz DMIPS/MHz
development. This allows small start-up companies to meet
No No 5.496 373 0.212 narrower deadlines and thus be able to compete in the digital
No Yes 5.496 373 0.212 circuit design market.
Yes Yes 21.004 97 0.055 The fourth and final achievement is the reduction of re-
sources spent on developing SoCs. By using RISC-V and
SoC operating modes that run code from the SRAM produce open-source components and tools, acquiring expensive li-
the exact same Dhrystone results on FPGA and RTL simula- censes for CPUs IPs and software tools from large companies
tion. The DDR operating mode, however, is less performant such as ARM is no longer a necessity. Instead, free open-
on FPGA than in simulation because the external memory in source CPUs – such as PicoRV32 – can be used, allowing
the first case is a DDR, while on the second case is an AXI small start-up companies to develop systems with a lower
SRAM, which is faster than DDR. Also, the Cache used on budget and thus increase their competitiveness in the digital
the FPGA for the DDR operating mode is a prior and less circuit design market.
performant version of the Cache used in the RTL simulation B. Future Work
tests, which does not yet work on FPGA with IObSoC. IObSoC can be further developed in several future work
perspectives. The first one is to implement the SoC on an
VII. C ONCLUSIONS
ASIC, which some customers may desire if the volume of
This document introduces IObSoC, a RISC-V SoC devel- SoCs needed is high enough to make the unitary price of each
oped at company IObundle that uses the PicoRV32 CPU and unit lower than that of implementing it on several FPGAs.
other open-source components and can be configured to work The second future work perspective is add support of new
in various operating modes. IObSoC is presented as a base CPU architectures for IObSoC. This is an advantage because
SoC of a development environment for RISC-V SoCs, which PicoRV32, although being small, has a relatively higher CPI,
includes addition of new CPU peripherals, software and hard- thus being inadequate for high CPU performance applications.
ware compilation, RTL simulation and FPGA implementation. The third future work perspective is to create a new SoC
After considering several RISC-V CPU options, the Pi- operating mode for running code from a flash memory unit,
coRV32 core was chosen to build IObSoC, given its simplicity which many customers may be interested in because flash is
and validated examples. The SoC was then built alongside a cheap, reprogrammable and widely used non-volatile solid-
its verification environment, so that it could be tested as its state memory type.
development advanced. The fourth and final future work perspective is to add OS
In order to validate IObSoC’s operation, the SoC was tested functionality to IObSoC. This is an important feature because
with the Dhrystone benchmark [43], a widely used program for it widens the range of supported applications by IObSoC, such
measuring general purpose processor performance. The tests as software programs that require the use of a file system.
were carried out with RTL simulations of the several SoC’s The RISC-V privileged ISA provides privileged instructions
operating modes using the Icarus Verilog simulator. IObSoC required for OSs.
R EFERENCES [21] RISC-V Foundation. RISC-V GNU Toolchain, 2018. [Online] Avail-
able at: https://github.com/riscv/riscv-gnu-toolchain. Accessed on March
[1] José Sousa, Carlos Rodrigues, Nuno Barreiro, and João Fernandes. 2020. GIT code repository.
Building Reconfigurable Systems Using Open Source Components. [22] The LLVM Compiler Infrastructure, 2020. [Online] Available at:
April 2014. https://llvm.org/. Accessed on March 2020.
[2] Luı́s Fiolhais and José Sousa. Warpbird: an Untethered System on Chip [23] Incisive Enterprise Simulator, 2020. [Online] Avail-
Using RISC-V Cores and the Rocket Chip Infrastructure. January 2018. able at: https://www.cadence.com/content/cadence-
[3] E. Blem, J. Menon, and K. Sankaralingam. Power struggles: Revisiting www/global/en US/home/tools/system-design-and-
the RISC vs. CISC debate on contemporary ARM and x86 architectures. verification/simulation-and-testbench-verification/incisive-enterprise-
In 2013 IEEE 19th International Symposium on High Performance simulator.html. Accessed on March 2020.
Computer Architecture (HPCA), pages 1–12, Feb 2013. [24] Mentor® ModelSim®, 2020. [Online] Available at:
[4] Prof. Pravin R. Lakhe. A Technology In Most Recent Processor https://www.mentor.com/products/fv/modelsim/. Accessed on March
Is Complex Reduced Instruction Set Computers (CRISC): A Survey. 2020.
International Journal of Innovative Research & Studies, Volume 2(Issue [25] Icarus Verilog, 2020. [Online] Available at: http://iverilog.icarus.com.
6), 2013. ISSN 2319-9725. Accessed on March 2020.
[26] Introduction to Verilator, 2020. [Online] Available at:
[5] The rise of RISC - [Opinion]. IEEE Spectrum, 55(8):18–18, Aug 2018.
https://www.veripool.org/wiki/verilator. Accessed on March 2020.
[6] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction
[27] Xilinx Vivado Design Suite - HLx Editions, 2020. [Online] Available
Set Manual, Volume I: User-Level ISA, Document Version 20191213.
at: https://www.xilinx.com/products/design-tools/vivado.html. Accessed
RISC-V Foundation, December 2019.
on March 2020.
[7] Andrew Waterman. Improving Energy Efficiency and Reducing Code [28] Intel® Quartus® Prime Software Suite, 2020. [Online] Available
Size with RISC-V Compressed. Master’s thesis, EECS Department, at: https://www.intel.com/content/www/us/en/software/programmable/
University of California, Berkeley, May 2011. quartus-prime/overview.html. Accessed on March 2020.
[8] Editors Andrew Waterman and Krste Asanović. The RISC-V Instruction [29] Olof Kindgren and et. al. FuseSoC, 2020. [Online] Available at:
Set Manual, Volume II: Privileged Architecture, Document Version https://github.com/olofk/fusesoc. Accessed on March 2020. GIT code
20190608-Priv-MSU-Ratified. RISC-V Foundation, June 2019. repository.
[9] ARM. AMBA® AXI™ and ACE™ Proto- [30] José T. de Sousa and et. al. IOb-SoC, March 2020. [Online] Available
col Specification, 2011. [Online] Available at: at: https://bitbucket.org/jjts/iob-soc/src/master. Accessed on March 2020.
http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720 5721/labs/refs/ GIT code repository.
AXI4 specification.pdf. Accessed on March 2020. [31] Avnet, Inc. Kintex UltraScale KU040 Devel-
[10] Xilinx. AXI Reference Guide, March 2011. [Online] Available opment Board, 2015. [Online] Available at:
at: https://www.xilinx.com/support/documentation/ip documentation/ https://www.avnet.com/opasdata/d120001/medias/docus/13/aes-AES-
ug761 axi reference guide.pdf. Accessed on March 2020. KU040-DB-G-User-Guide.pdf. Accessed on March 2020.
[11] João D. Lopes and José T. de Sousa. Versat, a Minimal Coarse-Grain Re- [32] Spartan-6 FPGA SP605 Evaluation Kit, 2020. [Online] Avail-
configurable Array. In High Performance Computing for Computational able at: https://www.xilinx.com/products/boards-and-kits/ek-s6-sp605-
Science - VECPAR 2016, Lecture Notes in Computer Science, pages g.html. Accessed on March 2020.
174–187, Porto, Portugal, July 2017. Springer International Publishing. [33] Cyclone V GT FPGA Development Kit, 2020. [Online] Available
[12] Rajeev Kamal and Neeraj Yadav. NOC AND BUS ARCHITECTURE: at: https://www.intel.com/content/www/us/en/programmable/products
A COMPARISON. International Journal of Engineering Science and /boards and kits/dev-kits/altera/kit-cyclone-v-gt.html. Accessed on
Technology, 4, April 2012. March 2020.
[13] Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, [34] José T. de Sousa and et. al. IOb-Cache, March 2020. [Online]
David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Available at: https://bitbucket.org/jjts/iob-cache/src/master. Accessed on
Hauser, Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu Kim, March 2020. GIT code repository.
John Koenig, Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, [35] José T. de Sousa and et. al. IOb-RV32, March 2020. [Online]
Howard Mao, Miquel Moreto, Albert Ou, David A. Patterson, Brian Available at: https://bitbucket.org/jjts/iob-rv32/src/master. Accessed on
Richards, Colin Schmidt, Stephen Twigg, Huy Vo, and Andrew Water- March 2020. GIT code repository.
man. The Rocket Chip Generator. Technical Report UCB/EECS-2016- [36] José T. de Sousa and et. al. IOb-UART, March 2020. [Online]
17, EECS Department, University of California, Berkeley, Apr 2016. Available at: https://bitbucket.org/jjts/iob-uart/src/master. Accessed on
[14] Eric Matthews and Lesley Shannon. Taiga: a Configurable RISC-V Soft- March 2020. GIT code repository.
processor Framework for Heterogeneous Computing Systems Research. [37] Xilinx. UltraScale Architecture Clocking Re-
2017. sources, October 2019. [Online] Available at:
[15] PULP-Platform. PULPino, May 2019. [Online] Available at: https://www.xilinx.com/support/documentation/user guides/ug572-
https://github.com/pulp-platform/pulpino. Accessed on March 2020. GIT ultrascale-clocking.pdf. Accessed on March 2020.
code repository. [38] Micron Technology. DDR4 SDRAM EDY4016A -
[16] PULP-Platform. RI5CY, 2020. [Online] Available at: 256Mb x 16 Datasheet, July 2017. [Online] Available
https://github.com/pulp-platform/riscv. Accessed on March 2020. at: https://www.micron.com/products/dram/ddr4-sdram/part-
GIT code repository. catalog/edy4016aabg-dr-f. Accessed on March 2020.
[17] P. Davide Schiavone, F. Conti, D. Rossi, M. Gautschi, A. Pullini, [39] José T. de Sousa and et. al. Verilog AXI Components,
E. Flamand, and L. Benini. Slow and steady wins the race? A March 2020. [Online] Available at: https://github.com/jjts/verilog-
comparison of ultra-low-power RISC-V cores for Internet-of-Things axi/tree/bce3d5e93398e3ee628f60a755f46fd6d92ad8db. Accessed on
applications. In 2017 27th International Symposium on Power and March 2020. GIT code repository.
Timing Modeling, Optimization and Simulation (PATMOS), pages 1–8, [40] José T. de Sousa and et. al. IOb-Timer, March 2020. [Online]
Sep. 2017. Available at: https://bitbucket.org/jjts/iob-timer/src/master. Accessed on
[18] PULP-Platform. PULP, January 2020. [Online] Available at: March 2020. GIT code repository.
https://github.com/pulp-platform/pulp. Accessed on March 2020. GIT [41] Xilinx ISE Design Suite, 2020. [Online] Available at:
code repository. https://www.xilinx.com/products/design-tools/ise-design-suite.html.
[19] C. Wolf and et. al. PicoRV32 - A Size-Optimized Accessed on March 2020.
RISC-V CPU, November 2019. [Online] Available at: [42] José T. de Sousa and et. al. IOb-SoC-Dhrystone, March 2020. [Online]
https://github.com/cliffordwolf/picorv32. Accessed on March 2020. GIT Available at: https://bitbucket.org/jjts/iob-soc-dhrystone/src/master/. Ac-
code repository. cessed on March 2020. GIT code repository.
[20] Tim Edwards. Raven: An ASIC implementation of the [43] Reinhold P. Weicker. Dhrystone: A Synthetic Systems Programming
PicoSoC PicoRV32, October 2019. [Online] Available at: Benchmark. Commun. ACM, 27(10):1013–1030, October 1984.
https://github.com/efabless/raven-picorv32. Accessed on March 2020. [44] Roy Longbottom. Dhrystone Benchmark Results On PCs and Later
GIT code repository. Devices Roy Longbottom, 08 2017.