4
4
Integration
Zhenya Zang Yao Liu Ray C.C. Cheung
City University of Hong Kong City University of Hong Kong City University of Hong Kong
Kowloon Tong, Hong Kong Kowloon Tong, Hong Kong Kowloon Tong, Hong Kong
Email: zzang2-c@my.cityu.edu.hk Email: liu.yao@my.cityu.edu.hk Email: r.cheung@cityu.edu.hk
Abstract—In IoT (Internet of Things) applications, security the IP protection, However, RISC-V is an open source ISA
issues are increasingly attracting attention. However, current (Instruction Set Architecture), and we can design the micro
embedded processors lack cryptographic protection mechanism. architecture as our preference and specific requirements. It is
In this paper, an austere RISC-V core processor with RV32I
subset instruction is deemed as a master device to cooperate with currently regarded as an architecture standard for educational
an AES cryptographic engine in an SoC, due to its openness and and industrial applications, and it has been comprised of a
flexibility. This core contains separate instructions and a data variety of modularized ISAs such as RV32I, RV32E, RV64I,
bus connected to a Wishbone crossbar. A Spartan-6 XC6SLX9 and non-standard ISAs. The RV32I subset indicates that it
board is taken as an architecture protocol verification platform, possesses of 32-bit addressing space, 32 integer instructions
where the peak operating frequency of the RISC-V core and
the encryption SoC is 105MHz and 111.5MHz, respectively. The and 32 GPRs (General Purpose Register), which is exactly
hardware resource utilization is reduced compared with the the work that this paper has finished. The RISC-V foundation
MIPS core with identical efforts provides these suits with compatible toolchains [2], by which
Index Terms—RISC-V, SoC, Security Processor, Wishbone C and assembly code can be both compiled to machine code.
However, this design consists of various register configura-
I. I NTRODUCTION tions, its therefore more efficient to program by using assembly
Nowadays IoT technology establishes connections among code. RISC-V instructions are modularized and extensible, so
sensors, mobile devices and vehicles, by which significant it has some reserved space for designers to design specific
traffic signal, humans health information and environmental accelerators that can be applied to the DSP domain, parallel
index can be collected and transferred. Some studies [22] show computing and some machine learning algorithms [3].
that the number of IoT devices will increase to 200 billions in In this paper, an efficient trade-off between resource over-
2020. Meanwhile, secure issue is critical to IoT applications. head and throughput is achieved, the first portion is to
In 2016, many servers of American server provider called implement an RV32I core with 5-stage pipelines based on
DYN company suffered an attack that results in a large-scale the MIPS core [4]. The peak operating frequency of the
network crash, many fields like payment, financial media and RV32I core is 105.1 MHz, and the Harvard memory access
social media cannot work properly. Therefore, the crucial is utilized. Furthermore, the simulation result is verified by
information should be secure from attackers, and security Modelsim software, and the bitstream is programmed on
approaches should be provided when sharing data. Spartan-6 XC6SLX9 board. The second portion is designing a
Besides, owing to the large amount of data, the com- cryptographic SoC with 111.5 MHz peak operating frequency
putational capability of devices in IoT systems should be based on the RV32I core. An AES IP core and UART
enhanced significantly. One project [1] shows that software- are utilized as an encryption engine and a communication
based methods cannot process big data with a high speed peripheral, respectively. The simulated cipher is verified a
and throughput, whereas hardware-based infrastructures can professionally cryptographic calculator, and hardware resource
implement sophisticated security computations effectively and utilization and synthesis information are reported by Xilinx
efficiently. FPGA (Field Programmable Gate Array) has an Synthesis Technology (XST) kit. In the next section, some
ample performance in terms of the parallel computing and related work is illustrated. The design details of the core and
reconfigurable feature. Therefore, in this article, FPGA is the SoC are depicted in section III. Section IV contains some
adopted as a platform to fabricate and verify a cryptographic performance evaluations. Consequently, section V concludes
SoC. the finished work.
As for the master microcontroller, hazards will be encoun- The main contributions of this paper are listed as follows:
tered when licensed ISAs or micro-architectures cost much • We design an RV32I softcore with 5-stage in-order multi-
money and cannot be modified and applied to different appli- cycle pipeline and loosely coupled individual instruction
cation scenarios if we use the IP (Intelligent Property) directly. and data Wishbone bus.
As we know, it is difficult to implement some modifications • We implement an extensible, programmable and integrat-
on a hardware level to x86 and ARM processor because of ed SoC with an AES-128 encryption engine and other
l-)))
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. RISC-V Core Micro Architecture
peripherals. TABLE I
• The performance evaluation shows the proposed work is RV32I ENCODING FORMAT [2]
R-Type func7 rs2 rs1 func3 rd opcode
of small overhead and high speed. I-Type imm[11:0] rs1 func3 rd opcode
S-Type imm[11:5] rs2 rs1 func3 imm[4:0] opcode
II. RELATED WORK U-Type imm[31:12] rd opcode
In academia domain, there are various soft processors pro-
totypes providing security mechanism on FPGA. For example, taken as an encryption engine to encrypt the data transformed
An FPGA based IoT device [1] prevents FPGA bitstream through the SoC.
from being cracked. And the system image of it is encrypted III. DESIGN DETAILS
to strength the system security. Additionally, a single-chip
processor [5] advanced a secure processor model that im- A. RISC-V (RV32I) core
plemented an AES engine, a TRNG (True Random Number The processor core consists of 3 top-level modules RV32I
Generator) and a memory integrity tree on the openSPARC core, IROM (Instruction ROM) and DRAM (Data RAM).
FPGA platform to improve the security of this soft processor. RV32I consists of 47 instructions that also contain sever-
However, it can be deemed as a tightly-coupled architecture al SCALL/SBREAK/CSR instructions that makes hardware
and relies on the specific platform, thus it might be not implementation more integrated. However, it increases the
compatible when utilized in other platforms. complexity of implementation, and these instructions are not
As for the soft processor, there are some existing lightweight essential to this project. Hence, this RV32I core includes 38
RISC-V cores currently, such as Z-scale [6] from Berkeley, instructions in total. There are 32 registers in the register file
ORCA [7] from Vectorblox company, and so on. The Z- that is indexed from 0 to 31. It has 32 GPRs apart from 0th,
scale is a three-stage in-order pipeline soft processor written because it is hardwired to constant 0, which resembles MIPSs
in chisel and resembles in application scenarios to ARM register principle. RV32I base instruction set mainly includes
Cortex series to target embedded systems design. Therefore, four instruction formats namely, R, I, S, and U, as shown in
it is not the appropriate processor to be fabricated for FPGA TABLE I. Every instruction is fixed in 32 bits, and memory is
framework. For the ORCA processor, it is written in VHDL aligned in every four bytes boundary. The required GPRs index
HDL (Hardware Description Language), and its initial aim is are placed in the fixed position. Therefore, it is convenient for
to cooperate with some commercial processors of this vendor, instruction decoder to decode the index of the corresponding
thus it is not customizable for a secure processor. To sum up, register and then access to register file. Fig. 1 describes the
a compatible and lightweight processor should be designed for data and control signal path in the architecture, the decode
different purposes. In this paper, a general purpose lightweight unit has three types of input signals that are opcode with
RV32I core is implemented. Moreover, an AES IP core is index [6:0], funct3 code with index [14:12] and funct7 code
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Configuring IROM experiment flow
Fig. 2. Elements allocation in MIPS Fig. 3. Elements allocation in RISC-
IROM [4] V IROM
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.
TABLE II
ASSEMBLY CODE TESTCASE AND CORRESPONDING HEX
CODE, EXPECTED RESULT
Assembly Code Hexadecimal Code Expected Result
lui x1, 0x20000 200000b7 x1=0x200000000
ori x1,x1,0x100 1000e093 x1=0x200000100
lui x2, 0x30000 30000137 x2=0x300000000
add x3, x2, x1 001101b3 x3=0x500000100
addi x3,x3, 0xf 00f18193 x3=0x50000010f
sub x3, x3, x2 402181b3 x3=x020000010f
lui x1, 0xf0000 ffff00b7 x1=0xfff00000
Fig. 6. Operation forward handles data hazard slt x2, x1, x0 0000a133 x2=0x00000001
sltu x2, x1, x0 0000b133 x2=0x00000000
lui x1, 0x00001 000010b7 x1=0x00001000
slti x3,x1, -0x500 b000a193 x3=0x00000000
sltiu x3,x1,-0x500 b000b193 x3=0x00000001
ori x4, x0, 0x123 12306213 x4=0x00000123
sw x4, 0x0(x0) 00402023 x4=0x00000123
ori x5, x0, 0x123 12306293 x5=0x00000123
lui x4, 0x0 00000237 x4=0x00000000
lw x4, 0x0(x0) 00002203 x4=0x00000123
beq x4, x5, label 00520263
label:
ori x5, x0, 0x0 00006293 x5=0x00000000
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.
TABLE III
ADDRESSING SPACE OF EVERY SLAVE MODULE
TABLE IV
ADDRESSING SPACE AND DEFINITIONS OF REGISTERS IN AES
IP CORE INTERFACE Fig. 9. SoC overview diagram
Name of Addressing
register space
Bit width Description IP core should be leveraged [15]. The registers of plain text,
Base+0x24 ciphertext, encryption key and control unit should be identified
Input 32 Store plain text
to Base+0x30 in the interface in order to be configured by assembly code
Base+0x14
Key
to Base+0x20
32 Store key conveniently. The corresponding functions and addresses of
Base+0x4 registers in AES IP core interface are defined in TABLE IV.
Output 32 Store cipher
to Base+0x10 All the processed data transferred by the interface on the SoC
Control Base+0x0 32 Enable en/decryption should be configured by executing assembly code of the RISC-
V soft core.
is presented in TABLE II. After being converted into the
hexadecimal code, it can be fetched by a program counter. The AES IP core is defined as a slave device with a base
In TABLE II, we can see the corresponding hexadecimal address 0x30000000. Hence, all registers in this module take
code of this assembly code and calculated results written 0x30000000 as the base address when executing addressing
back to w data and w addr ports in the register file. task. The data input register is 32-bit width that is used to
From Fig. 8, a portion of the simulated register index and store plain text or cipher, when the IP core executes encryption
expected results in Modelsim environment are presented and decryption processes. Due to the data width, plain text or
that are exactly same as the proposed results shown in cipher should be transferred four times through the Wishbone
TABLE II. crossbar. Therefore, its addressing space is from base+0x0 to
base+0x30 with a reset value 0x0. Besides, the key register is
B. Encryption SoC also a 32-bit width register that stores the encryption key and
In preceding sections, the IROM and DRAM are embedded provides it to the bus when AES should execute encryption and
in the processor core. However, in practical applications, due decryption operations. Moreover, the most significant register
to the large boot program volume, the IROM cannot be in this module is the command control register, 1st bit of
embedded in FPGA simply but utilizing external resources like the control register refers to the enable signal driving AES
flash and SDRAM (Synchronous Dynamic Random Access module to get the information, if the AES core gets ready
Memory). In this SoC design, we implement a SoC prototype and loads anticipated plaintext or cipher into the data input
that contains a single port block ROM and a single port block register. The 3rd bit is responsible to decide if AES module
RAM taken as IROM and DRAM respectively. Designers can should execute encryption or decryption operation, TABLE V
change the depth by revising the macros. External memory introduces functions of the control unit register.
can be leveraged according to the specific requirement as
long as the module is compatible with Wishbone bus interface
specification. Owing to the fact that the storage structure IV. PERFORMANCE EVALUATION
is a Harvard architecture, the RISC-V core has individual
instruction and data interfaces that occupy two master ports A. RISC-V(RV32I) Core Properties
on the Wishbone bus. Fig. 9 generally shows the structure
of the SoC. The UART controller, the GPIO controller, the In terms of the timing performance and resource usage con-
AES controller, the BROM interface and BRAM interface dition of the RISC-V core, we can obtain the information from
are connected to their respective slave ports. To sum up, the Xilinx synthesis report after translating, mapping and lay-
addressing space of every slave device is described in TABLE out. The peak operating frequency of this core is 105.108MHz
III. In preceding RISC-V core design, the fetched address of running on SPARTAN-6 XC6SLK9 FPGA board, and the
the first instruction is 0x00000000, whereas the first instruction hardware resource utilization rate is much less than the MIPS
address in SoC should be 0x30000000, because BROM is core with the identical characteristics like supporting 32-bit
working as a slave device and its initial addressing space is integer computation and containing 32 GPRs. TABLE VI
0x30000000. Thus, a slight amendment of program counter presents the hardware utilization and peak operating frequency
module should be done. Another issue is that the input and information comparisons between MIPS and RISC-V core, the
output port of this IP core is not compatible with the Wishbone DFF mainly contributes to the area overhead. Therefore, the
crossbar bus, so an interface between Wishbone bus and AES RV32I core has a satisfactory performance of low overhead.
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.
TABLE V for different encryption strength and situations as long as the
REGISTERS IN CONTROL UNIT interface is compatible with the Wishbone crossbar. In addition
Bit Index Description Reset Value to that, most IoT sensors communication specifications are I2C
31:0 Reserved 0x0
and SPI, so these peripherals can be implemented on it, and the
0:Execute encryption task
2 0x0 data transferred by them can be encrypted as well for diverse
1:Execute decryption task
0:Not ready to receive data applications.
1 1:Ready to receive data and load it 0x0 R EFERENCES
into data input register
0:Not begin to encrypt/decrypt [1] Yuan Liu, Jed Briones, Ruolin Zhou, and Neeraj Magotra,“Study of Se-
0 0x0 cure Boot with a FPGA-based IoT Devic”, 2017 IEEE 60th International
1:Begin to encrypt/decrypt
Midwest Symposium on Circuits and Systems (MWSCAS) , 2017
[2] RISC-V Community, “The RISC-V Instruction Set Architecture”. [On-
TABLE VI line]. Available: http: //risc-v.org/
HARDWARE UTILIZATION AND PEAK OPERATING [3] Micheal Gautschi, Pasquale Davide Schiavone, Andreas Traber. Igor
FREQUENCY Loi, Antonio Pullini, Davide Rossi, Eric Flamand, Frank K, and Luca
Benini. “Near-threshold RISC-V Core With DSP Extensions for Scalable
Core Slice LUT DFF RAM Peak Freq IoT Endpoint Devices”, 2017 IEEE TRANSACTIONS ON VERY
MIPS32I [4] 574 1998 1749 48 107.479 MHz LARGE SCALE INTEGRATION (VLSI) SYSTEM, VOL. 25, NO. 10,
RV32I 498 1796 1844 48 105.108 MHz OCTOBER, 2017
[4] Silei Lei, “Design a CPU by ourselves” Electronic Industry Press, 2014.
(In Chinese)
[5] UC Berkeley Architecture Research , “Verilog Version of Z-scale”
[Online]. Available: http://github.com/ucb-bar/vscale
[6] Jakub M. Szefer, Wei Zhang, Yu-Yuan Chen, David Champagne 3, King
Fig. 10. The decrypted plain text data o and its cipher text data i with the Chan, Will X.Y. Li, Ray C.C. Cheung, Ruby B. Lee “Rapid Single-Chip
key 0x0 Secure Processor Prototyping on the OpenSPARC FPGA Platform” 22nd
IEEE International Symposium on Rapid System prototyping, 2011
[7] Vectorblox Computing Inc. VectorBlox/risc-v. [Online]. Available:
B. Verification of Encryption SoC http://github.com/VectorBlox/risc-v
The measure to verify the validity of the SoC is to configure [8] A.Waterman, Y. Lee, D. Patterns, and K. Asanovic, “Volum I: User-level
ISA Version 2.0, The instruction Set manual.” [Online]. Available: http:
the corresponding registers by programming assembly code. //risc-v.org/spec/risc-v-spec-v2.0.pdf
In this test case, the configured input ciphertext and the [9] RISC-V Community, “GNU toolchain for RISC-V, including GCC”,
output plain text are shown in Fig. 10, the encryption key [Online]. Available: https://github.com/riscv/riscv-gnu-toolchain
[10] Yao Liu, Ray C.C. Cheung and Hei Wong, “Lightweight Secure Pro-
and ciphertext is set to 0x0. The 1st bit of control register cesser Prototype on FPGA”, The international Conference on Field
should be set to a high level, the 2nd and 3rd should be also CProgrammable Logic and Applications (FPL), 2018
configured to high level, because it is an encryption process. [11] Ho-Cheung Ng, Liu Cheng and Kwok-Hay So, “A Soft Processor
Overlay with Tightly-coupled FPGA Accelerator” in 2nd International
After calculating the decrypted plain text by a dedicated AES Workshop On Overlay Architecture For FPGAs (LOAF), 2016
calculator, we can observe the result is exactly identical with [12] Suseela Buli, Pradeep Gupta, Kuruvilla Varghese and Amrutur Bharad-
the value shown in Fig. 10. Additionally, the hardware resource waj, “A RISC-V ISA compatible processor IP for SoC” in International
Symposium on Devices, Circuits and Systems (ISDCS), 2018
utilization and operating frequency of this SoC are also [13] Don Kurian Dennis, Ayushi Priyam, Sukhpreet Singh Virk and Sajal
estimated by XST. After the place and route procedures, the Agrawal, “Single Cycle RISC-V Micro Architecture Processor and
peak operation of the SoC can reach 111.5 MHz, whereas the its FPGA Prototype” in 7th International Symposium on Embedded
Computing and System Design (ISED), 2017
hardware resource utilization does not increase dramatically, [14] Wishbone System-on-Chip(SoC) Interconnection Architecture
because several dedicated IP core like BROM, BRAM are for Portable IP Cores Revision B.3 2002.9 [Online]. Available:
applied on this system, and they would save massive hardware http://opencores.org/ocsvn/wb conmax/wb conmax
[15] AES Encryption core [Online]. Available: http://opencores.org/
resource in practice. [16] Imagination Technologies Limited, “MIPS Architecture.” [Online].
Available: http://imgtec.com./mips/architecture/
V. CONCLUSION [17] Zhanghao Wu, RISC-V CPU in verilog [Online]. Available: http-
s://github.com/Michaelvll/RISCV CPU
The work in this paper can be divided into two portions, the [18] Aneesh Raveendran, Vinayak Baramu patil, David Selvakumar and
first one is designing a lightweight RISC-V (RV32I) processor Vivian Desalphine, “An RISC-V instruction set processor-micro-
core with five-stage in-order pipeline and Harvard memory ac- architecture design and analysis” in International Conference on VLSI
Systems, Architectures, Technology and Applications (VLSI-SATA),
cess. It presents a qualified performance in terms of operating 2016
speed and area overhead. Moreover, an encryption SoC with [19] D. Oliveira, T. Gomes, and S. Pinto, “Towards a Green and Secure
a Wishbone crossbar is implemented in the second segment. Architecture for Reconfigurable IoT End-Devices” 9th ACM/IEEE In-
ternational Conference on Cyber-Physical Systems, 2018
When corresponding registers are configured, it can encrypt [20] Karyofyllis Patsidisa, Dimitris Konstantinoua, Chrysostomos
and decrypt the input data. According to the simulation, we can Nicopoulosb, Giorgos Dimitrakopoulos “A low-cost synthesizable
see the SoC works properly and has a fast operating frequency RISC-V dual-issue processor core leveraging the compressed Instruction
Set Extension” Journal, Microprocessors and Microsystems 61 (2018)
and a low area ahead. On top of that, this architecture is 1C10
programmable and extensible, all registers can be configured [21] Semuel Steffl, Sherief Reda “LAcore: A Supercomputing-Like Linear
by initializing IROM. Algebra Accelerator for SoC-Based Design” IEEE 35th International
Conference on Computer Design, 2017.
The entire SoC is loosely-coupled so that other encryption [22] Intel. “A Guide To The Internet of Things” [Online]. Available: http:
engines like ECC and SHA also can be equipped on this SoC //www.intel.com/content/www/us/en/internet-of-things
Authorized licensed use limited to: KIIT University. Downloaded on March 30,2025 at 16:15:01 UTC from IEEE Xplore. Restrictions apply.