Risc V1
Risc V1
1. Introduction 1-2
1
1.1 Objective
2
1.2 Scope of the Project
3
2.1 Overview of RISC-V Architecture
4
2.2 FPGA Technology and Applications
5-6
2.3 Related Work
4. Implementation 11-13
13
4.4 Memory and i/o interfaces
Pipeline)
5. Simulation and Testing 14-16
6.
Results And Discussion 17-21
7.1 Advantages 22
7.2 Application 23
7.3 Disadvantages 24
Conclusion 25
Future Work 26
References 27
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
With the evolution of processor design technology and the development of large-scale integrated circuit
design technology, now we have entered the era of processors, and the CPU is the core, integrated circuits
and memories complete the main functions of information processing in the system. The CPU is the core
component of the processor, and how to design and implement an effective processor has become a key
technology.
At the same time, the demand for microelectronics technology in the embedded field has increased year by
year, which has promoted the development of the RISC-V instruction set. It has the advantages of free and
open source, minimalism, modularity and customizable expansion, which is undoubted a good chance for
the processor industry. A rare good opportunity .The integrated circuit industry is a national strategic
industry, and it is the source and power to promote the development of the information industry. The
processor design has to consider the assembly line design. The classic assembly line has five stages, and the
more the number of pipeline stages, the better is the throughput but the overhead will be greater, so this
article uses the most common three stage pipeline depth in ARM to achieve the goal of processor design.
Based on the RISC-V architecture, this design researched a processor core that supports a subset of RV32IM
instructions, including 47 basic integer instructions and 8 extended inte ger multiplication and division
instructions, using three-stage pipeline technology, and finally realized a RISC-V instruction set sequence,
single launch, single-core 32-bit processor, and the processor was simulated and verified to achieve the pre-
determined goal.
1.1 Objective:
The primary objective of this project is to design, implement, and verify a RISC-V-based processor core on
an FPGA platform. The project aims to combine the strengths of the RISC-V architecture and the flexibility
of FPGAs to deliver a working prototype of a soft-core processor that can execute RISC-V instructions.
• To study and understand the RISC-V instruction set and its architecture.
• To develop a functional RISC-V processor core using hardware description languages (HDLs) such as
Verilog or VHDL.
• To simulate and verify the behavior of the processor core using industry-standard tools.
• To implement the verified design on an FPGA and test its real-time performance.
• To evaluate the processor's performance in terms of resource utilization, timing, and functionality.
This project will serve as a hands-on exercise in computer architecture, digital logic design, and hardware
prototyping.
The scope of this project is defined by its focus on the design and implementation of a basic RISC-V
processor on an FPGA. The project will address the following key areas:
• Instruction Set Support: The project will implement the base RV32I instruction set (32-bit integer
instruction set), which includes arithmetic, logic, control flow, and memory operations. More advanced
extensions like floating point, vector operations, and compressed instructions are not within the scope
of this project.
• Processor Design: A simple, single-cycle or multi-cycle processor design will be developed. More
advanced pipelining, branch prediction, or out-of-order execution features are considered out of scope
for the initial implementation.
• Hardware Description: The processor will be described in Verilog or VHDL and simulated using tools
such as Model Sim or Vivado Simulator.
• FPGA Implementation: The final design will be synthesized and implemented on an FPGA development
board, such as the Xilinx Spartan-6 or Artix-7, depending on availability.
• Testing and Validation: Simple test programs will be written in RISC-V assembly, compiled using the
RISC-V GNU toolchain, and executed on the FPGA implementation. While the project lays the
foundation for a RISC-V-based processor design, more advanced features such as operating system
support, multi-core implementation, or cache subsystems may be considered for future work beyond the
scope of the current implementation.
CHAPTER 2
LITERATURE REVIEW
2.1 Overview of RISC-V Architecture:
RISC-V is a modern open-source instruction set architecture (ISA) developed at the University of
California, Berkeley. It is designed to be simple, modular, and extensible, allowing designers to build
customized processor architectures. The base instruction set (RV32I) includes 32-bit integer instructions
and is designed to be small yet sufficient for most general-purpose applications.
The modular nature of RISC-V allows developers to choose only the necessary components, enabling
efficient hardware implementation and reducing area and power consumption.
The figure above represents the basic components of a RISC-V core including the instruction fetch unit,
decode unit, register file, ALU, and memory interface. The open nature of RISC-V fosters innovation and
collaboration in both academic and industrial research.
A. RISC-V instruction set RISC-V is an open-source instruction set architecture (ISA) based on the reduced
instruction set computer (RISC) prin ciple. The fixed 47-instruction RV32I is used as the core. It is also the
only module that RISC-V requires the processor to support. Only the RV32I instruction subset module can
run a complete software stack. The other instruction subsets are all available selected modules. The
representative modules include M, A, F, D, C [3]. RISC-V architecture instruction set modularization
chooses different configurations of RISC V instruction sets to achieve, as shown in Table 1 .
• B. RISC-V instruction format The instruction length of RISC-V is 32 bits. An instruction is composed
of several parts of separate numbers that is all composed of 0 and 1. Each part has its specific function. In
the RISC-V instruction set, these individual numbers are fixed in the same position in different instruction
types, and the digital composition of related instructions also has similarities, which greatly reduces the
complexity of processor design . The instruction set format used in this design is shown in Figure 1 .
C. Pipeline technology The execution process of RISC-V instructions generally needs to go through five
stages, which are fetching, decoding, executing, fetching, and writing back. The five stages of in struction
execution are shown in Figure 2. Assuming that these five steps can be completed in time T, if pipeline
technology is used to execute all stages of the three instructions, it will take 7T. If these three instructions are
executed in a single cycle and non-pipelined situation, it will take 15T. It can be seen that pipeline
technology is greatly increased the number of instructions executed, thereby improving work efficiency.
Usually, digital system clock frequency and performance improvement can be achieved through pipeline
technology, and pipeline technology is also the core technology of processor design.
A significant amount of research has been conducted in the design and implementation of RISC-V
processors. Some notable works include:
• Rocket Chip: A RISC-V SoC generator developed at UC Berkeley. It includes support for multi-core
processors and advanced memory subsystems.
• PicoRV32: A minimalistic RISC-V core optimized for low-resource FPGAs, supporting only RV32I
instructions.
• VexRiscv: A highly configurable RISC-V CPU core written in SpinalHDL, suitable for lightweight
FPGA implementations.
Academic projects and open-source communities have contributed immensely to the evolution of RISC-V.
These implementations serve as excellent references for designing a RISC-V processor in terms of
architecture, pipelining strategies, and system integration.
The motivation behind this project aligns with the current trend of adopting open ISAs for education,
research, and product development. By implementing a RISC-V processor on an FPGA, this project
contributes to the growing ecosystem of accessible and open processor technologies.
These references provide insights into design trade-offs, performance benchmarks, and hardware utilization
patterns that can guide the current implementation.
The RISC-V ecosystem is supported by a large open-source community and a variety of tools:
CHAPTER 3
DESIGN METHODOLOGY
3.1 System Overview:
The system is designed around a simple RISC-V core that can execute a subset of the RV32I instruction set.
The processor will interact with instruction and data memory, perform arithmetic and logic operations, and
support basic control flow instructions.
The processor operates in a fetch-decode-execute cycle. Instructions are fetched from instruction memory,
decoded by the control unit, executed by the ALU, and the results are written back to the register file or
memory. A program counter (PC) manages sequential instruction flow, while branching and jumping
instructions enable control flow changes.
The design of the RISC-V processor includes several essential hardware modules that work together to
execute instructions efficiently. The Arithmetic Logic Unit (ALU) is responsible for carrying out all
arithmetic and logical operations defined in the RV32I instruction set. The Register File comprises 32
general-purpose 32-bit registers, providing two read ports and one write port, enabling simultaneous data
access. The Control Unit plays a critical role in decoding instructions and generating appropriate control
signals to guide data flow and operation execution. Instruction Memory holds the program code, while Data
Memory facilitates the storage and retrieval of data during load and store operations. The Immediate
Generator extracts constant values embedded in instruction formats to support immediate-type operations.
Finally, the Program Counter (PC) maintains the address of the currently executing instruction and ensures
sequential flow or branching based on the instruction logic.
• Arithmetic Logic Unit (ALU): Executes arithmetic and logic operations defined in the RV32I ISA.
• Register File: Contains 32 general-purpose 32-bit registers. Supports two read ports and one write port.
• Control Unit: Decodes instructions and generates control signals.
• Instruction Memory: Stores the RISC-V program.
• Data Memory: Accessed for load and store operations.
• Immediate Generator: Extracts immediate values from instruction fields.
• Program Counter (PC): Holds the address of the current instruction.
This modular structure allows for simplicity in development and clarity during testing and debugging.
These tools facilitate each stage of the development lifecycle, from RTL design to testing on hardware.
The project is implemented in multiple structured and iterative stages, beginning with an in-depth analysis
of the RV32I instruction set architecture (ISA). This involves identifying the core operations required for
implementation. Following this, individual hardware modules like the ALU, register file, and control unit
are developed using Verilog HDL. These modules are then integrated into a cohesive top-level RISC-V
processor design. The complete system is verified through extensive simulation using testbenches in
ModelSim to ensure functional correctness. After successful simulation, the design is synthesized and
implemented on a Xilinx FPGA using Vivado, where the design is translated into a hardware bitstream. This
bitstream is subsequently used to program the FPGA board. The final validation phase involves executing
compiled RISC-V programs on the hardware and examining outputs via hardware interfacing or debugging
tools. Each of these phase’s feeds back into the process in an iterative manner, enabling continuous
refinement. This methodology ensures the processor design is reliable, efficient, and scalable for future
enhancements.
1. Instruction Set Analysis: Understanding the RV32I ISA and identifying the necessary operations to
implement.
2. Module Design: Designing individual components such as the ALU, register file, and control unit in
Verilog.
The development is iterative. After initial implementation, simulations help identify bugs or inefficiencies,
which are then corrected before final deployment on hardware.
This structured methodology ensures correctness and reliability of the processor design and allows for
scalable improvements in future iterations.
CHAPTER 4
IMPLEMENTATION
4.1 Processor Design:
The processor design centers around a modular, single-cycle architecture that executes instructions from the
RV32I base integer instruction set. Each instruction is processed in one clock cycle, simplifying timing
analysis and reducing control complexity. The design leverages clearly defined data paths and control signals,
making it highly suitable for educational and FPGA-based projects. Major modules such as the ALU, register
file, control unit, and memory blocks are designed independently and then integrated to form the processor
core. The processor operates by fetching instructions from memory, decoding them using the control unit,
executing them through the ALU, and writing results to registers or memory, as required.
A top-level Verilog module integrates these subsystems, handling instruction sequencing, branching,
memory interaction, and result forwarding. The architecture ensures deterministic execution and is a strong
foundation for implementing pipelining or advanced features in future iterations.
The implemented processor adheres to the RV32I ISA, the 32-bit base integer instruction set of RISC-V. It
includes support for arithmetic, logic, load/store, and control flow instructions. Key instructions include:
• Arithmetic and Logic: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA
• Immediate Arithmetic: ADDI, ANDI, ORI, XORI
• Memory Operations: LW, SW
• Branch Instructions: BEQ, BNE, BLT, BGE
• Jump Instructions: JAL, JALR
The ISA supports 32 general-purpose registers (x0–x31), where x0 is hardwired to zero. Instruction encoding
follows fixed 32-bit formats (R, I, S, B, U, J), with dedicated fields for opcode, register indices, function
codes, and immediate values. The Immediate Generator module handles decoding of immediate values from
instruction formats, simplifying control logic.
Although the current implementation follows a single-cycle design, the data path is structured in a way that
supports future pipelining. The processor operates in a standard fetch-decode-execute memory-writeback
cycle. The primary data path includes:
• Instruction Fetch: The Program Counter (PC) supplies the instruction address to the instruction memory.
• Instruction Decode: The fetched instruction is decoded, and operands are read from the register file.
• Execution: The ALU performs computations based on control signals.
• Memory Access: For load/store instructions, data memory is accessed.
• Write Back: The result is written back to the register file if applicable.
Control signals are generated to route data between modules correctly, and the ALU performs result
computation based on the decoded instruction. Branching is handled via PC updates based on branch
outcomes.
The processor uses dual-port memories for instruction and data to allow simultaneous access during
execution. The memory modules are Verilog-based RAM blocks, with configurable sizes suitable for on-
chip block RAM in FPGAs. Load and store instructions support byte or word addressing.
The I/O interface is designed to be extensible. In the current implementation, I/O is simulated using
ModelSim testbenches. For hardware interfacing, GPIOs can be mapped to LEDs, switches, or UART for
serial communication on the FPGA board. Address decoding logic is added to distinguish between memory
and I/O-mapped accesses.
This modular approach to memory and I/O makes the processor adaptable for future peripheral integration
like timers, displays, or serial interfaces.
This section provides a complete overview of the internal working and interconnection of modules within
the RISC-V processor, emphasizing simplicity and modularity while maintaining fidelity to the ISA
specification.
1. The designed RISC-V processor operates using a 3-stage pipeline architecture, which enhances
instruction throughput and performance compared to a single-cycle design. The pipeline stages are:
2. Instruction Fetch (IF): Retrieves the instruction from instruction memory using the current value of
Programcounter.
3. Instruction Decode (ID): Decodes the instruction, reads operands from the register file, and generates
control signals.
4. Execute/Write-Back (EX/WB): Performs ALU operations, memory access (if required), and writes
the result back to the register file.
5. This segmentation allows multiple instructions to be processed simultaneously at different stages of
execution, resulting in improved utilization of hardware resources and better overall instruction
throughput.
Hazard Manageme
1.DataHazards:
• Occur when instructions depend on results of previous instructions Basic forwarding logic is
implemented to mitigate these, allowing immediate reuse of results without stalling.
2.ControlHazards:
Caused by branch or jump instructions that change the PC.In the current design, branch instructions
introduce a 1-cycle stall, handled by flushing or invalidating the fetched instruction when a branch is
taken.
Pipeline Registers:
Pipeline registers are added between each stage to hold intermediate values
ID/EX Register: Holds decoded operands, control signals, and instruction fields.
These registers ensure data continuity between stages and preserve instruction flow during simultaneous
execution.
Modular and scalable design for future addition of stages (e.g., MEM or WB).
Limitations
Does not implement complex hazard resolution (e.g., full branch prediction or speculative execution
CHAPTER 5
Verification plays a critical role in ensuring the correctness and robustness of a processor design before it
is implemented on actual hardware. For this RISC-V processor, a top-down and modular verification
strategy was used, ensuring both individual module correctness and the overall processor behavior.
1. Unit-Level Verification
Each hardware block—such as the ALU, register file, control unit, instruction memory, data memory, and
immediate generator—was tested in isolation using dedicated Verilog testbenches.
• ALU: Tested for all arithmetic and logical operations (ADD, SUB, AND, OR, etc.) with both signed
and unsigned inputs. Overflow and carry handling were verified.
• Register File: Checked for dual-read, single-write behavior. x0 was verified to always output zero
regardless of writes.
• Immediate Generator: Verified that R, I, S, B, U, and J-type instructions correctly extracted the
immediate values from the instruction.
• Control Unit: Ensured correct generation of control signals (ALUOp, MemRead, MemWrite, Branch,
etc.) based on the instruction opcode.
2. Integration-Level Verification
After verifying the modules individually, the full processor system was assembled and tested for end-to-end
correctness.
• Full programs (e.g., sorting arrays, arithmetic loops, and control-flow-based tasks) were compiled using
the RISC-V GNU Toolchain.
• The generated binary or hex files were loaded into simulated instruction memory.
• Simulations confirmed that the processor fetched, decoded, and executed instructions correctly, with
expected register and memory updates.
3. Test Coverage
To ensure completeness:
ADDI x1, x1 ← 0 +
x1 = 10 x1 = 10 Pass
x0, 10 10
Correct
LW x3, Load from x3 = mem
value Pass
0(x1) address 10 [10]
loaded
• $display () and $monitor (): Used to output internal signals like ALU results, register writes, memory
addresses, and control signals.
• Assertions: Applied to check invariants (e.g., x0 == 0, write enable == 0 when not writing).
• Step-by-step simulation: Instructions were executed one at a time with breakpoints to check all
intermediate values.
• Instruction Tracing: Testbenches logged the PC, instruction type, and key operations for analysis.
• Incorrect Control Signals: For some branch instructions, ALUOp and PCSrc were incorrectly
generated. This was fixed by refining opcode decoding logic.
• Write Conflicts in Register File: The register file occasionally allowed simultaneous writes, fixed by
ensuring mutual exclusivity and timing correctness.
• Data Memory Misalignment: Byte addressing and word-alignment mismatches were discovered during
LW/SW. These were resolved by enforcing alignment in memory address calculation.
• Immediate Generator Bugs: Wrong sign extension for negative values in I-type instructions initially
led to incorrect execution.
3. Performance Metrics
• Although the processor is single-cycle, which limits frequency scalability, several performance
factors were evaluated:
• Critical Path: Instruction decode → control unit → ALU input → ALU result → register write.
• Instruction Latency: One clock cycle per instruction.Instruction Throughput: 1 instruction per clock
cycle (ideal for simple designs).
CHAPTER 6
ROM and RAM are responsible for storing programs and data [14]. In order to verify the functions of ROM
and RAM, some test programs need to be used, then compiled into executable .bin files, and finally
converted into inst rom which can be read by the system function readmenh. data file, as shown in Figure
19, it is a partial example of the test program that prints out the words “Hello RISC-V” from the serial port
UART into the instructions in the ins rom.data file. The simulation waveform diagram of ROM and RAM
memory and bus interface is shown in Figure 20. The signal data o is the output of ROM and RAM memory
respectively.The ROM and RAM memories are respectively connected to the slave device interfaces s1 and
s2 of the bus. It can be seen from the simulation waveform that the value of data o is equal to the value of
the corresponding slave device interface, that is, the ROM and RAM memories can work normally
These results show that the design is highly resource-efficient, with plenty of headroom for extensions such
as pipelining or peripheral integration.
The timing analysis indicates that the design meets all setup and hold constraints, ensuring reliable
performance under operational conditions.
Hardware Testing: Simple programs written in RISC-V assembly (e.g., arithmetic operations, memory
manipulation, branching) were compiled using the RISC-V GNU toolchain, converted to binary memory
format, and loaded into the FPGA. The results were verified via GPIO LEDs, UART output, and logic
analysis tools.
Figure 6.2: FPGA Setup with Debug Probes and UART Output
The performance of the implemented processor was compared with similar academic and open-source
RISC-V cores on the same FPGA platform. Key metrics included clock frequency, resource utilization, and
instruction execution time.
While pipelined cores like PicoRV32 offer better performance in terms of throughput, our single-cycle
design provides predictability and simplicity, making it ideal for educational use and custom SoC
integration.
The bar chart in the image compares the performance of several RISC-V and ARM-based processors in
terms of Dhrystone MIPS per MHz (DMIPS/MHz), a common benchmark for evaluating processor
efficiency. Notably, the chart shows that PicoRV32 AXI implementations (including rv32i, mul, and fast
mul variants) exhibit relatively low performance, each achieving below 0.5 DMIPS/MHz. In contrast, the
ARM Cortex-A9 achieves a significantly higher score, around 2.2 DMIPS/MHz, showcasing the superior
performance of a high-end commercial processor.
Most significantly, “Our Processor” demonstrates even better performance than the Cortex-A9, achieving
approximately 2.4 DMIPS/MHz, making it the best-performing core in this comparison.
This result highlights the efficiency and optimization of the custom-designed RISC-V processor, especially
when compared to lightweight or open-source alternatives like PicoRV32. It validates the design choices
Dept. of ECE, SKIT 21 2024-25
Design and Verification of Three-stage Pipeline CPU Based on RISC-V Architecture.
made in the project and confirms that the processor can compete with and even surpass existing architectures
in specific benchmarks.
Challenge: Achieving timing closure on a single-cycle design is challenging due to long critical paths,
especially in ALU and branching logic.
Solution: Optimization techniques such as flattening hierarchy, retiming, and careful placement of logic
blocks in Vivado were employed. The ALU design was reviewed and redundant operations were minimized.
Challenge: Feeding compiled RISC-V programs into the instruction memory of the FPGA posed integration
difficulties.
Solution: A script was developed to convert .hex files into memory initialization format (MIF) compatible
with the Verilog memory module. Automated loading through Vivado’s memory editing tools streamlined
the testing process.
3. Verification Complexity.
Challenge: Manually verifying every instruction path and execution result can be error-prone.
Solution: Comprehensive testbenches were created in ModelSim, and waveform analysis was conducted
using GTKWave. Unit tests for individual modules ensured correctness before system-level integration.
Challenge: Debugging on FPGA is constrained by limited I/O and lack of visibility into internal states.
Solution: UART interfaces were implemented for serial output of register values. Additionally, ILA
(Integrated Logic Analyzer) cores from Xilinx were used for on-chip signal monitoring.
Challenge: Incorrect decoding of some R-type and B-type instructions due to improper control signal
generation.
Solution: Extensive verification of the control unit was carried out using instruction-by-instruction
comparison against the RISC-V specification.
The FPGA implementation validates the functional correctness and hardware efficiency of the designed
RISC-V processor. While not aiming for maximum performance, the project succeeded in demonstrating a
practical and modular approach to custom CPU design, paving the way for future enhancements such as
pipelining, interrupt handling, and peripheral expansion.
The FPGA implementation of the RISC-V processor yielded several significant outcomes. The RV32I-
based processor was successfully synthesized and deployed on a Xilinx FPGA using Vivado, with all core
components—including the ALU, control unit, register file, and memory—functioning correctly.
Simulation in ModelSim confirmed the accurate execution of arithmetic, logic, load/store, and control flow
instructions. The single-cycle architecture allowed each instruction to execute within one clock cycle,
simplifying the control logic and enabling deterministic behavior. Dual-port RAM enabled concurrent
access to instruction and data memory, optimizing memory handling.
The modular design facilitated independent testing and streamlined integration of subsystems. Post-
synthesis performance analysis indicated stable operation at frequencies ranging between 50–100 MHz,
depending on the FPGA board used, with low resource utilization, providing headroom for further
enhancements. Simulation and waveform analysis tools like GTKWave supported thorough debugging,
helping resolve timing issues and refine instruction decoding logic.
The architecture is also scalable, allowing future extensions such as pipelining, peripheral interfacing, or
interrupt handling with minimal structural changes. Overall, the implementation validated the design’s
correctness, efficiency, and extensibility.
CHAPTER 7
APPLICATION, ADVANTAGES AND DISADVANTAGES
7.1 Advantages:
1. Open-Source and ISA Compliance
The project implements the RV32I base instruction set, which is part of the open-source RISC-V standard.
This brings immense flexibility and ensures compatibility with various compilers, development tools, and
educational resources. Unlike proprietary ISAs, RISC-V allows unrestricted customization and distribution,
making it an excellent choice for research and academia.
With just 7.2% of LUTs and 3.8% of flip-flops used on the Artix-7 FPGA, the processor design proves to
be lightweight and highly resource-efficient. This minimal resource footprint leaves ample room for future
expansion, such as adding pipelining stages, memory hierarchies, or integrated peripherals—without
exceeding FPGA capacity.
The single-cycle architecture ensures one instruction is executed per clock cycle, making the design easy
to understand and debug. Furthermore, the design is modular, with separate Verilog files for the ALU,
control unit, register file, and memory interface. This promotes reusability and simplifies unit-level
verification.
The project successfully bridged simulation and real hardware implementation. Programs were compiled
using the RISC-V GNU toolchain and loaded onto the FPGA for execution. Output verification through
UART and GPIOs confirmed hardware accuracy, validating the end-to-end toolchain and design flow.
The design acts as a hands-on learning platform for understanding how a CPU works—from fetch-decode-
execute cycles to ALU design and instruction control logic. It provides engineering students or hobbyists a
tangible and functional model of a CPU that they can interact with, modify, and extend.
7.2 Applications:
1. Academic Teaching Tool
Universities and institutions can adopt this project to teach computer architecture, digital design, and
embedded systems. Since it covers ISA-level instruction decoding, ALU operations, and memory
interfacing, it serves as a practical complement to theoretical coursework.
Given its low resource utilization and predictable timing, this processor can be deployed in embedded
control systems for simple automation tasks, such as home appliance control, basic robotics, or industrial
sensors.
This core can serve as the CPU within a prototype system-on-chip (SoC). Designers can integrate
peripherals like UART, GPIO, SPI, or timers around the processor to form a customized computing system
on FPGA.
In low-to-medium complexity IoT applications where processing requirements are modest, this RISC-V
core can function as a control processor. With future additions like low-power modes or wireless modules,
it could serve well in smart agriculture, home automation, or wearable tech.
This project lays a foundation for exploring advanced CPU features, including pipeline optimization, branch
prediction, cache systems, or out-of-order execution. It is ideal for M.Tech/Ph.D. research into processor
Dept. of ECE, SKIT 25 2024-25
Design and Verification of Three-stage Pipeline CPU Based on RISC-V Architecture.
Although simple and predictable, single-cycle architectures suffer from long critical paths—especially as
instruction complexity increases. The entire instruction must complete in one cycle, leading to lower clock
frequencies compared to pipelined designs.
Currently, the processor only supports the base RV32I ISA. Essential extensions like RV32M (for
multiplication/division), RV32F (floating point), or RV32C (compressed instructions) are absent. This
limits the range of software that can run on the processor.
The processor cannot respond to asynchronous events or handle faults, making it unsuitable for real-time
systems or applications that require system calls, multitasking, or OS-level features.
Without a dedicated debug interface like JTAG or a memory-mapped debug unit, developers must rely on
UART prints or signal probing using ILA. This makes complex debugging time-consuming and hardware-
dependent.
Unlike commercial processors, this design lacks a full-featured SDK or IDE integration. Binary image
generation and memory loading must be done manually or through scripts, which could hinder user adoption
or large-scale testing.
CONCLUSION
This project successfully demonstrates the complete design, implementation, and validation of a custom
single-cycle RISC-V RV32I processor on an FPGA platform. Beginning from ISA-level understanding and
RTL coding in Verilog, to simulation, synthesis, and physical deployment on the Xilinx Artix-7 FPGA,
every stage was carefully executed with a strong emphasis on correctness, modularity, and resource
efficiency.
The processor proved to be functionally accurate and capable of executing real RISC-V programs with
outputs verified via UART and logic analysis tools. Despite the simplicity of its single-cycle architecture, it
maintained high clarity, making it ideal for educational purposes and foundational research. The project
showcased the flexibility and advantages of using the open-source RISC-V ecosystem, supported by the
GNU toolchain and FPGA development tools like Vivado.
While performance trade-offs were acknowledged—particularly in terms of clock speed, lack of pipelining,
and absence of interrupts—the processor design remains an excellent proof-of-concept. Its low resource
utilization and clean modular structure make it well-suited for future enhancements, including pipelining,
peripheral integration, interrupt handling, and ISA extension.
Ultimately, this project serves not only as a technical achievement in processor design but also as a robust
platform for learning, teaching, and extending into more complex embedded systems and custom SoC
architectures.
The implementation of a RISC-V RV32I single-cycle processor on a Xilinx FPGA has proven to be both
functional and resource-efficient. All core components—including the ALU, control logic, register file,
and memory interface—were designed in Verilog and successfully synthesized using Vivado. Functional
correctness was verified using simulation tools like ModelSim, and hardware validation was achieved by
executing RISC-V programs on the FPGA. The project achieved a good balance between simplicity and
performance, with sufficient room for future scalability. Key design goals such as modularity, ISA
compliance, and testability were met effectively.
FUTURE ENHANCEMENTS
Although the single-cycle RISC-V processor successfully met its core objectives—functionality, ISA
compliance, and FPGA implementation—there are numerous opportunities to improve and expand the
design for more advanced applications.
One of the most significant enhancements would be introducing a pipelined architecture. A 5-stage pipeline
(Fetch, Decode, Execute, Memory, Write-back) would improve performance by increasing instruction
throughput and enabling higher clock frequencies. However, this would require hazard detection
mechanisms and more complex control logic.
Expanding the instruction set is another valuable upgrade. Adding support for RV32M (multiplication and
division), RV32C (compressed instructions), and RV32F (floating-point operations) would broaden the
processor’s usability, particularly for computation-heavy or memory-constrained applications.
Real-time capabilities could be added by implementing interrupt and exception handling. This would allow
the processor to respond to asynchronous events and improve its utility in embedded systems and IoT
devices.
Peripheral integration is also essential for practical deployment. Adding UART, SPI, GPIO, and timer
modules would transform the processor into a full-fledged SoC (System-on-Chip), capable of interacting
with external components and sensors.
For better development experience, integrating advanced debugging interfaces such as JTAG and using
internal FPGA tools like Integrated Logic Analyzer (ILA) can significantly ease testing and error tracing.
Automated toolchains, including scripts to convert and load programs, would further streamline
development.
Lastly, power optimization techniques like clock gating and reduced switching activity can make the design
suitable for battery-powered and low-power applications. These enhancements would not only improve
performance but also extend the processor’s range of applications from academic projects to real-world
embedded systems.
REFERENCES
[1] Wang Shaokun based on FPGA five-stage pipeline CPU. Computer System Applications, 2015.
[2] Ni Guangnan.Meet the new trend of open source chips[J].Information Security and Communication
Confidentiality,2019(02):11-13.
[3] Andrew a guide Waterman, to the David open Patterson. source RISC-V instruction
s://university.imgtec.com/resources/books/, November 2018.
[4] Hu Zhenbo. Teach you how to design a CPU-RISC-V processor[M]. Beijing: People’s Posts and
Telecommunications Press, 2018.25-26.
[5] Lei Silei.Summary of RISC-V architecture open source processor and SoC research[J].Single Chip
Microcomputer and Embedded System Applications,2017,17(02):56-60.
[6] Zhang Yonghui, Shen Zhong, Chen Baodan, etc. Principle and Applica tion of ARM Cortex-M3
Microcontroller[M]. Beijing: Publishing House of Electronics Industry, 2013, 114-117.
[7] Translated by Yi Jiangfang, Liu Xianhua, etc. Computer Composition and Design: Hardware/Software
Interface (Fifth Edition of the Original Book RISC-V Edition) [M]. Beijing: Mechanical Industry Press.