G1 Report
G1 Report
the reliability of the ALU is also tested and validated through a
Abstract—This paper presents the design and implementation comprehensive evaluation.
of a 4-bit Arithmetic Logic Unit based on FPGA, which supports
basic and extended functions such as addition, subtraction, II.METHODOLOGY
multiplication, division, AND, OR, XOR, shift, and rotate. Some
simple applications based on this ALU were developed, including This section presents the implementation of arithmetic and
convolution layer hardware design and ultrasonic distance logical operations and modules, including a 4-bit adder,
measurement. The entire design process was carried out using Wallace Tree multiplier, multi-cycle divider using restoring
Verilog hardware description language for modular development, division, logical and shift operations within the ALU, and
with simulation and verification performed using Vivado to
ensure functional accuracy. The ALU was ultimately deployed on output display using 7-segment displays.
an FPGA platform, and real-world tests were conducted on the A. Addition and Subtraction
implemented convolution layer and ultrasonic distance
measurement applications. Results show that the ALU not only In the design of the 4-bit ALU for addition and subtraction,
meets the required functionality but also performs convolution the implementation leverages an adder circuit where the input
operations and distance measurement tasks successfully, operands are processed based on the control signal K. We have
demonstrating its good scalability. However, the data precision of 4 full adders connected in series. Each adder processes a
our applications is limited by the 4-bit data width. single bit from the two 4-bit inputs. For addition (K=0), the
two binary numbers are added directly. The circuit uses XOR
Index Terms—Arithmetic Logic Unit, Convolution Layer
Hardware Design, Field-Programmable Gate Array, Restoring gates to manage the subtraction operation. For subtraction
Division, Ultrasonic Distance Measurement, Wallace Tree (K=1), the two's complement of the subtrahend is computed
Multiplier. by inverting the bits and adding 1, effectively converting the
subtraction into an addition problem[2]. In both cases, only the
I
I. INTRODUCTION lower 4 bits are retained. Since this ALU uses 4-bit inputs and
N computing, the Arithmetic Logic Unit (ALU) is a outputs, the calculation range is limited to 0-15.
fundamental combinational logic circuit designed to B. Multiplication
execute a wide range of arithmetic and logic operations.
In our project, we implement a multiplier based on the
As a core component of the Central Processing Unit (CPU),
Wallace Tree method.
the ALU performs essential operations for data processing,
including addition, subtraction, multiplication, division, and
logic functions such as AND, OR, and XOR.
The ALU plays an important role in the computing
architecture[1]. Even the most basic microprocessors
incorporate an ALU to handle core computational tasks. The
ALU typically interfaces with the processor's control unit, Fig. 1. Wallace Tree Multiplier.[3]
memory, and I/O devices through the bus protocol. With the
advancement of Field-Programmable Gate Array (FPGA) As shown in Fig. 1, the method first multiplies 4-bit
technology, the design of customized ALUs tailored to operands bit by bit to obtain 16 partial products. Then, it uses
specific application requirements has become a practical a half adder (HA) in columns 3 and 4 to compress the partial
solution. product of 4 rows to 3 rows. After that, we can use a HA in
This paper presents the design and implementation of a 4-bit column 2 and the full adders (FA) in columns 3, 4, and 5 to
ALU capable of supporting basic and extended operations. The compress 3 rows into 2 rows. Finally, we use a 6-bit adder to
ALU is applied in some simple cases, including the convolution obtain the result.
layer in deep neural networks (DNNs) and ultrasonic ranging. This method reduces the number of adders in the multiplier,
The design process encompasses coding the ALU in Verilog thereby achieving the goal of reducing hardware costs and
and validating its functionality through simulation using Vivado. improving computation speed. It is suitable for high-speed
In addition to ensuring computational accuracy, we also deploy computing and does not require high accuracy.
and verify the ALU and its applications on FPGA hardware. Wallace tree multipliers have irregular circuit structures,
The design of each functional module is described in detail, and which can lead to clock skew. However, the 4-2 compressor
2
TABLE I
MULTI-CYCLE DIVIDER INTERFACE SIGNALS
Port signal data width describe
Input clk 1 control synchronously
Input reset 1 reset the divider’s state
Input start 1 start to do the division
Input dividend 4 input dividend
Input divisor 4 input divisor
Output ready 1 indicate completion D. Logical Operations
Output quotient 4 output quotient The designed ALU supports logical operations. In Verilog,
Output remainder 4 output remainder logical operations are used to perform Boolean algebra
operations on variables. These operations are fundamental in
2) Finite State Machine Control designing digital circuits and systems. TABLE II shows the
The core control logic of the divider is implemented using operations that can be calculated through the designed ALU.
an FSM controller. The Fig. 2 defines the operational flow of
the divider. Upon reset, the divider enters the IDLE state. TABLE II
Once the start signal is activated, the FSM transitions to the LOGICAL OPERATIONS SUPPORTED BY THE ALU
INIT state, deactivating the ready signal. If the divisor is non- Function symbol describe
zero, the FSM moves to the CALC state to perform the Bitwise AND & Each bit of the output value is the
restoring division. After the calculation is complete, the FSM result of ANDing corresponding
transitions to the DONE state. Finally, it returns to the IDLE bits of the two input values.
state and activates the ready signal, allowing communication Bitwise OR | Each bit of the output value is the
with other modules. result of ORing corresponding bits
of the two input values.
Bitwise XOR ^ Each bit of the output value is the
result of XORing corresponding
bits of the two input values.
system transitions to the LOAD state, deactivating the ready IV. RESEARCH RESULT
signal and setting the MAC unit's reset signal to 1. In the
LOAD state, data from the convolution kernel and input A. Operations Verification
feature map is loaded into the MAC unit, and the reset signal In this section, the ALU function and application will be
is cleared. The system then moves to the CALC state. In the verified through the FPGA board. The FPGA core is Artix-7
CALC state, the system iteratively performs MAC operations xc7a35tftg256-1 of XILINX. The verifying process is based
until the element-wise multiplication of the convolution kernel on Vivado. The two input values of the ALU are controlled
and the current window is completed. After that, the system by the switches on the board. The buttons control the function
transitions to the STORE state, writing the partial result into of the ALU. The result of the ALU module is shown in the 7-
the output feature map register and clearing the MAC unit. As segment displays and ILA.
the input feature map’s sliding window moves, the system To test the ALU function, one of the input values is set to
returns to the LOAD state, repeating the process until the 1000, and the other is set to 0011 in binary. The results of the
calculation of all sliding windows is completed. Finally, the 4-bit adder and subtractor are shown in Fig. 8.
system returns to the IDLE state, and sets the ready signal to 1,
indicating that the computation is complete.
B. Ultrasonic Ranging Fig. 8. The results of 4-bit adder and subtractor.
Ultrasonic Ranging is a technique used to determine the
distance between an object and a sensor by using ultrasonic The results of the multiplier and divider are shown in Fig. 9.
waves to determine distances. The Ultrasonic ranging module The input values are 0011 and 0010 in binary.
is shown in Fig. 6.
Fig. 12 shows the results of ASL and ASR. The input value
is 1011.
1) Every 100 ms FPGA transmits a trigger pulse to the Trig Fig. 12. The results of ASL and ASR.
Pin. The pulse width is 10 us.
2) When the rising edge capture occurs at the Echo pin, start Fig. 13 shows the result of ROL and ROR. The input values
the Timer and wait for the falling edge on the Echo pin. are 1011.
3) As soon as the falling edge is captured at the Echo pin,
read the count of the Timer. The period of the Timer is 10 us.
The speed of sound waves is 343 m/s. So, the total distance
is calculated through the following function. Fig. 13. The result of ROR and ROL.
343 × Time of High(Echo) Pulse
Total Distance =
2
(4)
5
B. Applications Verification
1) Convolution Layer
In the application of the ALU design, we conducted simulation
tests to verify the functionality of the convolution layer. As
shown in Fig. 14, the convolution process, which employs a
kernel and a sliding window, was calculated step by step.
Initially, a result of 8 was calculated, followed by subsequent
results of 8, 9, and a, in sequence. This stepwise computation
demonstrated the accuracy of the design, with the final output
feature map confirmed to be a988, verifying the correct
operation of the convolution layer hardware implementation.
The design efficiently reuses ALU resources, providing clear Fig. 15. The measuring result of the ultrasonic sensor.
control through an FSM and enabling modular integration for
future upgrades. However, it is limited by the single 4-bit MAC V. CONCLUSION
unit design, which restricts parallelism and data precision.
In this project, we successfully designed and implemented
an ALU on an FPGA, which supports both basic and extended
operations. The ALU was utilized in some simple applications,
including convolution operations and ultrasonic ranging.
Through the deployment of the ALU with an FPGA, we
were able to verify the functionality of our design and
implementation, demonstrating accurate results for operations
such as multiplication using the Wallace tree model, and the
restoring division process. Additionally, the application of the
convolution layer showcased how the ALU could be adapted
for use in neural networks, albeit with limitations due to the 4-
bit data width.
While the project was successful in meeting its objectives,
several challenges were encountered, including overflow
issues in the 4-bit operations and clock skew due to the
irregular circuit structures. Despite these challenges, the
project provided valuable insights into the design of ALU and
the use of FPGA in practical applications. Future
improvements could focus on increasing data precision,
optimizing circuit structures, and addressing signal delay
Fig. 14. Simulation of the convolution layer. issues.
Overall, this project highlights the importance of hardware
2) Ultrasonic Ranging design in digital systems and demonstrates the potential of
In the application of the ALU design, the ultrasonic sensor HC- ALU design in complex computation for various applications.
SR-04 module[6] is used for ultrasonic ranging. The distance
will be shown in four 7-segment displays. Fig. 15 shows that the REFERENCES
distance between the object and the sensor is 211 mm. [1] D. A. Patterson and J. L. Hennessy, "Computer Organization and Design,
The ultrasonic ranging design is straightforward to implement Fifth Edition: The Hardware/Software Interface". San Francisco, CA,
and highly adaptable. Besides, Ultrasonic sensors can be used in USA: Morgan Kaufmann, 2013.
various environments, making them versatile for measuring [2] GeeksforGeeks, "4-bit Binary Adder-Subtractor." [Online]. Available:
https://www.geeksforgeeks.org/4-bit-binary-adder-subtractor/. Accessed:
liquids, solids, or transparent objects. They are relatively Sep. 12, 2024.
accurate in short-range scenarios, especially in the range of 1 to [3] C. S. Wallace, "A Suggestion for a Fast Multiplier," in IEEE Transactions
10 meters. It is difficult to measure small targets because on Electronic Computers, vol. EC-13, no. 1, pp. 14-17, Feb. 1964, doi:
ultrasonic waves have a wide beam and the sensor has poor 10.1109/PGEC.1964.263830.
[4] Electronics Tutorials, "7-Segment Display Tutorial." [Online]. Available:
directionality. The response speed of ultrasonic distance https://www.electronics-tutorials.ws/blog/7-segment-display-tutorial.html.
measurement is usually slower compared to laser measurement, Accessed: Sep. 12, 2024.
making it less suitable for applications requiring high-frequency [5] Y. Chen, Y. Xie, L. Song, et al., "A Survey of Accelerator Architectures
measurements. for Deep Neural Networks," *Engineering*, vol. 6, no. 3, pp. 264-274,
2020.
[6] ElectronicWings, "Ultrasonic Module HC-SR04." [Online]. Available:
https://www.electronicwings.com/sensors-modules/ultrasonic-module-hc-
sr04. Accessed: Sep. 12, 2024.