0% found this document useful (0 votes)
11 views5 pages

G1 Report

This paper details the design and implementation of a 4-bit Arithmetic Logic Unit (ALU) using FPGA technology, capable of performing various arithmetic and logical operations, including addition, subtraction, multiplication, and division. The ALU was tested through applications such as a convolution layer in deep neural networks and ultrasonic distance measurement, demonstrating its functionality and scalability. The design process involved using Verilog for coding and Vivado for simulation and verification, ensuring accurate performance on the FPGA platform.

Uploaded by

1417302540
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

G1 Report

This paper details the design and implementation of a 4-bit Arithmetic Logic Unit (ALU) using FPGA technology, capable of performing various arithmetic and logical operations, including addition, subtraction, multiplication, and division. The ALU was tested through applications such as a convolution layer in deep neural networks and ultrasonic distance measurement, demonstrating its functionality and scalability. The design process involved using Verilog for coding and Vivado for simulation and verification, ensuring accurate performance on the FPGA platform.

Uploaded by

1417302540
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Arithmetic Logic Unit Design and Applications


based on FPGA
Feiyang Xu, Boyang Li, Haoran Yu, Xiang Fei


the reliability of the ALU is also tested and validated through a
Abstract—This paper presents the design and implementation comprehensive evaluation.
of a 4-bit Arithmetic Logic Unit based on FPGA, which supports
basic and extended functions such as addition, subtraction, II.METHODOLOGY
multiplication, division, AND, OR, XOR, shift, and rotate. Some
simple applications based on this ALU were developed, including This section presents the implementation of arithmetic and
convolution layer hardware design and ultrasonic distance logical operations and modules, including a 4-bit adder,
measurement. The entire design process was carried out using Wallace Tree multiplier, multi-cycle divider using restoring
Verilog hardware description language for modular development, division, logical and shift operations within the ALU, and
with simulation and verification performed using Vivado to
ensure functional accuracy. The ALU was ultimately deployed on output display using 7-segment displays.
an FPGA platform, and real-world tests were conducted on the A. Addition and Subtraction
implemented convolution layer and ultrasonic distance
measurement applications. Results show that the ALU not only In the design of the 4-bit ALU for addition and subtraction,
meets the required functionality but also performs convolution the implementation leverages an adder circuit where the input
operations and distance measurement tasks successfully, operands are processed based on the control signal K. We have
demonstrating its good scalability. However, the data precision of 4 full adders connected in series. Each adder processes a
our applications is limited by the 4-bit data width. single bit from the two 4-bit inputs. For addition (K=0), the
two binary numbers are added directly. The circuit uses XOR
Index Terms—Arithmetic Logic Unit, Convolution Layer
Hardware Design, Field-Programmable Gate Array, Restoring gates to manage the subtraction operation. For subtraction
Division, Ultrasonic Distance Measurement, Wallace Tree (K=1), the two's complement of the subtrahend is computed
Multiplier. by inverting the bits and adding 1, effectively converting the
subtraction into an addition problem[2]. In both cases, only the

I
I. INTRODUCTION lower 4 bits are retained. Since this ALU uses 4-bit inputs and
N computing, the Arithmetic Logic Unit (ALU) is a outputs, the calculation range is limited to 0-15.
fundamental combinational logic circuit designed to B. Multiplication
execute a wide range of arithmetic and logic operations.
In our project, we implement a multiplier based on the
As a core component of the Central Processing Unit (CPU),
Wallace Tree method.
the ALU performs essential operations for data processing,
including addition, subtraction, multiplication, division, and
logic functions such as AND, OR, and XOR.
The ALU plays an important role in the computing
architecture[1]. Even the most basic microprocessors
incorporate an ALU to handle core computational tasks. The
ALU typically interfaces with the processor's control unit, Fig. 1. Wallace Tree Multiplier.[3]
memory, and I/O devices through the bus protocol. With the
advancement of Field-Programmable Gate Array (FPGA) As shown in Fig. 1, the method first multiplies 4-bit
technology, the design of customized ALUs tailored to operands bit by bit to obtain 16 partial products. Then, it uses
specific application requirements has become a practical a half adder (HA) in columns 3 and 4 to compress the partial
solution. product of 4 rows to 3 rows. After that, we can use a HA in
This paper presents the design and implementation of a 4-bit column 2 and the full adders (FA) in columns 3, 4, and 5 to
ALU capable of supporting basic and extended operations. The compress 3 rows into 2 rows. Finally, we use a 6-bit adder to
ALU is applied in some simple cases, including the convolution obtain the result.
layer in deep neural networks (DNNs) and ultrasonic ranging. This method reduces the number of adders in the multiplier,
The design process encompasses coding the ALU in Verilog thereby achieving the goal of reducing hardware costs and
and validating its functionality through simulation using Vivado. improving computation speed. It is suitable for high-speed
In addition to ensuring computational accuracy, we also deploy computing and does not require high accuracy.
and verify the ALU and its applications on FPGA hardware. Wallace tree multipliers have irregular circuit structures,
The design of each functional module is described in detail, and which can lead to clock skew. However, the 4-2 compressor
2

can be used to make the wiring between modules very neat,


and the 4-2 compressor can be implemented with two FAs.
3) Restoring Division
C. Division
In the CALC state, the divider performs the division using
In our project, we implement a multi-cycle divider using the restoring division algorithm.
restoring division. The divider design is based on the finite
state machine (FSM) model, ensuring the stability of the
calculation.
1) Module interface
The interface signals are shown in the following TABLE I.

TABLE I
MULTI-CYCLE DIVIDER INTERFACE SIGNALS
Port signal data width describe
Input clk 1 control synchronously
Input reset 1 reset the divider’s state
Input start 1 start to do the division
Input dividend 4 input dividend
Input divisor 4 input divisor
Output ready 1 indicate completion D. Logical Operations
Output quotient 4 output quotient The designed ALU supports logical operations. In Verilog,
Output remainder 4 output remainder logical operations are used to perform Boolean algebra
operations on variables. These operations are fundamental in
2) Finite State Machine Control designing digital circuits and systems. TABLE II shows the
The core control logic of the divider is implemented using operations that can be calculated through the designed ALU.
an FSM controller. The Fig. 2 defines the operational flow of
the divider. Upon reset, the divider enters the IDLE state. TABLE II
Once the start signal is activated, the FSM transitions to the LOGICAL OPERATIONS SUPPORTED BY THE ALU
INIT state, deactivating the ready signal. If the divisor is non- Function symbol describe
zero, the FSM moves to the CALC state to perform the Bitwise AND & Each bit of the output value is the
restoring division. After the calculation is complete, the FSM result of ANDing corresponding
transitions to the DONE state. Finally, it returns to the IDLE bits of the two input values.
state and activates the ready signal, allowing communication Bitwise OR | Each bit of the output value is the
with other modules. result of ORing corresponding bits
of the two input values.
Bitwise XOR ^ Each bit of the output value is the
result of XORing corresponding
bits of the two input values.

These logical operations are essential in Verilog for


manipulating binary data. They are used extensively in
designing and simulating digital hardware.
E. Shift/Rotate
The designed ALU supports shift and rotate operations.
TABLE III shows the operations that can be used through the
designed ALU.
The rotation for a 4-bit signal A can be realized by using the
Fig. 2. State transition diagram of the multi-cycle divider.
following function.
For ROR: A <= {A[2:0],A[3]}. (1)
In division operations, a zero divisor presents a special case
For ROL: A <= {A[0],A[3:1]}. (2)
that must be handled appropriately. In this design, when the
divisor is zero, the FSM immediately sets the quotient to all 1s F. 7-Segment Display
and sets the remainder to the dividend, transitioning to the In the design, the 7-segment displays are used to show the
DONE state. This ensures the safety and correctness of the ALU result data. As shown in Fig. 3, it consists of seven
operation. individual LED segments arranged in a rectangular form.
3

TABLE III III. ALU APPLICATIONS


SHIFT OPERATIONS SUPPORTED BY THE ALU This section outlines the application of the ALU in two cases:
Function symbol describe the hardware implementation of the convolution layer and an
LSL << LSL also shifts bits to the left by a ultrasonic ranging function.
specified number of positions. Unlike
ASL, it is purely logical and doesn't A. Convolution Layer
consider the sign of the number. In DNNs, the convolution layer is a crucial component for
LSR >> LSR shifts the bits of a number to the feature extraction[5]. As shown in Fig. 4, we design an ALU-
right by a specified number of based multiply-accumulate (MAC) unit and integrate an FSM
positions. Zeros are shifted into the controller into the hardware implementation of the
higher bits, and the lower bits are convolution layer in a convolution neural network (CNN).
discarded.
ASL <<< ASL shifts the bits of a number to the
left by a specified number of positions.
Zeros are shifted into the lower bits,
and the higher bits are discarded.
ASR >>> ASR shifts the bits of a number to the
right by a specified number of
positions. The sign bit (most significant
bit) is replicated to fill the vacated
positions on the left.

Each segment is labeled from "a" to "g," and the


Fig. 4. Convolution Layer Module.
corresponding combination of illuminated segments creates
the desired numeral or letter[4]. This design offers simplicity
1) Convolution Operation
and readability, making it well-suited for applications such as
Convolution involves element-wise multiplication of the
calculators and measurement devices. The FPGA controls the
convolution kernel with a local region of the input feature map.
segments based on TABLE IV.
Mathematically, for a kernel K and input feature map I, the
convolution output at position O(x,y) is:
O x, y = i j K i, j ∙ I x + i, y + j (3)
2) MAC Unit
The design of the MAC unit is simple, consisting of a
multiplier, an adder, and a register. This architecture
effectively reuses the existing resources of the ALU, achieving
tight integration of multiplication and addition operations.
Fig. 3. The structure of a 7-segment display.
3) Finite State Machine Control
Our convolution layer design is based on an FSM model.
If several digits need to be displayed simultaneously, the
The FSM control module governs the operations of the MAC
FPGA should illuminate the segments for each 7-segment
unit by iteratively controlling the computation results based on
display one after another in a short period.
the inputs and current state.
TABLE IV
TRUTH TABLE FOR THE 7-SEGMENT DISPLAY
LED segment pin
digit
a b c d e f g
0 1 1 1 1 1 1 0
1 0 1 1 0 0 0 0
2 1 1 0 1 1 0 1
3 1 1 1 1 0 0 1
4 0 1 1 0 0 1 1
5 1 0 1 1 0 1 1
6 1 0 1 1 1 1 1
7 1 1 1 0 0 0 0 Fig. 5. State transition diagram of the convolution layer.
8 1 1 1 1 1 1 1
9 1 1 1 1 0 1 1 As shown in Fig. 5, the system enters the IDLE state,
waiting for the start signal. When the start signal is valid, the
4

system transitions to the LOAD state, deactivating the ready IV. RESEARCH RESULT
signal and setting the MAC unit's reset signal to 1. In the
LOAD state, data from the convolution kernel and input A. Operations Verification
feature map is loaded into the MAC unit, and the reset signal In this section, the ALU function and application will be
is cleared. The system then moves to the CALC state. In the verified through the FPGA board. The FPGA core is Artix-7
CALC state, the system iteratively performs MAC operations xc7a35tftg256-1 of XILINX. The verifying process is based
until the element-wise multiplication of the convolution kernel on Vivado. The two input values of the ALU are controlled
and the current window is completed. After that, the system by the switches on the board. The buttons control the function
transitions to the STORE state, writing the partial result into of the ALU. The result of the ALU module is shown in the 7-
the output feature map register and clearing the MAC unit. As segment displays and ILA.
the input feature map’s sliding window moves, the system To test the ALU function, one of the input values is set to
returns to the LOAD state, repeating the process until the 1000, and the other is set to 0011 in binary. The results of the
calculation of all sliding windows is completed. Finally, the 4-bit adder and subtractor are shown in Fig. 8.
system returns to the IDLE state, and sets the ready signal to 1,
indicating that the computation is complete.
B. Ultrasonic Ranging Fig. 8. The results of 4-bit adder and subtractor.
Ultrasonic Ranging is a technique used to determine the
distance between an object and a sensor by using ultrasonic The results of the multiplier and divider are shown in Fig. 9.
waves to determine distances. The Ultrasonic ranging module The input values are 0011 and 0010 in binary.
is shown in Fig. 6.

Fig. 9. The results of 4-bit multiplier and divider.

AND, OR, and XOR operations are supported in the ALU.


The input values are 1011 and 0010 in binary. The results of
some logical operations are shown in Fig. 10.

Fig. 6. Ultrasonic Ranging module.

The basic principle involves emitting ultrasonic pulses from


a transmitter and measuring the time it takes for the sound Fig. 10. The results of 4-bit AND and XOR gate.
waves to bounce back from an object and return to the
Fig. 11 shows the results of LSL and LSR. The input value
sensor. The timing diagram is shown in Fig. 7.
is 1011.

Fig. 11. The results of LSL and LSR.

Fig. 12 shows the results of ASL and ASR. The input value
is 1011.

Fig. 7. Timing diagram of the ultrasonic ranging module.

1) Every 100 ms FPGA transmits a trigger pulse to the Trig Fig. 12. The results of ASL and ASR.
Pin. The pulse width is 10 us.
2) When the rising edge capture occurs at the Echo pin, start Fig. 13 shows the result of ROL and ROR. The input values
the Timer and wait for the falling edge on the Echo pin. are 1011.
3) As soon as the falling edge is captured at the Echo pin,
read the count of the Timer. The period of the Timer is 10 us.
The speed of sound waves is 343 m/s. So, the total distance
is calculated through the following function. Fig. 13. The result of ROR and ROL.
343 × Time of High(Echo) Pulse
Total Distance =
2
(4)
5

B. Applications Verification
1) Convolution Layer
In the application of the ALU design, we conducted simulation
tests to verify the functionality of the convolution layer. As
shown in Fig. 14, the convolution process, which employs a
kernel and a sliding window, was calculated step by step.
Initially, a result of 8 was calculated, followed by subsequent
results of 8, 9, and a, in sequence. This stepwise computation
demonstrated the accuracy of the design, with the final output
feature map confirmed to be a988, verifying the correct
operation of the convolution layer hardware implementation.
The design efficiently reuses ALU resources, providing clear Fig. 15. The measuring result of the ultrasonic sensor.
control through an FSM and enabling modular integration for
future upgrades. However, it is limited by the single 4-bit MAC V. CONCLUSION
unit design, which restricts parallelism and data precision.
In this project, we successfully designed and implemented
an ALU on an FPGA, which supports both basic and extended
operations. The ALU was utilized in some simple applications,
including convolution operations and ultrasonic ranging.
Through the deployment of the ALU with an FPGA, we
were able to verify the functionality of our design and
implementation, demonstrating accurate results for operations
such as multiplication using the Wallace tree model, and the
restoring division process. Additionally, the application of the
convolution layer showcased how the ALU could be adapted
for use in neural networks, albeit with limitations due to the 4-
bit data width.
While the project was successful in meeting its objectives,
several challenges were encountered, including overflow
issues in the 4-bit operations and clock skew due to the
irregular circuit structures. Despite these challenges, the
project provided valuable insights into the design of ALU and
the use of FPGA in practical applications. Future
improvements could focus on increasing data precision,
optimizing circuit structures, and addressing signal delay
Fig. 14. Simulation of the convolution layer. issues.
Overall, this project highlights the importance of hardware
2) Ultrasonic Ranging design in digital systems and demonstrates the potential of
In the application of the ALU design, the ultrasonic sensor HC- ALU design in complex computation for various applications.
SR-04 module[6] is used for ultrasonic ranging. The distance
will be shown in four 7-segment displays. Fig. 15 shows that the REFERENCES
distance between the object and the sensor is 211 mm. [1] D. A. Patterson and J. L. Hennessy, "Computer Organization and Design,
The ultrasonic ranging design is straightforward to implement Fifth Edition: The Hardware/Software Interface". San Francisco, CA,
and highly adaptable. Besides, Ultrasonic sensors can be used in USA: Morgan Kaufmann, 2013.
various environments, making them versatile for measuring [2] GeeksforGeeks, "4-bit Binary Adder-Subtractor." [Online]. Available:
https://www.geeksforgeeks.org/4-bit-binary-adder-subtractor/. Accessed:
liquids, solids, or transparent objects. They are relatively Sep. 12, 2024.
accurate in short-range scenarios, especially in the range of 1 to [3] C. S. Wallace, "A Suggestion for a Fast Multiplier," in IEEE Transactions
10 meters. It is difficult to measure small targets because on Electronic Computers, vol. EC-13, no. 1, pp. 14-17, Feb. 1964, doi:
ultrasonic waves have a wide beam and the sensor has poor 10.1109/PGEC.1964.263830.
[4] Electronics Tutorials, "7-Segment Display Tutorial." [Online]. Available:
directionality. The response speed of ultrasonic distance https://www.electronics-tutorials.ws/blog/7-segment-display-tutorial.html.
measurement is usually slower compared to laser measurement, Accessed: Sep. 12, 2024.
making it less suitable for applications requiring high-frequency [5] Y. Chen, Y. Xie, L. Song, et al., "A Survey of Accelerator Architectures
measurements. for Deep Neural Networks," *Engineering*, vol. 6, no. 3, pp. 264-274,
2020.
[6] ElectronicWings, "Ultrasonic Module HC-SR04." [Online]. Available:
https://www.electronicwings.com/sensors-modules/ultrasonic-module-hc-
sr04. Accessed: Sep. 12, 2024.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy