IJCRT2304397
IJCRT2304397
org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
Abstract—In this paper, the designing of Systolic array archi- sion. Researchers are continuously working on developing
tecture multiplier is provided. The evolution of computers and the multipliers that offer high speed, low power consumption,
Internet has brought demand for powerful and high-speed data regularity of layout, and compact VLSI implementation. These
processing, but in such a complex environment fewer method can
provide perfect solutions. To handle the above addressed issue, improvements make multipliers suitable for a range of high-
parallel computing is proposed as a solution. This paper speed, low-power, and compact applications.
demonstrates an effective design for the Matrix Multiplication In modern technology, the demand for high-speed and powerful
using Systolic Architecture on Reconfigurable Systems (RS) like data processing continues to grow as more data is generated,
Field Programmable Gate Arrays (FPGAs). The relevant topics and the need for processing this data in real-time increases.
and literature regarding the elements in a systolic array
architecture system have been studied and reviewed. A system Parallel computing technology is becoming increas- ingly
with less path delay and more data processing speed, is being popular due to its ability to overcome the complexities of
designed in Xilinx. traditional serial processing. This technology uses pipelining
Index Terms—Reconfigurable Systems, Systolic array archi- concepts to perform tasks concurrently and reduce processing
tecture, Parallel computing, Matrix multiplication, Path delay, time.
Xilinx
Overall, advancements in technology have led to various
techniques that can be utilized to achieve high-speed data pro-
I. INTRODUCTION
cessing. These techniques include pipelining, parallel process-
The demand for high-speed and powerful data processing has ing, and optimized hardware complexity parameters, which
become increasingly important in modern technologies due to have improved the efficiency and speed of data processing.
the growing amounts of data that need to be processed. With ongoing research and development, the future of data
Traditional data processing techniques involved serial process- processing looks promising, and we can expect even more
ing, which was a time-consuming and inefficient method that powerful and efficient techniques to emerge in the coming
introduced various delays in the execution process. years.
To address this issue, new techniques have been developed,
such as pipelining and parallel processing, which have led II. SYSTOLIC ARRAY ARCHITECTURE
to faster execution times. Pipelining involves breaking down The concept of a systolic architecture, which is a pipelined
tasks into smaller components and processing them concur- network arrangement of Processing Elements (PEs) known as
rently to reduce the overall processing time. On the other hand, cells in computer architecture. This architecture is a subset of
parallel processing involves using multiple processors to parallel computing in which cells independently compute and
perform tasks simultaneously, thereby increasing the speed and store the data that is fed to them. The systolic architecture is
efficiency of data processing. a cell array composed of matrix-like rows, with Processing
Researchers have also explored other techniques to achieve Elements analogous to central processing units (CPUs) except
high-speed data processing, such as optimizing hardware com- for the absence of a programme counter, instruction register,
plexity parameters and working on system architectures. By control unit, and so on. The systolic architecture performs
reducing the length of the data path and improving system parallel matrix multiplication using a PMMSA approach, which
architecture, data processing efficiency and speed can be immediately multiplexes a pair of matrix elements. This
improved. approach is characterized by processing data input in a
Multipliers play a crucial role in digital signal processing and pipeline and is comprised of regularly arrayed PE, where
other applications, such as data encryption and compres- neighbor PEs are connected with each other by the shortest
xc3s500e-5-ft256. The parallel processing and pipelining is Mishra et.al[2011] proposes a low power and fast archi- tecture
introduced into the proposed systolic architecture to enhance for Viterbi decoding of convolutional codes, which integrates
the speed and reduce the complexity of the Matrix Multiplier. the benefits of systolic array architecture and register exchange
The proposed design is simulated, synthesized, implemented on strategy. The design and implementation of the systolic array
FPGA device xc3s500e-5-ft256 and it has given the core speed based memoryless Viterbi decoder, as well as a modified
210.2MHz. register exchange decoding scheme, are pre- sented. The
R Hughey[1988] have studied the detailed review of B SYS architecture achieves low power consumption and high
systolic array. They got to know that the B-SYS systolic array throughput and is suitable for FPGA implementation.
uses a linear interconnection scheme and a simple arithmetic Performance analysis is based on maximum frequency of
logic unit to efficiently implement combinatorial algorithms. It operation, FPGA resource utilization, and power consumption.
uses an outside memory for program sequencing, resulting in a The proposed architecture uses linear systolic structures and
simpler architecture with the advantages of instruction arithmetic pipelined processors, and reconfigurable composite
broadcasting within a systolic framework. Pro- gramming branch metrics. In conclusion, Mishra et.al[2011] concluded
involves mapping the data flow of the algorithm onto the that the decoder uses a modified register exchange strategy
systolic array and managing potential issues such as pipeline to achieve a memoryless design. The integration of a pointer-
hazards. A sequence comparison algorithm has been based register exchange method with the systolic architecture
programmed and simulated, but its performance is limited by the results in a faster and higher throughput system with a lower
need for bit-serial communication at chip boundaries. In critical path compared to the trace-back strategy. Design reuse
conclusion, Hughey et.al[1988] the B-SYS architecture and time-multiplexing approaches are used to establish a fully
demonstrates the potential for fully systolic programmable reconfigurable system, contributing to area saving at the cost of
processor arrays that do not require local program memory latency. The architecture is modeled using Verilog and exported
or global instruction broadcasting. The use of the processing to Xilinx Virtex II pro xc2vp307fg676 using system generator
phase concept allows for the avoidance of hazards introduced tool. Results show that the proposed architecture outperforms
by the systolic instruction stream. The basic cell of this its trace-back versions in terms of performance.
architecture is simple and flexible, making it possible to build
highly parallel, programmable systolic arrays. Further research IV. CONCLUSION
on B-SYS will focus on exploring the architecture’s limitations, Systolic arrays are a type of parallel computing architecture that
examining techniques for mapping algorithms onto the array, can be used to perform a wide range of computational tasks,
and implementing a B-SYS prototype in CMOS technology. including digital signal processing and linear algebra operations
Stojanovic et.al[1988] presents an approach for the imple- such as matrix multiplication. The systolic array architecture is
mentation of matrix-vector multiplication on a unidirectional particularly well-suited for these applications because it is
linear systolic array (ULSA) and a bidirectional linear systolic highly parallel and can process large amounts of data
array (BLSA) for efficient processing. The authors propose a simultaneously. The design and implementation of efficient
systematic procedure for designing an optimal ULSA with the systolic array architecture for multiplier operations requires
minimal execution time for a given problem size. They also careful consideration of several design parameters. One of the
propose a transformation method to partition a dense matrix most important design considerations is the number of
into band matrices for adequate matching to the size of processing elements in the systolic array. Increasing the number
BLSA, with a new index transformation to avoid the insertion of processing elements can improve the throughput of the
of zero elements between successive iterations. The obtained system but can also increase the power consumption and
processing time is approximately two times shorter than complexity of the design. Another important consideration is
previous methods, and the transformation is simpler. The paper the data flow within the systolic array. The data flow deter-
provides a model for matrix-vector multiplication on a fixed- mines how data is propagated through the array and can greatly
size ULSA and discusses the application of the proposed impact the performance of the system. For example, a diagonal
methods to improve processing efficiency. In the conclusion data flow can reduce the number of data transfers and increase
Stojanovic et.al[2007] accommodate the matrix size with the the system efficiency. Interconnect topology is also an impor-
ULSA, the matrix A is partitioned into quasidiagonal blocks. tant design parameter to consider. The topology determines
During computation of the resulting vector cr, the elements how processing elements are connected to each other and can
of block matrices and vector cr are reordered to decrease impact the system’s scalability and flexibility. To further
computation time. The resulting processing time is about two optimize systolic array architectures for multiplier operations,
times shorter than a previous method and equivalent to another techniques such as pipelining, loop unrolling, and parallel
method. The article proposes a global structure for the memory processing can be used. Pipelining can reduce the latency of the
interface subsystem that facilitates data transfer to and from the system by dividing the computation into stages. Parallel
ULSA. The article also estimates the performance of the fixed processing can also be used to increase the throughput of the
size ULSA and compares it to a bidirectional systolic array system by processing multiple data streams simultaneously. In
(BLSA). conclusion, the design and implementation of efficient systolic
array architectures for multiplier operations require a thor-