0% found this document useful (0 votes)
9 views5 pages

IJCRT2304397

This paper reviews the design and implementation of an efficient Systolic Array Architecture multiplier aimed at enhancing matrix multiplication speed and reducing power consumption. It discusses the evolution of computing demands and the advantages of parallel computing, particularly through pipelining and the use of Processing Elements (PEs) in a systolic array. The proposed design demonstrates significant improvements in processing efficiency, making it suitable for various high-speed applications in digital signal processing and scientific computing.

Uploaded by

mmsurya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

IJCRT2304397

This paper reviews the design and implementation of an efficient Systolic Array Architecture multiplier aimed at enhancing matrix multiplication speed and reducing power consumption. It discusses the evolution of computing demands and the advantages of parallel computing, particularly through pipelining and the use of Processing Elements (PEs) in a systolic array. The proposed design demonstrates significant improvements in processing efficiency, making it suitable for various high-speed applications in digital signal processing and scientific computing.

Uploaded by

mmsurya2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

www.ijcrt.

org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882

A Review on Design and Implement


EfficientSystolic Array Architecture
Multiplier
1st Aryan Ajit Tiwary 2nd Arya Raghuveer Mane Electronics 3rd Akshay Dharmesh Lende
Electronics and Telecommunication Pune and Telecommunication Pune Institute Electronics and Telecommunication
Institute of Computer TechnologyPune, of Computer TechnologyPune, India Pune Institute of Computer Technology
India Pune, India

4th Mr.N.G Nirmal


Electronics and Telecommunication
Pune Institute of Computer Technology
Pune, India

Abstract—In this paper, the designing of Systolic array archi- sion. Researchers are continuously working on developing
tecture multiplier is provided. The evolution of computers and the multipliers that offer high speed, low power consumption,
Internet has brought demand for powerful and high-speed data regularity of layout, and compact VLSI implementation. These
processing, but in such a complex environment fewer method can
provide perfect solutions. To handle the above addressed issue, improvements make multipliers suitable for a range of high-
parallel computing is proposed as a solution. This paper speed, low-power, and compact applications.
demonstrates an effective design for the Matrix Multiplication In modern technology, the demand for high-speed and powerful
using Systolic Architecture on Reconfigurable Systems (RS) like data processing continues to grow as more data is generated,
Field Programmable Gate Arrays (FPGAs). The relevant topics and the need for processing this data in real-time increases.
and literature regarding the elements in a systolic array
architecture system have been studied and reviewed. A system Parallel computing technology is becoming increas- ingly
with less path delay and more data processing speed, is being popular due to its ability to overcome the complexities of
designed in Xilinx. traditional serial processing. This technology uses pipelining
Index Terms—Reconfigurable Systems, Systolic array archi- concepts to perform tasks concurrently and reduce processing
tecture, Parallel computing, Matrix multiplication, Path delay, time.
Xilinx
Overall, advancements in technology have led to various
techniques that can be utilized to achieve high-speed data pro-
I. INTRODUCTION
cessing. These techniques include pipelining, parallel process-
The demand for high-speed and powerful data processing has ing, and optimized hardware complexity parameters, which
become increasingly important in modern technologies due to have improved the efficiency and speed of data processing.
the growing amounts of data that need to be processed. With ongoing research and development, the future of data
Traditional data processing techniques involved serial process- processing looks promising, and we can expect even more
ing, which was a time-consuming and inefficient method that powerful and efficient techniques to emerge in the coming
introduced various delays in the execution process. years.
To address this issue, new techniques have been developed,
such as pipelining and parallel processing, which have led II. SYSTOLIC ARRAY ARCHITECTURE
to faster execution times. Pipelining involves breaking down The concept of a systolic architecture, which is a pipelined
tasks into smaller components and processing them concur- network arrangement of Processing Elements (PEs) known as
rently to reduce the overall processing time. On the other hand, cells in computer architecture. This architecture is a subset of
parallel processing involves using multiple processors to parallel computing in which cells independently compute and
perform tasks simultaneously, thereby increasing the speed and store the data that is fed to them. The systolic architecture is
efficiency of data processing. a cell array composed of matrix-like rows, with Processing
Researchers have also explored other techniques to achieve Elements analogous to central processing units (CPUs) except
high-speed data processing, such as optimizing hardware com- for the absence of a programme counter, instruction register,
plexity parameters and working on system architectures. By control unit, and so on. The systolic architecture performs
reducing the length of the data path and improving system parallel matrix multiplication using a PMMSA approach, which
architecture, data processing efficiency and speed can be immediately multiplexes a pair of matrix elements. This
improved. approach is characterized by processing data input in a
Multipliers play a crucial role in digital signal processing and pipeline and is comprised of regularly arrayed PE, where
other applications, such as data encryption and compres- neighbor PEs are connected with each other by the shortest

IJCRT2304397 International Journal of Creative Research Thoughts (IJCRT) d331


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
line, and therefore mass data has no need to be stored before proposed Systolic Array architecture multiplier
processing. This minimizes the distance between PEs in an multiplies n x n matrices with a smaller number of clock
array, greatly reduces internal communication delay, and cycles. Systolic arrays typically send and receive
improves processing unit utility. To enhance the speed and numerous streams of data, as well as multiple data
reduce the complexity of the systolic architecture computation counters are needed to produce these data streams; thus,
of matrix multiplication, the PE is replaced with Multiplication data parallelism is supported.
and Accumulation (MAC), which will be implemented using
systolic array. The proposed systolic array architecture multi-
plier multiplies n x n matrices with a smaller number of clock
cycles. Multiple data streams are typically sent and received
by the systolic array, and multiple data counters are required
to generate these data streams, allowing for data parallelism.
A systolic array is a network of processors that compute and
pass data through the system in a rhythmic manner. Every
PE is composed of a full adder with three inputs and one output.
The two inputs are X and Y respectively, and the third is
an input as Carry-IN. The output carry is Carry- OUT, and
the normal output is SUM. The working principle of the PE,
which is to add two binary numbers using a full adder. The full
adder has three inputs, X, Y, and Carry-IN, and it outputs two
values, Carry-OUT and SUM. The sum output is calculated by
taking the exclusive OR of X, Y, and Carry-IN. The Carry-
OUT output is calculated by taking the AND of X and Y, the
AND of X and Carry-IN, and the AND of Y and Carry-IN,
and then taking the OR of these three results. The systolic
architecture can be implemented in two methods: the
conventional method (without pipeline and parallel processing)
and the systolic architecture (with pipeline and parallel
processing). The PE is the most basic element for implementing
the systolic array architecture, and initially, a single PE is
designed that consists of an AND gate and a full adder. For an
8/16-bit system, eight full adders are needed, which will be
Fig. 2. Systolic Array Architecture Multiplier
arranged like the systolic array architecture multiplier given in
the article. In conclusion, the systolic architecture, which is a
pipelined network arrangement of Processing Elements used for
parallel computing. The article describes the concept of parallel Every PE is made up of full adder. the working
matrix multiplication using a PMMSA approach, which is principle of PE is described below Full adder has three
characterized by processing data input in a pipeline and is inputs and gives out one output. The inputs are X and Y ,
comprised of regularly arrayed PE. the third input is Carry-IN. The output carry is Carry-
OUT, and the output is SUM.

Fig. 1. Basic Structure of Processing Element


The article then explains how the PE works and how it can
be used to implement the systolic array architecture. The
Fig. 3. Full Adder

IJCRT2304397 International Journal of Creative Research Thoughts (IJCRT) d332


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882

Overall, our proposed design provides an effective solution


for fast and efficient multiplication in various applications,
including digital signal processing, scientific computing, and
many others. Future research can focus on further optimizing
the design parameters to achieve even higher performance while
reducing power consumption.
Mahendra Vucha [2011] have been studied a detailed review of
the design of a systolic array multiplier that can handle different
input bit sizes while minimizing the delay product is a crucial
task in many real-time applications. In this proposed design,
they focused on utilizing a variable word-size processing
element (PE) array, combined with a pipelined architecture,
to achieve high efficiency and flexibility. By utilizing a
combination of fixed and variable word-size PEs, our design
can handle different input data sizes while minimizing the
Fig. 4. Full Adder overall delay product. The pipelined architecture enables
parallel processing of multiple operands, further reducing the
delay and increasing efficiency. Overall, our proposed design
III. LITERATURE SURVEY
provides an effective solution for fast and efficient
Subathradevi [2018] have been studied a detailed review of multiplication in various applications, including digital signal
design for the systolic array multiplier that minimises the delay processing, scientific computing, and many others. Future
product utilising different input bit sizes. Systolic architecture research can focus on further optimizing the design parameters
converts high-level computation into hardware structures. Sys- to achieve even higher performance while reducing power
tolic systems are simple to implement since these are regular consumption. The article describes several architectures used
and simple to calculate. Systems using this architecture are for block matching algorithms, including AB1, AS2, AB2
more efficient, provides better performance, and also includes and AS1.These architectures use systolic arrays to process
specialised functions that can address a wide range of issues.In input data and compute the motion vector. The article
order to save energy, this paper developed a VLSI design, based proposes a new architecture that uses a two-dimensional
on systolic multipliers that can be used on network links. The systolic array to perform matrix multiplication of order N x
proposed study focuses on the creation of a unique architecture N. The architecture uses Multiplication and Accumulation
with cells used for the decomposition of systolic multiplier. (MAC) to enhance speed and reduce complexity. Each
There are two distinct designs that have been presented. The processing element (PE) of the systolic array computes the
Architecture-I has been built using registers and decomposition multiplication of elements and accumulates the corresponding
cells to minimize latency in the event of a path delay. The element. The architecture implements the algorithm in a
second architecture i.e., Architecture-II consists of the tristate pipeline and parallel processing to improve speed. The
buffers and the full adder-based multiplexer along with the proposed architecture is shown below.
processing elements. Their effects lead to a product with a
lower power delay. The experiment is conducted in a systolic
array binary multiplier with a 1-level pipelining and the PEs
will be organized in a row as well as a column-based structure
symmetry and the results are appropriately listed in
Table[citation]. Based on this observation, it has been deduced
that the architecture provided a reduced delay compared to that
of the architecture[citation] by using the register in the other
feed-forward cut-sets of the middle cut-set. In conclusion,
Subathradevi et.al[2018] has provided us the design of a
systolic array multiplier that can handle different input bit sizes
while minimizing the delay product is a crucial task in many
real-time applications. In this proposed design, we focused on
utilizing a variable word-size processing element (PE) array,
combined with a pipelined architecture, to achieve high effi-
ciency and flexibility. By utilizing a combination of fixed and Fig. 5. PE of Systolic Architecture
variable word-size PEs, our design can handle different input
data sizes while minimizing the overall delay product. The In the conclusion, Vucha et.al [2018] we got Systolic Array
pipelined architecture enables parallel processing of multiple Architecture that is designed for Matrix Multiplication and it
operands, further reducing the delay and increasing efficiency. is targeted to the Field Programmable Gate Array device

IJCRT2304397 International Journal of Creative Research Thoughts (IJCRT) d333


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882

xc3s500e-5-ft256. The parallel processing and pipelining is Mishra et.al[2011] proposes a low power and fast archi- tecture
introduced into the proposed systolic architecture to enhance for Viterbi decoding of convolutional codes, which integrates
the speed and reduce the complexity of the Matrix Multiplier. the benefits of systolic array architecture and register exchange
The proposed design is simulated, synthesized, implemented on strategy. The design and implementation of the systolic array
FPGA device xc3s500e-5-ft256 and it has given the core speed based memoryless Viterbi decoder, as well as a modified
210.2MHz. register exchange decoding scheme, are pre- sented. The
R Hughey[1988] have studied the detailed review of B SYS architecture achieves low power consumption and high
systolic array. They got to know that the B-SYS systolic array throughput and is suitable for FPGA implementation.
uses a linear interconnection scheme and a simple arithmetic Performance analysis is based on maximum frequency of
logic unit to efficiently implement combinatorial algorithms. It operation, FPGA resource utilization, and power consumption.
uses an outside memory for program sequencing, resulting in a The proposed architecture uses linear systolic structures and
simpler architecture with the advantages of instruction arithmetic pipelined processors, and reconfigurable composite
broadcasting within a systolic framework. Pro- gramming branch metrics. In conclusion, Mishra et.al[2011] concluded
involves mapping the data flow of the algorithm onto the that the decoder uses a modified register exchange strategy
systolic array and managing potential issues such as pipeline to achieve a memoryless design. The integration of a pointer-
hazards. A sequence comparison algorithm has been based register exchange method with the systolic architecture
programmed and simulated, but its performance is limited by the results in a faster and higher throughput system with a lower
need for bit-serial communication at chip boundaries. In critical path compared to the trace-back strategy. Design reuse
conclusion, Hughey et.al[1988] the B-SYS architecture and time-multiplexing approaches are used to establish a fully
demonstrates the potential for fully systolic programmable reconfigurable system, contributing to area saving at the cost of
processor arrays that do not require local program memory latency. The architecture is modeled using Verilog and exported
or global instruction broadcasting. The use of the processing to Xilinx Virtex II pro xc2vp307fg676 using system generator
phase concept allows for the avoidance of hazards introduced tool. Results show that the proposed architecture outperforms
by the systolic instruction stream. The basic cell of this its trace-back versions in terms of performance.
architecture is simple and flexible, making it possible to build
highly parallel, programmable systolic arrays. Further research IV. CONCLUSION
on B-SYS will focus on exploring the architecture’s limitations, Systolic arrays are a type of parallel computing architecture that
examining techniques for mapping algorithms onto the array, can be used to perform a wide range of computational tasks,
and implementing a B-SYS prototype in CMOS technology. including digital signal processing and linear algebra operations
Stojanovic et.al[1988] presents an approach for the imple- such as matrix multiplication. The systolic array architecture is
mentation of matrix-vector multiplication on a unidirectional particularly well-suited for these applications because it is
linear systolic array (ULSA) and a bidirectional linear systolic highly parallel and can process large amounts of data
array (BLSA) for efficient processing. The authors propose a simultaneously. The design and implementation of efficient
systematic procedure for designing an optimal ULSA with the systolic array architecture for multiplier operations requires
minimal execution time for a given problem size. They also careful consideration of several design parameters. One of the
propose a transformation method to partition a dense matrix most important design considerations is the number of
into band matrices for adequate matching to the size of processing elements in the systolic array. Increasing the number
BLSA, with a new index transformation to avoid the insertion of processing elements can improve the throughput of the
of zero elements between successive iterations. The obtained system but can also increase the power consumption and
processing time is approximately two times shorter than complexity of the design. Another important consideration is
previous methods, and the transformation is simpler. The paper the data flow within the systolic array. The data flow deter-
provides a model for matrix-vector multiplication on a fixed- mines how data is propagated through the array and can greatly
size ULSA and discusses the application of the proposed impact the performance of the system. For example, a diagonal
methods to improve processing efficiency. In the conclusion data flow can reduce the number of data transfers and increase
Stojanovic et.al[2007] accommodate the matrix size with the the system efficiency. Interconnect topology is also an impor-
ULSA, the matrix A is partitioned into quasidiagonal blocks. tant design parameter to consider. The topology determines
During computation of the resulting vector cr, the elements how processing elements are connected to each other and can
of block matrices and vector cr are reordered to decrease impact the system’s scalability and flexibility. To further
computation time. The resulting processing time is about two optimize systolic array architectures for multiplier operations,
times shorter than a previous method and equivalent to another techniques such as pipelining, loop unrolling, and parallel
method. The article proposes a global structure for the memory processing can be used. Pipelining can reduce the latency of the
interface subsystem that facilitates data transfer to and from the system by dividing the computation into stages. Parallel
ULSA. The article also estimates the performance of the fixed processing can also be used to increase the throughput of the
size ULSA and compares it to a bidirectional systolic array system by processing multiple data streams simultaneously. In
(BLSA). conclusion, the design and implementation of efficient systolic
array architectures for multiplier operations require a thor-

IJCRT2304397 International Journal of Creative Research Thoughts (IJCRT) d334


www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
ough understanding of the application requirements,
system constraints, and design trade-offs. By carefully
considering these factors, designers can create systolic
array architectures that provide high throughput, low
power consumption, and scalability for a wide range of
applications.
REFERENCES
[1] S. Subathradevi, C. Vennila “Systolic array multiplier for
augmenting data center networks communication link”
[2] Mahendra Vucha, Arvind Rajawat “Design and FPGA
Implementation of Systolic Array Architecture for Matrix
Multiplication” International Journel of Computer Applications (0975
– 8887)
[3] R. Hughey, Daniel P. Lopresti “Architecture of a
programmable systolicarray”.
[4] A. K. Mishra, P.P. Jiju “Low power, dynamically
reconfigurable mem- oryless systolic array based architecture for
Viterbi decoder”.
[5] N. M. Stojanovic, I. Z. Milovanovic, M. K. Stojcev, E. I.
Milovanovic “Matrix-vector multiplication on a fixed size
unidirectional SystolicArray”

IJCRT2304397 International Journal of Creative Research Thoughts (IJCRT) d335

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy