Fpga Arm Processor Based Supercomputiing
Fpga Arm Processor Based Supercomputiing
Abstract—The low-cost and low-power heterogeneous ARM based-Server is favorable for applications that need
architecture platform such as Xilinx Zynq SoC provides an high throughput instead of computing power.
extensive combination of ARM multi-core processor with FPGA
accelerator for acceleration of high performance computing Zynq SoC device [5] offers a heterogeneous computing
applications. In this paper, we proposed an FPGA and ARM platforms built by Xilinx. It combines a multi-core ARM
processor based supercomputer system composed of five Zynq CPU with FPGA accelerator. The purpose of FPGA
SoCs compute-nodes. The design system uses message passing accelerator integration into SoC for low power acceleration.
interface libraries for communication between compute-nodes
The ARM CPU and FPGA accelerator are connected together
while AXI4-stream interfaces between ARM processor and
FPGA inside a compute-node. An FIR filter application is used by using a high performance and high bandwidth AXI/ACP
to test the performance of the system with and without FPGA set of interfaces, that’s allow interfacing with main memory
accelerators. The results show that the performance of ARM [6] [7].
based supercomputer with FPGA accelerators is 8.56 times
higher than similar system without FPGA accelerators. In this proposed work, we designed an FPGA and ARM
processor based supercomputer system. The system
Keywords—Hetrogeneous, Zynq Soc, Supercomputer composed of low-cost and low power Zybo boards [8] having
Zynq SoCs. In the design system, the FPGA handle the
I. INTRODUCTION compute-intensive portion of a high performance application
With the improvement in multi-core processor and increased the computation power of ARM CPU. The
technology, the demand for application performance also Finite Impulse Response (FIR) filter is used as a test
increased. Delivering high performance to an application application on the system to evaluate the computational
requires more processing speed from multicore-processors capability of ARM processor with and without FPGA
which increases the power consumption. As the power has accelerator.
become the main metric for modern high performance
computing, the researcher, and system architect are proposing This paper is organized as follows: Section II gives a detail
heterogeneous multi-core processing system that combines a description of the related works in the field of heterogeneous
multi-core processor with hardware accelerators or co- architecture computing of deploying FPGA as a hardware
processor. These accelerators improve the performance of a accelerator with the conventional multi-core processor and
compute-intensive application by executing a certain task. with the embedded processor. Section III discusses the
Over the past few decades, FPGA-based accelerators give the system architecture and design includes both hardware and
considerable improvement in performance and power software. In Section IV the results and discussion of research
efficiency make them attractive to high performance works are presented. The conclusion and future works are
computing world. The flexibility of achieving higher presented in Section V and VI followed by references.
performance per watt prove that FPGA is capable to compete
for both superscalar and GPU accelerators, especially for
II. RELATED WORK
high performance computing applications [1] [2] .
The heterogeneous architecture platforms provide a
The embedded processor's ARM based-servers [3] has foundation to FPGA to integrate with other conventional
gained popularity in academia and industry due to low-cost processors for the acceleration of high performance
and low power consumption compared to conventional applications [9] [10]. The following research work promises
processors. The computational capability of ARM embedded an opportunity for FPGA-based accelerators with others
processors is not like other x86 architectures processors in computing units. Cray XD1 supercomputer [11], The
server environments but according to a recent research [4], Berkeley Emulation Engine 2 BEE2 [12] and Maxwell
their market share will be expected to rise 25% in 2020. project [13] used FPGA as the only computing elements in
their supercomputing cluster for application acceleration. In
ACKNOWLEDGMENT
Clock Cycles
Speedup The research leading to these results has received funding
Data Set With FPGA
ARM ARM + FPGA from the Unal Color of Education Research and Development
(UCERD) Private Limited Islamabad.
1GB 461,334,321 61,223,331 7.55x
REFERENCES
[1] S. Amin, T. Hussain, and U. Zabit, “FPGA Based Processing of [13] R. Baxter et al., “Maxwell - A 64 FPGA supercomputer,” Proc. -
Speckle Affected Self-Mixing Interferometric Signals,” 2016 Int. 2007 NASA/ESA Conf. Adapt. Hardw. Syst. AHS-2007, no. August,
Conf. Front. Inf. Technol., pp. 292–296, 2016. pp. 287–294, 2007.
[2] T. Hussain, M. Pericas, N. Navarro, and E. Ayguade, [14] B. Klauer, “The Convey Hybrid-Core Architecture,” in High-
“Implementation of a reverse time migration kernel using the HCE Performance Computing Using FPGAs, vol. 375, 2010, pp. 431–
high level synthesis tool,” 2011 Int. Conf. Field-Programmable 451.
Technol. FPT 2011, pp. 2–9, 2011. [15] Kuen Hung Tsoi and Wayne Luk, “Axel: A Heterogeneous Cluster
[3] “MACOM Announces Sampling of X-Gene® 3 Server-on-a- with FPGAs and GPUs,” 18th Annu. ACM/SIGDA Int. Symp. F.
Chip® Solution | AppliedMicro.” [Online]. Available: Program. Gate Arrays, pp. 115–124, 2010.
https://www.apm.com/news/macom-announces-sampling-of-x- [16] A. D. George and G. Stitt, “Novo-G : A View at the HPC
gene-3-server-on-a-chip-solution/. [Accessed: 27-Nov-2017]. Crossroads for Scientific Computing .,” no. January, 2010.
[4] “Worldwide x86 and ARM Server-Class Microprocessor Forecast, [17] Z. Lin and P. Chow, “ZCluster: A Zynq-based Hadoop cluster,”
2016–2020.” FPT 2013 - Proc. 2013 Int. Conf. F. Program. Technol., pp. 450–
[5] “Zynq-7000 All Programmable SoC.” [Online]. Available: 453, 2013.
https://www.xilinx.com/products/silicon-devices/soc/zynq- [18] P. Moorthy and N. Kapre, “Zedwulf: Power-performance tradeoffs
7000.html. [Accessed: 21-Nov-2017]. of a 32-node Zynq SoC cluster,” Proc. - 2015 IEEE 23rd Annu.
[6] T. Hussain, M. Shafiq, M. Pericàs, N. Navarro, and E. Ayguadé, Int. Symp. Field-Programmable Cust. Comput. Mach. FCCM
“PPMC: A programmable pattern based memory controller,” Lect. 2015, no. 3, pp. 68–75, 2015.
Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. [19] X. Bai, L. Jiang, Q. Dai, J. Yang, and J. Tan, “Acceleration of RSA
Lect. Notes Bioinformatics), vol. 7199 LNCS, pp. 89–101, 2012. Processes based on Hybrid ARM-FPGA Cluster,” 2017.
[7] T. Hussain, O. Palomar, O. Unsal, A. Cristal, E. Ayguade, and M. [20] “Xillinux: A Linux distribution for Zedboard, ZyBo, MicroZed and
Valero, “Advanced Pattern based Memory Controller for FPGA SocKit | xillybus.com.” [Online]. Available:
based HPC applications,” Proc. 2014 Int. Conf. High Perform. http://xillybus.com/xillinux. [Accessed: 21-Nov-2017].
Comput. Simulation, HPCS 2014, pp. 287–294, 2014. [21] “SSH Server | SSH.COM.” [Online]. Available:
[8] “Zybo Zynq-7000 ARM/FPGA SoC Trainer Board (LIMITED https://www.ssh.com/ssh/server. [Accessed: 21-Nov-2017].
TIME)>> see Zybo Z7-10 for replacement - Digilent.” [22] “Open MPI: Open Source High Performance Computing.”
[Online]. Available: http://store.digilentinc.com/zybo-zynq-7000- [Online]. Available: https://www.open-mpi.org/. [Accessed: 21-
arm-fpga-soc-trainer-board/. [Accessed: 23-Nov-2017]. Nov-2017].
[9] T. Hussain, “Memory resources aware run-time automated [23] “MPICH | High-Performance Portable MPI.” [Online]. Available:
scheduling policy for multi-core systems,” Microprocess. https://www.mpich.org/. [Accessed: 21-Nov-2017].
Microsyst., vol. 57, pp. 1–24, 2018. [24] “OpenCL Overview - The Khronos Group Inc.” [Online].
[10] T. Hussain, “A novel hardware support for heterogeneous multi- Available: https://www.khronos.org/opencl/. [Accessed: 23-Nov-
core memory system,” J. Parallel Distrib. Comput., vol. 106, pp. 2017].
31–49, 2017. [25] “FIR Filter Design, Software and Examples.” [Online]. Available:
[11] C. Xd, “Cray XD1 Supercomputer.” http://www.iowahills.com/5FIRFiltersPage.html. [Accessed: 27-
[12] C. Chang, J. Wawrzynek, and R. W. Brodersen, “BEE2: A high- Nov-2017].
end reconfigurable computing system,” IEEE Des. Test Comput.,
vol. 22, no. 2, pp. 114–125, 2005.