Hardware Implementation of Neural Networks
Hardware Implementation of Neural Networks
net/publication/385988727
CITATIONS READS
0 38
2 authors:
All content following this page was uploaded by Danil Skrebenkov on 29 January 2025.
Abstract—Nowadays neural networks become very useful in “standard” FFNN. CNN is used for image recognition,
different fields of research such as image recognition, computer vision and classification tasks.
optimization, data analysis, classification and prediction tasks.
Though, increasing complexity leads to longer calculation time C. Chaotic Oscillatory Neural Network
and higher power consumption, what constrain using of neural A Chaotic Oscillatory Neural Network (CONN) is a more
networks in different types of applications. This makes a reason specific neural network than FFNN. Instead of dealing with
to develop specific hardware implemented circuits for neural input and weight coefficients in a CONN weight coefficients
networks. This paper briefly reviews present possibilities and are calculated based on input [4].
approaches of hardware implementation for some types of
neural networks.
I. INTRODUCTION
Neural network algorithms can be applied in various fields
of research. These algorithms require a lot of computational
resources that raises the problem of efficiency. Some of
algorithms can be accelerated and optimized using FPGA (for
example, FFNN [1, 2] and CNN [3]). However, when it comes
to “bigger” neural networks or neural networks with specific
operations (for example, oscillator in CONN [4]), mixed or
analog ASICs show a promising performance in terms of
power consumption and calculation time. Low power
consumption is required in applications with battery supply,
such as medical wearable equipment, devices for sport and
fitness activities, IoT devices for smart homes and vehicles.
Neural networks can also be used in electronic devices, for
example, in ADC [5] for calibration or correction of output
signal.
Fig. 1. FFNN with one hidden layer.
The paper is organized as follows. Section II describes
main principles of considered neural networks. Section III
determines possibilities for efficiency increase using hardware
implementation based on observed papers. Section IV
includes the results of observations. Conclusions are outlined
in Section V.
II. BASIC PRINCIPLES OF NEURAL NETWORKS
A. Feed-Forward Neural Network
Feed-forward neural network (FFNN) basically requires
weighted sum and activation function for a single neuron.
Single layer network is shown on Fig. 1. When the number of
hidden layers is increasing to three or more, it becomes a Deep
Neural Network (DNN). An example of a DNN structure is
shown on Fig. 2.
B. Convolutional Neural Network
Convolutional Neural Network (CNN) is a feed-forward
neural network with convolutional layers. CNN usually
includes a convolutional layer, an activation function layer, a
pooling layer and a fully connected layer [6]. Operations of
convolution, activation function and pooling can be repeated
several times. Structure of a CNN is shown on Fig. 3. It can Fig. 2. FFNN with three hidden layers (DNN).
be seen that CNN is more complex algorithm than a
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
This circuits have been used for convolution kernel and have
been tested in CNN with four convolutional layers. The total
power consumed by a single neuron is 62.81 uW. Proposed
in [13] analog neuron consumes 25 uW and has been used in
a CNN with two convolutional layers. So, despite the existing
problems such as need of conversion and noise presence,
analog circuits can be very efficient in terms of power
Fig. 3. CNN with two convolutional and pooling layers [3]. consumption.
Then an oscillators’ iterative part starts. On the first stage C. Implementation of MAC Operation
the input of the oscillators is set to a random value in the range Multiply-accumulate (MAC) operation is a basic
from zero to one. The output of oscillators forms an output operation for convolution. While MAC implementation in
vector which will be multiplied with weight coefficients and digital domain requires a lot of resources, in analog domain
passed to the input of oscillators. This process will be repeated the resource consumption can be reduced. Based on this idea
until the synchronization stability is achieved. The structure of several implementations of MAC units in mixed circuits have
a CONN is shown on Fig. 4. CONN can be used for clustering been proposed [14-15]. Block diagram of analog convolution
analysis. kernel is shown on Fig. 8 [15]. Eight parallel convolution
kernels for CNN proposed for CCD camera consume 36 mW
[15].
100
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
IV. ANALYSIS OF NEURAL NETWORK IMPLEMENTATIONS
This section compares neural network implementation in
general purpose devices (CPU/GPU) and in specific devices
(FPGA/ASIC).
A. Power Consumption
Lower power consumption is an advantage of neural
networks implemented on FPGA or ASIC. Previously it was
shown that FPGA-based neural network implementations
consume less power than implementations based on CPU or
GPU [10]. Neural networks partially implemented in analog
domain on ASIC allow to reduce power consumption in
Fig. 8. Analog convolution network schematic diagram [15]. comparison with neural networks implemented on FPGA [11-
14].
D. CMOS Oscillator for CONN B. Accuracy of Hardware Implemented Neural Networks
In [4] CMOS oscillator for CONN has been proposed. Its Based on [11], accuracy of CNNs implemented on FPGA
general circuit is shown on Fig. 9. The CMOS circuits for varies from 88% to 99%. Simulations of CNN with analog
comparator (for example, it can be implemented as in [16]) kernels has been shown accuracy from 82% to 99% [12].
and sample-and-hold unit are well-known and there is no Comparing to that, image classification CNN on GPU has the
problem to implement these devices in analog domain. accuracy 83% [19]. The study of accuracy needs further
detailed observations, because accuracy may correlate with
tasks that a particular neural network is designed to solve.
Also, performance of analog circuits usually suffers from
noise and element mismatches that can lead to lower accuracy
of neural networks based on analog kernels.
C. Specific Neural Networks
Specific neural networks such as CONN that was observed
previously take more computational resources. Their
hardware implementation can make it possible to use these
neural networks in low power applications, such as IoT
Fig 9. General circuit of chaotic oscillator [4]. devices, smart wearable sensors, etc.
101
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
Electrical Engineering and Photonics (EExPolytech), St. Petersburg, faults of the CNC machinery on FPGA," 2023 International VLSI
Russian Federation, 2023, pp. 72-75, doi: Symposium on Technology, Systems and Applications (VLSI-
10.1109/EExPolytech58658.2023.10318777. TSA/VLSI-DAT), HsinChu, Taiwan, 2023, pp. 1-4, doi: 10.1109/VLSI-
[4] K. P. Kuznecov and D. O. Budanov, "Chaotic oscillator for a chaotic TSA/VLSI-DAT57221.2023.10134316.
oscillatory neural network hardware implementation," 2022 [12] S. Wang, K. M. Al-Tamimi, I. Hammad and K. El-Sankary, "Towards
International Conference on Electrical Engineering and Photonics current-mode analog implementation of deep neural network
(EExPolytech), St. Petersburg, Russian Federation, 2022, pp. 54-57, functions," 2022 20th IEEE Interregional NEWCAS Conference
doi: 10.1109/EExPolytech56308.2022.9950739. (NEWCAS), Quebec City, QC, Canada, 2022, pp. 322-326, doi:
[5] A. S. Kozlov and M. M. Pilipko, "A second-order sigma-delta 10.1109/NEWCAS52662.2022.9842017.
modulator with a hybrid topology in 180nm CMOS," 2020 IEEE [13] M. S. Asghar, S. Arslan and H. Kim, "Low power spiking neural
Conference of Russian Young Researchers in Electrical and Electronic network circuit with compact synapse and neuron cells," 2020
Engineering (EIConRus), St. Petersburg and Moscow, Russia, 2020, International SoC Design Conference (ISOCC), Yeosu, Korea (South),
pp. 144-146, doi: 10.1109/EIConRus49466.2020.9039246. 2020, pp. 157-158, doi: 10.1109/ISOCC50952.2020.9333105.
[6] G. Kumar, P. Kumar and D. Kumar, "Brain tumor detection using [14] M. S. Asghar, M. Junaid, H. W. Kim, S. Arslan and S. A. Ali Shah, "A
convolutional neural network," 2021 IEEE International Conference digitally controlled analog kernel for convolutional neural
on Mobile Networks and Wireless Communications (ICMNWC), networks," 2021 18th International SoC Design Conference (ISOCC),
Tumkur, Karnataka, India, 2021, pp. 1-6, doi: Jeju Island, Korea, Republic of, 2021, pp. 242-243, doi:
10.1109/ICMNWC52512.2021.9688460. 10.1109/ISOCC53507.2021.9613851.
[7] N. G. Markov, I. V. Zoev and E. A. Mytsko, "FPGA hardware [15] P. Jungwirth, D. Richie and B. Secrest, "Analog convolutional neural
implementation of the Yolo subclass convolutional neural network network," 2020 SoutheastCon, Raleigh, NC, USA, 2020, pp. 1-6, doi:
model in computer vision systems," 2022 International Siberian 10.1109/SoutheastCon44009.2020.9368273.
Conference on Control and Communications (SIBCON), Tomsk, [16] A. S. Korotkov, D. V. Morozov, M. M. Pilipko and A. Sinha, "Delta-
Russian Federation, 2022, pp. 1-4, doi: Sigma ADC for ternary code system (part I: modulator
10.1109/SIBCON56144.2022.10003015. realization)," 2007 International Symposium on Signals, Circuits and
[8] A. Stempkovskiy, R. Solovyev, D. Telpukhov and A. Kustov, Systems, Iasi, Romania, 2007, pp. 1-4, doi:
"Hardware implementation of convolutional neural network based on 10.1109/ISSCS.2007.4292653.
systolic matrix multiplier," 2023 Intelligent Technologies and [17] B. Reagen et al., "Minerva: enabling low-power, highly-accurate deep
Electronic Devices in Vehicle and Road Transport Complex (TIRVED), neural network accelerators," 2016 ACM/IEEE 43rd Annual
Moscow, Russian Federation, 2023, pp. 1-5, doi: International Symposium on Computer Architecture (ISCA), Seoul,
10.1109/TIRVED58506.2023.10332631. Korea (South), 2016, pp. 267-278, doi: 10.1109/ISCA.2016.32.
[9] M. E. Elbtity, H. -W. Son, D. -Y. Lee and H. Kim, "High speed, [18] G. D. Guglielmo et al., "A reconfigurable neural network ASIC for
approximate arithmetic based convolutional neural network detector front-end data compression at the HL-LHC," in IEEE
accelerator," 2020 International SoC Design Conference (ISOCC), Transactions on Nuclear Science, vol. 68, no. 8, pp. 2179-2186, Aug.
Yeosu, Korea (South), 2020, pp. 71-72, doi: 2021, doi: 10.1109/TNS.2021.3087100.
10.1109/ISOCC50952.2020.9333013.
[19] E. Cengil, A. Çinar and Z. Güler, "A GPU-based convolutional neural
[10] Y. Li, S. Lu, J. Luo, W. Pang and H. Liu, "High-performance network approach for image classification," 2017 International
convolutional neural network accelerator based on systolic arrays and Artificial Intelligence and Data Processing Symposium (IDAP),
quantization," 2019 IEEE 4th International Conference on Signal and Malatya, Turkey, 2017, pp. 1-6, doi: 10.1109/IDAP.2017.8090194.
Image Processing (ICSIP), Wuxi, China, 2019, pp. 335-339, doi:
10.1109/SIPROCESS.2019.8868327.
[11] C. -C. Chung, Y. -P. Liang, Y. -C. Chang and C. -M. Chang, "A binary
weight convolutional neural network hardware accelerator for analysis
102
Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.