0% found this document useful (0 votes)
19 views5 pages

Hardware Implementation of Neural Networks

The paper discusses the hardware implementation of various types of neural networks, including Feed-Forward Neural Networks (FFNN), Convolutional Neural Networks (CNN), and Chaotic Oscillatory Neural Networks (CONN). It highlights the efficiency benefits of using FPGA and ASIC technologies for reducing power consumption and improving performance in applications such as medical devices and IoT. The findings suggest that hardware implementations can significantly lower power usage compared to traditional CPU or GPU methods, making them suitable for low-power applications.

Uploaded by

v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Hardware Implementation of Neural Networks

The paper discusses the hardware implementation of various types of neural networks, including Feed-Forward Neural Networks (FFNN), Convolutional Neural Networks (CNN), and Chaotic Oscillatory Neural Networks (CONN). It highlights the efficiency benefits of using FPGA and ASIC technologies for reducing power consumption and improving performance in applications such as medical devices and IoT. The findings suggest that hardware implementations can significantly lower power usage compared to traditional CPU or GPU methods, making them suitable for low-power applications.

Uploaded by

v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/385988727

Hardware Implementation of Neural Networks

Conference Paper · October 2024


DOI: 10.1109/EExPolytech62224.2024.10755531

CITATIONS READS

0 38

2 authors:

Danil Skrebenkov Dmitry Budanov


Peter the Great St.Petersburg Polytechnic University Peter the Great St.Petersburg Polytechnic University
1 PUBLICATION 0 CITATIONS 27 PUBLICATIONS 81 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Danil Skrebenkov on 29 January 2025.

The user has requested enhancement of the downloaded file.


Hardware Implementation of Neural Networks
Danil I. Skrebenkov Dmitry O. Budanov
Higher School of Electronics and Micro-Electro-Mechanical Higher School of Electronics and Micro-Electro-Mechanical
Systems Systems
Peter the Great St. Petersburg Polytechnic University Peter the Great St. Petersburg Polytechnic University
2024 International Conference on Electrical Engineering and Photonics (EExPolytech) | 979-8-3503-8887-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/EExPolytech62224.2024.10755531

Saint Petersburg, Russia Saint Petersburg, Russia


skrebenkov.di@edu.spbstu.ru budanov_do@spbstu.ru

Abstract—Nowadays neural networks become very useful in “standard” FFNN. CNN is used for image recognition,
different fields of research such as image recognition, computer vision and classification tasks.
optimization, data analysis, classification and prediction tasks.
Though, increasing complexity leads to longer calculation time C. Chaotic Oscillatory Neural Network
and higher power consumption, what constrain using of neural A Chaotic Oscillatory Neural Network (CONN) is a more
networks in different types of applications. This makes a reason specific neural network than FFNN. Instead of dealing with
to develop specific hardware implemented circuits for neural input and weight coefficients in a CONN weight coefficients
networks. This paper briefly reviews present possibilities and are calculated based on input [4].
approaches of hardware implementation for some types of
neural networks.

Keywords—Feed-Forward Neural Network, Convolutional


Neural Network, Deep Neural Network, Chaotic Oscillatory
Neural Network, ASIC, FPGA

I. INTRODUCTION
Neural network algorithms can be applied in various fields
of research. These algorithms require a lot of computational
resources that raises the problem of efficiency. Some of
algorithms can be accelerated and optimized using FPGA (for
example, FFNN [1, 2] and CNN [3]). However, when it comes
to “bigger” neural networks or neural networks with specific
operations (for example, oscillator in CONN [4]), mixed or
analog ASICs show a promising performance in terms of
power consumption and calculation time. Low power
consumption is required in applications with battery supply,
such as medical wearable equipment, devices for sport and
fitness activities, IoT devices for smart homes and vehicles.
Neural networks can also be used in electronic devices, for
example, in ADC [5] for calibration or correction of output
signal.
Fig. 1. FFNN with one hidden layer.
The paper is organized as follows. Section II describes
main principles of considered neural networks. Section III
determines possibilities for efficiency increase using hardware
implementation based on observed papers. Section IV
includes the results of observations. Conclusions are outlined
in Section V.
II. BASIC PRINCIPLES OF NEURAL NETWORKS
A. Feed-Forward Neural Network
Feed-forward neural network (FFNN) basically requires
weighted sum and activation function for a single neuron.
Single layer network is shown on Fig. 1. When the number of
hidden layers is increasing to three or more, it becomes a Deep
Neural Network (DNN). An example of a DNN structure is
shown on Fig. 2.
B. Convolutional Neural Network
Convolutional Neural Network (CNN) is a feed-forward
neural network with convolutional layers. CNN usually
includes a convolutional layer, an activation function layer, a
pooling layer and a fully connected layer [6]. Operations of
convolution, activation function and pooling can be repeated
several times. Structure of a CNN is shown on Fig. 3. It can Fig. 2. FFNN with three hidden layers (DNN).
be seen that CNN is more complex algorithm than a

979-8-3503-8887-9/24/$31.00 ©2024 IEEE 99

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
This circuits have been used for convolution kernel and have
been tested in CNN with four convolutional layers. The total
power consumed by a single neuron is 62.81 uW. Proposed
in [13] analog neuron consumes 25 uW and has been used in
a CNN with two convolutional layers. So, despite the existing
problems such as need of conversion and noise presence,
analog circuits can be very efficient in terms of power
Fig. 3. CNN with two convolutional and pooling layers [3]. consumption.
Then an oscillators’ iterative part starts. On the first stage C. Implementation of MAC Operation
the input of the oscillators is set to a random value in the range Multiply-accumulate (MAC) operation is a basic
from zero to one. The output of oscillators forms an output operation for convolution. While MAC implementation in
vector which will be multiplied with weight coefficients and digital domain requires a lot of resources, in analog domain
passed to the input of oscillators. This process will be repeated the resource consumption can be reduced. Based on this idea
until the synchronization stability is achieved. The structure of several implementations of MAC units in mixed circuits have
a CONN is shown on Fig. 4. CONN can be used for clustering been proposed [14-15]. Block diagram of analog convolution
analysis. kernel is shown on Fig. 8 [15]. Eight parallel convolution
kernels for CNN proposed for CCD camera consume 36 mW
[15].

Fig. 5. ReLU implementation [12].


Fig. 4. CONN structure [4].

III. HARDWARE IMPLEMENTATIONS


This section describes different implementation
approaches that are useful in neural networks. It starts from
activation functions and ends with chaotic oscillators. Despite
the fact that accuracy, training time and the other
characteristics of neural networks are still important, power
consumption and occupied area become more significant in
case of hardware implementations.
A. Neural Networks Implemented in FPGA
Applying FPGAs for neural networks acceleration is
quietly useful option if a network structure is not complex.
Papers [1-3, 7-9] demonstrate usage of FPGA for FFNN and Fig. 6. SoftMax implementation [12].
CNN in particular. However, FPGAs have limited number of
devices such as DSP blocks. As a consequence, the number of
these devices can be not enough for more complex neural
networks. For example, in [10] usage of DSP blocks is equal
to 84.2% and increasing the number of layers is limited. Power
consumption of FPGA-based neural network implementation
is lower than power consumption of, for example, GPU-based
one. For example, proposed in [11] FPGA-based CNN
implementation consumes 523 mW and GPU usually
consumes around 300-400 W.
B. Implementation of Activation Functions
Analog implementations of activation functions
theoretically can be used for various types of neural networks.
Paper [12] introduces ReLU function (Fig. 5), SoftMax
function (Fig. 6) and weight multiplier (Fig. 7) that are
implemented in 180 nm CMOS with 1.5 V supply voltage. Fig. 7. Weight multiplier implementation [12].

100

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
IV. ANALYSIS OF NEURAL NETWORK IMPLEMENTATIONS
This section compares neural network implementation in
general purpose devices (CPU/GPU) and in specific devices
(FPGA/ASIC).
A. Power Consumption
Lower power consumption is an advantage of neural
networks implemented on FPGA or ASIC. Previously it was
shown that FPGA-based neural network implementations
consume less power than implementations based on CPU or
GPU [10]. Neural networks partially implemented in analog
domain on ASIC allow to reduce power consumption in
Fig. 8. Analog convolution network schematic diagram [15]. comparison with neural networks implemented on FPGA [11-
14].
D. CMOS Oscillator for CONN B. Accuracy of Hardware Implemented Neural Networks
In [4] CMOS oscillator for CONN has been proposed. Its Based on [11], accuracy of CNNs implemented on FPGA
general circuit is shown on Fig. 9. The CMOS circuits for varies from 88% to 99%. Simulations of CNN with analog
comparator (for example, it can be implemented as in [16]) kernels has been shown accuracy from 82% to 99% [12].
and sample-and-hold unit are well-known and there is no Comparing to that, image classification CNN on GPU has the
problem to implement these devices in analog domain. accuracy 83% [19]. The study of accuracy needs further
detailed observations, because accuracy may correlate with
tasks that a particular neural network is designed to solve.
Also, performance of analog circuits usually suffers from
noise and element mismatches that can lead to lower accuracy
of neural networks based on analog kernels.
C. Specific Neural Networks
Specific neural networks such as CONN that was observed
previously take more computational resources. Their
hardware implementation can make it possible to use these
neural networks in low power applications, such as IoT
Fig 9. General circuit of chaotic oscillator [4]. devices, smart wearable sensors, etc.

E. Neural Network ASICs V. CONCLUSIONS


Paper [17] introduces Minerva – an algorithm for In this paper the most common types of neural networks
optimizing DNN hardware accelerators. Configurable DNN were surveyed. Hardware implementation of neural networks
generated by Minerva consumes 24 mW and achieves nearly on FPGA or ASIC can reduce power consumption and allow
1% prediction error. neural networks to be used in low power applications which
are required in various medical wearable equipment, sport and
In [18] a CNN was implemented in 65 nm technology in
fitness applications, IoT devices for smart homes and vehicles,
autoencoder for high-granularity calorimeter. The architecture
Advanced Driver Assistance Systems (ADAS), smart devices
of used neural network is shown on Fig. 10. As a result, ASIC- for monitoring environment conditions in agricultural
based implementation consumes 95 mW and has a 50 ns applications. Several analog approaches for implementing
latency while FPGA-based implementation consumes 2.5- parts of neural networks were observed. Hardware-
5 W and has a 300 ns latency. implemented neural networks demonstrate promising
performance in terms of power consumption and possibilities
for using in embedded devices where low power consumption
is required. Using hardware implementations of neural
networks allows to reduce power consumption from hundreds
of watts to tens of milliwatts comparing to common devices
such as CPU or GPU.
REFERENCES
[1] V. A. Sumayyabeevi, J. J. Poovely, N. Aswathy and S. Chinnu, "A New
Hardware Architecture for FPGA Implementation of Feed Forward
Neural Networks," 2021 2nd International Conference on Advances in
Computing, Communication, Embedded and Secure Systems
(ACCESS), Ernakulam, India, 2021, pp. 107-111, doi:
10.1109/ACCESS51619.2021.9563342.J.
[2] S. Hariprasath and T. N. Prabakar, "FPGA implementation of
multilayer feed forward neural network architecture using
VHDL," 2012 International Conference on Computing,
Communication and Applications, Dindigul, India, 2012, pp. 1-6, doi:
10.1109/ICCCA.2012.6179225.I.
[3] J. E. Akimova and D. O. Budanov, "Hardware Implementation of a
Fig. 10. Neural network architecture for the encoder model [18].
Convolutional Neural Network," 2023 International Conference on

101

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.
Electrical Engineering and Photonics (EExPolytech), St. Petersburg, faults of the CNC machinery on FPGA," 2023 International VLSI
Russian Federation, 2023, pp. 72-75, doi: Symposium on Technology, Systems and Applications (VLSI-
10.1109/EExPolytech58658.2023.10318777. TSA/VLSI-DAT), HsinChu, Taiwan, 2023, pp. 1-4, doi: 10.1109/VLSI-
[4] K. P. Kuznecov and D. O. Budanov, "Chaotic oscillator for a chaotic TSA/VLSI-DAT57221.2023.10134316.
oscillatory neural network hardware implementation," 2022 [12] S. Wang, K. M. Al-Tamimi, I. Hammad and K. El-Sankary, "Towards
International Conference on Electrical Engineering and Photonics current-mode analog implementation of deep neural network
(EExPolytech), St. Petersburg, Russian Federation, 2022, pp. 54-57, functions," 2022 20th IEEE Interregional NEWCAS Conference
doi: 10.1109/EExPolytech56308.2022.9950739. (NEWCAS), Quebec City, QC, Canada, 2022, pp. 322-326, doi:
[5] A. S. Kozlov and M. M. Pilipko, "A second-order sigma-delta 10.1109/NEWCAS52662.2022.9842017.
modulator with a hybrid topology in 180nm CMOS," 2020 IEEE [13] M. S. Asghar, S. Arslan and H. Kim, "Low power spiking neural
Conference of Russian Young Researchers in Electrical and Electronic network circuit with compact synapse and neuron cells," 2020
Engineering (EIConRus), St. Petersburg and Moscow, Russia, 2020, International SoC Design Conference (ISOCC), Yeosu, Korea (South),
pp. 144-146, doi: 10.1109/EIConRus49466.2020.9039246. 2020, pp. 157-158, doi: 10.1109/ISOCC50952.2020.9333105.
[6] G. Kumar, P. Kumar and D. Kumar, "Brain tumor detection using [14] M. S. Asghar, M. Junaid, H. W. Kim, S. Arslan and S. A. Ali Shah, "A
convolutional neural network," 2021 IEEE International Conference digitally controlled analog kernel for convolutional neural
on Mobile Networks and Wireless Communications (ICMNWC), networks," 2021 18th International SoC Design Conference (ISOCC),
Tumkur, Karnataka, India, 2021, pp. 1-6, doi: Jeju Island, Korea, Republic of, 2021, pp. 242-243, doi:
10.1109/ICMNWC52512.2021.9688460. 10.1109/ISOCC53507.2021.9613851.
[7] N. G. Markov, I. V. Zoev and E. A. Mytsko, "FPGA hardware [15] P. Jungwirth, D. Richie and B. Secrest, "Analog convolutional neural
implementation of the Yolo subclass convolutional neural network network," 2020 SoutheastCon, Raleigh, NC, USA, 2020, pp. 1-6, doi:
model in computer vision systems," 2022 International Siberian 10.1109/SoutheastCon44009.2020.9368273.
Conference on Control and Communications (SIBCON), Tomsk, [16] A. S. Korotkov, D. V. Morozov, M. M. Pilipko and A. Sinha, "Delta-
Russian Federation, 2022, pp. 1-4, doi: Sigma ADC for ternary code system (part I: modulator
10.1109/SIBCON56144.2022.10003015. realization)," 2007 International Symposium on Signals, Circuits and
[8] A. Stempkovskiy, R. Solovyev, D. Telpukhov and A. Kustov, Systems, Iasi, Romania, 2007, pp. 1-4, doi:
"Hardware implementation of convolutional neural network based on 10.1109/ISSCS.2007.4292653.
systolic matrix multiplier," 2023 Intelligent Technologies and [17] B. Reagen et al., "Minerva: enabling low-power, highly-accurate deep
Electronic Devices in Vehicle and Road Transport Complex (TIRVED), neural network accelerators," 2016 ACM/IEEE 43rd Annual
Moscow, Russian Federation, 2023, pp. 1-5, doi: International Symposium on Computer Architecture (ISCA), Seoul,
10.1109/TIRVED58506.2023.10332631. Korea (South), 2016, pp. 267-278, doi: 10.1109/ISCA.2016.32.
[9] M. E. Elbtity, H. -W. Son, D. -Y. Lee and H. Kim, "High speed, [18] G. D. Guglielmo et al., "A reconfigurable neural network ASIC for
approximate arithmetic based convolutional neural network detector front-end data compression at the HL-LHC," in IEEE
accelerator," 2020 International SoC Design Conference (ISOCC), Transactions on Nuclear Science, vol. 68, no. 8, pp. 2179-2186, Aug.
Yeosu, Korea (South), 2020, pp. 71-72, doi: 2021, doi: 10.1109/TNS.2021.3087100.
10.1109/ISOCC50952.2020.9333013.
[19] E. Cengil, A. Çinar and Z. Güler, "A GPU-based convolutional neural
[10] Y. Li, S. Lu, J. Luo, W. Pang and H. Liu, "High-performance network approach for image classification," 2017 International
convolutional neural network accelerator based on systolic arrays and Artificial Intelligence and Data Processing Symposium (IDAP),
quantization," 2019 IEEE 4th International Conference on Signal and Malatya, Turkey, 2017, pp. 1-6, doi: 10.1109/IDAP.2017.8090194.
Image Processing (ICSIP), Wuxi, China, 2019, pp. 335-339, doi:
10.1109/SIPROCESS.2019.8868327.
[11] C. -C. Chung, Y. -P. Liang, Y. -C. Chang and C. -M. Chang, "A binary
weight convolutional neural network hardware accelerator for analysis

102

Authorized licensed use limited to: Peter the Great St. Petersburg Polytechnic Univ. Downloaded on January 29,2025 at 13:45:35 UTC from IEEE Xplore. Restrictions apply.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy