0% found this document useful (0 votes)
38 views10 pages

AES 32 An FPGA Implementation of Lightweight-AES For

Uploaded by

Mouna Bedoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views10 pages

AES 32 An FPGA Implementation of Lightweight-AES For

Uploaded by

Mouna Bedoui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Computing and Digital Systems

ISSN (2210-142X)
Int. J. Com. Dig. Sys. 16, No.1 (Aug-2024)
http://dx.doi.org/10.12785/ijcds/160167

AES 32: An FPGA implementation of Lightweight-AES for


IoT Devices
Sumit Singh Dhanda1 , Brahmjit Singh1 and Poonam Jindal1
1
Department of Electronics and Communication, National Institute of Technology, Kurukshetra, India

Received 27 May 2023, Revised 06 May 2024, Accepted 11 May 2024, Published 10 Aug. 2024

Abstract: IoT is marked by the resource-constrained devices. Information security is the main challenge that arise due to wireless
transmission of data by ubiquitous sensors. The rapid expansion of IoT setups with resource-constrained devices has spurred research
into low-cost information security solutions. This study presents an efficient version of AES for high throughput. The AES’s data path
is 32-bit compressed. Implementation has been carried out on different FPGA families. Data path compression and use of BRAMs
has led to improved throughput with savings in resource consumption. Loop-unrolled AES results in the consumption of 2669 slices
which 12 times as big as this design. While 32-bit AES with 128-bit data path consumes 4 times more resources than proposed design
which uses 223 slices and 5 BRAMs on Artix-7 FPGA. The proposed design delivers throughput in the range of 2.2 to 3.5 Gbps and
achieves efficiency of 1.75 Mbps-7.8 Mbps per slice on different FPGAs. It outperforms different lightweight ciphers and constrained
AES implementations in existing literature.

Keywords: Advance Encryption Scheme (AES), Internet of Things (IoT), Field Programmable Gate Arrays (FPGA), Data path,
Information security

1. INTRODUCTION utilized for these purposes. Hardware implementation of


Cisco estimates that the number of connected devices AES is preferred as compared to software implementation
will rise from 50 billion by 2020 to reach 500 billion by for high throughput applications. These implementations are
2025 [1]. To ensure the information security in Internet carried out either on field programmable gate arrays (FPGA)
of Things (IoT) small sensors and devices need to be or on application specific integrated chips (ASIC). Major
safeguarded against sniffing attacks. Smart grids are also an research areas of AES implementation are highlighted in
application of IoT. Enormous data exchange and openness Fig 1. To minimize the delay highly pipelined architectures
of resource sharing among smart meters in smart grid have are implemented. Area reduction is achieved by iterative
also generated challenges of data security [2]. Data privacy architecture. Several optimizations in the basic operations
and data leakage are also an important concern at cloud such as SubBytes or Mix-Columns, arithmetic operations
level as well [3]. Confidentiality of the information can be etc. are also used for the same. Further, resource sharing [8]
enhanced with the help of Block ciphers. These are used for has also been used to minimize the area and increase the
ensuring information security [4], [5] in various standards. speed of the architecture while maintain the integrity of the
Blockchain technology is another important area where cipher. Data-path reduction [9] is one of the resource shar-
cryptographic algorithms serve as the base of security [6]. ing techniques to achieve the smaller area implementation
For this kind of application, the most secure block cipher of AES. Due to the ever-increasing demand of security so-
is thought to be Advanced Encryption Standard (AES). lutions for resource constrained devices researchers are still
National Institute of Standards and Technology (NIST) [7], working in the direction of developing new architectures
released FIPS-197 in which AES was adopted as a standard of AES. Various attempts are reported in literature towards
symmetric cipher. It ensures confidentiality at two levels optimizations are broadly focused in two categories:

i) For high throughput applications such as e- i) Pipelined (fully or partially) architecture for imple-
commerce or in case of trunk communication. mentation of high speed.
ii) For lower data rates it can be used for resource ii) Compact and low-power architecture for the low
constrained devices. resources or low-cost devices and feedback mode of
operations.
Software and hardware implementations of AES are
E-mail address: dhandasumit@gmail.com, brahmjit.s@gmail.com, poonamjindal81@nitkkr.ac.in https:// journal.uob.edu.bh
926 Sumit Singh Dhanda, et al.: An FPGA Implementation of Lightweight-AES for IoT Devices

presented, compared and discussed while identifying its


applications in Section 4. In the end Section 5, draws
conclusions and provides the future work directions.
2. RELATED WORK
In [8], authors used the 32-bit data path, resource sharing
between these encryption and decryption units and subfield
arithmetic to minimize the hardware requirement. In [10], a
low power AES architecture with an optimized S-box have
been implemented on an FPGA with 128-, 192- and 256-
bit keys. An ASIC implementation for the AES processor
has been carried out in [11] which is capable of delivering
the throughput of 2.29 Gbps. In [12], authors carried out
32-bit implementation on FPGA with the help of pre-
computed key expansion for FPGA. S-box is implemented
as LUTs. The design used the dedicated memory blocks
that were available on FPGA. Shift rows is performed with
addressing logic. It is made possible by arranging the state-
bytes in such a manner that were efficiently stored in shift
registers. The same method has been used in [13] to reduce
storage requirements and implement data paths of various
Figure 1. Research directions in AES sizes. In [14],authors have improved the FPGA resource
consumption using T-box method. In [15], a theoretical
design for the AES architecture was presented to optimize
Major contribution of this work is to explore the adap- the resource consumption. In [9], authors carried out a
tion of AES-128 to low-cost devices in IoT by effective fully parallel and loop unrolled implementation of AES
resource utilization. A two-step approach is applied to using composite field arithmetic and LUT based T-Boxes.
minimize the latency and resource consumption. First step It was carried out for two different architectures one was
considers the compression of data path to 32-bits. Use of 8-bit S-box based while the other was 32-bit data path.
BRAMs available in FPGA, maximizes the utilization of The architectures were optimized for high speed and low
available resources. Although, this design utilizes 32-bit latency. The theoretical architecture presented by in [15]
data path like others [10-12] but optimum utilization re- was utilized in [16] with a core added with decryption
sources enable it to stand out among these designs. Efficient functionality and 8-bit data path. Data path contains S-box
use of available block RAMs has enabled it to minimize the implementation in combinatorial logic. A study focused on
resources. Use of BRAMs for the implementations of the S- the IoT devices and their design was presented in [17]
boxes yielded heavy reduction in the resource consumption. but they left small devices. The power consumption for
As the standard AES design is utilized for the work and no AES has been reported 42 mW in this study which is
changes has been made to original structure and operations not appropriate for the constrained IoT devices. Hence,
of AES structure. Hence, the security of the cipher remains the 128-bit architectures mentioned in [17], [18], [19] are
same as original AES. not suitable for the implementation in constrained devices
due to power requirements. Similarly, [20] utilizes 32-bit
The contribution of this work are as follows: data path and has power consumption in micro-watt level
but the area requirements make it unsuitable for the small
i) In this paper, a new high performance constrained sensors.In [21], an asynchronous design has been presented
architecture has been presented for resource con- for 128-bit data-path AES that consumes lesser power
strained devices. but the area requirements are high for the small devices
ii) Architecture makes use of BRAMs to minimize the and power consumption is still a concern for resource
resource consumption on FPGAs. It achieves the constrained devices. In [22], on FPGA, a simplified version
optimum utilization of FPGA resources. of the AES algorithm is realised. To obtain the least amount
iii) It also presents the state of the art in the field of of latency in an ad hoc voice link, the mixcolumns step is
research. deleted.
iv) The result is compared with existing designs and
lightweight implementations for IoT. In [23],a new AES crypto-hardware accelerator was
presented for the devices such as Bluetooth controller.
Rest of the paper is organized as follows: Various con- It uses power efficient designs for S-box, MixColumns,
temporary implementations are presented alongside older Shiftrows and their inverses. The area occupied is 3120 GE
ones in Section 2. Section 3 provides the implementation for the 130 nm CMOS technology. In [24], a new design
details. Implementation results of the proposed design are named nano-AES was presented utilizing 8-bit data path.

https:// journal.uob.edu.bh
Int. J. Com. Dig. Sys. 16, No.1, 925-934 (Aug-2024) 927

It was an ASIC implementation which achieved 35-2.4%


improvements over previous works. In [25], have presented
8-bit architecture for the SILC, CLOC, AES-JAMBU, and
COLM authenticated ciphers. All of these are designed by
modifying AES core. AES-JAMBU used the least resources
among all of these. A crypto-engine for AES-GCM was
purposed in [26], which generates the throughput of 100
Gbps. It can be utilized in optical transport networks. It
is designed using 40 nm library. AES has been adapted to
design a chaos-based algorithm for the encryption in [27]. It
provides security for images and data. Authors have tested
the scheme for different tests and attacks and high resistance
has been reported against such attacks. Security issues of
AES based designs are highlighted in next few works.
True random number generators (TRNGs) have a statistical
weakness due to physical randomness. A post-processing Figure 2. BRAM based 32-bit Iterative architecture
method can be used to solve this issue. An S-box based
solution have been proposed in [28] In [29], ], a correlation
scan attack against XOR compaction is proposed. In [30], one can save the time required which will result in enhanced
LC-FARES was presented. It has the capability to identify performance. For the same purpose independent s-box is
injected-faults. Sixteen 8-bit registers are used, in a 32-bit used in this design. Initial input to the proposed design has
architecture, for implementing ShiftRows. A flexible AES a size of 128-bits. For SubBytes operation, it is converted
design, that can choose from different defense mechanisms, into four blocks of 32-bits each.
key sizes and mode of operations etc., is presented in [31]
using an agile approach. It uses Chisel framework to achieve ShiftRows operates on input bytes and arrange them
reduced code size. Authors have designed an advanced into a fixed sequence. Every time input is fed this sequence
crypto-hardware for AES. It supports variable key sizes is repeated. This fact has led to permute the output-bytes
in multiple modes [32]. The designs are synthesized using in a fixed pattern which is similar to the standard design
7 nm CMOS technology. In [33], authors have presented of Chadoweic and Gaj [12] (2003) and N. Pramstaller, et
a lightweight cipher using Lorentz-chaotic system (LCS). al. [13] (2004). The next operation is the MixColumns
It occupies only 27 slices and uses feistel structure. LCS operation, which has four and is 32-bit wide. The last step
has been used to generate the random numbers which are is adding a round key, which is accomplished by using a
used in the key. Numbers of works have been reported on different SubBytes unit. In this case, the computation is
the AES optimizations. There is need for the reduction in done instantly. Either galois field (GF) arithmetic or the
resource consumption of AES. Data path compression is S-Box can be stored as a look-up table (LUT) are used
one of the popular strategies for the area minimization. But in the SubBytes operation. Here, S-box is implemented as
only few works have been reported for lightweight-AES for a LUT, which uses a few more resources than a design
the IoT applications. It is a big clearly highlights a gap in the based on GF. However, we have stored the S-box entries
literature and motivated us to adopt following methodology: using the block RAM (BRAM) that is included in the
FPGA. The same data-path and resources are used by both
i) AES-128 is adapted to AES-32 by data path com- encryption and decryption in this system. Multiplexers have
pression. BRAMs are used to further minimize the been used to make this possible. The Key Expansion unit
resource consumption. uses a separate S-box as well, but it uses less resources
ii) Verilog is used for coding the design which is now that the FPGA has been modified to use the on-board
synthesized on PlanAhead software. block RAM. 5 BRAMS have been used in all in the Artix-
iii) Thereafter, it was implemented on different FPGAs. 7 FPGA. It has enabled the design to reduce the resource
iv) Based on these FPGA implementation design is requirement heavily. Total 256 entries have been made
compared with existing designs. for the byte substitution table. In this process, the ‘case’
statement has been utilized for the byte substitutions. It is an
Proposed design outperforms existing works in through- area consuming process but it will help in faster execution
put and area. of the cipher. Due to 32-bit data path and sharing of S-box,
the number of cycles required to implement one round now
3. 32-bit Data Path Implementation become 5 (4 cycles for main data and one cycle for key
This 32-bit iterative architecture was designed for high expansion). Multiplexers helps in sharing of resources. The
throughput with minimized resource-consumption. It is SubBytes architecture is shown in Fig 3. It is a 32-bit wide
shown in Fig 2 below. MixColumns are 32-bit in size operation which is divided in four 8-bit wide operations
just like main data path. Key generation is one of the most individually. Hence, these are calculated individually and
important operation in AES. By generating keys on-the-fly combined in the end. The Key expansion module utilizes

https:// journal.uob.edu.bh
928 Sumit Singh Dhanda, et al.: An FPGA Implementation of Lightweight-AES for IoT Devices
clk
r10
r1
I0[127:0]
r0_out_i
datain[127:0] O[127:0] r2 r3 r4 r5 r6 r7 r8 r9 clk
I1[127:0] clk
key[127:0] keylastin[127:0] fout[127:0]
RTL_XOR data[127:0] keyout[127:0] clk clk clk clk clk clk clk clk dataout[127:0]
V=B"1001" rc[3:0]
keyin[127:0] rndout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0] data[127:0] keyout[127:0]
rin[127:0]
rc[3:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0] keyin[127:0] rndout[127:0]
rounndlast
rounds V=B"0001" rc[3:0] V=B"0010" rc[3:0] V=B"0011" rc[3:0] V=B"0100" rc[3:0] V=B"0101" rc[3:0] V=B"0110" rc[3:0] V=B"0111" rc[3:0] V=B"1000" rc[3:0]

rounds rounds rounds rounds rounds rounds rounds rounds

Figure 5. Top level Schematic for AES-128


Rrg[0]_Rrg[9]_mux_16 Rrg[0]_Rrg[9]_mux_20
d0[9:0] d0[9:0]
o[9:0] o[9:0] BSYrg_BSYrg_MUX_397
d1[9:0] d1[9:0]
KrgX[127]_Key[127]_mux_11 a0
o
cond RTL_mux_10 cond RTL_mux_10 d0[127:0] Knext[127]_KrgX[127]_mux_18 a1
o[127:0]
d1[127:0] d0[127:0]
o[127:0] c RTL_MUX
d1[127:0]
Key[127:0] Krg[127]_Krg[127]_mux_17 cond RTL_mux_128

d0[127:0] KrgX[127]_Knext[127]_mux_7 cond RTL_mux_128 BSYrg_PWR_1_o_MUX_395


o[127:0] BSYrg_GND_1_o_MUX_399
Rrg[9]_Rrg[0]_mux_9 Rrg[9]_Rrg[9]_mux_13 Krg[127]_Key[127]_mux_10 d1[127:0] d0[127:0] a0 BSYrg_BSYrg_MUX_401
o[127:0] o a0
d0[9:0] d0[9:0] d0[127:0] d1[127:0] a1 o a0
o[9:0] o[9:0] o[127:0] cond RTL_mux_128 a1 o
d1[9:0] d1[9:0] d1[127:0] a1
cond RTL_mux_128 c RTL_MUX
c RTL_MUX
cond RTL_mux_10 cond RTL_mux_10 cond RTL_mux_128 c RTL_MUX
KrgX[127]_Knext[127]_mux_22 BSY
Drdy d0[127:0] Drg[127]_Din[127]_mux_8
o[127:0]
Krdy Rrg[9]_Rrg[0]_mux_24 Krg[127]_Krg[127]_mux_21 d1[127:0] d0[127:0] Drg[127]_Drg[127]_mux_12 Dnext[127]_Drg[127]_mux_19 BSYrg
o[127:0] r
d0[9:0] d0[127:0] d1[127:0] d0[127:0] d0[127:0] Drg[127]_Dnext[127]_mux_23
o[9:0] o[127:0] cond RTL_mux_128 o[127:0] o[127:0] d q
d1[9:0] d1[127:0] d1[127:0] d1[127:0] d0[127:0]
cond RTL_mux_128 o[127:0] clk
d1[127:0]
cond RTL_mux_10 cond RTL_mux_128 cond RTL_mux_128 cond RTL_mux_128 e
s
cond RTL_mux_128
EN Knext[127]_Krg[127]_mux_15 RTL_FDRSE

d0[127:0]
o[127:0]
d1[127:0] Dout[127:0]
DC
cond RTL_mux_128
Krg Drg
r[127:0] Rrg[9:0] do[127:0] r[127:0]
CLK clk di[127:0] ko[127:0] clk
d[127:0] q[127:0] ki[127:0] d[127:0] q[127:0]
e[127:0] e[127:0]
s[127:0] DecCore s[127:0]
RTL_wide_fdrse_128 RTL_wide_fdrse_128
a[127:0] Din[127]_Krg[127]_xor_6
o[127:0]
b[127:0] KrgX Dvldrg_GND_1_o_MUX_394
r[127:0]
Din[127:0] RTL_xor_128 a0 Dvldrg_PWR_1_o_MUX_398
clk o Dvldrg_Dvldrg_MUX_400
a1 a0
d[127:0] q[127:0] o a0 Dvldrg
a1 o r
e[127:0] c RTL_MUX a1
GND_1_o_INV_13 Rrg s[127:0] d q Dvld
i o c RTL_MUX
r[9:0] RTL_wide_fdrse_128 RTL_MUX
RSTn c clk
clk
RTL_INV e
d[9:0] q[9:0] s
Rrg[9]_PWR_1_o_mux_25 e[9:0] RTL_FDRSE
s[9:0]
d0[9:0] RTL_wide_fdrse_10
o[9:0]
V=B"1000000000" d1[9:0]

cond RTL_mux_10
Dvldrg_GND_1_o_MUX_396
a0
o
a1

c RTL_MUX

Figure 6. Schematic for AES-32 with BRAMs

Figure 3. SubBytes structure for AES-32


This 7-series FPGA have two types of slices; slice-M
and slice-L. Here, the advantage of using slice-M is that it
can utilize its LUTs to configure distributed RAM (DRAM).
It helps in better utilization of resources. Another strategy
that we have adopted is to utilize the BRAM for S-box.
BRAM on 7-series FPGAs has storage capacity up to 36
Kbits which makes it ideal suited for the S-box storage. It
can also be used for other storage as well. Since S-box as
LUT has 256 entries and each entry is a byte long, using
slice resources or DRAM for the same will be a waste
of resources. The top-level schematic of the AES-32 and
AES-128 bits is displayed in Figs. 5 and 6, respectively.
As previously mentioned, AES-32 is implemented as an
iterative architecture, but AES-128 is constructed as a loop
unrolled design.

Figure 4. MixColumns structure for AES-32 Table I represents the resource consumption of the AES-
32 on Artix-7 FPGA and its comparison with AES-128.
It shows that total 568 LUTs have been used while the
a separate SubBytes module in this design and hence it is number of slices stands at 223. The design also utilizes 5
able to calculate the output in minimum cycles. BRAMs available on FPGA. These BRAMs are used for the
implementation of the S-boxes which are implemented as
Although, the cost is paid in terms of BRAMs and the LUT. It helps in the better resource utilization. There are
additional control circuitry. Inverse SubBytes is similar to two types of slices available on 7-series FPGAs slice-M and
SubBytes operations and uses same number of resources. slice-L. One of Slice-M’s advantages is that they may be
modified to create DRAMs, which can then be utilized for
Four levels of logic constitute the MixColumns opera- storage while the software applies optimization techniques.
tion. Fig 4 shows the different levels of logic used in the Of the 223 slices that have been employed in our design,
MixColumns design. There are total 4-levels of logic used 40 percent are slice-M and 60 percent are slice-L. However,
in this design and a total of 91 XOR operations are needed. using BRAMs is how the resource consumption is primarily
There are two XOR gates on first level, Fourteen XOR gates reduced. We have compared our implementation to AES-
are present on level 2. While level 3 consist of thirty-seven 128, which was implemented on the same Artix-7 FPGA,
gates and finally thirty-eight can be found in level 4. The in order to highlight the savings that our implementation
inverse MixColumns operation is quite similar to the design has accomplished. It has loop unrolled architecture. The
but there are five levels of logic. comparison is shown in the I and depicted in Fig 7 . It
4. Result and Discussions shows that AES-128 consumes 2668 slices and a total of
9571 LUTs.
The initial implementation of the design is carried out
with Xilinx Vivado software version 2014 and Artix-7 The following are the outcomes for the use of the Artix-
FPGA. On the other hand, for the comparison with exiting 7 FPGA’s resources: Compared to AES-32, there has been
designs the synthesis is carried out using Xilinx PlanAhead an overall improvement in resource utilization of 91.64
software. Different FPGAs have been used for the imple- percent. We have also implemented AES-32 using a 128-
mentations. While mentioning the FPGAs, we have used bit data channel in a similar manner. It makes use of
‘V’ for Virtex, ‘K’ for Kintex and ‘Sp’ for Spartan family 1231 LUTs and 424 slices. These results show that an
while numeric values 5, 6, 7 or alphabet ‘E’ etc. represent area savings of 47.40 percent can be realized simply by
the generation of the particular family. compressing the data route to 32 bits. The bar chart in Fig.
8 illustrates the same. The utilization of BRAMs in the

https:// journal.uob.edu.bh
Int. J. Com. Dig. Sys. 16, No.1, 925-934 (Aug-2024) 929

TABLE I. AES-32’s comparison with the two implementation


design in Xilinx PlanAhead and implemented it on various
Design Slices LUTs Improvement (%) Xilinx FPGAs in order to compare it with the existing
AES-128 (loop unrolled architecture) 2669 9571 91.64
designs. It gave us the information we needed to compare
AES-32-bit with 128-bit data path 424 1231 47.40 the results in-depth with previous designs. In order to do the
AES-32-bit 223 568 design comparison, we have used the approach described by
[34], which involves using the notion of ”equivalent slices”
and a ”normalization method.”
The idea of ”equivalent slices” is applied in the process
of comparing the old and new gadgets. Since a lot of designs
rely on lookup tables that are kept in the FPGA’s BRAMs.
Thus, slices and BRAMs make up the two components
of resource use. BRAMs have been converted to slices
for the implemented design, this value has been added to
the total. After which the ”normalization method” and the
idea of ”equivalent slices” were used to conduct the design
comparison.
A thorough analysis of the literature indicates that
various FPGA types are employed in implementation. All
the FPGAs ranging from Virtex series have been used
Figure 7. Comparison of three basic implementations of AES among for the same. For a fair comparison, a ”normalized TPS”
each other calculation is therefore given. The following presumptions
are used to calculate the implementations of FPGA families
previous to Virtex-5:
S-box implementation results in a significant reduction in
area when compared to the current implementations, even i) These FPGAs have two LUTs per slice, whereas FP-
for 32-bit implementations. This reduction helps in making GAs made after Virtex-5 have four LUTs per slice.
design compact and better suited for the small devices. For these FPGAs, the occupied space is therefore
divided by two.
The proposed design is compared with the existing ones ii) 64 slices have been attributed to a single BRAM of
based on three factors. The number of slices consumed for size 18 Kb in these FPGAs, but starting with Virtex-
the implementation. The maximum frequency of operation 5, single BRAM with 36 Kb size has been attributed
(Fmax) design has clocked on the FPGA. The throughput 128 slices.
delivered by the proposed design and its efficiency which iii) Lastly, normalization factor 1.22 (550/450) has been
is calculated as throughput per slice (TPS). Mega Hertz used for multiplication for the normalization to the
(MHz) is the unit for calculating Maximum frequency of operating frequencies attained by these FPGAs in
operation. It is the maximum value recorded when design is order to equalize the frequency of operation. This is
implemented on a specific FPGA. Throughput, T is recorded because the highest frequency of operation for these
in mega-bits per second (Mbps) and presented via equation FPGAs is 450 MHz, but the maximum frequency of
(1) operation for Virtex-5 and later models is 550 MHz.
B×F
T= (1)
N The normalized frequency, throughput, and TPS are
where F is the frequency at which the FPGA operates, N is computed for FPGAs of the older generation—that is, those
the number of clock cycles used to encrypt or decrypt the manufactured prior to Virtex-5—using this normalization
entire block of data, and B is the number of bits processed criterion. We can observe the contrast between the intended
at a time that makes up the block size. Equation is used to design and the designs found in published works.
determine TPS, efficiency, and throughput per slice.
Fig 8 presents the resource consumption of all the
designs. The design by [14] is the most resource con-
T strained. While [12] are second most constrained imple-
T PS = (2)
R mentation. Our design is fourth among these designs in
terms of resource consumption. The results are depicted the
Here, R represents the total number of resources—that equivalent slice calculations. Hence, the BRAM occupancy
is, slices or LUTs—that the design uses when it is imple- increases the number of slices. But being the part of the
mented on a certain FPGA. It stands for slices here. Effi- FPGA architecture, their use increases the utilization of
ciency offers a clearer picture for the precise investigation available resources which would be wasted otherwise. This
of how the design uses resources. We have synthesized the use of BRAMs helps the design to achieve high operating

https:// journal.uob.edu.bh
930 Sumit Singh Dhanda, et al.: An FPGA Implementation of Lightweight-AES for IoT Devices

TABLE II. COMPARISON WITH DIFFERENT DESIGNS WITH


NORMALIZATION

Equivalent Fmax T
Work FPGA Slice + BRAM
Slices (MHz) (Mbps)
885 885 103.3 300.4
[35] ZY V-5
4992 4992 116 1350
[36] NB V-5 556 556 256 712.3
[14] R Sp-3 163+3 355 71 208
[12] CG Sp-2 222+3 414 60 166
123.464
[37] UL Sp-3 287+3 479 294.4
(101.2)
167.14
[38] NK V-4 2281 1141 -
(137)
28.06
[39] NM V800-4 4452 2226 29
(23)
[40] LO V-E 3580 1790 - 157.07
Figure 8. Comparison of Resource Consumption among Designs
[41] RBH V-5 69+3 453 257 747
This work V-5 459+9 1611 220 2821
253.15
Sp-3 619(/2) + 10 950 3240.32
(207.5)

TABLE III. COMPARISON OF NORMALIZED THROUGHPUT


AND TPS AMONG DIFFERENT DESIGNS

TPS
T
Work (Mbps/
(Mbps)
Slice)
300.4 0.339
[35] ZY
1350 0.270
[36] NB 712.3 1.28
[14] R 208 0.70
[12] CG 166 0.32
[37] UL 294.4 0.61
Figure 9. Throughput comparison among designs [39] NM 29 0.013
[40] LO 157.07 0.08774
[41] RBH 747 1.65
frequencies. This work V-5 2821 1.75
Fig 9 depicts the comparison of the designs on the Sp-3 3240.32 3.414
basis of throughput which is measured in megabits per
second (Mbps). Both implementations of the proposed
design provide the best throughput among presented designs
with 2821 and 3240 Mbps. It shows that in terms of
performance proposed design provides superior throughput.
Fig 10 shows the comparison on the basis of maximum
frequency of operation. These all are normalized results
and designs by [41] and [36] occupies the first and second
place in terms of maximum frequency. The proposed design
occupies third place on Spartan-3 FPGA while fourth place
is for the Virtex-5 FPGA implementation of this design.
Hence, design performs satisfactorily on this parameter.
In addition to the comparison with the current designs
The suggested design and the most recent lightweight
ciphers suggested for the IoT needs are contrasted in
Table IV. Table IV presents a comparison among different
lightweight ciphers. These primitives are chosen for com- Figure 10. Frequency comparison among designs
parison on the basis of implementation of slices, frequency

https:// journal.uob.edu.bh
Int. J. Com. Dig. Sys. 16, No.1, 925-934 (Aug-2024) 931

TABLE IV. A COMPARISON WITH OTHER LIGHTWEIGHT


PRIMITIVES

Design FPGA Slices Fmax Throughput (Mbps) TPS (Mbps/slice)

PRESENT [42] V-6 157 186.3 372.6 2.37


LED [43] V-7 217 169.09 338.18 1.55
HIGHT [43] V-7 252 372.3 744.6 2.950
SIMON [44] V-7 95 219 292 3.07
XTEA [45] K-7 228 345.9 26.42 0.115
V-6 283 166 241.4 0.85
LC-FARESS [30] V-7 277 310 450.9 1.62
K-7 271 330 480 1.77
[46] V-6 6577 170 2188.7872 3.33
V-7 666 213 2734.5792 4.11
K-7 536 213 2734.5792 5.10
THIS WORK V-7 489+5 240.269 3075 6.28924
K-7 460+5 274.907 3518 7.64958

of operation, throughput and TPS. As can be observed Figure 11. TPS comparison among designs
from the table IV, proposed design is not as constrained in
implementation as the other ciphers. It consumes maximum
resources among all the ciphers [47]. The main reason its low latency. Normally, IoT applications are associated
behind this is the block size processed in these ciphers. Pro- with small devices. These devices are either sensors or
posed lightweight-AES algorithm processes 128-bit block actuators which transmits information on very small rate.
in comparison to the 64-bit block size of LED, XTEA, Many IoT applications such as surveillance etc. transmits
SIMON and HIGHT [48]. Small block size means that the data at higher rate. Hence, to ensure the encryption of
size of the ciphers and processing time will be small. But such high information one has to have high throughput for
iterative architecture of AES-32, data path compression the encryption scheme like the proposed design.
reduces the area required. Use of BRAMs further minimize
the resource consumption. AES-32 has been compared with Table IV presents the comparitive performance eval-
the design in [48] which does not utilize BRAMs and s-box uation amongst the different lightweight ciphers which
has been implemented in galois field. It clearly highlights is depicted in Fig 11. These primitives are chosen for
that proposed design achieves the maximum utilization of comparison based on implementation of Slices, frequency of
available FPGA resources. Comparison with [48] further operation, throughput and TPS. As observed from the table
highlights that implementation of s-box in BRAMs results IV, proposed design is not as constrained in implementa-
in 50 to 90% improvement in the throughput. The num- tion as the other ciphers. It consumes maximum resources
ber of cycles remained 10 due to a separate s-box for among all the ciphers. Proposed lightweight-AES algorithm
key-schedule. Hence, AES-32 is able to achieve higher processes 128-bit block in comparison to the 64-bit block
throughput with small area. AES-32 can be operated in size of LED, XTEA, SIMON and HIGHT. Small block
nearly same frequencies. The throughput of the design size means that size of the ciphers and processing time
underlines the performance of the design. It is best among will be small. But iterative architecture of AES-32, data
all the designs in the table. TPS (efficiency) data shows path compression reduces the area required. Use of BRAMs
that the design is able to achieve the optimum utilization further minimize the resource consumption. The number of
of FPGA resources. It presents maximum performance per cycles remained 10 due to a separate s-box for key-schedule.
unit resource among all the designs. A thorough comparison Hence, AES-32 is able to achieve higher throughput with
reflects that slightly more consumption of resources by small area.
the purposed design enables it to deliver best performance AES-32 can be operated in nearly same frequencies.
among all. It helps in delivering highest throughput and The throughput of the design underlines the performance
maximum TPS. The TPS results obtained by the AES- of the design. It is best among all the designs in the
32 reflects the optimum utilization of resources of the table IV. It provides 4 to 133 time more throughput in
FPGA along with best per slice performance. The suggested comparison to these ciphers. TPS (efficiency) data shows
design may now be processed more quickly and with less that the design is able to achieve the optimum utilization
resource consumption thanks to the use of BRAMS in the of FPGA resources. It presents maximum performance per
implementation. The employment of distinct ”SubBytes” for unit resource among all the designs. The improvement in
key-schedule has contributed to delivering high throughput efficiency ranges from 2 to 66 times. A thorough compari-
and exhibiting low latency, which is another factor in the son reflects that slightly more consumption of resources by
optimal performance. Its low resource consumption makes the purposed design enables it to deliver best performance
it a good choice for a variety of Internet of Things use among all. It helps in delivering highest throughput and
cases, including surveillance, smart lighting, smart build- maximum TPS. The TPS results obtained by the AES-32
ings, and AC control. It is desirable for rapid response reflects the optimum utilization of resources of the FPGA
applications, including smart grid applications, because of along with best per slice performance. Use of BRAMS
https:// journal.uob.edu.bh
932 Sumit Singh Dhanda, et al.: An FPGA Implementation of Lightweight-AES for IoT Devices

in the implementation has enabled the faster processing [4] P. Jindal and B. Singh, “Quantitative analysis of the security
and low-resource consumption of the proposed design. The performance in wireless lans,” Journal of King Saud University-
employment of distinct ”SubBytes” for key-schedule has Computer and Information Sciences, vol. 29, no. 3, pp. 246–268,
2017.
contributed to delivering high throughput and exhibiting low
latency, which is another factor in the optimal performance. [5] ——, “Analyzing the security-performance tradeoff in block ci-
Its low resource consumption makes it a good choice for a phers,” in International conference on computing, communication
variety of Internet of Things use cases, including surveil- & automation. IEEE, 2015, pp. 326–331.
lance, smart lighting, smart buildings, and AC control. It is
desirable for rapid response applications, including smart [6] A. Echchabi, M. M. S. Omar, and A. M. Ayedh, “Factors influenc-
ing bitcoin investment intention: the case of oman,” International
grid applications, because of its low latency. Normally, Journal of Internet Technology and Secured Transactions, vol. 11,
IoT applications are associated with small devices. These no. 1, pp. 1–15, 2021.
devices are either sensors or actuators which transmits
information on very small rate. Many IoT applications such [7] N. Fips, “Advanced encryption standard (aes) fips pub 197,” Tech-
as surveillance etc. transmits the data at higher rate. Hence, nology Laboratory, National Institute of Standards. . . , vol. 2009,
to ensure the encryption of such high information one has pp. 8–12, 2001.
to have high throughput for the encryption scheme like the
[8] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact
proposed design. rijndael hardware architecture with s-box optimization,” in Interna-
tional Conference on the Theory and Application of Cryptology and
5. Conclusion and Future Work Information Security. Springer, 2001, pp. 239–254.
In this work, we have adapted AES-128 to AES-32
employing data path compression strategy. Sharing the re- [9] T. Good and M. Benaissa, “Very small fpga application-specific
sources between encryption and decryption path, the LUTs instruction processor for aes,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 53, no. 7, pp. 1477–1486, 2006.
requirement is minimized. Effective utilization of FPGA
resources has led to further reduction in the number of slices [10] K. U. Järvinen, M. T. Tommiska, and J. O. Skyttä, “A fully
and improvement in throughput over existing designs. AES- pipelined memoryless 17.8 gbps aes-128 encryptor,” in Proceedings
32 which is nearly 6.9 times smaller in comparison to loop- of the 2003 ACM/SIGDA eleventh international symposium on Field
unrolled AES-128 is more suitable for small IoT devices. programmable gate arrays, 2003, pp. 207–215.
Utilizing five on-board block RAMs overall consumption
[11] I. Verbauwhede, P. Schaumont, and H. Kuo, “Design and perfor-
of LUTs is remarkably reduced. As a result, the number mance testing of a 2.29-gb/s rijndael processor,” IEEE Journal of
of slices needed for AES implementation is reduced to 223 Solid-State Circuits, vol. 38, no. 3, pp. 569–572, 2003.
slices. On the Spartan-3 chip, the suggested design achieves
a throughput of 3.2 Gbps, although in other Xilinx FPGAs, [12] P. Chodowiec and K. Gaj, “Very compact fpga implementation
it stays between 2.2 and 3.5 Gbps. The design’s efficiency of the aes algorithm,” in International workshop on cryptographic
also confirms that it makes the best use of the available hardware and embedded systems. Springer, 2003, pp. 319–333.
resources. Each slice has a speed range of 1.75 Mbps to 7.8
[13] N. Pramstaller, S. Mangard, S. Dominikus, and J. Wolkerstorfer,
Mbps. It is a good choice for the various Internet of Things “Efficient aes implementations on asics and fpgas,” in International
use cases, including surveillance, smart lighting, smart Conference on Advanced Encryption Standard. Springer, 2004, pp.
buildings, and AC control, because to its high throughput 98–112.
and low resource requirements. For applications that require
faster response times, such smart grid applications, low [14] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, and J.-D. Legat,
“Compact and efficient encryption/decryption module for fpga im-
latency could be advantageous. BRAMS are good way plementation of the aes rijndael very well suited for small embedded
for resource reduction in FPGA. But its implementation applications,” in International Conference on Information Technol-
in gates occupies large area. In future, effort will be on ogy: Coding and Computing, 2004. Proceedings. ITCC 2004., vol. 2.
further reduction of slice consumption and developing more IEEE, 2004, pp. 583–587.
lightweight-ciphers for IoT applications.
[15] T. Järvinen, P. Salmela, P. Hämäläinen, and J. Takala, “Efficient byte
References permutation realizations for compact aes implementations,” in 2005
13th European Signal Processing Conference. IEEE, 2005, pp.
[1] S. Alnefaie, S. Alshehri, and A. Cherif, “A survey on access 1–4.
control in iot: models, architectures and research opportunities,”
International Journal of Security and Networks, vol. 16, no. 1, pp.
[16] P. Hamalainen, T. Alho, M. Hannikainen, and T. D. Hamalainen,
60–76, 2021.
“Design and implementation of low-area and low-power aes en-
cryption hardware core,” in 9th EUROMICRO conference on digital
[2] E. Y. Dari, A. Bendahmane, and M. Essaaidi, “Verification-based system design (DSD’06). IEEE, 2006, pp. 577–583.
data integrity mechanism in smart grid network,” International
Journal of Security and Networks, vol. 16, no. 1, pp. 1–11, 2021.
[17] S. Agwa, E. Yahya, and Y. Ismail, “Power efficient aes core for iot
constrained devices implemented in 130nm cmos,” in 2017 IEEE
[3] A. Ghorbel, M. Ghorbel, and M. Jmaiel, “A model-based approach International Symposium on Circuits and Systems (ISCAS). IEEE,
for multi-level privacy policies derivation for cloud services,” In- 2017, pp. 1–4.
ternational Journal of Security and Networks, vol. 16, no. 1, pp.
12–27, 2021.

https:// journal.uob.edu.bh
Int. J. Com. Dig. Sys. 16, No.1, 925-934 (Aug-2024) 933

[18] Y. Wang and Y. Ha, “Fpga-based 40.9-gbits/s masked aes with initiative,” IEEE Transactions on Very Large Scale Integration
area optimization for storage area network,” IEEE Transactions on (VLSI) Systems, vol. 30, no. 2, pp. 177–186, 2021.
Circuits and Systems II: Express Briefs, vol. 60, no. 1, pp. 36–40,
2013. [33] A. Shailaja and K. G. Ningappa, “A low area vlsi implementation of
extended tiny encryption algorithm using lorenz chaotic system,” In-
[19] M. M. Wong, M. D. Wong, A. K. Nandi, and I. Hijazin, “Con- ternational Journal of Information and Computer Security, vol. 14,
struction of optimum composite field architecture for compact high- no. 1, pp. 3–19, 2021.
throughput aes s-boxes,” IEEE transactions on very large scale
integration (VLSI) systems, vol. 20, no. 6, pp. 1151–1155, 2011. [34] D.-S. Kundi, A. Aziz, and N. Ikram, “A high performance st-
box based unified aes encryption/decryption architecture on fpga,”
[20] D.-H. Bui, D. Puschini, S. Bacles-Min, E. Beigné, and X.-T. Tran, Microprocessors and Microsystems, vol. 41, pp. 37–46, 2016.
“Aes datapath optimization strategies for low-power low-energy
multisecurity-level internet-of-things applications,” IEEE Transac- [35] Z. Yuan, Y. Wang, J. Li, R. Li, and W. Zhao, “Fpga based
tions on Very Large Scale Integration (VLSI) Systems, vol. 25, optimization for masked aes implementation,” in 2011 IEEE 54th
no. 12, pp. 3281–3290, 2017. International Midwest Symposium on Circuits and Systems (MWS-
CAS). IEEE, 2011, pp. 1–4.
[21] N. El-meligy, M. Amin, E. Yahya, and Y. Ismail, “130nm low power
asynchronous aes core,” in 2017 IEEE International Symposium on [36] N. Benhadjyoussef, M. Karmani, M. Machhout, and B. Hamdi,
Circuits and Systems (ISCAS). IEEE, 2017, pp. 1–4. “A hybrid countermeasure-based fault-resistant aes implementation,”
Journal of Circuits, Systems and Computers, vol. 29, no. 03, p.
[22] K. Kumar, K. Ramkumar, and A. Kaur, “A lightweight aes algorithm 2050044, 2020.
implementation for encrypting voice messages using field pro-
grammable gate arrays,” Journal of King Saud University-Computer [37] U. Legat, A. Biasizzo, and F. Novak, “A compact aes core with
and Information Sciences, 2020. on-line error-detection for fpga applications with modest hardware
resources,” Microprocessors and microsystems, vol. 35, no. 4, pp.
[23] N. Ahmad and S. R. Hasan, “A new asic implementation of an 405–416, 2011.
advanced encryption standard (aes) crypto-hardware accelerator,”
Microelectronics Journal, vol. 117, p. 105255, 2021. [38] N. Kamoun, L. Bossuet, and A. Ghazel, “Sram-fpga implementation
of masked s-box based dpa countermeasure for aes,” in 2008 3rd
[24] K. Shahbazi and S.-B. Ko, “Area-efficient nano-aes implementation International Design and Test Workshop. IEEE, 2008, pp. 74–77.
for internet-of-things devices,” IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 136–148, 2020. [39] N. Mentens, L. Batina, B. Preneel, and I. Verbauwhede, “E09:
An fpga implementation of rijndael: Trade-offs for side-channel
[25] M. Jahanbani, N. Bagheri, and Z. Norozi, “Lightweight imple- security,” IFAC Proceedings Volumes, vol. 37, no. 20, pp. 493–498,
mentation of silc, cloc, aes-jambu and colm authenticated ciphers,” 2004.
Microprocessors and Microsystems, vol. 72, p. 102925, 2020.
[40] L. Ordu and B. Ors, “Power analysis resistant hardware implemen-
[26] E. Mobilon and D. S. Arantes, “100 gbit/s aes-gcm cryptography tations of aes,” in 2007 14th IEEE International Conference on
engine for optical transport network systems: architecture, design Electronics, Circuits and Systems. IEEE, 2007, pp. 1408–1411.
and 40 nm silicon prototyping,” Microelectronics Journal, vol. 116,
p. 105229, 2021. [41] R. Bani-Hani, K. Mhaidat, and S. Harb, “Very compact and efficient
32-bit aes core design using fpgas for small-footprint low-power em-
[27] M. Maazouz, A. Toubal, B. Bengherbia, O. Houhou, and N. Batel, bedded applications,” Journal of Circuits, Systems and Computers,
“Fpga implementation of a chaos-based image encryption algo- vol. 25, no. 07, p. 1650080, 2016.
rithm,” Journal of King Saud University-Computer and Information
Sciences, 2022. [42] W. Zhao, Y. Wang, and R. Li, “A unified architecture for dpa-
resistant present,” in 2012 International Conference on Innovations
[28] A. M. Garipcan and E. Erdem, Design, FPGA implementation and in Information Technology (IIT). IEEE, 2012, pp. 244–248.
statistical analysis of a high-speed and low-area TRNG based on an
AES s-box post-processing technique. ISA transactions, Elsevier, [43] S. Subramanian, M. Mozaffari-Kermani, R. Azarderakhsh, and
2021. M. Nojoumian, “Reliable hardware architectures for cryptographic
block ciphers led and hight,” IEEE Transactions on Computer-Aided
[29] Y. Sao, S. S. Ali, D. Ray, S. Singh, and S. Biswas, “Co-relation Design of Integrated Circuits and Systems, vol. 36, no. 10, pp. 1750–
scan attack analysis (cosaa) on aes: A comprehensive approach,” 1758, 2017.
Microelectronics Reliability, vol. 123, p. 114216, 2021.
[44] P. Ahir, M. Mozaffari-Kermani, and R. Azarderakhsh, “Lightweight
[30] S. Sheikhpour, S.-B. Ko, and A. Mahani, “A low cost fault-attack architectures for reliable and fault detection simon and speck
resilient aes for iot applications,” Microelectronics Reliability, vol. cryptographic algorithms on fpga,” ACM Transactions on Embedded
123, p. 114202, 2021. Computing Systems (TECS), vol. 16, no. 4, pp. 1–17, 2017.

[31] X. Guo, M. El-Hadedy, S. Mosanu, X. Wei, K. Skadron, and M. R. [45] K. Tian, Fault-Resilient Lightweight Cryptographic Block Ciphers
Stan, “Agile-aes: Implementation of configurable aes primitive with for Secure Embedded Systems. Rochester Institute of Technology,
agile design approach,” Integration, vol. 85, pp. 87–96, 2022. 2014.

[32] P. Nannipieri, S. Di Matteo, L. Baldanzi, L. Crocetti, L. Zulberti, [46] S. S. Dhanda, B. Singh, P. Jindal, and D. Panwar, “A highly efficient
S. Saponara, and L. Fanucci, “Vlsi design of advanced-features fpga implementation of aes for high throughput iot applications,”
aes cryptoprocessor in the framework of the european processor

https:// journal.uob.edu.bh
934 Sumit Singh Dhanda, et al.: An FPGA Implementation of Lightweight-AES for IoT Devices

Journal of Discrete Mathematical Sciences and Cryptography, Brahmjit Singh has completed Bachelor
vol. 25, no. 7, pp. 2029–2038, 2022. of Engineering in Electronics Engineering
from Malaviya National Institute of Tech-
[47] H. Zodpe and A. Sapkal, “An efficient aes implementation us- nology, Jaipur, Master of Engineering with
ing fpga with enhanced security features,” Journal of King Saud
University-Engineering Sciences, vol. 32, no. 2, pp. 115–122, 2020. specialization in Microwave and Radar from
Indian Institute of Technology, Roorkee and
[48] S. S. Dhanda, B. Singh, and P. Jindal, “Lightweight cryptography: Ph.D. degree from GGS Indraprastha Uni-
a solution to secure iot,” Wireless Personal Communications, vol. versity, Delhi. He is with the Department
112, no. 3, pp. 1947–1980, 2020. of Electronics and Communication Engi-
neering, National Institute of Technology,
Kurukshetra working as Professor having 24 years of teaching
and research experience. He is currently serving as Dean PD and
Sumit Singh Dhanda received B.Tech and Regional Coordinator, Regional Academic Centre for Space at
M.Tech degrees in Electronics and Com- NIT Kurukshetra. He has held several administrative and academic
munication Engineering from Kurukshetra positions in NIT Kurukshetra which include Chairman ECE De-
University, Kurukshetra in 2005 and 2011 partment, Professor in-Charge Centre of Computing and Network-
respectively. He has teaching experience of ing, and Member Planning and Development Board. He was also
10 years and currently pursuing his Doctoral incharge of Siemens Centre of Excellence at NIT Kurukshetra.
Degree with ECE Department at National He has published 100 research papers in International/National
Institute of Technology, Kurukshetra, India. Journals and conferences, organized several conferences and short
He has published 10 research papers in Inter- term courses. His current research interests include 6G, Cogni-
national/National conferences. His research tive Radio, and Security Algorithms for Wireless Networks and
interests include security algorithms for Internet of Things and Mobility Management in wireless networks. He has been awarded
wireless and mobile communication. The Best Research Paper Award on behalf of ‘The Institution of
Engineers (India)’. He is the member of IEEE, Life member of
IETE, and Life Member of ISTE.

Poonam Jindal received B.E degree in


Electronics and Communication Engineering
from Punjab Technical University, Punjab
in 2003, M.E degree in Electronics and
Communication Engineering from Thapar
University, Patiala in 2005 (India). She is
working as Assistant Professor with Elec-
tronics and Communication Engineering De-
partment, National Institute of Technology,
Kurukshetra, India and completed her Doc-
toral Degree at National Institute of Technology, Kurukshetra, In-
dia. She has published 50 research papers in International/National
journals and conferences. Her research interests include security
algorithms for wireless networks and mobile communication. She
is a member of IEEE..

https:// journal.uob.edu.bh

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy