0% found this document useful (0 votes)
12 views9 pages

Hardware_Acceleration_of_SVM_classifier

This paper presents a hardware-software co-design approach to accelerate the Support Vector Machine (SVM) classifier using a Zynq-7000 SoC FPGA, addressing the high computational complexity of SVM in real-time applications. The proposed method optimizes the computationally intensive parts of SVM as a custom hardware IP core, achieving an 18x speedup compared to its software counterpart while maintaining low power consumption. The implementation demonstrates improved performance in classification tasks, particularly for datasets like Iris and breast cancer, by leveraging the parallel processing capabilities of FPGAs.

Uploaded by

pshravan123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

Hardware_Acceleration_of_SVM_classifier

This paper presents a hardware-software co-design approach to accelerate the Support Vector Machine (SVM) classifier using a Zynq-7000 SoC FPGA, addressing the high computational complexity of SVM in real-time applications. The proposed method optimizes the computationally intensive parts of SVM as a custom hardware IP core, achieving an 18x speedup compared to its software counterpart while maintaining low power consumption. The implementation demonstrates improved performance in classification tasks, particularly for datasets like Iris and breast cancer, by leveraging the parallel processing capabilities of FPGAs.

Uploaded by

pshravan123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-8 Issue-12, October, 2019

Hardware Acceleration of SVM classifier


using Zynq SoC FPGA
Vidhyapathi CM, Maheshwar Reddy M, Nikhil Reddy T, Alex Noel Joseph Raj, Kathirvelan J

 directly proportional to the number of SVs and the size of the


Abstract: Support Vector Machines (SVM) is one of the most data sets. Computation complexity becomes the major
commonly used the state-of-the-art supervised machine learning bottleneck considering the real time embedded system based
algorithm for various classification problems. It provides high applications such as object classification, face recognition,
accuracy rate compared to other classification algorithms.
automotive and embedded vision. On an embedded system
However, When SVM is modelled only using Software, it is a time
consuming algorithm due to its high computational complexity. platform which imposes tight constraints such as limited
This makes the algorithm to be not suitable for embedded real resources and area and power consumption, software
time applications. We propose a new hardware software co-design modelling of SVM classification becomes difficult with more
approach to achieve the real time performance by accelerating the number of SVs. Also, modelling a time-consuming algorithms
computationally intensive classifier part of the algorithm as a such as SVM for large-scale problems only on software will
custom hardware Intellectual Property (IP) core. In this paper, a increase time complexity when datasets are large. Hence there
novel Support Vector Machine (SVM) linear classifier is modelled is a need to analyze the computationally intensive part of the
as a custom hardware Intellectual Property (IP) core using High
SVM and letting it to run on a dedicated hardware will speed
Level Synthesis (HLS). The developed IP core is optimized for
latency and hardware resource utilization by applying various up the algorithm at the cost of increased hardware. This idea
directives of HLS tool. The synthesis results of the IP core for Skin motivated the researchers to accelerate the SVM algorithm
segmentation dataset is reported. The proposed hardware using graphics processing units (GPUs) [3] and FPGAs
software co-design approach is implemented in real time on [8]-[10].However GPUs suffers with more power
Zynq-7000 XC7Z020 System on Chip (SoC) field programmable consumption [11] and is not a preferable choice for embedded
gate arrays (FPGA). A detailed comparative results of proposed applications. Presently, FPGAs based heterogeneous
hardware software co-design approach and the complete software computing platforms such as Zynq-7020 and customized
approach is reported in this work for Iris and Breast cancer
hardware accelerators are the preferred choices [12] that
dataset. A promising result of 18x speedup is achieved using SVM
classifier hardware IP compared to is software counterpart. consume less power and can be developed as a small system.
Also, FPGAs provide high level of parallelism in processing
Keywords: SVM, Hardware-software co-simulation, HLS, IP, which cuts down the computing time of a time complex
SoC, FPGA. algorithm. FPGAs in many applications have shown
performance gain over general-purpose processors [13-17].
I. INTRODUCTION In this paper we have proposed a novel
hardware-software co-design approach in which the
Support Vector machine is used for classification computationally intensive vector dot product multiplication
part of SVM on a custom hardware and rest of them on the
applications such as image classification, object detection,
software. The proposed hardware software co-design is
speech recognition, medical diagnosis and others [1, 2, and 3]
implemented in real time using the Zynq-7020 SoC FPGAs.
with good accuracy. The SVM implements the two phases of
Machine Learning: Learning the data and classifying the
II. RELATED WORK
test-data based on the learning data. During the training phase
of the SVM classifier, it develops a model, which is used to All Research works until date have aimed at
classify future dataset by making use of the Support Vectors, completely developing the classifier using HDL on FPGA.
which were calculated during the training phase. The Support Pipelining technique [17, 18, and 19] was used in previous
Vectors are then used to predict the class of the given test data designs which exploit the parallel processing of the FPGA
instances. The SVM classifier as compared to other classifiers which led to increase the throughput of the classification
has shown better accuracy rates for many applications process. Complete Software based modelling of SVM in
[4-7].However, the computation complexity of the SVM is many embedded applications such as pedestrian [20], [21],
vehicle view detection [22] and face detection [23], [24] has
Revised Manuscript Received on October 10, 2019 shown good accuracy rates at the cost of limited performance.
Vidhyapathi CM, SENSE Dept., Vellore Institute of Technology, Hence many researchers was motivated in accelerating the
Vellore, India. Email: vidhyapathi.cm@vit.ac.in SVM algorithm through general-purpose processors, digital
Maheshwar Reddy, SENSE Dept., Vellore Institute of Technology,
signal processors (DSPs) to achieve better performance on
Vellore, India. Email:maheswar.mannur@gmail.com
Nikhil Reddy, SENSE Dept., Vellore Institute of Technology, Vellore, these hardware platforms.
India. Email: snthota1997@gmail.com
Alex Noel Joseph Raj, Department of Electronics Engineering, Shantou
University, China. Email: jalexnoel@stu.edu.cn
Kathirvelan J, SENSE Dept., Vellore Institute of Technology, Vellore,
India. Email: j.kathirvelan@vit.ac.in

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2280 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA

In the work [25], the critical part of the algorithms obtained from training phase are then used in the
were modelled using hardware and the rest in software. In classification phase to classify new data.
work [26], an attempt was made to accelerate the SVM using
a low power low cost 8 bit PIC microcontroller considering
the limited memory and hardware resources. Graphics
processing units (GPUs) was used in the work [27] due to the
parallel nature of execution to improve the performance of
SVM. However, the fixed hardware structure, need of
efficient programming and high power consumptions of
GPUs are the major drawbacks while using them for
embedded systems.
Considerable work has been carried out by
researchers in recent years, in accelerating the training and Fig. 1. Support Vector classes and hyperplane
classification of SVM using custom reconfigurable
computing hardware mostly on FPGAs. An analog-digital A training dataset is initially used to model the decision
mixed SVM processor was proposed in the work [28]. In function. This training dataset consists of input data vectors
[29], a modified SVM training algorithm and the and its corresponding output class. The training process
corresponding architecture was proposed. Small scale produces SVs which represent the entire dataset and can be
implementation of SVM with reduced number of SVs were used to classify any new data. The training or learning part is
proposed in [30], [31] and [32].Also these developed mostly an offline task while the classification is mostly online
algorithms were strictly application specific and that were i.e. performed on real-time data. This classification phase is a
cannot be extended for other problems. In [33] and [34], the computationally expensive task. Classification is linearly
hardware acceleration of vector operations of SVM was dependent on the test data size, its features and the number of
analysed and better performance was reported. Furthermore, Support Vectors. In such cases, there arises a need for
these work shown the importance of hardware accelerations acceleration of the classification process.
using arithmetic and logic units (ALUs), multipliers and
vector processing elements to achieve real time performance. B. Hardware acceleration of SVM classifier using
All the related work was carried out keeping a specific Zynq-7020 FPGA
application in focus. The proposed hardware for acceleration Our aim is to make the time consuming part of the
of SVM was not generic and was strictly application specific. algorithm to run on the FPGA whereas the rest of the
In our work, we would like to propose a generic hardware algorithm will be run on the Processing System of the Zynq
software co-design approach. At the same time to provide SoC. To achieve this we make use of the Xilinx Vivado HLS
flexibility in optimising the proposed custom hardware part of tool, Vivado Design Suite and Xilinx SDK, Using the Vivado
the SVM based on the specific application dataset. HLS we generate the Verilog or VHDL code by Synthesizing
To the best of our knowledge and survey, we believe the C code or C++ code. To investigate the time consuming
that there exists no technique that makes use of part the complete SVM is modelled in C code by giving the
hardware-software co-simulation technique exploiting the inputs as the model file. The predicted files are generated as
architecture of Zynq-7020.In addition, previous works were the output file. Software profiling was done to understand the
implemented in hardware definition language (HDL) alone, time consuming part of the SVM. Further we measured the
which was coded in Verilog or VHDL. This increases the time taken by the function which does the vector
design time for an application. We have made use of Xilinx multiplication of the input data vectors which is time
tools such as Vivado HLS, which synthesizes C or C++ code consuming part of the SVM algorithm, we measured the time
to Verilog code and drastically reduces the design time for the to compare it later with the time taken on hardware alone.
application. The software modelled SVM has been verified by
modifying the existing SVM-light library. We successfully
III. OVERVIEW OF THE PROPOSED classified different datasets such as Iris and breast cancel data
METHODOLOGY sets and got high accuracy rates. To simplify the classification
process and reduce the load of the computationally expensive
A. SVM Background process, we proposed a scalable, efficient hardware-software
Support Vector Machine (SVM) was first introduced by co-design approach, which loads the SVs, test data and the
Cortes and Vapnik in 1995 [35], [36], and is based on the weights into an FPGA fabric to exploit its parallelization
concept of a decision boundary that separates two different abilities. The burden on the CPU and the CPU processing
classes of data in order to discriminate classes with high time got reduced due to the parallel computation on the FPGA
accuracy. SVM is a supervised ML (Machine Learning) compared to the sequential execution of a traditional CPU.
classifier tool. They provide good performance for regression Although SVM can be implemented in a variety of kernel
and classification tasks. A separating hyperplane is tricks for linearly non-separable data, the basic core of an
constructed in the training phase by using an input training SVM classification phase is the vector multiplication of the
data set containing data samples. The hyperplane that best decision function. Therefore, we chose to implement the
separates the samples belonging to the two classes is called a linear kernel of SVM to achieve hardware acceleration.
maximum-margin hyperplane that forms the decision
boundary. The class samples that are on the boundary are
called Support Vectors (SVs) as depicted in Fig.1. These SVs

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2281 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019

We focus specifically on the acceleration of the The models are trained using the Windows SVM-light
classification phase by implementing the decision function, it application for later classification on hardware.
makes use of factors found out during training phase namely Cross-validation technique is used during the training phase
alpha, bias and feature values , the number of support Vectors to achieve a higher accuracy. Due to the limited runtime
defined also determines the decision function which is used memory, the datasets were reduced in dimension by using
for classifying a given test data. The Eq. (1) is given by scaling and normalization techniques.
(1) The SVM-light was initially compiled and tested using the
The above equation is implemented on FPGA by making use GCC compiler. It was later modified using the Visual Studio.
of the High level synthesis method which creates the IP for the Fig. 3. Shows the training process for the Fischer’s Iris dataset
top level function that implements the above equation. The using GCC compiler.
decision function is divided into three equations.
(2)
The above Eq. (2) multiplies the alpha value for each support
vector with the respective number of features of each support
vector and stores them in variable AC as shown above.
(3)
By making use of AC found in Eq. (2), the variable D is the
distance which is found by multiplying the AC value of each
feature with the test data feature values as given in Eq. (3).
(4)
The calculated distance value is then subtracted from the bias
Fig. 3. Training Process for Iris Dataset using GCC
value which was defined earlier during the training phase and
compiler
the difference is used by the decision function to classify the After training, a model file is created which contains the
test data to its respective class as given in Eq. (4). information regarding the training dataset like the size of the
The method we implement involves both software and data used for training. It also contains information like the
hardware co-design, we are making use of the hybrid Zynq number of Support Vectors and their values, number of
Architecture. The Zynq System [37] on Chip (SoC) consists features, the bias value and the type of kernel used for
of processing system (PS) with ARM cortex processor and a training. The first few lines of the model file are shown in Fig.
programmable logic (PL) on the hardware which is the FPGA. 4.

C. Training SVM models on datasets


Before creating the SVM models on the Windows
application, the required datasets need to be obtained. The
datasets used in Machine Learning are mostly available either
through Python libraries or in CSV formats. The above
formats are difficult to be parsed in C due to the lacking of
resources. Therefore, a Python script is written to download
the datasets using import the file as CSV. The dataset is then
normalized if required. The dataset is then separated into Fig. 4. Model File
training and testing data. The training and testing data is
D. Vivado HLS and SVM classifier Custom Hardware
written into files in the format they can be read by the
Intellectual Property (IP) core
SVM-light code. An example of how the SVM-light format of
the dataset looks like is show in Fig. 2. Vivado High Level Synthesis (HLS) [38] uses C level
source code to create an RTL implementation. The HLS tool
extracts the control and dataflow from the source code. The
design is implemented based on user-defined directives. This
methodology allows for smaller, faster and more optimal
design. High Level Synthesis is done mainly through
scheduling and binding. Scheduling is mapping the sequential
operations in the C code onto clock cycles. Binding is the
mapping of the operations to the hardware cores. Fig.5. gives
Fig. 2. SVM-light dataset format an outline of the steps involved in C code to RTL IP
The first column represents the class of the instance. Since the integration.
classification type is binary, the classes are represented as 1
and -1. The subsequent columns represent the value of the
feature with its label. For example, in the first line, the first
column, 1 means that it belongs to the first class. In this case
of Skin Segmentation dataset, it means that the instance
represents skin. The second column 1:0.51 means the first
feature i.e. the colour Blue has a value of 0.51. The third
column represents Green and the last column colour Blue.

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2282 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA

C allowed us to use HLS methodology to speed up the IP


creation process. The training part of the classification was
done on Windows application using the basic parameters and
the linear kernel. This model, which contains the support
vectors and the information regarding the kernel and is later
used by the IP to classify the test data.
E. Block design in Xilinx Vivado Design Suite
The Xilinx Vivado 2018.2 Design Suite was used to design
and implement our hardware design on the Zed board i.e.
Zynq-7020 Evaluation Board. The exported RTL design from
Vivado HLS is added as
Fig. 5 HLS Design Flow for Algorithmic C to RTL IP a repository in Vivado. Upon doing so, Vivado discovers the
integration [38] IP core. This IP core is added to the block design along with
all the other blocks. The first one is the Zynq7 Processing
The designer have complete control over the optimizations to System. Then the IP core, DMA and AXI Timer are added.
be applied through directives like pipelining, unrolling loops, All the necessary connections among PS, IP core, DMA,
array partitioning, interfaces, etc. Vivado HLS supports Timer and AXI interconnects are made. The S2MM and
higher level languages like C, C++ and SystemC provided MM2S of the DMA and the out_stream and in_stream of the
that it is statically defined at compile time. If it cannot be IP are interconnected to ensure the streaming of input and
defined until runtime, the code cannot be synthesized. In output data to and from between the IP core and PS.
addition, there is support for the use of float and double in the The IP core in the Zynq-7020 PL is connected to the PS ARM
code. through an ACP (Accelerator Coherency Port). This is
The HLS based custom created hardware IP block is preferable over the HP (High Performance) port as it reduces
shown in Fig. 6. The top level function contains the time the burden of caching. This is because the ACP is a 64-bit
consuming vector multiplication code of the SVM algorithm. slave interface on the SCU (Scoop Control Unit).This ensures
The simulation and synthesis of the generated IP core is cache coherent transmissions between the PL and PS,
performed using the input and model data as test bench files. providing a direct low-latency path. All the transmissions at
For achieving parallel processing, different directives are the ACP happen through the DMA core. Finally, the clock
carefully examined and applied to achieve better performance frequencies are set. Connection automation is run to connect
with reduced hardware resources. Pipelining and unrolling are the unconnected clocks and resets. Then the block is
the specific directive from HLS which was used to optimise evaluated to make sure all the necessary ports are connected.
the “for loop” of the top-level function. After the successful C The block design connecting the Classify IP core to the PS
simulation and Synthesis, we generate RTL co-simulation and and Timer can be seen in Fig. 7. The in_stream and
export the RTL into our custom IP block which creates a out_stream between the IP core and the Zynq PS move
solution containing the equivalent Verilog code of the top through the AXI DMA. The IP core receives the in_stream as
level function. MM2S from DMA. The IP core sends the out_stream to the
DMA as S2MM.
To measure the time taken to run the classifier IP in the PL, an
AXI timer IP core was used while XTimer module was
exploited to measure the clock cycles taken to run the same
application on the PS part of the Zynq SoC. It was made sure
that the models chosen for evaluation on Xilinx SDK tool
were not so large. Although the application could be run for
the large datasets on the PL part, the DDR3 memory’s
limitation meant that it could not be run on the ARM
Fig. 6. Custom Hardware SVM classifier IP core processor. Therefore, smaller datasets were chosen to
The C-based SVM classifier called SVM-light was used evaluate the time measurements.
to implement it as an IP on Zynq-7020. Its implementation in

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2283 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019

Fig.7. Hardware (Custom SVM classifier IP) and Software (ARM Processing System) co-block design

The next phase was for implementation of the C algorithm in


The final step is to generate the bit-stream from block design Vivado HLS, for designing the top level function and the test
and use that in Xilinx SDK to dump the bit-stream onto the bench file and the data files. This generates the synthesis
Zynq FPGA and verify the algorithm. report and the equivalent Verilog code for the top-level
function defined by the user. By making use of directives such
IV. RESULTS AND DISCUSSIONS as loop unrolling and pipelining we were able speedup the
custom SVM classifier IP by reducing the latency through
The SVM algorithm was first run on the modified version of
parallelism. Table II depicts the use of directives and the
the SVM-light C library using the GCC compiler from the
respective latency obtained for skin data set.
windows command prompt for different datasets such as Iris
The first column of Table II represents the HLS directive
dataset, handwritten digits, skin segmentation, breast Cancer
and adult income datasets. Fig.8. shows the testing process for applied on the IP. The remaining columns are the synthesis
the Fischer’s Iris dataset using GCC compiler. The accuracy results generated by Vivado HLS. The first row of the table is
of classification on the testing dataset is displayed. after applying the basic interface directives (AXI4-Stream
and AXI4-Lite). The subsequent rows are for all the other
directives except the interfaces. After applying various
directives to the design after interfaces, it can be seen that
Fig.8. Testing Process for Iris Dataset using GCC there has been a significant improvement in the latency and
compiler throughput. However, the improvement in latency and
After the testing process, the code creates a predictions file,
throughput has also resulted in utilization of additional
which contains all the values before applying the sign function
of the decision function. These values are used for assigning hardware resources. The latency is seen to be significantly
the class which the test instance belongs to. The prediction improved from 1,200,803 to 122,562 clock cycles, which is
file for Iris dataset is shown in Fig. 9. The first five instances more than 9 times decrease. However, more number of
belong to the first class and the rest of them belong to the resources were allocated which is an apparent trade off when
second class. improvement the latency. The large number of resources on
the Zynq7 SoC is a reason why it is being used in this work to
improve latency while being able to allocate the additional
resources. However, we cannot simply go using a very large
number of resources the reason being that the power
consumption has to be kept at an optimum value. Therefore, a
compromise has to be made among Latency, Hardware
Resource Utilization and Power Consumption.
Fig. 9. Predictions File
The Table I depicts the classification accuracy achieved for
each datasets.
A. Hardware resource utilization

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2284 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA

Table I. Classification Accuracy of different datasets

Dataset Number of Training Testing Number of Support Vectors Classification Accuracy (%)
Used features data size Data Size
Iris 4 90 10 30 100
MNIST 154 69,000 1,000 19,530 89.4
(Digits)
Skin 3 22,051 24,506 5,211 92.68
Breast 30 512 57 137 89.47
Cancer
Adult 123 30,956 1,605 11,066 82.74
Income
Prediction

Table II. Synthesis report for Skin Segmentation dataset

Directives Latency( 1 Throughput BRA DSP BLOCKS Flip Flop LUT


applied cycle = 10 ns) M
Basic 1,200,803 1,200,804 256 5 1,030 1,647
Interfaces
(AXI
stream and
Lite)
Array 784,197 784,198 192 11 1,373 2,174
Partitioning
(Cyclic
factor 3)
Partial 147,067 147,068 192 10 1,713 2,314
Pipelining
of Outer
Loops
Unrolling 122,562 122,563 192 10 1,701 2,348
Inner Loops
and
Pipelining
Outer
Loops

Table III: Hardware resource utilization (Zynq-7010) of SVM model on different data sets

Skin Segmentation data


Hardware Component Iris data set Breast Cancer dataset
set

1,701 893 20,572


FF
(1%) (2%) (19%)

192 0 32
BRAM
(68%) (0%) (11%)

2,348 2,802 16,756


LUT
(4%) (5%) (31%)

10 12 12
DSP48
(4%) (5%) (5%)

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2285 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019

Hardware resource utilization of SVM model on Skin B. Processing Speed and Time
Segment dataset, Iris dataset and Breast Cancer dataset is
reported in Table III. The hardware resource is mentioned in After Synthesis in Vivado HLS, using the Vivado Design
the first column and the subsequent columns show the number suite the proposed hardware software co-design is validated
of the on-chip components being used and the percentage of and the final bit-stream is loaded into FGPA. Using Xilinx
the available resources. All these models are the results after SDK, we developed an application to measure the processing
applying the best optimization directives to achieve speed and the time taken by the algorithm for each of the
improvements in latency and resource utilization. As seen different datasets when run on software and compared those
from the Tables 3, it is evident that the percentage of BRAMs with hardware results. To measure the time taken to run the
used is high in the Skin Segmentation and Iris data set model. classifier IP in the PL, an AXI timer IP core was used while
This is because arrays are implemented as BRAMs and since XTimer module was exploited to measure the clock cycles
the support vectors and the test data instances are stored and taken to run the same application on the PS part of the Zynq
passed as arrays, they require a high number of BRAMs for SoC. Initially a default clock frequency of 250 MHz was
their storage. Here the latency is for the total number of clock applied to both the PS and PL of the Zedboard. The following
cycles that were used to complete the process. There is an table depicts the clock cycles and the processing time taken by
improvement in the latency when loop unrolling and respective datasets. The results for this clock frequency
pipelining are applied as directives. However, the hardware configuration are provided in Table IV.
resources as we can see are more utilized when applying loop
unrolling as it allocates more DSPs for the multiplications.

Table IV. Clock cycles and Processing Time comparison

Model FPGA( 1 cycle= 10ns) ARM( 1 cycle= 10ns) Speedup Factor


Model 1 191 Cycles 3455 Cycles 18.09
0.76 µs 13.82 µs
Model 2 6759 Cycles 131,057 Cycles 19.39
27.04 µs 524.23 µs

Table V. Maximum frequency comparison

Data Model FPGA ARM Speedup factor


Model 1 194 cycles 1,269 cycles 6.54
0.77 µs 1.90 µs 1.46
Model 2 6,752 cycles 45,779 cycles 6.78
27.08 µs 68.66 µs 2.53

Table VI. Speedup Factor comparison of different frequencies

Frequency Clock cycles speedup Processing Time Speedup


250 MHz 19.39 19.39
250 MHz and 666.67 MHz 6.78 2.53

Here the model 1 is trained for Iris Data and model 2 was factor is obtained as compared to when operating at maximum
trained for Breast cancer data. The Iris dataset classification frequency.
application for Zynq SoC in the SDK tool took 191 clock
cycles to run on the PL. This also constitutes the streaming of V. CONCLUSION
the data to the IP through the DMA. However, the same
application on the ARM processor took 3455 clock cycles.
This work initially set out to implement the ML classifier,
Therefore the hardware IP achieved more than 18x the
SVM on an FGPA. The reason was to develop an efficient and
acceleration compared to the similar C coded function on the
faster generic embedded classifier. Existing literature on this
ARM CPU. In addition, results were obtained by operating
topic was studied and the gaps in the research were identified.
the PS and PL at their maximum frequencies i.e. 666.67 MHz
Therefore, we focused on implementing an embedded
for the PS and 250 MHz for the PL. The results for this
classification system with high accuracy, while keeping in
configuration are provided in Table V.
mind the constraints of embedded systems. Various design
A comparison was made finally between the PS and PL using
and development tools were exploited to develop such a
their maximum operating frequencies and the default used
system.
frequencies to conclude which is a better alternative for the
processing time and speed, the Table VI concludes this.
It is clear from the Table VI that when both the processor and
the FPGA operate at the same frequency a better speed up

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2286 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA

An accelerator for SVM IP was designed which implemented 12. M. P. Véstias, "High-Performance Reconfigurable Computing
Granularity," Encyclopedia of Information Science and Technology,
an online, efficient and scalable model. The proposed model
pp. 3558-3567, 2015.
is less complex and more hardware-friendly than a software 13. M. Wielgosz, E. Jamro, D. Zurek, and K. Wiatr, "FPGA
implementation of the same algorithm. Five models were Implementation of The Selected Parts of The Fast Image
initially trained and implemented on the SVM-light Segmentation," in Studies in Computational Intelligence vol. 390, ed,
application. The datasets for Fischer’s Iris, MNIST 2012, pp. 203-216.
14. T. Saegusa, T. Maruyama, and Y. Yamaguchi, "How Fast is an
handwritten digit recognition (modelled as even-odd FPGA in Image Processing?," in International Conference on Field
classification), Breast Cancer prediction, and Skin Programmable Logic and Applications, FPL 2008 2008, pp. 77-82.
Segmentation and Adult Income prediction were modelled in 15. H. M. Hussain, K. Benkrid, and H. Seker, "The Role of FPGAs as
Windows. Three of the above-mentioned five models were High Performance Computing Solution to Bioinformatics and
Computational Biology Data," AIHLS2013, p. 102, 2013.
also simulated and synthesized in HLS however there is a 16. K. Nagarajan, B. Holland, A. D. George, K. C. Slatton, and H. Lam,
limitation for implementing larger datasets such as MNIST "Accelerating Machine-Learning Algorithms on FPGAs using
(Hand-written digits) on hardware FPGA as the resources for Pattern-Based Decomposition," Journal of Signal Processing
the large dataset were not sufficient and would require Systems, vol. 62, pp. 43-63, 2011.
17. L. Woods, J. Teubner, and G. Alonso, "Real-Time Pattern Matching
additional hardware. Regarding acceleration, a maximum with FPGAs," in 2011 IEEE 27th International Conference on Data
factor of 20x was achieved on Zynq FPGA compared to the Engineering (ICDE), 2011, pp. 1292-1295.
similar system running on software i.e. the ARM PS. At 250 18. A. Eklund, P. Dufort, D. Forsberg, and S. M. LaConte, "Medical
MHz, the processing time of 0.77 µs was achieved on the Image Processing on The GPU–Past, Present and Future," Medical
Image Analysis, vol. 17, pp. 1073-1094, 2013.
FPGA, compared to the 1.90 µs obtained on the ARM 19. S. Asano, T. Maruyama, and Y. Yamaguchi, "Performance
processor. Comparison of FPGA, GPU and CPU in Image Processing," in
Classification accuracy was to be kept same while achieving International Conference on Field Programmable Logic and
hardware-friendly system. Most of the existing Applications, 2009. FPL 2009, 2009, pp. 126-131.
20. C. Papageorgiou and T. Poggio, “Trainable Pedestrian Detection
implementations in the literature suffered from an accuracy System,” Int’l J. Computer Vision, vol. 38, pp. 15-33, 2000.
loss while trying to achieve performance improvements. 21. M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio,
However, we achieved consistent classification accuracies “Pedestrian Detection Using Wavelet Templates,” Proc. IEEE CS Conf.
throughout our work from Windows application to HLS Computer Vision and Pattern Recognition, pp. 193-199, 1997.
22. S. Agarwal and D. Roth, “Learning a Sparse Representation for Object
synthesis to Hardware implementation. Therefore, our work is Detection,” ECCV ’02: Proc. Seventh European Conf. Computer
a scalable, generic embedded SVM classifier with no loss in Vision, pp. 113-130, 2002.
accuracy meeting the critical embedded system constraints. 23. E. Osuna, R. Freund, and F. Girosi, “Training Support Vector Machines:
An Application to Face Detection,” Proc. IEEE Conf. Computer Vision
and Pattern Recognition, pp. 130-136, 1997.
REFERENCES 24. H. Sahbi, D. Geman, and N. Boujemaa, “Face Detection Using
1. J. Nayak, B. Naik, and H. Behera, "A Comprehensive Survey on Coarse-to-Fine Support Vector Classifiers,” Proc. Int’l Conf. Image
Support Vector Machine in Data Mining Tasks: Applications Processing, pp. 925-928, 2002.
& Challenges," International Journal of Database Theory and 25. R. Pedersen and M. Schoeberl, “An Embedded Support Vector
Application, vol. 8, pp. 169-186, 2015. Machine,” Proc. Fourth Workshop Intelligent Solutions in Embedded
2. C. J. C. Burges, “A tutorial on support vector machines for pattern Systems, pp. 1-11, 2006.
recognition,” Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 26. A. Boni, F. Pianegiani, and D. Petri, “Low-Power and Low-Cost
121–167, 1998. Implementation of SVMs for Smart Sensors,” IEEE
3. B. Catanzaro, N. Sundaram, and K. Keutzer, “Fast support vector Trans.Instrumentation and Measurement, vol. 56, no. 1, pp. 39-44,
machine training and classification on graphics processors,” in Proc. Feb.2007.
25th Int. Conf. Mach. Learn., 2009, pp. 104–111. 27. B. Catanzaro, N. Sundaram, and K. Keutzer, “Fast Support Vector
4. P. Sabouri, H. GholamHosseini, T. Larsson, and J. Collins, "A Cascade Machine Training and Classification on Graphics Processors,” Proc.
Classifier for Diagnosis of Melanoma in Clinical Images," in 36th 25th Int’l Conf. Machine Learning, pp. 104-111, 2008.
Annual International Conference of the IEEE Engineering in 28. R. Genov and G. Cauwenberghs, “Kerneltron: Support Vector
Medicine and Biology Society (EMBC), 2014, pp. 6748-6751. “Machine” in Silicon,” IEEE Trans. Neural Networks, vol. 14, no. 5, pp.
5. G. M. Foody and A. Mathur, "A Relative Evaluation of Multiclass 1426-1434, Sept. 2003.
Image Classification by Support Vector Machines," IEEE 29. D. Anguita, A. Boni, and S. Ridella, “A Digital Architecture for Support
Transactions on Geoscience and Remote Sensing, vol. 42, pp. Vector Machines: Theory, Algorithm, and FPGA Implementation,”
1335-1343, 2004. IEEE Trans. Neural Networks, vol. 14, no. 5, pp. 993-1009, Sept. 2003.
6. R. Entezari-Maleki, A. Rezaei, and B. Minaei-Bidgoli, "Comparison of 30. I. Biasi, A. Boni, and A. Zorat, “A Reconfigurable Parallel Architecture
Classification Methods Based on the Type of Attributes and Sample for SVM Classification,” Proc. IEEE Int’l Joint Conf. Neural Networks,
Size," Journal of Convergence Information Technology, vol. 4, pp. vol. 5, pp. 2867-2872, 2005.
94-102, 2009. 31. O. Pina-Ramirez, R. Valdes-Cristerna, and O. Yanez-Suarez, “An FPGA
7. J. KIM, B.-S. Kim, and S. Savarese, "Comparing Image Classification Implementation of Linear Kernel Support Vector Machines,” Proc.
Methods: K-Nearest-Neighbor and Support-Vector-Machines", Ann IEEE Int’l Conf. Reconfigurable Computing and FPGA’s, pp. 1-6, 2006.
Arbor, vol. 1001, pp. 48109-2122, 2012. 32. R.A. Reyna, D. Esteve, D. Houzet, and M.-F. Albenge, “Implementation
8. S. Cadambi et al., “A massively parallel FPGA-based coprocessor for of the SVM Neural Network Generalization Function for Image
support vector machines,” in Proc. 17th IEEE Int. Symp. Field Program. Processing,” Proc. IEEE Fifth Int’l Workshop Computer Architectures
Custom Comput. Mach. (FCCM), Apr. 2009, pp. 115–122. for Machine Perception, pp. 147-151, 2000.
9. O. Piña-Ramírez, R. Valdés-Cristerna, and O. Yáñez-Suárez, “An FPGA 33. R. Roberto, H. Dominique, D. Daniela, C. Florent, and O. Salim,
implementation of linear kernel support vector machines,” in Proc. IEEE “Object Recognition System-on-Chip Using the Support Vector
Int. Conf. Reconfigurable Comput. FPGA’s, Sep. 2006, pp. 1–6. Machines,” EURASIP J. Advances in Signal Processing, vol. 2005, pp.
10. M. Ruiz-Llata, G. Guarnizo, and M. Yébenes-Calvino, “FPGA 993-1004, 1900.
implementation of a support vector machine for classification and
regression,” in Proc. Int. Joint Conf. Neural Netw., 2010, pp. 1–5.
11. J. Fowers, G. Brown, P. Cooke, and G. Stitt, “A performance and energy
comparison of FPGAs, GPUs, and multicores for sliding-window
applications,” in Proc. ACM/SIGDA Int. Symp. Field Program. Gate
Arrays (FPGA), 2012, pp. 47–56.

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2287 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019

34. H. Peter, G. Srihari, C. Durdanovic, V. Jakkula, M. Sankardadass, E.


Cosatto, and S. Chakradhar, “A Massively Parallel Digital Learning
Processor,” Proc. 22nd Ann. Conf. Neural Information Processing
Systems (NIPS), pp. 529-536, 2008.
35. C. Cortes and V. Vapnik, “Support-Vector Networks,” Machine
Learning, vol. 20, no. 3, pp. 273-297, 1995.
36. V. Vapnik, The Nature of Statistical Learning Theory. Springer- Verlag,
1995.
37. “ZedBoard Hardware User’s Guide”, [Online],
Available:www.zedboard.org/sites/default/files/documentations/ZedBo
ard_HW_UG_v2_2.pdf
38. Vivado Design Suite User Guide High-level Synthesis (UG902 v2018.2)
June 2018.

AUTHORS PROFILE

Vidhyapathi CM received the B.E. degree in Electronics


and Communication Engineering from Anna University,
India, in 2006 and M.E. degree in VLSI Design from
Anna University, Chennai, India, in 2008. Since 2010, he
has been a member of faculty in the Department of
Embedded Technology, School of Electronics
Engineering, Vellore Institute of Technology, and Vellore, India, where he is
currently an Assistant professor (Senior). His research interests are hardware
acceleration of algorithms using FPGA, Algorithm design and system level
optimization of embedded systems and computer vision.
.

Mannur Maheshwar Reddy recently received his


B.E. degree in Electronics and Communications
Engineering from Vellore Institute of Technology
(VIT), India, in 2019. His research interests include
Computer Architecture, Algorithm Optimization and
Computer Vision.

Sai Nikhil Reddy Thota a graduate from the year


2019 in Bachelors degree of Technology in the field
of Electronics and Communication Engineering from
Vellore institute of technology. His research interests are
machine learning algorithm optimization using hardware
FPGA.

Alex Noel Joseph Raj received the B.E. degree in


Electrical Engineering from Madras University, India, in
2001, the M.E. degree in Applied Electronics from Anna
University in 2005, and the Ph.D. degree in Engineering
from the University of Warwick in 2009. From October
2009 to September 2011, he was with Valeport LTD Totnes, UK as Design
Engineer. From March 2013 to March 2017 he was with the Department of
Embedded technology, School of Electronics Engineering, VIT University,
Vellore, India as a professor. Since March 2017, he is with Department of
Electronic Engineering, College of Engineering, and Shantou University,
China. His research interests include Machine learning, Signal and image
processing, and FPGA implementations.

Kathirvelan J received the Ph.D. degree in Engineering


from VIT University in 2016. He is currently working as
Associate Professor in Department of Sensor and
Biomedical Technology, School of Electronics
Engineering in VIT University, Vellore, India. His
research interests include collision avoidance, control engineering
computing, field programmable gate arrays, handicapped aids, infrared
detectors, speech processing, and virtual instrumentation.

Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2288 & Sciences Publication

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy