Hardware_Acceleration_of_SVM_classifier
Hardware_Acceleration_of_SVM_classifier
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2280 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA
In the work [25], the critical part of the algorithms obtained from training phase are then used in the
were modelled using hardware and the rest in software. In classification phase to classify new data.
work [26], an attempt was made to accelerate the SVM using
a low power low cost 8 bit PIC microcontroller considering
the limited memory and hardware resources. Graphics
processing units (GPUs) was used in the work [27] due to the
parallel nature of execution to improve the performance of
SVM. However, the fixed hardware structure, need of
efficient programming and high power consumptions of
GPUs are the major drawbacks while using them for
embedded systems.
Considerable work has been carried out by
researchers in recent years, in accelerating the training and Fig. 1. Support Vector classes and hyperplane
classification of SVM using custom reconfigurable
computing hardware mostly on FPGAs. An analog-digital A training dataset is initially used to model the decision
mixed SVM processor was proposed in the work [28]. In function. This training dataset consists of input data vectors
[29], a modified SVM training algorithm and the and its corresponding output class. The training process
corresponding architecture was proposed. Small scale produces SVs which represent the entire dataset and can be
implementation of SVM with reduced number of SVs were used to classify any new data. The training or learning part is
proposed in [30], [31] and [32].Also these developed mostly an offline task while the classification is mostly online
algorithms were strictly application specific and that were i.e. performed on real-time data. This classification phase is a
cannot be extended for other problems. In [33] and [34], the computationally expensive task. Classification is linearly
hardware acceleration of vector operations of SVM was dependent on the test data size, its features and the number of
analysed and better performance was reported. Furthermore, Support Vectors. In such cases, there arises a need for
these work shown the importance of hardware accelerations acceleration of the classification process.
using arithmetic and logic units (ALUs), multipliers and
vector processing elements to achieve real time performance. B. Hardware acceleration of SVM classifier using
All the related work was carried out keeping a specific Zynq-7020 FPGA
application in focus. The proposed hardware for acceleration Our aim is to make the time consuming part of the
of SVM was not generic and was strictly application specific. algorithm to run on the FPGA whereas the rest of the
In our work, we would like to propose a generic hardware algorithm will be run on the Processing System of the Zynq
software co-design approach. At the same time to provide SoC. To achieve this we make use of the Xilinx Vivado HLS
flexibility in optimising the proposed custom hardware part of tool, Vivado Design Suite and Xilinx SDK, Using the Vivado
the SVM based on the specific application dataset. HLS we generate the Verilog or VHDL code by Synthesizing
To the best of our knowledge and survey, we believe the C code or C++ code. To investigate the time consuming
that there exists no technique that makes use of part the complete SVM is modelled in C code by giving the
hardware-software co-simulation technique exploiting the inputs as the model file. The predicted files are generated as
architecture of Zynq-7020.In addition, previous works were the output file. Software profiling was done to understand the
implemented in hardware definition language (HDL) alone, time consuming part of the SVM. Further we measured the
which was coded in Verilog or VHDL. This increases the time taken by the function which does the vector
design time for an application. We have made use of Xilinx multiplication of the input data vectors which is time
tools such as Vivado HLS, which synthesizes C or C++ code consuming part of the SVM algorithm, we measured the time
to Verilog code and drastically reduces the design time for the to compare it later with the time taken on hardware alone.
application. The software modelled SVM has been verified by
modifying the existing SVM-light library. We successfully
III. OVERVIEW OF THE PROPOSED classified different datasets such as Iris and breast cancel data
METHODOLOGY sets and got high accuracy rates. To simplify the classification
process and reduce the load of the computationally expensive
A. SVM Background process, we proposed a scalable, efficient hardware-software
Support Vector Machine (SVM) was first introduced by co-design approach, which loads the SVs, test data and the
Cortes and Vapnik in 1995 [35], [36], and is based on the weights into an FPGA fabric to exploit its parallelization
concept of a decision boundary that separates two different abilities. The burden on the CPU and the CPU processing
classes of data in order to discriminate classes with high time got reduced due to the parallel computation on the FPGA
accuracy. SVM is a supervised ML (Machine Learning) compared to the sequential execution of a traditional CPU.
classifier tool. They provide good performance for regression Although SVM can be implemented in a variety of kernel
and classification tasks. A separating hyperplane is tricks for linearly non-separable data, the basic core of an
constructed in the training phase by using an input training SVM classification phase is the vector multiplication of the
data set containing data samples. The hyperplane that best decision function. Therefore, we chose to implement the
separates the samples belonging to the two classes is called a linear kernel of SVM to achieve hardware acceleration.
maximum-margin hyperplane that forms the decision
boundary. The class samples that are on the boundary are
called Support Vectors (SVs) as depicted in Fig.1. These SVs
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2281 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019
We focus specifically on the acceleration of the The models are trained using the Windows SVM-light
classification phase by implementing the decision function, it application for later classification on hardware.
makes use of factors found out during training phase namely Cross-validation technique is used during the training phase
alpha, bias and feature values , the number of support Vectors to achieve a higher accuracy. Due to the limited runtime
defined also determines the decision function which is used memory, the datasets were reduced in dimension by using
for classifying a given test data. The Eq. (1) is given by scaling and normalization techniques.
(1) The SVM-light was initially compiled and tested using the
The above equation is implemented on FPGA by making use GCC compiler. It was later modified using the Visual Studio.
of the High level synthesis method which creates the IP for the Fig. 3. Shows the training process for the Fischer’s Iris dataset
top level function that implements the above equation. The using GCC compiler.
decision function is divided into three equations.
(2)
The above Eq. (2) multiplies the alpha value for each support
vector with the respective number of features of each support
vector and stores them in variable AC as shown above.
(3)
By making use of AC found in Eq. (2), the variable D is the
distance which is found by multiplying the AC value of each
feature with the test data feature values as given in Eq. (3).
(4)
The calculated distance value is then subtracted from the bias
Fig. 3. Training Process for Iris Dataset using GCC
value which was defined earlier during the training phase and
compiler
the difference is used by the decision function to classify the After training, a model file is created which contains the
test data to its respective class as given in Eq. (4). information regarding the training dataset like the size of the
The method we implement involves both software and data used for training. It also contains information like the
hardware co-design, we are making use of the hybrid Zynq number of Support Vectors and their values, number of
Architecture. The Zynq System [37] on Chip (SoC) consists features, the bias value and the type of kernel used for
of processing system (PS) with ARM cortex processor and a training. The first few lines of the model file are shown in Fig.
programmable logic (PL) on the hardware which is the FPGA. 4.
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2282 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2283 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019
Fig.7. Hardware (Custom SVM classifier IP) and Software (ARM Processing System) co-block design
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2284 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA
Dataset Number of Training Testing Number of Support Vectors Classification Accuracy (%)
Used features data size Data Size
Iris 4 90 10 30 100
MNIST 154 69,000 1,000 19,530 89.4
(Digits)
Skin 3 22,051 24,506 5,211 92.68
Breast 30 512 57 137 89.47
Cancer
Adult 123 30,956 1,605 11,066 82.74
Income
Prediction
Table III: Hardware resource utilization (Zynq-7010) of SVM model on different data sets
192 0 32
BRAM
(68%) (0%) (11%)
10 12 12
DSP48
(4%) (5%) (5%)
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2285 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019
Hardware resource utilization of SVM model on Skin B. Processing Speed and Time
Segment dataset, Iris dataset and Breast Cancer dataset is
reported in Table III. The hardware resource is mentioned in After Synthesis in Vivado HLS, using the Vivado Design
the first column and the subsequent columns show the number suite the proposed hardware software co-design is validated
of the on-chip components being used and the percentage of and the final bit-stream is loaded into FGPA. Using Xilinx
the available resources. All these models are the results after SDK, we developed an application to measure the processing
applying the best optimization directives to achieve speed and the time taken by the algorithm for each of the
improvements in latency and resource utilization. As seen different datasets when run on software and compared those
from the Tables 3, it is evident that the percentage of BRAMs with hardware results. To measure the time taken to run the
used is high in the Skin Segmentation and Iris data set model. classifier IP in the PL, an AXI timer IP core was used while
This is because arrays are implemented as BRAMs and since XTimer module was exploited to measure the clock cycles
the support vectors and the test data instances are stored and taken to run the same application on the PS part of the Zynq
passed as arrays, they require a high number of BRAMs for SoC. Initially a default clock frequency of 250 MHz was
their storage. Here the latency is for the total number of clock applied to both the PS and PL of the Zedboard. The following
cycles that were used to complete the process. There is an table depicts the clock cycles and the processing time taken by
improvement in the latency when loop unrolling and respective datasets. The results for this clock frequency
pipelining are applied as directives. However, the hardware configuration are provided in Table IV.
resources as we can see are more utilized when applying loop
unrolling as it allocates more DSPs for the multiplications.
Here the model 1 is trained for Iris Data and model 2 was factor is obtained as compared to when operating at maximum
trained for Breast cancer data. The Iris dataset classification frequency.
application for Zynq SoC in the SDK tool took 191 clock
cycles to run on the PL. This also constitutes the streaming of V. CONCLUSION
the data to the IP through the DMA. However, the same
application on the ARM processor took 3455 clock cycles.
This work initially set out to implement the ML classifier,
Therefore the hardware IP achieved more than 18x the
SVM on an FGPA. The reason was to develop an efficient and
acceleration compared to the similar C coded function on the
faster generic embedded classifier. Existing literature on this
ARM CPU. In addition, results were obtained by operating
topic was studied and the gaps in the research were identified.
the PS and PL at their maximum frequencies i.e. 666.67 MHz
Therefore, we focused on implementing an embedded
for the PS and 250 MHz for the PL. The results for this
classification system with high accuracy, while keeping in
configuration are provided in Table V.
mind the constraints of embedded systems. Various design
A comparison was made finally between the PS and PL using
and development tools were exploited to develop such a
their maximum operating frequencies and the default used
system.
frequencies to conclude which is a better alternative for the
processing time and speed, the Table VI concludes this.
It is clear from the Table VI that when both the processor and
the FPGA operate at the same frequency a better speed up
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2286 & Sciences Publication
Hardware Acceleration of SVM classifier using Zynq SoC FPGA
An accelerator for SVM IP was designed which implemented 12. M. P. Véstias, "High-Performance Reconfigurable Computing
Granularity," Encyclopedia of Information Science and Technology,
an online, efficient and scalable model. The proposed model
pp. 3558-3567, 2015.
is less complex and more hardware-friendly than a software 13. M. Wielgosz, E. Jamro, D. Zurek, and K. Wiatr, "FPGA
implementation of the same algorithm. Five models were Implementation of The Selected Parts of The Fast Image
initially trained and implemented on the SVM-light Segmentation," in Studies in Computational Intelligence vol. 390, ed,
application. The datasets for Fischer’s Iris, MNIST 2012, pp. 203-216.
14. T. Saegusa, T. Maruyama, and Y. Yamaguchi, "How Fast is an
handwritten digit recognition (modelled as even-odd FPGA in Image Processing?," in International Conference on Field
classification), Breast Cancer prediction, and Skin Programmable Logic and Applications, FPL 2008 2008, pp. 77-82.
Segmentation and Adult Income prediction were modelled in 15. H. M. Hussain, K. Benkrid, and H. Seker, "The Role of FPGAs as
Windows. Three of the above-mentioned five models were High Performance Computing Solution to Bioinformatics and
Computational Biology Data," AIHLS2013, p. 102, 2013.
also simulated and synthesized in HLS however there is a 16. K. Nagarajan, B. Holland, A. D. George, K. C. Slatton, and H. Lam,
limitation for implementing larger datasets such as MNIST "Accelerating Machine-Learning Algorithms on FPGAs using
(Hand-written digits) on hardware FPGA as the resources for Pattern-Based Decomposition," Journal of Signal Processing
the large dataset were not sufficient and would require Systems, vol. 62, pp. 43-63, 2011.
17. L. Woods, J. Teubner, and G. Alonso, "Real-Time Pattern Matching
additional hardware. Regarding acceleration, a maximum with FPGAs," in 2011 IEEE 27th International Conference on Data
factor of 20x was achieved on Zynq FPGA compared to the Engineering (ICDE), 2011, pp. 1292-1295.
similar system running on software i.e. the ARM PS. At 250 18. A. Eklund, P. Dufort, D. Forsberg, and S. M. LaConte, "Medical
MHz, the processing time of 0.77 µs was achieved on the Image Processing on The GPU–Past, Present and Future," Medical
Image Analysis, vol. 17, pp. 1073-1094, 2013.
FPGA, compared to the 1.90 µs obtained on the ARM 19. S. Asano, T. Maruyama, and Y. Yamaguchi, "Performance
processor. Comparison of FPGA, GPU and CPU in Image Processing," in
Classification accuracy was to be kept same while achieving International Conference on Field Programmable Logic and
hardware-friendly system. Most of the existing Applications, 2009. FPL 2009, 2009, pp. 126-131.
20. C. Papageorgiou and T. Poggio, “Trainable Pedestrian Detection
implementations in the literature suffered from an accuracy System,” Int’l J. Computer Vision, vol. 38, pp. 15-33, 2000.
loss while trying to achieve performance improvements. 21. M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio,
However, we achieved consistent classification accuracies “Pedestrian Detection Using Wavelet Templates,” Proc. IEEE CS Conf.
throughout our work from Windows application to HLS Computer Vision and Pattern Recognition, pp. 193-199, 1997.
22. S. Agarwal and D. Roth, “Learning a Sparse Representation for Object
synthesis to Hardware implementation. Therefore, our work is Detection,” ECCV ’02: Proc. Seventh European Conf. Computer
a scalable, generic embedded SVM classifier with no loss in Vision, pp. 113-130, 2002.
accuracy meeting the critical embedded system constraints. 23. E. Osuna, R. Freund, and F. Girosi, “Training Support Vector Machines:
An Application to Face Detection,” Proc. IEEE Conf. Computer Vision
and Pattern Recognition, pp. 130-136, 1997.
REFERENCES 24. H. Sahbi, D. Geman, and N. Boujemaa, “Face Detection Using
1. J. Nayak, B. Naik, and H. Behera, "A Comprehensive Survey on Coarse-to-Fine Support Vector Classifiers,” Proc. Int’l Conf. Image
Support Vector Machine in Data Mining Tasks: Applications Processing, pp. 925-928, 2002.
& Challenges," International Journal of Database Theory and 25. R. Pedersen and M. Schoeberl, “An Embedded Support Vector
Application, vol. 8, pp. 169-186, 2015. Machine,” Proc. Fourth Workshop Intelligent Solutions in Embedded
2. C. J. C. Burges, “A tutorial on support vector machines for pattern Systems, pp. 1-11, 2006.
recognition,” Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 26. A. Boni, F. Pianegiani, and D. Petri, “Low-Power and Low-Cost
121–167, 1998. Implementation of SVMs for Smart Sensors,” IEEE
3. B. Catanzaro, N. Sundaram, and K. Keutzer, “Fast support vector Trans.Instrumentation and Measurement, vol. 56, no. 1, pp. 39-44,
machine training and classification on graphics processors,” in Proc. Feb.2007.
25th Int. Conf. Mach. Learn., 2009, pp. 104–111. 27. B. Catanzaro, N. Sundaram, and K. Keutzer, “Fast Support Vector
4. P. Sabouri, H. GholamHosseini, T. Larsson, and J. Collins, "A Cascade Machine Training and Classification on Graphics Processors,” Proc.
Classifier for Diagnosis of Melanoma in Clinical Images," in 36th 25th Int’l Conf. Machine Learning, pp. 104-111, 2008.
Annual International Conference of the IEEE Engineering in 28. R. Genov and G. Cauwenberghs, “Kerneltron: Support Vector
Medicine and Biology Society (EMBC), 2014, pp. 6748-6751. “Machine” in Silicon,” IEEE Trans. Neural Networks, vol. 14, no. 5, pp.
5. G. M. Foody and A. Mathur, "A Relative Evaluation of Multiclass 1426-1434, Sept. 2003.
Image Classification by Support Vector Machines," IEEE 29. D. Anguita, A. Boni, and S. Ridella, “A Digital Architecture for Support
Transactions on Geoscience and Remote Sensing, vol. 42, pp. Vector Machines: Theory, Algorithm, and FPGA Implementation,”
1335-1343, 2004. IEEE Trans. Neural Networks, vol. 14, no. 5, pp. 993-1009, Sept. 2003.
6. R. Entezari-Maleki, A. Rezaei, and B. Minaei-Bidgoli, "Comparison of 30. I. Biasi, A. Boni, and A. Zorat, “A Reconfigurable Parallel Architecture
Classification Methods Based on the Type of Attributes and Sample for SVM Classification,” Proc. IEEE Int’l Joint Conf. Neural Networks,
Size," Journal of Convergence Information Technology, vol. 4, pp. vol. 5, pp. 2867-2872, 2005.
94-102, 2009. 31. O. Pina-Ramirez, R. Valdes-Cristerna, and O. Yanez-Suarez, “An FPGA
7. J. KIM, B.-S. Kim, and S. Savarese, "Comparing Image Classification Implementation of Linear Kernel Support Vector Machines,” Proc.
Methods: K-Nearest-Neighbor and Support-Vector-Machines", Ann IEEE Int’l Conf. Reconfigurable Computing and FPGA’s, pp. 1-6, 2006.
Arbor, vol. 1001, pp. 48109-2122, 2012. 32. R.A. Reyna, D. Esteve, D. Houzet, and M.-F. Albenge, “Implementation
8. S. Cadambi et al., “A massively parallel FPGA-based coprocessor for of the SVM Neural Network Generalization Function for Image
support vector machines,” in Proc. 17th IEEE Int. Symp. Field Program. Processing,” Proc. IEEE Fifth Int’l Workshop Computer Architectures
Custom Comput. Mach. (FCCM), Apr. 2009, pp. 115–122. for Machine Perception, pp. 147-151, 2000.
9. O. Piña-Ramírez, R. Valdés-Cristerna, and O. Yáñez-Suárez, “An FPGA 33. R. Roberto, H. Dominique, D. Daniela, C. Florent, and O. Salim,
implementation of linear kernel support vector machines,” in Proc. IEEE “Object Recognition System-on-Chip Using the Support Vector
Int. Conf. Reconfigurable Comput. FPGA’s, Sep. 2006, pp. 1–6. Machines,” EURASIP J. Advances in Signal Processing, vol. 2005, pp.
10. M. Ruiz-Llata, G. Guarnizo, and M. Yébenes-Calvino, “FPGA 993-1004, 1900.
implementation of a support vector machine for classification and
regression,” in Proc. Int. Joint Conf. Neural Netw., 2010, pp. 1–5.
11. J. Fowers, G. Brown, P. Cooke, and G. Stitt, “A performance and energy
comparison of FPGAs, GPUs, and multicores for sliding-window
applications,” in Proc. ACM/SIGDA Int. Symp. Field Program. Gate
Arrays (FPGA), 2012, pp. 47–56.
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2287 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-12, October, 2019
AUTHORS PROFILE
Published By:
Retrieval Number L25621081219/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.L2562.1081219 2288 & Sciences Publication