0% found this document useful (0 votes)
24 views4 pages

Real-Time Face Detection Based On YOLO

Real-time_face_detection_based_on_YOLO

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views4 pages

Real-Time Face Detection Based On YOLO

Real-time_face_detection_based_on_YOLO

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1st IEEE International Conference on Knowledge Innovation and Invention 2018

Real-time face detection based on YOLO


Wang Yang1, Zheng Jiachun2
(1.Navigation Institute, Jimei University, Xiamen, Fujian, China
2.Information Engineering Institute, Jimei University, Xiamen, Fujian, China
No. 185 Yinjiang Road, Jimei District, Xiamen, Fujian, China
1.86-18860031081 and 1510109901@qq.com
2.86-13806073677 and 415203705@qq.com)

Abstract Related Working


As a target detection system, YOLO has a fast detection
speed and is suitable for target detection in real-time A. Detection principle of YOLO
environment. Compared with other similar target detection The input image is divided into S * S uniform grid, and each
systems, it has better detection accuracy and faster detection cell is composed of ( x, y, w, h) and confidence C (Object ) . The
time. This paper is based on YOLO network and applied to face
coordinates ( x, y ) represent the position of the center of the
detection. In this paper, YOLO target detection system is
applied to face detection. Experimental results show that the detection boundary box relative to the grid. ( w, h) is the width
face detection method based on YOLO has stronger robustness and height of the detection boundary box. Each grid predicts
and faster detection speed. Still in a complex environment can the probability of C categories. The confidence score reflects
guarantee the high detection accuracy.At the same time, the the probability of the model to include the target object and the
detection speed can meet real-time detection requirements accuracy of the prediction detection box. Pr(Object ) stands for
whether there is a target object falling into this cell. If there is
Key words: YOLO, target detection,face detection, confidence, it is defined as:
dimensional clustering C (Object ) Pr(Object ) * IOU (Pr ed , Truth) (1)
If it is determined that the cell does not have a target object,
Introduction the confidence score should be zero C (Object ) 0
IOU is the overlapping rate of the generated candidate bound
A common target detection algorithm, which selects a region
and ground truth bound, that is, the ratio of their intersection
of interest (ROI) on a given image as a candidate region,
and union.
extracts candidate region features such as a local binary
characteristic[1] (LBP) or a directional gradient histogram [2] area (boxtruth ) ˆ area (box pred )
IOU (Pr ed , Truth) (2)
(HOG) feature, and classifies the area through the training area (boxtruth ) ‰ area (box pred )
classifier. In the context of large data, however, machine
learning methods that rely on preset models have not been able After obtaining the confidence of each prediction box, the
to accurately and fully describe the data characteristics in the low-score prediction box are removed by setting the threshold
application scenario. value, and then non-maximum suppression is performed on the
In recent years, with the vigorous development of the deep remaining bounding box.
learning, based on the deep study of target detection, has made
great development. There are two kinds of object detection B. Detect network
based on deep learning: one is the R-CNN series based on The size of input image is 416*416. After a series of
regional detection: Fast R-CNN [3], Faster R-CNN [4] and the convolution and batch normalization operations on the input
other is the regression SSD[5] and YOLO[6]. Based on the image, the input image is sampled three times, 32 times, 16
region target detection method, it contains all sorts of potential times and 8 times, and the multi-scale feature map is obtained.
regional generating parts and various feature layers, which After 32 times down sampling, the feature map is too small,
makes the algorithm's real-time quality not guaranteed. therefore YOLO V3 uses the up sampling with step size of 2 to
From network design, the difference between the YOLO and double the size of the resulting feature map, which becomes 16
RCNN series network is as follows: (1) training and detection, times of down sampling. Similarly, the feature map sampled on
feature extraction and classification of regression of YOLO, all 16 times is sampled with a step length of 2, and the feature map
is done in a single network. It’s a separate end-to-end with a sampling size of 8 times is obtained, so that deep feature
network;(2) YOLO regard object detection as a regression can be used for detection.
Detecting targets at different scales is challenging, especially
problemOnce the image is input into the network, the position
for small targets. Feature pyramid network (FPN) [8] is a
of all objects in the image, their categories and corresponding
feature extractor designed to improve accuracy and speed. It
confidence probabilities can be obtained.The detection results
replaces feature extractors in detectors (such as Faster R-CNN)
of the RCNN series can be divided into two parts: object
and generates higher-quality feature graph pyramids.
category (classification), object location and bounding box
(regression problems). In 2018, Redmon et al [7] proposed
target detection method of the YOLO V3.When we tested the
320x320 of picture,YOLOv3 runs in 22 ms at 28.2 mAP, as
accurate as SSD but three times faster!

ISBN:
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute 978-1-5386-5267-1
of Science 221
Bengaluru. Downloaded on April 14,2024 at 13:11:52 UTC from IEEE Xplore. Restrictions apply.
1st IEEE International Conference on Knowledge Innovation and Invention 2018

objects is low. In order to improve this phenomenon,


w , h is used instead of w, h . As shown in the figure 2, for
the same width error wp  wt , width error wp  wt of the
bbox of small size object is larger than the width error
wp  wt of larger size, thus enhancing the influence of the
width error of the bbox of small size object to some extent. So
is the height error.

Fig. 1A feature fusion module


The network connection structure consists of two routes:
bottom-up line, stop-down lines, and horizontal connections.
The bottom-up process is actually the forward process of the
network. Extract the output features of the last layer of each
stage to form the feature pyramid. The top-down process is
carried out by up sampling. While the horizontal connection is
to merge the results of the up sampling and the feature map of
the same size generated from the bottom up.
The main function of 1*1 convolution kernel is to reduce the Fig. 2 Object width squared error diagram
number of convolution kernel without changing the size of
feature map. This allows for more valuable feature information D. Dimension clustering
from the up sampling layer and fine-grained features from the Although YOLO V3 has achieved good detection results, it is
previous feature map. The top features merge with an up not completely suitable for image positioning tasks. Therefore,
sampling and downward feature, and each layer is the corresponding improvements to YOLO V3 are made for
independently predicted. specific problems.
C. loss function When training the network, the initial specification and
The loss function of YOLO V3 adapots mean square error, number of candidate frames are required. With the increasing
which consists of three parts: coordinate error, IOU error and number of network iterations, the network learns face features,
classification error. and the parameters of the prediction box are adjusted
2
S B ^ ^
continuously approaching the real box finally. The anchor of
Ocoord ¦¦ [( xi  xi )  ( yi  yi ) ]
obj 2 2
ij YOLO V3 is identified by VOC2007 and VOC2012 data
i 0 j 0
aggregation classes. These two data sets are rich in categories,
2
S B ^ ^ and the parameters of the anchor determined are universal, but
+ Ocoord ¦¦ [( wi  wi )  ( hi 
obj 2 2
ij
hi ) ] not suitable for specific detection tasks. Therefore, clustering
i 0 j 0
operations need to be carried out again in the detection data set.
S
2
B ^ In order to speed up the convergence speed and improve the
+ ¦¦ ( Ci  Ci )
obj 2
ij
(3) accuracy of face detection, k-means method is used to cluster,
i 0 j 0
and the initial candidate frame parameters closest to the face
S
2
B ^ boundary frame in the image is obtained.
+ Onoobj ¦¦ ( Ci  Ci )
obj 2
ij In general, k-means clustering uses Euclidean distance to
i 0 j 0 measure the distance between two points. In this paper, IOU is
S
2
^ used to replace the Euclidean distance in k-means. The
¦ ¦ ( pi (c )  pi (c ))
obj 2
i
objective function of clustering is:
i 0 C classes d (box, centroid ) 1  IOU (box, centroid ) (4)

In the back propagation of the training process, it is assumed Experiment


that the bbox of large and small objects has the same width Experimental environment: the processor is Inter Core
w p  wt i7-7770, the CPU is 3.60GHZ*8, 7.7GB memory, GPU is GTX
error ratio , ( w p is the predicted value and wt is the 1080. The operating system is Ubuntu16.04, 64-bit.The depth
wp
learning framework is darknet
real value). The width error wp  wt of large size bbox will be
A.Data Set
significantly smaller than width error wp  wt of small size, This experiment uses WIDER FACE data set. During
resulting in the accuracy rate of bbox width prediction of small training, the SGD algorithm is adopted. There are 30,200

222 ISBN:
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute 978-1-5386-5267-1
of Science Bengaluru. Downloaded on April 14,2024 at 13:11:52 UTC from IEEE Xplore. Restrictions apply.
1st IEEE International Conference on Knowledge Innovation and Invention 2018

iterations in total during training. Learning rate is 0.001 and the


learning rate is reduced at 5000 and 20000 times.
B. Result
Respectively in the three data sets: Celeb Faces[9],
FDDB[10],WIDER FACE[8] choose some pictures on the
verification test results. (a) and (b) are from Celeb Faces
dataset, (c) and (d) are from FDDB dataset, (e) and (f) are
measured on WIDER FACE.

(e)

(a) (b)

(f)

Fig. 3 Test results


20 images are selected from the three data sets, and the
image sizes are adjusted respectively to obtain the average
detection time, as shown in table 1.
TABLE I
AVERAGE DETECTION TIME
(c) Image size 178*218 450*320 2014*680
Detection 0.027199 0.027246 0.029455
time(s)

Conclusion

Compared with the traditional algorithm, the face detection


method based on yolo v3 has shorter detection time and
stronger robustness, which can reduce the miss rate and error
rate. It can still guarantee a high test rate in a sophisticated
environment, and the speed of detection can meet the real time
requirement, and achieve good effect.

Acknowledgment

(d) This work is financially supported by the science and


technology key projects of Fujian province漏2017H0028漐漓
the natural science foundation of Fujian province
(2013J01203), The authors would like to thank once again.

Reference
[1] Li L, Feng X, Xia Z, et al., Face spoofing detection with local
binary pattern network, 54th ed., Journal of visual
communication and image representation, 2018, pp. 182-192.
[2] Kim S, Cho K, Trade-off between Accuracy and Speed for
Pedestrian Detection Using HOG Feature, IEEE, Berlin,

ISBN:
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute 978-1-5386-5267-1
of Science 223
Bengaluru. Downloaded on April 14,2024 at 13:11:52 UTC from IEEE Xplore. Restrictions apply.
1st IEEE International Conference on Knowledge Innovation and Invention 2018
Germany, pp. 207-209, September 2013.
[3] Girshick R, Fast R-CNN, IEEE ICCV. pp. 1440-1448, 2015.
[4] Ren S, He K, Girshick R, et al., Faster R-CNN: Towards real-time
object detection with region proposal networks, Advances in
neural information processing systems. pp. 91-99, 2015.
[5] Liu W, Anguelov D, Erhan D, et al., Ssd: Single shot multibox
detector. 9905th ed., IEEE ECCV, 2016, pp. 21-37.
[6] Redmon J, Divvala S, Girshick R, et al., You only look once:
Unified, real-time object detection, IEEE CVPR, 2016, pp.
779-788.
[7] Redmon J, et al., You only look once: Unified, real-time object
detection, unpublished.
[8] Yang S, Luo P, Loy C C, et al., Wider face: A face detection
benchmark, IEEE CVPR, 2016, pp. 5525-5533.
[9] Yang S, Luo P, Loy C C, et al., From facial parts responses to
face detection: A deep learning approach, IEEE ICCV, 2015, pp.
3676-3684.
[10] Jain V, Learned-Miller E, Fddb: A benchmark for face detection
in unconstrained settings. Technical Report UM-CS-2010-009,
University of Massachusetts, Amherst, 2010.

224 ISBN:
Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute 978-1-5386-5267-1
of Science Bengaluru. Downloaded on April 14,2024 at 13:11:52 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy