A Video-Based Face Detection and Recognition System Using Cascade Face Verification Modules
A Video-Based Face Detection and Recognition System Using Cascade Face Verification Modules
Abstract-Face detection and recognition in a video is a commercial and law enforcement applications. The research
challenging research topic as overall processes must be done timely on the video-based face detection and recognition can be
and efficiently. In this paper, a novel face detection and recognition considered as the continuation and extension of the still-image
system using three fast cascade face verification modules and an
face recognition, which has been extensively researched for
ensemble classifier is presented. Firstly, the head of a tester is
years and some good results have been reported in the
serially verified by our proposed three verification modules: face
literature. For example, the well-known methods such as
skin verification module, face symmetry verification module, and
eye template verification module. The three verification modules
Principal Component Analysis (PCA) [1], Linear Discriminant
can eliminate the tilted faces, the backs of the head, and any other Analysis (LDA) [1], eigenfaces and Fisherfaces methods [2],
non-face moving objects in the video. Only the frontal face images Elastic Graph Matching (EGM) [3], robust Handsdorff
are sent to face recognition engine. The frontal face detection distance measure for face recognition [4], eigenspace-based
reliability can be adjusted by simply setting the verification face recognition [5], a novel hybrid neural and dual
thresholds in the verification modules. Secondly, three hybrid eigenspaces methods for face recognition [6], etc.
feature sets are applied to face recognition. An ensemble classifier
Another important task for face recognition in a video clip is
scheme is proposed to congregate three individual Artificial Neural
face detection. In order to capture the frontal face images
Network (ANN) classifiers trained by the three hybrid feature sets.
accurately and timely, many face detection methods have been
Experiments demonstrated that the frontal face detection rate can
be achieved as high as 95% in the low quality video images. The
proposed, such as the discriminating feature analysis and
overall face recognition rate and reliability are increased at the Support Vector Machine (SVM) classifier for face detection
same time using the proposed ensemble classifier in the system. [7], neural network-based face detection [8], face detection in
Index Terms: Face Verification Module, Ensemble Classifier, color images based on the fuzzy theory [9], etc. Face color
Video Image Processing, Feature Extraction, Pattern Recognition information is an important feature in the face detection. In
reference [10], authors used the quantized skin color regions
I. INTRODUCTION for face detection; statistical color modules with application to
Face detection and recognition in a video sequence has skin detection was reported in reference [11]; A latest survey
become an interesting research topic due to its enormously
1
of skin-color modeling and detection methods can be found in detection and recognition experiments conducted on five video
reference [12]. Eye is another important feature for face clips are reported in Section Six. Conclusion ends the paper.
detection and recognition. For example, a robust method for
eye feature extraction on the color image was reported in II. FACE DETECTION AND RECOGNITION SCHEME
reference [13]. Using optimal Wavelet packets and radial basis The flowchart of the proposed video-based cascade face
functions for eye detection was introduced in reference [14]. detection and face recognition system is shown in Fig.1.
Some face detection applications in videos can be found in
references [15, 16].
The development of cheap, high quality video cameras has
generated new interests in extending still image-based
recognition methodologies to video sequences. In recent years,
research on this area has attracted great interests from
scientists and researchers worldwide.
In this paper, we propose a simple and efficient cascade face
detection scheme: the tester’s head is automatically detected
by the body moving information between adjacent frames.
Then the possible face area is serially verified by our proposed
three verification modules: face skin verification module, face
symmetry verification module, and eye template verification
module. Only the frontal face image, which has passed the
three verification modules, is sent to face recognition engine
for recognition. By doing so, there are two advantages: 1)
Fig. 1 Face detection and recognition schematic flowchart
computer can save a plenty of time for video process; 2) face
recognition performance is increased as only frontal face
III. FACE VERIFICATION MODULES
image is sent for recognition. Three hybrid feature sets are
In this section, three face verification modules are proposed.
applied to face recognition. A novel ensemble classifier
Firstly, face skin and non-face spectra are analyzed. The skin
scheme is proposed to congregate three individual Artificial
spectra are given. Secondly, a fast face symmetry verification
Neural Network (ANN) classifiers trained by the three hybrid
algorithm is developed. Finally, three eye templates are
feature sets. A computationally efficient fitness function of
chosen to verify the frontal face.
genetic algorithms is presented and successfully used to
evolve the best weights for the proposed ensemble classifier.
A. Face Skin Verification
This paper is organized as follows: In Section Two, the
It has been reported that human skin only occupies certain
schematic flowchart of face detection and recognition is
spectra in the color space regardless of the different ethnic
presented. In Section Three, three novel and fast face
groups. In order to obtain an accurate face color spectrum,
verification modules are proposed. Then, three face
1024 face images were used for color spectrum analysis. The
recognition algorithms are summarized in Section Four. In
following color spaces have been analyzed: (r, g, b), (H, S, V),
order to pursue a higher recognition rate, a novel ensemble
(S, T, L), (Y, Cb, Cr,), (L, U, X), and (I, Rg, By). We used the
classifier scheme is proposed in Section Five. The face
k-nn cluster algorithm to classify skin color space into four
2
categories and construct four multivariate normal distribution It is a common sense that the human’s face is symmetric. In
modules for face color analysis. this section, the Y grayscale image in the (Y, U, V) color space
Based on (R, G, B,) color space, formula used in the analysis is used for face symmetry verification. The following steps are
are listed as follows: proposed for face symmetry verification:
r = R /(R + G + B), g = G /(R + G + B), b = 1.0 − r − g 1) Segment the face area from the shoulder and other parts of
human body using face color information and the width
H = a cos(0.5*(2* R − G − B) / sqrt (( R − G)2 + ( R − B)2 ))*180 / PI
S = (Max( R, G, B) − Min( R, G, B)) / Max( R, G, B) information of the human body;
V = Max( R, G, B) / 255.0 2) Divide the face area into two parts vertically and evenly:
left sub-block and right sub-block;
Here, PI=3.1415. After analysis, we decide to use only two 3) Calculate the histogram of the left sub-block L(i) and the
color spaces (r, g, b) and (H, S, V) in our face verification histogram of the right sub-block R(i), i=0, 1,2, …, N-1; N is
module due to the facts that the two color spaces are the length of the face area;
comparatively stable for different face skins and are less 4) Compute the Symmetric Similarity Coefficient (SSC) using
sensitive to illumination, intensity and the partial occlusion of the flowing formulae:
the lights. Fig. 2 shows the distributions of four categories of SSC = 2* L(i)* R(i) /(L(i)2 + R(i)2 ) (1)
skin spectra in the (r,g,b) and (H,S,V) color spaces.
The threshold for SSC can be adjusted according to
applications. In our experiments, if it is set to 0.75~0.85, then
most face images in the video can be verified. The higher the
threshold is set, the less frontal faces will be captured and the
more symmetric frontal faces are extracted.
3) Morphological erosion on I m :
B. Face Symmetry Verification
I n = I m : Str2 (3)
3
Here Str2 is a 2x2 matrix and all elements are set to 1. experiments, if the predefined threshold is set to 0.90. The eye
templates can detect frontal face with maximum 15 degree
4) Calculate I o = I m / I n in the pixel wise operation, and
angles.
scale the image I o into a grayscale image I g with scale of
[0,255]. IV. FEATURE EXTRACTION FOR FACE
Fig. 3 shows four original face images and the processed RECOGNITION
images using our proposed algorithm. The experiments Y component in (Y, U, V) color space in a frontal face image
demonstrated that the eye areas can be extracted regardless of is used to extracted three sets of features for face recognition
ethnic groups, wearing of glasses, and illuminations. as follows:
4
Ix=Iz*Sx (5) The extracted three sets of hybrid features are used to train
and the Y-gradients of the frontal face image is calculated by: Artificial Neural Networks (ANNs) with Back Propagation
Iy=Iz*Sy (6) (BP) algorithms as three classifiers.
The gradient magnitude and phase are then obtained by:
V. ENSEMBLE CLASSIFIER
r(i, j) = Ix2 (i, j) + Iy2 (i, j)
2
A novel classifier combination scheme is proposed in order to
−1
θ ( i , j ) = tan Iy (i, j)
Ix 2
(i, j)
(7) achieve the lowest error rate while pursuing the highest
Then, we can count the gradient direction of each pixel of the recognition rate for the video based face recognition. The
convoluted image with nonzero gradient magnitude values as schematic diagram is shown in Fig. 6. The output confidence
a directional feature. values of three ANNs are weighted by w0,0~w0,R-1 for ANN1,
In order to generate a fixed number of features, each gradient w1,0~w1,R-1 for ANN2, and w2,0~w2,R-1 for ANN3 (note: R is the
direction is quantized into one of eight directions at number of testers to be recognized in a video; w0,0~w0,R-1
π / 4 intervals. Each normalized gradient image is then refers to the weights of the confidence values c0,0~c0,R-1 of
divided into 16 sub-images. The number in each direction of ANN1, and so on). A gating network is used to congregate the
each sub-image is counted as a feature. In total, the number of weighted confidence values.
5
where Wi =[wi,0, wi,1,..., wi,R−1] , Ci =[ci,0,ci,1,...,ci,R−1] i=0,1,2 for person and the gating network votes for the same person;
otherwise, the person is rejected.
three ANNs.
G = [ g 0 , g1 ,..., g R −1 ]T For the training procedure, the frontal faces of each tester are
shot 100 times in front of camera with different illuminations,
G is the output of the gating network.
different distances between the tester and the camera, and
Our goal is to pursue a lowest misrecognition rate and at the different face complexions in order to get enough face images
same time to seek the highest recognition performance. We to form feature vectors to train ANN classifiers off-line.
can create a vector Otarget with R elements, which represent the During testing, the testers are moving in front of camera. The
number of persons in the test video. In the vector, the value of frontal faces of the testers are detected using our proposed
the corresponding label is set equal to 1.0, while others are set method. If the frontal face images of same tester are shot
equal to 0.0. A fitness function f is chosen to minimize the multiple times in a video clip, and if and only if the overall
difference between the output G and the corresponding verification confidence values of the detected frontal face
training sample vector Ot arg et , as follows: image are higher than that of the previous ones, then the
detected frontal face image is recognized by classifiers again.
f =| G − Ot arget |2 (11) Further more, if the recognition confidence value is higher
than that of the previous recognition result for that tester, then
By minimizing (11) through a genetic evolution, the weights
the recognition result will be updated.
tend to be optimal. Then, the recognition criteria are set as
In our experiments, five surveillance video clips taken
follows: indoors are used to test the face detection and recognition
A recognition result is accepted if: performance. Some of detected face images from the videos
are shown in Fig. 7.
1) three ANN classifiers vote for the same person at the same
time, where the sum of the confidence values is equal to or
larger than 1.8; or 2) the gating network votes for a person,
where the confidence value of the gating network is larger
than 0.65; or 3) the sum of the confidence values of any two
Fig. 7 Detected frontal face images from videos
ANNs is larger than 1.30 and they both vote for the same
6
In Fig. 8, some skin images are detected, however, those same time, the best tradeoff between misrecognition rate and
images can not pass the eye template module or/and rejection rate.
symmetric module. In the security applications, reliability is a crucial issue. The
reliability is defined as:
number of testers − Number of misrecognized testers
RE = Total Total number of testers
7
recognition performance in terms of recognition rate and [4] E. P. Vivek and N. Sudha, “Robust Handsdorff distance measure for face
recognition,” Pattern Recognit., vol. 40, no. 2, Feb. 2007, pp.431-442.
reliability.
[5]. J. Ruiz-del-Solar and P. Navarrete, “Eigenspace-based face recognition: a
VII. CONCLUSIONS comparative study of different approaches,” IEEE Trans. Systems, Man, and
In this paper, a novel face detection and recognition system is Cybernetics, Part C, vol. 35, no. 3, Aug. 2005, pp.315-325.
presented. Three fast and efficient face detection verification [6] D. Zhang, H. Peng, J. Zhou, and S. K. Pal, “A novel face recognition
system using hybrid neural and dual eigenspaces methods,” IEEE Trans.
modules are proposed to detect and verify the frontal faces in
Systems, Man, and Cybernetics, Part A, vol. 32, no. 6, Nov. 2002, pp. 787-
the video clips. Only the frontal faces, which have serially
793.
passed three verification modules, are sent to the recognition [7] P. Shihand and C. Liu, “Face detection using discriminating feature
engine for face recognition. Furthermore, the frontal face analysis and support vector machine,” Pattern Recognit., vol. 39, no. 2, Feb.
2006, pp.260-276.
detection reliability can be adjusted through the setting of the
[8] H. A. Rowley, S. Baluja, and T. Kanade, “Neural network-based face
verification thresholds. Therefore, this adaptable mechanism
detection,” IEEE Trans. Pattern Anal. and Machine Intell., vol. 20, no. 1, Jan.
can be used to the different applications. For example, we can 1998, pp. 23-28.
set the lower verification thresholds for the face tracking [9] H. Wu, Q. Chen, and M. Yachida, “Face detection from color images
using a fuzzy pattern matching method,” IEEE Trans. Pattern Anal. and
purpose only. Experiments conducted on five video clips have
Machine Intell., vol. 21, no. 6, June 1999, pp. 557-563.
demonstrated that the frontal face detection rate can be
[10] C. Garcia and G. Tziritas, “Face detection using quantized skin color
achieved as high as 95% in the low quality videos. regions, merging and wavelet packet analysis,” IEEE Trans. Multimedia, vol.
Three hybrid feature sets are successfully applied to the face 1, no. 3, 1999, pp.264-277.
[11] M. J. Jones and J. M. Rehg, “Statistical color models with application to
recognition system. A novel ensemble classification scheme is
skin detection,” http://www.cc.gatech.edu/~rehg/Papers/SkinDetect-IJCV.pdf
proposed to congregate the outputs of three ANN classifiers,
[12] P. Kakumanu, S. Makrogiannis, and N. Bourbakis, “A survey of skin-
which are trained by three sets of hybrid feature sets. The color modeling and detection methods,” Pattern Recognit., vol. 40, no. 3,
experiment shows that the overall face recognition rate and March 2007, pp.1106-1122.
[13] Z. Zheng and J. Yang, L. Yang, “A robust method for eye features
reliability are increased if the ensemble classifier is used.
extraction on color image,” Pattern Recognit. Lett., vol. 26, no. 14, 2005,
pp.2252-2261.
[14] J. Huang and H. Wechsler, “Eye detection using optimal wavelet packets
ACKNOWLEDGEMENTS
and radial basis functions,” International Journal of Pattern Recognit. and
Part of the gating network research was conducted at Centre Artificial Intell., vol. 13, no. 7, Nov. 1999, pp. 1009-1025.
for Pattern Recognition and Machine Intelligence [15] M. Lievin and F. Luthon, “Nonliner color space and spatiotemoral MRF
for hierarchical segmentation of face features in video,” IEEE Trans. Image
(CENPARMI), Concordia University, Canada. Author wishes
Processing, vol. 13, no. 1, Jan. 2004, pp. 63-71.
to thank Professors, and colleagues of CENPARMI for their [16] H. Wang and S. F. Chang, “A highly efficient system for automatic face
help. region detection in MPEG video,” IEEE Trans. Circuits and System for Video
Technology, vol. 7, no. 4, 1997, pp.615-628.
REFERENCES
[17] Pratt, William K, Digital Image Processing, New York, Wiley, 1991.
[1] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face
[18] P. Zhang, T. D. Bui, and C. Y. Suen, “A cascade ensemble classifier
recognition: a literature survey,” ACM Computing Surveys, vol. 35, no. 4,
system for reliable recognition of handwritten digits,” Pattern Recognit., vol.
2003, pp. 399-458.
40, no. 12, Dec. 2007, pp. 3415-3429.
[2] P. N. Bellhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs.
[19] P. Zhang, T. D. Bui, and C. Y. Suen, “Nonlinear feature dimensionality
fisherfaces: recognition using class specific linear projections,” IEEE Trans.
reduction for Handwritten numeral verification,” Pattern Analysis and
Pattern Anal. and Machine Intell., vol. 19, no. 7, 1997, pp. 711-720.
Applications, vol. 7 no. 3, 2004, pp.296-307.
[3] L. Wiskott, J. M. Fellous, N. Kruger, and C. vonder Malsburg, “Face
recognition by elastic bunch graph matching,” IEEE Trans. Pattern Anal. and
Machine Intell., vol. 19, no. 7,1997, pp.775-779.