0% found this document useful (0 votes)
337 views28 pages

Face Recognition - Past - Present - Future

Uploaded by

CNS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
337 views28 pages

Face Recognition - Past - Present - Future

Uploaded by

CNS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Digital Signal Processing 106 (2020) 102809

Contents lists available at ScienceDirect

Digital Signal Processing


www.elsevier.com/locate/dsp

Face recognition: Past, present and future (a review) ✩


Murat Taskiran a,∗ , Nihan Kahraman a , Cigdem Eroglu Erdem b
a
Yildiz Technical University, 34220, Istanbul, Turkey
b
Marmara University, 34722, Istanbul, Turkey

a r t i c l e i n f o a b s t r a c t

Article history: Biometric systems have the goal of measuring and analyzing the unique physical or behavioral
Available online 16 July 2020 characteristics of an individual. The main feature of biometric systems is the use of bodily structures
with distinctive characteristics. In the literature, there are biometric systems that use physiological
Keywords:
features (fingerprint, iris, palm print, face, etc.) as well as systems that use behavioral characteristics
Face recognition
Face identification
(signature, walking, speech patterns, facial dynamics, etc.) Recently, facial biometrics has been one of
Facial dynamics the most preferred biometric data since it generally does not require the cooperation of the user and
Image-based face recognition can be obtained without violating the personal private space. In this paper, the methods used to obtain
Video-based face recognition and classify facial biometric data in the literature have been summarized. We give a taxonomy of image-
based and video-based face recognition methods, outline the major historical developments, and the
main processing steps. Popular data sets that have been used for face recognition by researchers are
also reviewed. We also cover the recent deep-learning based methods for face recognition and point out
possible directions for future research.
© 2020 Elsevier Inc. All rights reserved.

1. Introduction Face recognition, in contrast to several other biometric traits,


does not necessarily require the cooperation of the person and can
be performed in an unobtrusive way, making it particularly suit-
The increase of human factors in new generation technologies able for surveillance applications. Moreover, face recognition can
gives rise to the need for biometric systems for person identifica- be based on both physical (static) features and dynamic features
tion and verification systems. There are biometric systems that use of the face, making it suitable for behavioral biometrics.
static physiological features such as the fingerprint, iris, and palm- The problem of face recognition in unconstrained environments
print, as well as systems that use behavioral characteristics such is a challenging problem due to head pose, illumination, age, and
as the signature, gait, walking pattern, speech patterns and facial facial expression related variations. There may also be changes in
dynamics, some of which are also known as soft biometrics [68]. appearance due to make-up, facial hair or accessories (e.g. glasses,
Face has been one of the main biometric traits, which has scarves). Another difficulty in face recognition is the similarity
many application areas including security and law enforcement, among individuals (e.g. relatives, twins) [158].
health, education, marketing, finance, entertainment, and human- Perception of faces is a task performed successfully and almost
computer interaction. In Table 1, the main application areas and effortlessly by humans but it is not a very easy task for computers.
specific applications related to these areas are listed. The human visual system accommodates complex neural paths for
The human face carries information about the identity, age [80], processing of static and dynamic features of the faces to recognize
gender [67], race and facial expressions reflecting the emotions and faces in relation with contextual knowledge [334]. There are many
mental states [92,395,227,394]. The analysis of the human face and studies in psychology and neuroscience that address different is-
facial behavior is an interdisciplinary research area involving psy- sues about face perception. For example, it has been shown that
chology, neuroscience, and engineering. both holistic and feature-based representations are used although
characteristic features may be dominant [35]. The hypothesis sug-
gested by Bruce and Young is that there are several independent

sub-processes working together for face perception [36]. Accord-
This research was supported by The Scientific and Technological Research Coun-
ing to this hypothesis, various properties such as age, gender, and
cil of Turkey (TUBITAK) under project 116E088.
basic facial expressions are obtained from the simple physical fea-
*Corresponding author.
E-mail addresses: mrttskrn@yildiz.edu.tr, mrttskrn1071@gmail.com (M. Taskiran), tures of the face as a result of independent processes, and the use
nicoskun@yildiz.edu.tr (N. Kahraman), cigdem.erdem@marmara.edu.tr (C.E. Erdem). of these properties enables the creation of a personal face model

https://doi.org/10.1016/j.dsp.2020.102809
1051-2004/© 2020 Elsevier Inc. All rights reserved.
2 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

Table 1 A full taxonomy of image and video-based face recognition


Application areas of face recognition methods. methods in the literature is given in the following sections of the
Application areas Specific applications paper.
Security, Health Information security, user authentication, There are also a number of survey papers summarizing the
Login to electronic devices work done on face recognition. The first survey papers were pub-
Human robot interaction, access to lished in 1990s [293,49]. Later, other survey papers have been
buildings
published [409,3,157,135,223], some of which focus on a specific
Recognizing faces from surveillance videos
Health related applications: smart homes, aspect or method including,
smart cars
Robotic assistants • Pose [404,83] or illumination [418] invariant face recognition
Law Enforcement Border monitoring, illegal event analysis, methods,
tracking suspects • Dynamic face recognition from video [326,342,234,124,26],
Passports, national ID cards, driver‘s • Multimodal face recognition using 3D and infrared modalities
licences [297,33,414],
Immigration
• Presentation attack detection (face anti-spoofing) methods
Entertainment, Education, Gaming, virtual reality [276,298],
Marketing Photo management
• Sparsity-based face recognition methods [372],
Video analytics, video retrieval
Online learning, student follow-up, user
• Deep learning based face recognition methods [21,231,279,
engagement 346,119].
Advertising campaigns, social network
moderation We can see that recent survey papers mainly focus on image-based
deep learning methods. Although a video is a very rich source of
facial texture and dynamics, and it has become easier to record
structure. By using this personal face model structure, face percep- and share videos, there is no recent survey paper focusing on
tion even under different conditions is provided by the brain. There video-based face recognition.
are also studies, which try to understand which features (eyes,
mouth, nose etc.) are more important in recognition of faces [304]. 1.2. Contributions and outline
In [140], it was shown that lighting from the top was important
for recognition of faces. There are also studies, which demonstrate
The major contributions of this review can be summarized as
that familiar faces are recognized easier if they are shown in mo-
follows:
tion even under challenging conditions such as negation, inversion
or thresholding [182,35].
• We give an up-to-date, comprehensive and compact overview
of the vast amount of work on image and video based face
1.1. Brief history and previous surveys recognition in the literature including the image and video
databases and evaluation methods. Approximately 300 papers,
The history of face recognition goes back to the 1950s and which were published between 1990s and the beginning of
1960s, but research on automatic face recognition is considered to 2020 have been reviewed. Our goal is to inform interested new
be initiated in the 1970s [409]. In the early works, features based researchers about the main developments in the past and to
on distances between important regions of the face were used point out relevant references for further details.
[164]. Research studies on face recognition flourished since the be- • We provide a taxonomy of image and video-based methods,
ginning of the 1990s following the developments in hardware and which also contains recent methods such as sparsity and deep
the increasing importance in security-related applications. learning based methods. The purpose of creating a taxonomy
The progress of image-based face recognition techniques since is to provide an overview of the methods in the literature for
the beginning of 1990s has been roughly divided into four major face recognition.
conceptual development phases by Wang and Deng [346], which • We give an up-to-date review of the image and video-based
is not a full taxonomy but reflects the historical development of data sets used for face recognition. We not only tabulate these
the major methods: i) Holistic or appearance-based approaches use data sets, but also give a timeline to show how the collected
the face region as a whole and use linear or non-linear meth- data sets evolved in time in terms of the number of subjects
ods to map the face to a lower dimensional subspace [27,363]. and the number of samples per subject.
One of the first successful methods was developed by Turk and • We review the recent deep-learning based methods, which
Pentland [333,332] and is known as Eigenfaces. There have been have shown remarkable results on large scale and uncon-
other approaches that use linear subspaces [77], manifold learn- strained challenging data sets. In this way, readers have been
ing [378,139] and sparse representations [73,75]. ii) Local-feature provided with detailed information about deep-learning based
based face recognition methods became popular after 2000s and methods that have brought a new perspective to face recogni-
they use hand-crafted features to describe the face such as Ga- tion since the beginning of 2010s.
bor features [203,373], and local binary patterns (LBP) and variants • We provide information on both image and video-based meth-
[237,6,76]. iii) Methods which use learning-based local descriptors ods, with an emphasis on the video-based methods. We be-
[41,190] emerged after the 2010s and they learn the discrimi- lieve video-based face recognition has not yet reached its full
nant image filters using shallow techniques. iv) Deep Learning based potential in terms of utilizing facial dynamics information.
methods gained popularity after the great success of AlexNet in the
ImageNet competition in 2012 [184], and brought a new perspec- The organization of the paper is as follows. In Section 2, we
tive to face recognition problem. An unprecedented stability has give an overview of the main concepts related to face recognition,
been achieved for face recognition so that their performance is including taxonomy, main steps, databases, evaluation metrics, and
similar to humans on large-scale datasets collected under uncon- face spoofing. In Section 3 and Section 4 we summarize the meth-
strained settings [320,262]. ods on image and video-based methods, respectively. Finally, in
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 3

Section 5, we provide main conclusions and directions for future


research.

2. Overview of face recognition

Face recognition can be approached as an identification problem


or a verification problem. Face identification is also referred to as
the 1:N matching problem. The unknown face is compared with
all the faces in the database of known identities and a decision is
made as a result of all the comparisons. If the person is known to
be in the database, the task is called as closed-set, otherwise, it is
called as open-set. Face verification is known as the 1:1 matching
problem. The identity of the query face is either confirmed or re-
jected by comparing it with the face data of the claimed identity
in the database.
Below, we provide an overview of the face recognition systems
in the literature focusing on the general taxonomy, main steps,
image and video databases, and evaluation metrics used for face
recognition.

2.1. Taxonomy of face recognition

Face recognition systems in the literature can be divided into


two main groups as image-based and video-based methods. Image-
based systems try to recognize a person by using the physical
appearance. On the other hand, video-based systems use physical
appearance as well as changes in appearance over time or dy-
namics of the face. The general taxonomy of the literature on face
recognition is shown in Fig. 1.
Image-based face recognition (FR) methods can be classified into
three main groups: i) appearance-based (or holistic) methods, ii)
model-based methods and iii) texture (local appearance) based
methods [26,158].
Video-based face recognition methods can be classified into two
main groups: i) Set-based methods and ii) Sequence-based meth-
ods. Set-based methods treat the frames of a video sequence as a
collection of images, without paying attention to the temporal or-
der of the frames. On the other hand, sequence-based methods use
the frames by keeping their temporal order. Hence, the dynamics
of the face over time also plays a role in the recognition of the
person.
It is very difficult to give a clear-cut taxonomy of all the work
on face recognition in the literature. Hence, the proposed taxon-
omy in Fig. 1 is a coarse grouping of the methods in the literature
and the algorithms in some groups may have overlapping proper-
ties.

2.2. Main steps of face recognition

Face recognition systems traditionally consist of six main stages


(see Fig. 2):

i) Input image or video of the face is acquired.


ii) Face anti-spoofing module ensures the security of the system
by employing presentation or adversarial attack detection (via
liveness tests etc.).
iii) Face and/or facial landmarks are detected in the image or each
video frame.
iv) Pre-processing is performed on the image or video, which may
consist of alignment, video frame selection, noise reduction,
contrast enhancement or similar operations.
v) Facial feature extraction from the image or video. Image-based
methods use holistic, model-based or texture-based feature
extraction approaches, whereas video-based methods use set- Fig. 1. The taxonomy of image-based and video-based face recognition systems in
based or sequence-based approaches. the literature.
vi) Face identification or verification is performed.
4 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

classified facial landmark detection methods into three categories


as holistic methods, Constrained Local Model (CLM) methods, and
regression-based methods [366]. Another possible categorization is
to group them as generative methods and discriminative methods
[163].
In order to evaluate the landmark localization performance two
different metrics can be used: ground truth based localization er-
ror and task-oriented performance [283]. Due to the most recent
advances in deep-learning techniques, the performance of facial
landmark extraction methods have been substantially improved,
even on in-the-wild datasets [375,365,31]. There are methods de-
veloped for multi-task learning, which combine face detection, and
landmark localization with other tasks pose estimation and gender
recognition [398,280,278]. Recently, single face tracking on mobile
devices using deep learning has also been investigated [199].

2.3. Databases

Early works on face recognition were based on rather small-


scale databases, recorded in laboratories under controlled condi-
tions. One of the first image-based databases is ORL [294] con-
tained 400 images from 10 subjects. Similarly, one of the first
video-based face databases released in 1997 [30] consisted of 70
videos from 40 subjects. In recent years, face recognition databases
have become large-scale with millions of images recorded under
uncontrolled conditions or tens of thousands of videos.
Fig. 2. The main steps of face recognition systems. Images are taken from the UvA-
NEMO database [81,82]. (For interpretation of the colors in the figure(s), the reader The face databases used for face recognition can be grouped
is referred to the web version of this article.) as image-based and video-based face databases. We summarize the
main image-based and video-based face recognition databases in
Below, we give a brief review of the face detection and facial the literature in Table 2, 3, 4, and Table 5, respectively. A graphical
landmarking methods in the literature. Accurate and effective face temporal representation of image-based, video and 3D face recog-
detection and facial landmarking algorithms increase the accuracy nition databases is also provided in Fig. 3.
of face recognition systems.
2.4. Evaluation metrics
2.2.1. Face detection
Face detection is estimating the bounding-box of the face in a With the rapid increase in the usage of face recognition systems
given image or frames of a video. If there are multiple faces in in our daily life, the performance of these systems has become a
the images, all of them are detected. The face detection should critical issue. In order to measure the performance of biometric
be robust to pose, illumination and scale differences and should systems, a number of evaluation metrics have been suggested by
eliminate the background as much as possible [279]. researchers.
The Viola-Jones face detector [338] is a widely used face de- Face recognition can be performed using an identification or
tector, which works well for frontal faces. It is based on Haar-like verification (authentication) approach. The evaluation metrics and
features and works in real-time. Other approaches have also been charts, which are commonly used for face verification are [88,271]:
proposed, which use the color information as well [56,322,93].
Recently, deep learning based face detectors have provided suc- • False Match Rate (FMR) (also known as False Accept Rate (FAR)):
cessful results [279,196,146]. In a recent method, faster R-CNN, It is the percentage of impostor (intruder) samples, which are
which uses the region proposals approach, has been used and was incorrectly recognized as the claimed identity.
initially proposed for object detection [162,285]. There are also • False Non-Match Rate (FNMR) (also known as False Reject Rate
other deep learning based face detection methods, which use the (FRR)): It is the percentage of genuine samples, which are in-
sliding-window idea [97,382,193]. The single shot detector (SSD), correctly rejected.
which was first proposed for object detection [204], has also been • Accuracy: It is the percentage of samples, which are correctly
successfully used for face detection [383,401]. classified.
• Geniune Accept Rate (GAR) (also known as True Acceptance Rate
2.2.2. Facial landmarking (TAR)): It is the percentage of genuine samples, which are cor-
After the face is detected, facial landmarks on the face (corners rectly accepted (i.e. TAR = 1-FNMR).
of the eyes, eyebrows, and the mouth, the tip of the nose, etc.) can • Detection Rate: It is percentage of intruders (not the samples)
be estimated to be used for face alignment. Aligning the face to a that are correctly detected.
canonical position has been shown to be beneficial for face recog- • Equal Error Rate (EER): It is the error rate at which FMR and
nition [22]. Examples of facial landmarks are shown with yellow FNMR are equal.
points in Fig. 2, which have been estimated using the ensemble of • The Receiver operating characteristics (ROC) Curve: This curve is
regression trees approach [169]. the plot of FRR versus FAR obtained at different detection
In the beginning of 2010s, different methods were proposed threshold values. The ROC curve can be also obtained by plot-
in order to perform face alignment procedures and it was shown ting TAR versus FAR. The Area under the ROC Curve (AUROC)
in the studies that these methods showed high performance [371, is the metric that represents the system’s performance, which
397,284]. There are survey papers that summarize the studies on takes the values between 0.5 (random selection) and 1 (perfect
facial landmarking [42,283,163,295,60,348,366,31]. Wu and Ji have classification).
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 5

Fig. 3. The databases used for face recognition. 2D databases are shown with blue color, video databases are shown with yellow color, 3D databases are shown with green
color and hyper-spectral/infra-red databases are shown with red color.

Table 2
Image-based databases for face recognition published between 1994-2002. Abbreviations should be interpreted as follows: V (various), N (No), Y (Yes), FE (number of
different facial expressions), IL (illuminations), PO (head poses), OC (occlusions, e.g. hand, hair, eyeglasses, beard...), TI (recording times), AC (accessories), BG (backgrounds),
ET (ethnicities). Depth means the number of images for each subject is high, and breadth means the number of subjects is high with as many images as possible for each
subject.

Database Year Img. size, color  subjects, Properties


 images
ORL [294] 1994 92 × 110, N 10, 400 All pictures are frontal
HRL [130] 1995 193 × 254, N 10, ∼ 800 IL: 77-84.
Color FERET [268] 1996 256 × 384, Y 1199, 14.051 FE: 2, IL: 2, PO: 9-20, TI: 2.
Yale [107] 1997 320 × 243, N 15, 165 With and without glasses, IL, FE
JAFFE [219] 1998 256 × 256, N 10, 213 FE: 7
UMIST [114] 1998 220 × 220, N 20, 564 Various angles from left profile to right profile.
AR [226] 1998 576 × 768, Y 116, 3.288 FE: 4, IL: 4, OC: 2, TI: 2.
Georgia Tech [250] 1999 640 × 480, Y 50, ∼ 1.500 Frontal and/or tilted faces with FE, IL.
Univ. of Oulu [312] 1999 428 × 569, Y 125, 2.000 All are frontal images with 16 IL.
CMU PIE [306] 2000 640 × 486, Y 68, 41.368 PO: 13, IL: 43, FE: 3.
Human Scan [161] 2001 384 × 286, N 23, ∼ 1.500 Mainly frontal views.
Equinox Inf. [309] 2001 240 × 320, N 91, 25.000 Camera spectral range: 8-12 micrometer visible, IL: 3, FE: 3.
Yale B [108] 2001 640 × 480, N 10, 5.760 PO: 9, IL: 64.
Notre Dame HumanID [265] 2002 1600 × 1200, Y ≥ 300, ≥ 15.000 IL: 3, FE: 2, TI: 10-13.
Indian Face [159] 2002 640 × 480, N 40, 440 Images taken with a bright homogeneous background and
subjects in an upright, frontal position.

Table 3
Image-based databases for face recognition published between 2003-2009. Abbreviations should be interpreted as follows: V (various), N (No), Y (Yes), FE (number of
different facial expressions), IL (illuminations), PO (head poses), OC (occlusions, e.g. hand, hair, eyeglasses, beard...), TI (recording times), AC (accessories), BG (backgrounds),
ET (ethnicities). Depth means the number of images for each subject is high and breadth means the number of subjects is high with as many images as possible for each
subject.

Database Year Img. size, color  subjects, Properties


 images
Korean Face (KFDB) [32] 2003 640 × 480, Y 1000, 52.000 FE: 5, IL: 16, PO: 7.
CVL Face [310] 2003 640 × 480, Y 114, ∼ 800 Profile left/right, 45 degrees left/right, frontal, frontal smile,
frontal smile with teeth.
CAS-PEAL [104] 2003 360 × 480, Y 2.747, 30.900 FE: 6, IL: 9-15, PO: 21, AC: 6, BG: 2-4, TI:2.
FRGC(3D) [266] 2004 1704 × 2272 or 466, 50.000 Images taken under controlled and uncontrolled conditions.
1200 × 1600, Y
FEI Face [325] 2006 640 × 480, Y 200, 2.800 Images are taken with a white homogeneous background in
an upright frontal position or profile rotation up to 180
degrees.
BU-3DFE (Static) [389] 2006 V, Y 100, 2.500 FE:7, Ages: 18-70 years, ET: 6.
MORPH [286] 2006 V, Y 13.618, 55.134 Gender:%81 Male, %19 Female Ages: 16-77 years, ET: 4.
LFW [150] 2007 150 × 150, Y 5.749, 13.233 Un-posed photos, mainly frontal views.
CASIA 3D [413] 2007 V, Y 123, 4.624 FE: 6
MUCT [243] 2008 480 × 640, Y 276, ∼ 3.500 Frontal and three-quarter views. IL, manual landmarks.
Bosphorus [296] 2008 V, Y 105, 4.652 FE: 34, PO: 13, OC: 4.
CMU Muti-PIE [116] 2008 V, Y 337, ≥ 750.000 PO:15, IL:19, Some high resolution frontal images.
CUFS [353] 2009 V, Y 606, ∼ 1.200 One frontal image and one sketch for each subject.
CASIA HFB [194] 2009 V, Y 100, 800 4 VIS and 4 NIR face images per subject.
CUFSF [354] 2009 V, N 1.194, ∼ 2.400 LI, Sketch with exaggeration drawn by an artist.
6 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

Table 4
Image-based databases for face recognition published between 2010-2018. Abbreviations should be interpreted as follows: V (various), N (No), Y (Yes), FE (number of
different facial expressions), IL (illuminations), PO (head poses), OC (occlusions, e.g. hand, hair, eyeglasses, beard...), TI (recording times), AC (accessories), BG (backgrounds),
ET (ethnicities). Depth means the number of images for each subject is high, and breadth means the number of subjects is high with as many images as possible for each
subject.

Database Year Img. size, color  subjects, Properties


 images
CASIA NIR-VIR 2.0 [195] 2010 640 × 480, Y 725, ∼ 36.000 1-22 VIS and 5-50 NIR face images per subject.
FG-NET [52] 2011 V, Y 82, 1.002 Ages:0-69 years, %58 Male, %42 Female.
10K US Adult Face [20] 2013 72 × 256, Y 2.222, 10.168 Memorability scores, computer vision and psychology
attributes, and landmark point annotations.
CASIA WebFace [87] 2014 V, Y 10.575, 494.414 Celebrities.
Cross-Age Celebrity [50] 2014 V, Y 2.000, ≥ 160.000 Celebrities with ages 16-62.
CelebFaces+ [315] 2014 V, Y 10.177, 202.599 Celebrities, Private.
Facebook [320] 2014 V, Y 4.400, 4.4M Private database.
Google [299] 2015 V, Y ≥ 10M , ≥ 500M Private database.
VGG Face [263] 2015 V, Y 2.622, 2.6M Celebrities, face annotations with bounding boxes and pose.
MS-Celeb-1M(Ch1) [121] 2016 V, Y 100.000, 10M Breadth; celebrities; knowledge base.
MS-Celeb-1M(Ch2) [121] 2016 V, Y 20.000, 1.5M Breadth; celebrities; knowledge base.
Mega Face [249] 2016 V, Yes 672.052, 4.7M Breadth; the whole long tail; commonalty.
CFPW [300] 2016 V, Y 500, 7.000 Frontal-profile images of celebrities.
VGG Face2 [40] 2017 V, Y 9.131, 3.31M Depth; PO, age, ET; celebrities.
MS-Celeb-1M(Ch3) [121] 2018 V, Y 180.000, 6.8M Breadth; celebrity.
CPLFW [410] 2018 250 × 250, Y 3.968,11.652 PO, Celebrities.

Table 5
Video-based databases for face recognition published between 1997-2018. Abbreviations: V (various), N (No), Y (Yes).
[H]
Database Year Frame size,  subjects, Properties
color  videos
Univ. of Maryland [30] 1997 560 × 240, N 40, 70 6 different facial expressions
XM2VTS Video [240] 1999 576 × 720, Y 295, 1.180 Frontal, profile, speech.
CK [165] 2000 640 × 480, N 97, 486 Each sequence begins with a neutral expression and ends
with a peak expression, FE: 6.
CMU Motion of Body (MoBo) [117] 2001 V, Y 25, 100 All subjects are captured using six high resolution color
cameras distributed evenly around the treadmill.
Honda/UCSD [189] 2002 640 × 480, Y 20, 1.500 Indoor environment, 15 fps, video length > 15 seconds.
Texas Univ. [256] 2003 720 × 480, N 284, ∼ 2.500 Four different categories: still facial mug shots, dynamic
facial mug shots, dynamic facial speech, and dynamic facial
expression.
Max Planck [34] 2003 786 × 576, Y 246 Facial Action Units recorded from six different viewpoints.
Banca Multi-Modal [19] 2004 720 × 576, Y 208, ∼ 2.500 Four languages; each subject was recorded during 12
different sessions over a period of 3 months.
YouTube Celeb. [174] 2008 V, Y 47, 1910 All videos are encoded in MPEG4 at 25 fps rate.
BU-4DFE [388] 2008 1040 × 1329, Y 101, 606 3D data at 25 fps; 6 different facial expressions.
MOBIO [236] 2009 V, Y 152, 1.824 Collected between 2008-2010 at six different sites from five
countries; 12 sessions for each subject.
YouTube Face [361] 2011 V, Y 1.595, 3.425 Videos downloaded from YouTube. An average of 2.15
videos for each subject.
PaSC Video [29] 2013 V, Y 265, 2.802 Balanced with respect to distance to the camera, alternative
sensors, frontal versus not-frontal views, and different
locations.
IJB-A [180] 2015 V, Y 500, 2.085 Manually localized face images.
IJB-B [359] 2017 V, Y 1.845, 7.011 Manually localized face images
UMDFaces [22] 2017 V, Y 3.107, 22.075 Breadth; video.
IJB-C [235] 2018 V, Y 3.531, 11.779 This dataset has 10.000 non-face images.

The evaluation metrics and charts commonly used for face Anti-spoofing in face recognition usually means liveness detec-
identification are: i) rank-1 accuracy and ii) Cumulative Match tion, or presentation attack detection, which can be done by sens-
Characteristic (CMC) curve, which is the plot of identification rate ing physiological movements such as eye blinking [362,9,308,188,
at rank-k (correct identity is among the top-k results). 62,191], facial expression changes, mouth movements [183,59,136,
The aforementioned evaluation metrics and the charts are used 166], or head movements [187]. Detecting the heart rate from a
for benchmarking and comparison purposes in face recognition face video is another method for liveness detection [196,254]. This
challenges [267,239,266] and protocols [287,235]. technique is called as remote (non-contact) photoplethysmography,
which utilizes the subtle color changes of the skin, which occur
2.5. Face anti-spoofing each time the heart beats and pumps blood to the body [70,57].
Other countermeasures can include different biometric modal-
Although face recognition is an easy to use a biometric trait, a ities such as gait and speech. Indeed, multi-modal systems [308]
major problem is its vulnerability to spoofing attacks done by pho- are intrinsically more difficult to spoof than uni-modal systems
tos, videos or 3D masks [94,276,301]. Spoofings attacks are most [172]. More information about overcoming 2D photo spoofing at-
common during the recording of biometric data, feature extraction, tacks can be found in [33,260,61,253,123,99].
or the decision phase. There are also other types of attacks on the Deep CNN based methods have recently become popular face
network or the database where biometric data is stored [17]. anti-spoofing [192,301]. In [248], the performances of different
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 7

CNN architectures for face anti-spoofing were investigated. In the PCA algorithm, IPCA algorithm was proposed for face recogni-
[212], a deep tree learning (DTL) method was proposed for zero- tion. Since the IPCA algorithm used in the literature does not give
shot face anti-spoofing (ZSFA). ZSFA is the detection of spoof at- a guarantee about the error rate, the authors used SVDU based
tacks that do not exist in training data, such as partial paper or IPCA. Zhang et al. [396] proposed a new method of obtaining op-
transparent mask attacks. A spatio-temporal anti spoofing network timal projective vectors from diagonal face imagery without an
(STASN) was proposed in [386], which can focus on subtle cues image to vector transformation and called the method of Diago-
such as borders and moire patterns to detect spoof faces. They also nal PCA (DiaPCA). In experimental studies, it was observed that
presented data collection and synthesis solutions. A multi-modal DiaPCA yielded more successful results as compared to 2DPCA and
face anti-spoofing challenge was also performed recently [400] us- PCA for face recognition.
ing the CASIA-SURF multi-modal dataset. This study summarized Independent Component Analysis (ICA) performs a linear trans-
the results of the most successful teams and gave important infor- formation so that the statistical independence between the com-
mation about future research directions [201]. ponents is guaranteed. In the study by Liu and Wechsler [202],
the usefulness of the ICA algorithm for face recognition was in-
3. Image-based face recognition vestigated. An evaluation was made on the sensitivity of the ICA
to the size of the space it was applied and also its discrimi-
Image-based face recognition approaches mainly involve recog- nant performance was compared to other criteria such as Bayes or
nition by using features from a single frame. Image-based FR Fisher. Discriminant analysis showed that when ICA criteria were
methods can be grouped as appearance-based (or holistic), model- performed in a properly compacted and whitened area, it per-
based, and texture (or local feature) based methods. Below, more formed better than Eigenfaces and Fisherfaces methods for face
details about these approaches will be given. recognition. In 2003, Liu and Wechsler introduced the indepen-
dent Gabor features (IGFs) method and its applications in face
3.1. Appearance-based (holistic) face recognition recognition [203]. The innovations that this study brings to the
literature are the derivation of Gabor features in the feature extrac-
The expression ‘appearance-based’ was introduced by Murase tion stage and formation of IGF features-based probabilistic reason-
and Nayar [247]. Appearance-based methods use the detected face ing model (PRM) classification method. As a result, 180 features
region as a whole and try to represent it in a lower-dimensional were obtained using IGF method for Face Recognition Technology
subspace. The lower-dimensional subspace is learned from the (FERET) face database [268] and 88 features for ORL (Olivetti Re-
training set using linear or non-linear methods. search Laboratory) database [294]. Experimental studies on these
two datasets gave face recognition accuracies of 98.5% and 100%,
3.1.1. Linear methods respectively. Deniz et al. [79] proposed a method which uses ICA
One of the most well-known appearance-based face recognition as a feature extractor and SVM as a classifier for face recognition.
methods is Eigenfaces [333], which uses the principal component In this study, ICA/SVM and PCA/SVM were applied on two different
analysis (PCA) to linearly project the images onto a lower dimen- face databases and it was observed that the accuracies were sim-
sional space learned by using the images in the training set. The ilar. It was concluded that PCA/SVM is better for face recognition
test image to be recognized is first projected onto this lower di- since the duration of training with ICA lasts much longer than PCA.
mensional space, then the identity is determined by comparing it In a study conducted by Zhi and Liu [412], a method was proposed
with the projections of the gallery images in the training set. In that PCA was used to extract features from gray-scale face images,
the study conducted by Yang et al. [377], a two-dimensional basic Genetic Algorithm was used to optimize the network’s weights and
component analysis (2DPCA) method was proposed. The difference SVM was used for classification. As a result of the experimental
between this method and the principal component analysis is that studies conducted with the Cas-Peal database collected in 2003,
the image matrix is not transformed into a one-dimensional vector 99% facial recognition success was achieved.
prior to feature extraction. Instead, the image covariance matrix is Recently, a multi-fold cross convolution method has been pro-
generated directly from the original image matrix. As a result of posed for condensing the Gabor, PCA and ICA filters, which re-
the experimental studies conducted on three datasets, it was ob- sult in superior image descriptors. In another work [102], a linear
served that the recognition rates obtained from 2DPCA were better mapping is learned using Bayesian sample steered discriminative
than PCA. regression (BSDR) for each class to extract the image class label
In another linear projection method [27] Fisher’s Linear dis- features, which are then classified using a nearest neighbor classi-
criminant was used with the goal of generating well-separable fier.
classes in the lower dimensional space. The method is known
as FisherFaces and was shown to be better than the Eigenfaces 3.1.2. Non-linear methods
method on the Harvard and Yale Face Databases. The most com- A group of non-linear appearance-based methods uses kernel
mon problem in the traditional linear discriminant analysis (LDA) based approaches. Kernel PCA was proposed as a non-linear exten-
comes up with small sample size (SSS) of datasets. In order to sion of PCA [173]. The aim is to make a non-linear mapping of the
overcome this problem, Wang and Tang proposed A dual-space data and then calculate the principal components of the features
Linear Discriminant Analysis [352] approach. It was observed that after this mapping. Kim et al. [173] observed that the system using
the method yielded more successful results with less number of Kernel PCA and SVM classifier had a smaller error rate as com-
features. Another approach which uses regularized Fisher’s dis- pared to other methods on the ORL database. A method based on
criminant criterion, was proposed by Lu et al. [215]. The exper- kernel-based discriminant analysis was proposed by Lu et al. [214]
imental studies using FERET database showed that the proposed to reduce the complexity caused by different emotions and other
method yields more successful results in face recognition than the difficult conditions. It was observed that the proposed method is
Eigenfaces method and LDA-based variations which are proposed more successful than kernel principal component analysis (KPCA)
to solve the SSS problem. and generalized discriminant analysis (GDA).
Zhao et al. [407], proposed the method of singular value de- Locally Linear Discriminant Analysis (LLDA) method was pro-
composition updating based on incremental principal component posed by Kim and Kittler [178] in order to align local structures
analysis (SVDU-IPCA) for face recognition. Due to computational linearly within global non-linear data structures. They reported
cost and memory-requirement burden problems encountered in that LLDA has low computational cost in face recognition pro-
8 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

cesses compared to Kernel Linear Discriminant Analysis (KLDA) In a recent study [405], a face frontalization method based on
and GDA. ISOMAP [323,381] and Locally Linear Embedding (LLE) appearance-flow-based CNN is proposed. In [13], an adaptive pose
[288] are methods in which non-linear manifolds are learned from alignment method is proposed, which adaptively learns alignment
low dimensional input-space. These methods gave promising re- templates using facial poses, and then aligns the test and training
sults when compared to other methods. images using these templates.
Recently, a method for illumination robust pre-processing
3.1.3. Deep learning based methods method has been proposed [402], which removes soft and hard
Deep neural networks (DNN) have been successfully used in shadows and retains identity-related information.
face recognition systems since 2014 due to the increase in pro- Deep feature extraction: The most important decisions for de-
cessing power and the compilation of large databases containing signing deep CNNs for feature extraction are the choice of network
(multiply) labeled samples [320,315,317,319,299]. The DeepFace architecture and loss function.
method [320] based on DNN yielded a face recognition rate of The architecture of deep CNNs can be grouped as backbone net-
about 97.35% on the Labeled Faces in the Wild (LFW) database con- works and multiple networks. After their high performance in Ima-
taining thousands of face images taken under unrestricted condi- geNet [290] competitions, deep learning networks such as AlexNet
tions [150], which is very close to the human level (97.53%). Since [184], VGGNet [307], GoogleNet [318], ResNet [137] and SENet
then, the accuracy on the LFW dataset has reached 99.80% [346]. [145], which are known as typical CNN architectures, have at-
Although deep-learning based methods are non-linear appearance- tracted the attention of researchers. These networks and their vari-
based methods, deep learning based methods are reviewed under ations have also been used for face recognition. In addition to the
a new subsection in this survey study, because of the fact that mainstream deep neural networks, networks with multiple struc-
deep-learning based methods have been used in the majority of tures have been proposed for multi-task learning including face
facial recognition studies in the literature lately and facial recogni- recognition [280,131].
tion achievements are much higher than the other methods. The choice of the loss function is also important for training
Although DNN methods give very high accuracy rates for both the deep CNN for face recognition. It was observed that for face
face identification and verification using images, their robustness recognition, the softmax loss function is not sufficient for sep-
under adverse conditions is being investigated [115]. They provide arating the features since within-class variations are larger than
high recognition accuracy using a large number of high-quality between-class variations. Therefore, to make the features more
images recorded under uncontrolled environments. However, if discriminative, other loss functions have been proposed such as
there are severe illumination variations, noise or when the images Euclidean-distance-based loss [370,356], triplet loss [299], angu-
have low resolution, lower recognition accuracy has been reported lar/cosine-margin-based loss [206], and variations of soft-max loss
[115]. Therefore, under adverse conditions, video-based approaches [205,207].
may provide helpful facial dynamics information. Feature extraction is especially challenging for the one-shot
Deep learning based face recognition methods consist of mainly face recognition problem. One-shot (or low-shot) face recognition
three stages [346]: Face pre-processing, deep feature extraction, refers to the case when there is only a single (or a few) im-
and face matching. We provide brief information about each of ages of some subject in the gallery [156,120,367,345,85], which
these steps below. is still an open research problem, since it is difficult to represent
Face pre-processing: Deep learning based face recognition the variance of data with a few samples. There are various ap-
methods provide a certain robustness in recognizing facial images proaches to tackle the one-shot face recognition problem. In [156],
with different illumination, pose and facial expressions in uncon- intermediate deep attribute representations were used, which were
trolled environments. However, a recent study [109] showed that obtained by fine-tuning a DCNN for specific attributes such as gen-
different illumination, exposure and facial expression have nega- der and face shape, which were shown to perform better than
tive effects on the performance of the network and showed the purely face-based feature representations. In [120], a regulariza-
need for face pre-processing to increase the performance. tion function is used with the cross-entropy loss function, and
In one-to-many augmentation pre-processing approach, the aim a new underrepresented-classes promotional loss is introduced.
is to make the deep CNN pose-invariant by generating images in The best results were achieved in the MS-Celeb-1M Low-shot Face
different poses from a single image during training. This is done Recognition Challenge at ICCV 2017. Another method that uses a
since it is expensive and time-consuming to collect large num- new regularization term is [345]. A different approach for one-shot
bers of images for creating a training database. The first approach face recognition was proposed in [85], which was based on using
for one-to-many augmentation uses data augmentation methods a generative adversarial network (GAN) to synthesize meaningful
such as photometric transformations [184] and geometric transfor- data for one-shot classes, which will enable the classifier to learn
mations [368,376]. The second approach is based on reconstruct- the one-shot classes better. The new samples were generated by
ing a 3D face model first to create 2D images in different poses adapting utilizing the data variances from other classes with more
[229,230]. In the third approach, CNN models are used to generate samples.
2D images directly instead of creating 3D models from 2D im- Face matching: After the selected architecture is trained using
ages and projecting them back into 2D images of different poses the determined loss function and the training data, deep feature
[273,387]. Bao et al. [24] have synthesized a new face from the extraction can be performed for test data, and face identification
features obtained from an input image and any other input image, and verification operations can be performed. Face matching can
and have shown that this synthesis is very successful in produc- be performed using cosine distance or L 2 distance. There are also
ing realistic face images even if it is not within the training data methods, such as metric learning and sparse representation-based
set. Generative Adversarial Networks (GAN) are also used for this classifier (SRC) [346].
purpose [23,46,303,408]. Main results: One of the first studies [320] showed that deep-
In many-to-one normalization pre-processing approach, the goal learning networks, which are successful for object recognition
is to generate the canonical view of the face image by using would be very successful in face recognition. In this work known
face images obtained from different angles in uncontrolled envi- as DeepFace, Alexnet was used as the network architecture with
ronments. Stacked Autoencoders (SAE) [376], CNN [417] and GAN the softmax loss function. The test performance on the Face-
[328] structures are have been used to obtain a frontal face im- book dataset was 97.35%, which was close to the human perfor-
age using patches from multiple images with different angles. mance. Another study that was done in 2014 [315] was named
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 9

as DeepID2, which used AlexNet as network architecture and con- 3.1.4. Sparsity based face recognition methods
trastive loss. The accuracy on the CelebFaces+ dataset was 99.15%. After the advancements in sparse signal coding, sparsity based
One year later DeepID3 was proposed [316], which used VGG- applications in computer vision became popular. Learning a dictio-
Net10 instead of AlexNet as the network architecture with a test nary from training data is an important part of sparse represen-
performance of 99.53%. tation based approaches and has been reviewed in several papers
Schroff et al. [299] introduced FaceNet, which used GoogleNet- [327,289,90,58,100,406].
24 as the network architecture with the triplet loss function and Sparse representation based classifier (SRC) [363] was one of
Google database for training. Google database was the largest the first successful applications of sparsity for face recognition. The
main approach in SRC is to use an overcomplete dictionary with
dataset used for face recognition until then. The triplet loss func-
elements chosen from the training samples, and to represent the
tion involved an anchor image, a positive example of the same
test samples as a linear combination of the training samples from
class and a negative example of a different class. The goal was
the same class.
to update the CNN such that the distance between the matching
In a recent study [372] dictionary learning algorithms for face
pair was minimized and the distance between the non-matching
recognition has been reviewed. Dictionary learning algorithms for
pair was maximized. The verification performance of faceNet on face recognition have been grouped into five categories. The first
the LFW database was 99.63%. category is based on shared dictionary learning and uses variants
Parkhi et al. [262] proposed the VGGFace, which used VGGNet- of the well-known K-SVD algorithm [5], and they are useful when
16 network architecture, the triplet loss function and the VGGFace inter-class variations are not large. When there are large intra-class
dataset for training. The test performed on LFW database yielded variations class-specific dictionary learning approaches have been
98.95% verification performance. In 2016, other methods were pre- used [380]. The third group of algorithms use a commonality dictio-
sented [358,206], one of which [358] used center loss as loss func- nary for handling intra-class variance, and a particularity dictionary
tion and achieved higher test performance on the LFW database. to handle inter-class variance [339,379]. When the sample size is
In 2017, most of the proposed methods preferred the ResNet ar- small auxiliary dictionary learning algorithms have been proposed
chitecture and its variations [403,277,341,210,134,72]. The method [74,355]. When the training and test samples come from different
proposed by Liu et al. [210], used the MS-Celeb-1M dataset [121] domains domain adaptive dictionary learning algorithms have been
for training and the CoCo (congenerous cosine) loss function and used [415,274].
achieved the highest verification performance of 99.86% on the Recently, a method has been proposed to improve the per-
LFW dataset. formance of SRC classifier [384] for low-quality images. A new
Among the several methods proposed in 2018 [272,340,343], dictionary is learned based on extracting the low-rank compo-
the one proposed by Wang et al. [343] provided the highest veri- nents on the training data. Another recent work [144] addresses
the problem of single sample face recognition under varying il-
fication performance on the LFW dataset (99.33%) using variations
lumination. After handling the illumination variations using QRCP
of the ResNet architecture, cosface loss function and the CASIA-
method, an SRC based classifier is employed. In [11], the perfor-
WebFace dataset [87] as the training data. The method proposed
mance of sparsity-based and CNN algorithms have been compared.
by Deng et al. [71] achieved 99.83% verification performance on
The disadvantage of CNN based approaches is that they require
LFW using ResNet-100 network architecture, arcface loss function
large databases for training. In [138], convolutional neural net-
and MS-Celeb-1M data set as training data [71,411]. works and SRC classifier are combined to overcome the difficul-
When the test performance on the LFW database is investi- ties of partial face recognition. Similarly, in [43], dictionary based
gated, it can be observed that there has been an extraordinary sparse representation is constructed using deep features obtained
increase in the performance of face recognition systems in the from a CNN, with the goal of making the CNN features more ro-
last 5 years (from 97.35% to 99.86%). In the study conducted by bust to occlusion. In [101], a low-rank matrix recovery approach is
Wu et al. [364], Intraspectrum discrimination and interspectrum proposed, for face recognition with occlusion.
correlation analysis deep network (IDICN) approach was proposed
in order to increase the success in recognition of multi-spectral 3.2. Model-based face recognition
facial images. In the experimental studies conducted with 3 dif-
ferent multi-spectral face image databases, facial recognition suc- In this section we summarize the model-based face recognition
cesses ranging from 99.70% to 100% were achieved. Recently, a methods as 2D methods and 3D methods.
method for handling extreme out-of-plane pose variations have
been proposed, which uses pose-specific deep CNNs [228]. Huang 3.2.1. 2D methods
et al. [148] conducted extensive experiments to investigate the ef- The most well-known 2D model-based approach is the Active
fect of re-sampling or cost-sensitive methods on learning success Appearance Model [64]. In AAM, the statistical appearance model
in imbalanced-class data. They created more balanced boundaries is matched by controlling a set of model parameters for the shape
and gray level variation modes learned from a training set. AAM
by using a deep learning network, especially in order to sustain
uses an efficient iterative matching algorithm by learning the rela-
the margins between classes. The successes achieved by combin-
tionship between perturbations in the model parameters and the
ing this method with a simple kNN cluster algorithm are 99.62%
induced image errors. In 2009, a face recognition system was pro-
and 96.5% for LFW and YTF databases, respectively. In [106], a
posed with constrained AAM [357]. In this work, 58 predetermined
two-stream CNN is proposed to recognize low-resolution faces. The
feature points were manually marked and 40 pictures were taken
teacher stream consists of a complex CNN for high-accuracy recog- for the training process. Constrained AAM/NNC (Nearest Neighbor
nition and the student stream a simpler CNN for low-cost recog- Classifier) achieved 92.5% face recognition performance.
nition. A new coupled mapping method using a two-branch deep In order to model previously unrecognized face images more
learning network to perform facial recognition from low-resolution accurately, Multi-Model AAM has been proposed that distinguishes
images was proposed by Zangeneh et al. [393]. The proposed struc- faces from similar shapes before conventional AAM and allows the
ture is based on mapping low and high-resolution face images of generation of several AAM models [171,220]. It has been observed
2-branch DCNN in a common space with non-linear transforma- that, this new model significantly improves the performance in im-
tion. ages that are problematic in both shape and texture. Elastic bunch
10 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

graph matching (EBGM) is another important model-based method and facial expressions. In the proposed method, while histograms
for face recognition [186,360]. The aim is to recognize the identity of oriented gradient and Gabor wavelets are used for feature ex-
in a large data set even a with single image. In this method, the traction, fuzzy ARTMAP classifier is used for recognition. In exper-
fiducial points of the face are described using wavelets and con- imental studies, the correct classification rate for AR database was
verted into an image graph in order to reduce the variations in achieved as 94.22%.
the image. In order to improve the performance of the EBGM algo-
rithm, “retinex and color constancy” preprocessing algorithm was 3.3.2. SIFT, local binary patterns and variants
proposed to solve the illumination problem [170]. In order to in- Another method for face recognition using the orientation de-
crease the performance of the EBGM algorithm, features based on scriptor is the Scale-invariant Feature Transform (SIFT) [213]. Al-
Gabor wavelets, which are called Gabor Jets [373] were used with though the method is generally resistant to scale and rotation
EBGM, and beta filters were associated with the EBGM [89]. changes, the computation time increases proportionally with the
number of feature points. Local binary patterns (LBP) was ini-
3.2.2. 3D methods tially proposed for texture analysis [255], and has been adopted
People can recognize the faces even when all the details are
for many applications in computer vision [149].
no longer resolved. The remaining information in the sensation of
The method proposed by Shen and Chiu [302] used the orien-
face is basically geometrical and represents what is remembered
tation information of SIFT and the texture information of the local
in a coarse resolution [37]. In [335,336], landmarks measures and
binary patterns (LBP) together in order to increase the recognition
geometrical features including curvature and shape are analyzed
performance and reduce the computation time. The experimental
for 3D human face description, which are also used for face recog-
results on the FERET database showed that the proposed LBP ori-
nition. Three dimensional face registration and recognition can be
entation descriptors decreased computation time by 30% compared
performed using 3D facial shape indexes based on facial curva-
to the original SIFT descriptors. There are also other studies in the
ture characteristics and other features such as surface normals
literature that use orientation information and texture information
[311,112]. In [111] new features based on discrete Fourier trans-
together [344,221,330].
form, discrete cosine transform, nonnegative matrix factorization,
A study by Liao and Chung [197] proposed a new descriptor
and principal curvature directions to represent shape are investi-
named Local Gradient Orientation Binary Pattern (LGOBP) and a
gated.
new saliency measure function based on Generalized Survival Ex-
In the literature, 3D morphable models were widely used for
ponential Entropy (GSEE) to identify the most prominent regions
expression invariant face recognition systems [12,10]. In [18], mul-
in face images. In the experimental studies, it was observed that
tiple images of a face were obtained from a single image using a
GSEE + LGOBP method gave better results than Bee Baseline and
3D morphable model in order to be classified using the Fisherface
LBP method.
method. Experimental results using the ORL face database [294]
Meena and Suruliandi [237] performed a comparative test for
and UMIST face database [114] showed that the proposed method
face recognition using Local Binary Patterns and its variants. In
was more successful than the conventional Eigenface method. In
this study, which compared LBP, Multivariate LBP (MLBP), Cen-
FRVT 2006 (Face Recognition Vendor Test), the performance of
identification systems using front-facing images, 3D information ter Symmetric LBP (CS-LBP) and LBP Variance (LBPV) algorithms,
and iris biometric information were compared [269]. Dual camera it was observed that CS-LBP was more successful than other LBP
systems were also used together with active appearance models variants as a result of the tests performed on three different face
(AAM) to reconstruct 3D models of the face [47], which show that databases. There are also other papers, which improve LBP features
3D models are robust to facial expressions and photo spoofing at- [197,251,385,344,221,330] or apply them to infrared face images
tacks. [369]. Xie et al. [369] proposed a method, which divides the in-
frared face image into non-overlapping local regions and applies
3.3. Texture (local feature) based face recognition pattern selection to the LBP features obtained from these regions.
In [251], the steps of face detection, face localization and face
Texture based face recognition methods in the literature used recognition were performed using a single LBP transformation. This
local feature descriptors. Below we summarize several widely used work also presents an innovative approach using LBP transforma-
approaches for describing local features and their applications for tion for eye-pupil detection.
face recognition. In a study by Wang et al. [344], a fusion method was pro-
posed, which used Local Difference Binary (LDB) descriptors [385]
3.3.1. Gradient orientation based methods to extract local features from a face image, and HOG descriptors to
Histogram of Oriented Gradients (HOG) was used for in face extract edge features. In the experiments using the ORL and Yale
recognition as well as other recognition and detection tasks in databases [107], it was observed that the proposed method was
computer vision [66]. In a study conducted by Albiol et al. [1], more successful than the LBP/HOG fusion method and the com-
EBG was used to find the facial landmarks. Then, the HOG de- putation time was shorter. In a study conducted by Mady and
scriptor was calculated for each facial landmark in the graph and Hilles [221], a method which used Viola-Jones algorithm for face
the nearest neighbor algorithm was used as the classifier. In an- detection, HOG, and LBP algorithms for feature extraction and Ran-
other work, HOG descriptors were calculated from regular grids dom Forest as a classifier was proposed. In the experimental stud-
and these were used for face recognition [78]. An extension of the ies on the mediu staff database [222], the average test accuracy
HOG algorithm was presented for use in face recognition problems of the proposed method was 97.6%. Compressive binary patterns
in [86]. Each sub-region of the face image was given a specific (CBP) have been proposed, which improves LBP using random-field
weight and a method called Co-occurrence of Oriented Gradients eigenfilters [76].
(CoHOG) was tested using the Yale Face database [107] and the Recently, The Triangular Coil Pattern of Local Radius of Gyration
ORL Face database. It was concluded that CoHOG algorithm is Face (TCPLRGF) method, which is a variation of the Local Radius of
more successful than HOG algorithm and the use of gradient mag- Gyration Face (LRGF), has been proposed by Kar and Neogi [167].
nitude increases the CoHOG face recognition rate. A new open- In experimental studies using AR, CMU-PIE and Extended Yale B
set face recognition system has been proposed by Al-Obaydy and Database, 100%, 98.27% and 96.35% facial recognition success were
Suandi [8] to perform facial recognition under different lighting observed, respectively.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 11

Table 6
A summary and comparison of main set-based video face recognition methods in literature published between 2000-2007.

Author and Used methods for Classification Database and face Main results
year facial features algorithm recognition rate
de Campos et al. (2000) [39] Gabor Wavelet Network, k-Nearest 174 images from 29 people, Proposed method is better than
Karhunen-Love Neighbor voting 97.7% Recognition rate traditional PCA for face
scheme recognition.
Liu et al. (2003) [211] Facial asymmetry information Linear 110 subjects from FERET The information obtained from
and conventional eigenface and Discriminant Database (classification error face asymmetry can be used for
fisherface Analysis rate reduced by 38%), 55 face recognition.
subjects from Cohn-Kanade
Database (classification error
rate reduced by 45%)
Gunturk et al. (2003) [118] Super-resolution reconstruction Euclidean Distance 68 video sequences from CMU Super resolution reconstruction
and Eigenface database, 79% recognition rate provides higher face recognition
performance compared to low
resolution images.
Park and Jain (2007) [261] 3D modeling Match score CMU’s Face In Action (FIA) video The 3D model has increased the
database with 221 subjects, 70% match performance by 40%.
Recognition rate
Liu and Chen (2007) [208] 3D ellipsoid images (Face mosaic Distance 29 subject from FIA database Mosaic model works better than
model) Measurement [110], 4.14% Error rate PCA method for face recognition.
Stallkamp et al. (2007) [313] Distance to-model (DTM), GMM and kNN 2,292 video sequences of 41 the combination of DTM and
distance-to-second-closest subjects recorded during 6 DT2ND methods gave the most
(DT2ND) and their combination. months (own database), EER 18% successful result for face
with GMM-l, EER 21% with kNN recognition.
Arandjelovic and Cipolla (2007) [15] Extended version of the Genetic Matching with Cambridge Face Database In experimental studies, it was
Shape Illumination Manifold artefact model (CamFace) and the Toshiba Face stated that the error rate in face
(gSIM) Database (ToshFace), 1300 video recognition decreased by 50%.
sequence from 160 individuals,
average error rate is 3.4%

4. Video-based face recognition 4.1. Set-based methods

Humans use both rigid facial features and dynamic facial fea- In the image set-based approach for face recognition, frames
tures to recognize other people around them [125]. The results of of a video are treated as a set of image samples and the tempo-
psychological and neurological studies can be summarized as fol- ral order is not considered. Set-based approaches can be classified
lows [182,257,181,124]: as methods that use fusion before matching and after matching.
Fusion before matching involves combining features obtained from
• The rigid features of the face give more reliable results than each face image before the recognition process. Fusion after match-
the dynamic features. ing technique combines the recognition results obtained from each
• Dynamic features contribute more to the success rate of the image. This combination can be done using score, rank or decision
system under stringent conditions (such as low lighting, low level fusion [26]. de Campos et al. [39] used the Gabor Wavelet
resolution, recognition from a distance). Network [185] to locate the face and segment the face into mouth,
• Facial dynamics are less sensitive to illumination changes and eyes and nose regions. Then, feature extraction was carried out us-
other appearance changes (beard, glasses, makeup, etc.). ing Karhunen-Loeve transform and the features obtained from the
• Learning of facial dynamic features is slower than rigid fea- regions for each frame were classified by kNN. Finally, the vot-
tures. ing scheme classifier [2] was used to obtain a final score from
• Facial dynamics is helpful to recognize the face since it helps the scores which are obtained from every frame. In the proposed
to capture the 3D features. method, the recognition rate was 97.7%, which was 3% higher than
• The facial dynamics make it easy for people to recognize the the traditional PCA method.
faces they are familiar with. For unfamiliar faces, the face In a survey paper [26], image set-based face recognition
video is perceived only as a sequence of rigid multiple images. methods have been grouped under four major groups: i) Super-
• Face dynamics is useful for gender estimation. resolution based methods [118,15,91], ii) 3D Modeling based
• Dynamics of emotional expressions are independent of age, methods [261,208,324,347], iii) Manifold modeling and iv) Frame
and they are consistent over the years. selection based methods [168,313].

In some studies, facial dynamics refer to both non-rigid move- 4.1.1. Super-resolution and cross-resolution methods
ments on the face as well as rigid movements of the head. This In a super-resolution based method [118], the low-resolution im-
review focuses on facial expressions and facial dynamics resulting ages obtained from surveillance cameras are used to obtain the
from speech movements and different emotions. Video-based face information contained in the high-resolution image. Instead of in-
recognition systems in the literature can be broadly grouped as: i) creasing the resolution and extracting the feature from a high-
Set-based methods, and ii) Sequence-based methods. Below, stud- resolution image, the super-resolution process was applied to the
ies in the literature under these two headings are summarized. feature vectors obtained from many low-resolution pictures and
We also tabulate main set-based methods in Table 6, Table 7, only facial information was obtained. Thus, a significant reduc-
Table 8 and Table 9. Sequence-based methods are tabulated in Ta- tion in computational complexity was observed. Arandjelovic and
ble 10, Table 11, Table 12, Table 13 and Table 14 which give a Cipolla [15] proposed an extended version of the Genetic Shape-
summary of facial features used, classification method, database Illumination Manifold (gSIM) method [14], in order to match the
and recognition rate, and main results. high-resolution images in the gallery with the low resolution
12 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

Table 7
A summary and comparison of main set-based video face recognition methods in literature published between 2008-2013.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Thomas et al. (2008) Background substraction, gestalt Weighted match scores Own database with 12,981 The combination of the
[324] clusters, temporal continuity frames from 57 subjects proposed methods increased the
face recognition performance.
Mian (2008) [241] SIFT Features Similarity Measure Honda/UCSD Database (Acc: Frame selection is a method that
99.5%) can provide high performance
for face recognition.
Kim et al. (2010) CCA (Principal Angles) Nearest Neighbor 700 Face image sets It has been observed that online
[179] learning provides lower
calculation costs and locally
orthogonal method improves
performance.
Chen et al. (2011) Multi-Region Histogram (MRH) Feature averaging (Avg-Feature), LFW (recognition rate 92.59%) Avg-Feature method yields more
[55] and LBP Mutual Subspace Method and MOBIO(Half total error rate successful and faster results than
(MSM), Manifold to Manifold 23.13%) the remaining methods when it
Distance (MMS) and Affine Hull is necessary to identify people
Method (AHM). with few images.
Hu et al. (2012) Sparse Approximated Nearest RBFNN Honda/UCSD DB(Acc:94.02%), In the experiments conducted
[147] Point (SANP) CMU MoBo DB(Acc:97.91%), with three datasets gave the
YouTube Celeb. DB (Acc:65.46%) best results compared to other
studies in that period.
Kashyap et al. Frequencies of action units (AU) The sum of absolute differences Own database, interview videos Face movements carry biometric
(2012) [168] or combinations of emotional between histograms. of 20 people, Recognition rate: data.
movements on the face. AUs of 50-55%
the speech on the lip were not
taken into account.
Huang et al. (2013) Coupling Alignments with Min, Voting, C-Voting YouTube-S2V (recognition rates CAR method outperforms the
[155] Recognition (CAR) 24.57%, 30.17%, 36.21%) and state-of-the-art methods.
COX-S2V (recognition rates
52.57%, 54.24%, 55%)

Table 8
A summary and comparison of main set-based video face recognition methods in literature published between 2014-2017.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Huang et al. (2015) [154] Projection Metric Grassmannian Graph-embedding YTF DB (Acc:70.04%) and PaSC The proposed method has led to
Learning (PML) Discriminant Analysis (GGDA) DB (Acc:43.63%) a reduction in computational
[133] costs.
Huang and Chen (2015) [151] Local Vector Pattern Feature-point Bilateral FERET Database 99.68% WLVP method was more
(LVP) with the Recognition (BR) recognition performance for Fb successful than LVP method.
weighting mechanism database, 95.23% recognition
performance for Dup1 database
Miaoli (2015) [242] RGB Histogram (FH), Bayes, FLD, MLP, Mean, SVM, XM2VTS databases. PSVM Half PSVM has reduced the effects of
DCTmod2 features, PSVM Total Error Rate 1.083% (for Lp1 low resolution images.
DCTb [270] with different feature)
Cevikalp and Serhan Yavuz Extended Polyhedral Extended Polyhedral Conic COX DB (Acc:64.00%), YouTube The test performance increased
(2017) [44] Conic Functions (EPCF) Classifier (EPCC) Celeb. DB (Acc:72.1%) by 18%.
ElSayed et al. (2017) [91] SRCNN and LBP Distance Measurement Labeled Faces in the Wild Improving the resolution of the
variations (chi-square metric) Database (LFW) (average rank-1 pictures increased the
recognition rate of 26.51% with performance of face recognition.
High-Dim LBP)
Sun et al. (2017) [314] HOG Features, CNN Deep Match Kernels (DMK) Honda/UCSD DB(Acc:98.00%), The DMK achieved the highest
Features YouTube Celeb. DB (Acc:80.3%) performance compared to the
methods used for image-set
classification.
Lu et al. (2017) [217] CNN Features Simultaneous Feature and Honda/UCSD DB(SFDL It has been observed that
Dictionary Learning (SFDL), Acc:100%,D-SFDL Acc:100%), D-SFDL method is more
Deep-SFDL (D-SFDL) MoBo DB (SFDL successful than SFDL method.
Acc:96.7%,D-SFDL Acc:98.5%),
YouTube Celeb DB (SFDL
Acc:76.7%,D-SFDL Acc:79.5%),
IJB-A DB (SFDL Acc:26.6%,D-SFDL
Acc:28.2%)
Rao et al. (2017) [281] Discriminative Fully Connected Layer YTF DB (Acc:94.28%), PaSC DB Proposed method can be
aggregation (Acc:92.06%), YTC DB integrate information from video
network(DAN) Features (Acc:97.32%) frames successfully.

probe images, which are obtained from low quality video. Another Coupling Alignments with Recognition (CAR) method, focused
method proposed by ElSayed et al. [91] used super-resolution CNN on the problem of matching a high resolution image in the gallery
(SRCNN), which first improves the image resolution and then ex- with the low resolution images captured under unconstrained en-
tracts features by a variant of LBP. Chi-square metric was used to vironments for still-to-video (S2V) face recognition [155]. In the
determine the similarity between the gallery and probe images. experimental studies conducted on Youtube-S2V [361] and COX-
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 13

Table 9
A summary and comparison of main set-based video face recognition methods in literature published between 2017-2020.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Rao et al. (2017) [282] CNN Features Fully Connected Layer YTF DB (Acc:96.52%), PaSC DB An attention-aware deep
(Acc:95.67%), YTC DB reinforcement learning is used
(Acc:97.82%) to eliminate the frames, which
are not useful.
Wang (2018) [347] 3D dynamic features Distance Measurement ORL Database (300 images from Proposed method can be
30 subjects) (precision improve the accuracy of face
probability 100%) recognition.
Wang et al. (2018) [351] Discriminant Analysis Set Classification COX DB (Acc:90.13%(max) The highest performance on four
on Riemannian (COX13)), YouTube Celeb. DB different face databases has
Manifold of Gaussian (Acc:77.09%), YTF DB been achieved.
Distributions (DARG) (Acc:73.01%), PaSC DB
(Acc:49.37%)
Ding and Tao (2018) [84] Trunk-Branch Ensemble Trunk-Branch Ensemble CNN PaSC Database (verification rate The proposed CNN network
CNN model (TBE-CNN) model (TBE-CNN) 96.12%), COX Face Database structure is more successful than
(identification rate 98.96%), the recently proposed CNN
Youtube Database (verification networks.
rate 94.96%)
Mokhayeri et al. (2019) [246] Domain-specific face SRC Classifier COX-S2V DB (pAUC:0.916, The proposed method has
synthesis (DSFS) AUPR:0.775) provided in a significant
reduction in computational
complexity.
Mokhayeri and Granger (2020) Synthetic plus SRC Classifier COX-S2V DB (pAUC:0.905, A face recognition system
[245] variational model AUPR:0.776) resistant to images obtained at
different angles has been
proposed.

Table 10
Comparison of main sequence-based video face recognition studies in literature published between 2000-2007.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Chen et al. (2001) [54] Optical flow vectors PCA, LDA Own database with 9 people, Face recognition using facial
video from 28 people, 85% motion is less sensitive to light
Recognition rate changes.
Cohn et al. (2002) [63] Frequency of face action units k-Nearest Neighbor Own database, Videos containing Personalized facial expressions
(AU) natural disgust, humiliation and are stable over time and can be
smile recorded from 85 people, used as biometric data.
50% Recognition rate
Liu and Cheng (2003) [209] Temporal Dynamics Hidden Markov Model Own database, 4 video sequence The proposed method has been
from 21 subjects, 1.2% Error rate more successful than
image-based methods.
Aggarwal et al. (2004) [4] Spatio-temporal method ARMA Honda/UCSD dataset Recognition The ARMA algorithm has
rate more than 79% achieved 90% performance in
video based face recognition.
Hadid and Pietikainen Spatio-temporal method HMM and ARMA MoBo, 93.4% (ARMA) Combining facial dynamics with
(2005) [125] Honda/UCSD, 91.2% (HMM) appearance features does not
(Note: more head movement in systematically give better results.
these databases)
Saeed et al. (2006) [292] Geometric features of head and GMM Own database, 130 videos of 9 Head and mouth movements can
mouth movements TV speakers, 97% Recognition be used for person recognition.
rate
Saeed et al. (2006) [292] Head, mouth and eye GMM Own database with 144 videos Facial expressions (especially lip
movements of 9 TV speakers, recognition movements) are useful for
rate 97.75% (with PCA and lip person recognition.
movements)

S2V [153] databases, the accuracy was approximately 70%. Re- performed using a commercial 2D face recognition system. In the
cently, in [103] a different approach has been used for cross- experimental studies, it was observed that the use of 3D models
resolution face recognition. First, discriminative features, which are increased the match ratio by 40%. Liu and Chen [208] proposed
robust to pose variations are learned in low-resolution and high- an approach for face recognition, which uses the facial appearance
resolution spaces using multilayer locality-constrained structural and facial geometry to create face mosaicing. A 3D ellipsoid model
orthogonal Procrustes regression. Then, recognition is performed was created by using local regions from different images, and these
using these resolution-robust features. 3D ellipsoid images were used for classification.

4.1.2. 3D modeling methods 4.1.3. Manifold methods


Methods that use 3D face modeling have been proposed to over- Different from super-resolution and 3D face modeling methods,
come the difficulties of face surveillance images/videos, which are manifold methods try to model the face subspaces directly, with-
generally low resolution, have poor contrast and non-frontal. In a out paying attention to the underlying physical image formation
study by Park and Jain [261], a 3D face model was created from process [16,127,200]. In [177], an approach based on canonical cor-
multiple non-frontal frames in a video, and person recognition was relations of linear subspaces was proposed for comparing image
14 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

Table 11
Comparison of main sequence-based video face recognition studies in literature published between 2007-2008.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Matta and Dugelay (2007) Head movements GMM Own database, 192 videos from Combining physiological and
[233] 12 people, TV speakers, behavioral information increases
recognition rate 92.71% with the recognition rate.
temporal features
Tulyakov et al. (2007) [331] Motion vector of the 1-NN (Euclidean distance) CK database videos (including 9 The movements of the
characteristic points between happiness and 13 sadness characteristic points carry
the peak of the emotion and the videos) - Big Brother 3 (own information for face recognition.
neutral expression + PCA database), 46 sadness and 8
happiness videos, 0.4% Equal
error rate (EER)
Faraj and Bigun (2007) [96] Quantification of optical flow GMM XM2VTS database (Recognition The changes in lip movements
vectors around the lips rate 98%) while the person is talking have
originated by speech characteristic features about the
person.
Al-Jawhar et al. (2008) [7] Wavelet subbands, Optical Flow, Similarity measurement FERET Database from 157 The structure created with ICA
PCA and ICA people, Recognition rate 73.24% was more successful than the
with PCA and 90.45% with ICA structure created with PCA.
Ning and Sim (2008) [252] Dense optical flow fields kNN Own smile video database (341 Laughing has characteristic
video from 10 subjects) information about identity.

Table 12
Comparison of main sequence-based face recognition studies in literature published between 2009-2011.

Author and Used Methods for Classification algorithm Database and face Main results
year facial features recognition rate
Hadid and Pietikainen Extended volume LBP 1-NN, Compared Methods: PCA, MoBo database (recognition rate Combining face and motion
(2009) [126] (EVLBP), feature LDA, LBP, HMM, ARMA 97.9%), [65] (recognition rate information on the face is
selection with 98.5%), Honda/UCSD (recognition beneficial for face and gender
AdaBoost. rate 96%) recognition.
Tistarelli et al. (2009) [326] Pixels in the face area Pseudo Hierarchical HMM Own database, videos from 21 Fusion modeling gives better
subjects, (Recognition rate 100%) results than static modeling.
Paleari et al. (2009) [259] 14 distances from 12 GMM eNTERFACE [225] (1300 videos Facial expressions carry
characteristic points on reflecting 6 basic emotions of 44 biometric information.
the face people) (recognition rate 40% for
all emotions)
Hsieh et al. (2009) [142] Combination of Similarity measurement BU-3DFE Database [390] (2400 The proposed method is suitable
constrained optical images from 100 subjects), for face recognition independent
flow and synthesized (Average recognition rate of facial expressions.
image 94.44%)
Tsai et al. (2009) [329] 17 Euclidean distances PCA and LDA JAFFE Database [219] Facial expressions carry
from the characteristic (Recognition rate 80%,) biometric data. The recognition
points on the face for CMU-AMP Database (975 images rate increases when combined
the image at the peak from 13 people), JAFFE + with the features of the
of the emotion CMU-AMP (Recognition rate appearance.
65%)
Hadid et al. (2011) [124] VLBP + Adaboost SVM CRIM database, Recognition rate Face movements are useful for
98.1% face recognition as well as
gender estimation age
estimation ethnicity estimation
Zafeiriou and Pantic (2011) Change/deformation Distance measurement Own DB (563 spontaneous During smile/laugh, feature
[392] occurring during smiles/laughter episodes, 849 biometric data is revealed.
spontaneous speech utterances, 51 posed
smile/laugh + PCA/LDA laughs, 67 speech-laughs and
167 other human noises)

sets. In [179], a method based on maximizing the orthogonality has been observed that video-based face recognition methods are
of subspaces, which represent different classes. A continuous im- more successful than image-based ones in harsh conditions. In ad-
provement in recognition performance is provided by using online dition, the Avg-Feature method yields more successful and faster
learning. results than the other methods when it is necessary to identify
There are also set-based face recognition methods that focus faces with few images.
on solving the face recognition problem from low-resolution im- In [216], multiple order statistics of image set were used as
ages, which are obtained from surveillance cameras. In a study by features and an adaptive weight multiple kernel learning algo-
Chen et al. [55], a comparison of face recognition systems using rithm is proposed. Wang et al. [350,351], proposed the Discrim-
low-resolution images taken from Close Circuit TeleVision (CCTV) inant Analysis on Riemannian Manifold of Gaussian Distributions
systems was performed. This benchmarking study was carried out (DARG) method to perform facial recognition using image sets.
by comparing four different methods on the image-based face The aim of this study is to provide better classification by deter-
database (LFW) and video-based face database (MOBIO) [224]: i) mining the basic data distribution in each class. For this purpose,
Feature averaging (Avg-Feature) [160], ii) Mutual subspace method discriminative Gaussian components in different classes were de-
(MSM) [374], iii) Manifold to manifold distance (MMS) [349] and termined in the image sets as Gaussian Mixture Model (GMM). In
iv) Affine hull method (AHM) [45]. In the experimental results, it a study by [154], a method which learns the projection metric fea-
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 15

Table 13
Comparison of main sequence-based face recognition studies in literature published between 2012-2016.

Author and Used methods for Classification algorithm Database and face Main results
year facial features recognition rate
Mendez-Vazquez et al. Volume Structured SVM, AdaBoost Youtube Face Database Face recognition using local
(2013) [238] Ordinal Features (Recognition rate 67.76% with discriminative information
SVM, 79.72% with AdaBoost) achieves more successful results.
Chen (2014) [53] LBP variation Dual Linear Regression PIE DB (Error rate 20.59%), LFW DLRC is an appropriate method
Classification (DLRC) DB (Error rate 1.61%), for classification.
Honda/UCSD DB (Recog. rate
92.31%), CMU Mobo DB (Recog.
rate 91.60%), Youtube DB (Recog.
rate 66.18%)
Franco et al. (2014) [98] Spatio-temporal key Temporal Key points Matching MoBo DB (Recog. rate 95.92%), Spatio-temporal key points can
points, Hough Honda/UCSD DB (Recog. rate be used for face recognition.
Transform 100%)
Gavrilescu (2016) [105] Action units (AU) and Artificial neural network (for Own DB (videos recorded from The use of facial movements
amplitude values on facial movements) + PCA-FR (for 64 people watching 4 emotions), increases face recognition rates
the face, distances appearance) Honda/UCSD DB (Recog. rate and reduces the success of
between characteristic 94.5%), YTF DB (Recognition rate misleading attempts.
points 93%)
Shreve et al. (2016) [305] Face action units (AU) Histogram similarity and DTW Own database (96 people Person recognition is possible
and amplitudes (dynamic time warping) recorded when interacting with with facial movements.
the tablet), 62% (rank-1
recognition rate)
Kim et al. (2016) [175] CNN features and 3DCNN UVA-NEMO smile [81] database When the spatio-temporal
human attributes (Error rate 1.7%) feature and human attributes
were used together, the system
gave more successful results.

Table 14
Comparison of main sequence-based face recognition studies in literature published between 2017-2018.

Author and Used Methods for Classification algorithm Database and face Main results
year facial features recognition rate
Haque et al. FACS Artificial Neural Network PAINFUL DATA: The The frame of the person suffered
(2017) [132] UNBC-McMaster Shoulder Pain were not always visually
Expression Archieve Database distinctive with the frame of
(Recog. rate 87.42%) they did not suffer.
Hajati et al. (2017) Spatio-temporal information Derivative Sparse Representation Honda/UCSD DB (Recog. rate DSR was more successful than
[129] (Dynamic texture) (DSR) 96.31%), CMU MoBo DB (Recog. other methods for short-length
rate 83.69%), Youtube DB (Recog. videos.
rate 90.45%) (when dynamics
texture length is equal to 10
frames)
Haamer et al. CNN features and geometric Support Vector Machine (SVM). Own database (630 videos from The experimental results showed
(2018) [122] features 61 people) (Recog. rate 96.2%) that the transition frames
outperform the peak emotion
frames in face recognition.

ture was directly on Grassmann manifold instead of Hilbert space 4.1.4. Frame-selection-based methods
for video-based facial verification was proposed. The aim of this In frame selection methods for face recognition, the goal is to se-
study was to avoid the high computational costs encountered in lect the most informative and diverse subset of the images/frames
kernel-based methods. Cevikalp and Serhan Yavuz [44] proposed a in a face video or set of images [241]. A two-stage face recogni-
tion method was proposed by Huang and Chen [151]. In the first
new method for calculating the distance between gallery images
stage, a new feature extraction approach called Local Vector Pat-
and the convex-hull of query images for large-scale face recogni-
tern (LVP) [95] was used with a weighting mechanism to calculate
tion applications using image sets. In addition, a new polyhedral the distance between the probe images and the enrolled images,
conic classifier (Extended Polyhedral Conic Classifier (EPCC)) was and M candidates are determined from the enrolled pictures. In
proposed. In another work [314], deep match kernels (DMK) were the second stage, the final classification process is carried out by
proposed for image-set classification. DMKs were recommended to using the feature-point Bilateral Recognition (BR) approach. Stal-
overcome the challenges of high inter-class ambiguity and high lkamp et al. [313] performed a study to create a face recognition
intra-class variability in image sets. Lu et al. [217] proposed si- system from videos recorded at the entrance of a laboratory for
multaneous feature and dictionary learning (SFDL) and deep-SFDL six months. Because images were obtained from an uncontrolled
(D-SFDL) methods for image-set based face recognition under chal- environment, many challenging situations were encountered. In
the study, which used distance to model (DTM) and distance-to-
lenging conditions (different pose, illumination, expression, etc.).
second-closest (DT2ND) weight functions, the K-Nearest Neighbor
The D-SFDL method was proposed to improve the recognition
(KNN) and Gaussian Mixture Model (GMM) approaches, it was ob-
performance by overcoming the difficulties caused by the non- served that the combination of DTM and DT2ND method gave the
linearity between image samples. In the experiments conducted on most successful result. Thomas et al. [324] proposed a method that
five different face data sets, it was observed that SFDL and D-SFDL used background subtraction and gestalt clusters to improve the
achieved high performance for image-set based face recognition. face detection performance in difficult images. Weighting match
16 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

scores based on the results of previously seen frames were used. captured using facial landmarks, action units (AU) or optical flow
Although the data used in the study are real-world data, it has vectors.
been observed that the performance has been increased by com- Cohn et al. [63] was one of the first studies showing that facial
bining the methods. dynamics can be used for person recognition. First, facial expres-
sions during solitary viewing of films were analyzed and the rate
4.1.5. Frame-aggregation-based methods of positive expressions in the responses of people to a video they
Recently, deep-learning based frame aggregation methods have watched were used. Then, facial expression during two-person in-
been utilized for face recognition from video. In [281], an aggrega- terviews is analyzed, which showed that the frequency of occur-
tion network is learned to synthesize discriminative images from rence of some facial action units on the face can be used for per-
video using metric learning and adversarial learning. In [375], fea- son recognition. Facial expression was measured using convergent
tures of video frames are first extracted using CNNs and then measures, including facial EMG, automatic feature-point tracking,
the features are aggregated using attention blocks. In [282], an and manual FACS. Chen et al. [54] proposed a method for face
attention-aware deep reinforcement learning is used to the elimi- recognition using a high dimensional vector obtained from a se-
nate frames, which are not useful based on the information from quence of motion flow fields. This feature vector includes mainly
the image space and the feature space. The experimental results temporal information of the face. It has been observed in experi-
on Youtube Face, Point-and-Shoot Challenge and Youtube datasets mental studies that this method can give successful results under
show competitive performance. difficult conditions.
Sparse representations have also been employed for image-set Studies in the literature indicate that emotional expressions on
based face recognition. In [147], a Sparse Approximated Near- the face are independent of age and therefore remain constant over
est Point (SANP) method was proposed in order to calculate the the years and are less sensitive to light changes and other appear-
between-set distance, which used an accelerated proximal gradi- ance changes (beard, glasses, makeup, etc.) [259,234]. Paleari et
ent method for optimization. In 2015, a study was conducted by al. Paleari et al. [259] conducted experiments on the eNTERFACE
Huang et al. [152], which compared the set-based face recognition [225] dataset containing 1300 videos of 44 people. Videos in this
methods using the COX face database. Also, point-to-set correlation data set contain short sentences that reflect the six basic emo-
learning (PSCL) method was proposed for video-to-still (V2S)/S2V tions (anger, fear, disgust, happiness, sadness and astonishment).
face recognition. In a recent work [246], a domain-specific face Using the distances between the characteristic points on the face,
synthesis method has been proposed for face recognition. A small 14 attributes were obtained and the averages of these attributes on
representative subset of face images is selected for 3D model re- the video were normalized and used for GMM-based classification.
construction, which is then used for designing a discriminative dic- Considering the number of instances in the ENTERFACE database,
tionary for SRC classifier. In another study conducted by Mokhayeri the average of 1-best recognition accuracy is 16 times better than
and Granger [245], a model called synthetic plus variational model, random; In the worst case, the system is performed 7 times better
in which a common probe picture was created by using the vari- than random.
ational dictionary and the gallery dictionary, using synthetic facial Matta and Dugelay used the behavioral approach for face recog-
images augmented with different exposure angles, was proposed. nition in some studies [232,292,233]. Saeed et al. [292] proposed
Recently, trunk-branch ensemble deep CNN (TBE-CNN) model a new person recognition system based on temporal signals from
has been proposed by Ding and Tao [84] to solve the illumination rigid head displacements and non-rigid mouth movements. Gaus-
and low-resolution problems, when face recognition is performed sian Mixture Model (GMM) approach and Bayes classifier were
using images obtained from surveillance cameras. In order to in- used as classifiers. In this study, a data set consisting of 130 videos
crease the performance of the network, artificially blurred images from 9 different people was used and the identification rate was
were given to the training and improved triplet loss function was estimated as 97%. One year later, a multi-modal system [233] was
proposed. In experimental studies performed with Point-to-shoot proposed, in which behavioral information and physiological in-
camera(PaSC) [29], COX Face and Youtube Face Database, 96.12% formation were used together by Matta and Dugelay. In order
verification rate, 98.96% identification rate and 94.96% verification to obtain the behavioral information, statistical features obtained
rate were obtained respectively. from the displacement of the head were used. The physiologi-
In summary, in set-based methods, three video data sets were cal information was obtained using the probabilistic extension of
commonly used for comparison purposes. These data sets are: the Eigenface approach. Classification was done using the Gaus-
Youtube Face DB, Youtube Celeb DB and PaSC DB. Although the sian Mixture Model and the Bayes classifier. This work showed that
face identification performance on these datasets was around 80% combining physiological and behavioral information increased the
at the end of 2015, it has been observed that identification per- identification rate. The face dynamics recognition method [347]
formance has exceeded 95% with the use of deep neural networks used adaptive template matching for segmentation and three di-
since 2017 [281,282,84]. mensional dynamic scanning method for extracting 3D dynamic
features from the pupil and the eyelid.
4.2. Sequence-based face recognition A method which tries to combine the appearance attributes of
the face with emotional facial expressions was proposed by Tsai
Sequence-based methods employ the temporal information that et al. [329]. The attributes of the appearance were obtained by
exists in a video, and hence the order of frames is important. using PCA and the facial expressions were obtained from the 17
Sequence-based methods can be grouped as temporal methods and distances between the characteristic points on the face. Experi-
spatio-temporal methods. Temporal methods use the facial dynam- ments on JAFFE and CMU-AMP data sets have shown that both
ics information separately from the texture information, whereas facial appearance attributes and facial expression attributes can be
spatio-temporal methods model the texture and the motion infor- used for biometric recognition. The results were also confirmed
mation together. by confidence interval analysis. In another fusion work, Saeed and
Dugelay [291] created a new biometric system by combining eye
4.2.1. Temporal methods dynamics with other biometric dynamics. Firstly, the global and
Studies about recognition of people from facial movements first local features obtained in the system were merged and then clas-
appeared in the beginning of the 2000s and have attracted the sified by using a Bayesian Classifier. The identification rate of the
attention of many researchers [326,124]. Facial movements can be system was 97.75% with the help of PCA and mouth dynamics.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 17

Kashyap et al. [168] used facial asymmetry for face recogni- 4.2.2. Spatio-temporal methods
tion. They used the frequency of facial action units’ (AU) related to Spatio-temporal methods for face recognition from videos [209,
facial movements and facial asymmetry as soft biometric data. Ex- 4] utilize the motion and texture information together. A method
perimental results obtained on a data set consisting of interviews using adaptive Hidden Markov Models (HMM) [275] was proposed
indicate that facial movements and facial asymmetry can be used by [209] to learn the temporal dynamics from video sequences of
as behavioral biometric data and a combination of facial asymme- each subject. The study by Tistarelli et al. [326] presents a dy-
try and facial action units were more successful than using the namic facial model based on Hidden Markov Models to capture
two methods separately. In another work, the facial action units both facial appearance and facial dynamics. The classification of
(AUs) exhibited by a person were used as distinctive features when the different emotions in facial expressions was performed by an
performing a task on a tablet computer in a semi-restricted en- unsupervised clustering method. In the experimental studies, data
vironment [305]. AUs were measured from videos of 96 different of 21 subjects were used and the proposed pseudo hierarchical
participants in a show-like quiz game that included a reward. They HMM method achieved 100% accuracy. In the study by Aggarwal
et al. [4], the autoregressive moving average (ARMA) model was
proposed a method that took advantage of the activation proper-
used, which provided face recognition from video under differ-
ties and the temporal dynamics of facial behavior.
ent poses and expressions. However, these methods use holistic
The display of the velocity field caused by the relative move-
information on the face and do not take into account local charac-
ment between an object and the camera is called the optical flow.
teristics. In order to eliminate these disadvantages, a method was
Optical flow, which is one of the methods used in motion anal-
proposed, which used local information in facial videos and selects
ysis in the video, is also used for face spoofing detection, facial
only face recognition aids in facial dynamics [128,126].
expression detection, and face asymmetry measurement as well as
In the study by Hadid et al. [128], the Extended Volume Local
face recognition. Face spoofing is an important problem for face
Binary Patterns (EVLBP) was proposed and the most discriminat-
recognition, which is mostly done using photography. A liveness ing EVLBP attributes were selected with the AdaBoost learning
detection method, which used the lighting differences between 2D algorithm. The face video was subdivided into local rectangular
object movement and 3D object movement in optical flow, was prisms to extract local characteristics. The characteristic points on
proposed by Bao et al. [25]. Chen et al. [51] proposed a method the face were not utilized when extracting local attributes. In this
of face asymmetry measurement based on optical flow, which was study, three different public video face databases (Motion of Body
used for face recognition and face image beautification. (MoBo), CRIM, and Honda / UCSD) were used and five methods
In the study conducted by Faraj and Bigun [96], it was revealed (PCA, LDA, LBP, HMMs, and ARMA) were used for comparison.
that lip movements during the speech in addition to voice can be The proposed method was found to be more successful than the
used for person recognition. In this study, quantification of opti- other benchmark methods and it achieved 97.9%, 96% and 98.5%
cal flow vectors around the lips originated by speech was used for test performance on MoBo, Honda / UCSD, and CRIM data sets,
feature extraction. It was concluded that the lip movements caused respectively. One of the important results of this study was that
by the speech carry biometric information. In another study by Al- some of the facial movements (intra-personal) are not useful for
Jawhar et al. [7], a method using wavelets, optical flow and PCA face recognition, but may be useful when the attribute is selected
was proposed. First, wavelet transform and PCA were applied to to model inter-personal differences. Mendez-Vazquez et al. [238]
the reference image and the test image. Then, optical flow residue have proposed a new spatio-temporal descriptor based on struc-
image was obtained and recognition was done by comparing the tured ordinal features to capture local discriminative information,
residue images with the same emotions and different emotions. which is overlooked in most spatio-temporal methods. In the ex-
Ning and Sim conducted a study to investigate whether smile dy- periments on the Youtube Face Database, it was observed that the
namics conveyed information about the identity [252]. Dense op- proposed method was more successful than the LBP variants.
tical flow fields were calculated and features were extracted. In A method of face recognition using the spatio-temporal dynam-
experimental studies using videos of 10 subjects, it was observed ics in eyes was proposed by Vinette et al. [337]. Bubbles procedure
that smiles have characteristic information about identity. Hsieh [113] was used to examine the information in the first part of
et al. [142,143] proposed a method, which combined computed the video. In a later study [125], spatio-temporal methods (HMM
intra-person optical flow with synthesized face images in a proba- and ARMA) and image-based methods were compared for differ-
ent video lengths and resolutions on datasets containing facial
bilistic framework in order to create more robust face recognition
expressions and head movements. Spatio-temporal methods were
systems against facial expressions.
found to have worse results for shorter videos and better results at
Recently, neural networks have become a part of fusion meth-
lower resolutions. These results support the conclusion that facial
ods since they are used for both feature extraction and classifica-
dynamics in the aforementioned low resolution cases contribute
tion. In a study by Gavrilescu [105], a method was proposed using
more to the performance of the recognition system. However, it
individual differences in facial expressions in order to strengthen
was also concluded that the combination of facial appearance and
face recognition systems in misleading situations. In this study, the
facial dynamics do not always have a positive effect. It is stated
face is analyzed in four regions (eyebrows, eyes, mouth, cheeks) in
that this result may be due to head movement rather than facial
order to create individual differences of facial expressions for use expression in the data sets used. Hadid et al. [124] investigated the
with the standard PCA-based face recognition algorithm. Then, a spatio-temporal features obtained from the head and facial parts as
facial expression behavior map was created by using these regions an adjunct to the LBP-based methods.
and recognition was performed by using artificial neural networks. In a study performed by Zafeiriou and Pantic [392], it was
Finally, the standard PCA-based face recognition and individual fa- shown that the change/ deformation occurring during a sponta-
cial expression recognition processes were combined. When the neous smile/laugh was a biometric feature. During spontaneous
combined method was applied to the Honda/UCSD and Youtube laughing, the moment when the change/deformation was maxi-
Face databases, the test performance was 94.5% and 92.9%, re- mum (apex) was used for feature extraction. In the experiments,
spectively. Another recent work extracts facial dynamics features 563 spontaneous laugh videos collected from 22 subjects were
from smile videos for face recognition [321]. The extracted fea- used. Chen [53] proposed Dual Linear Regression Classification
tures utilize statistical geometric characteristics of the face during (DLRC) method in order to solve the problem of not having spatio-
the onset, apex, and offset phases of the emotional expression. temporal connection between multiple images in cases where mul-
18 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

tiple images are present in both the gallery and probe sets for each vided. Image-based and video-based face recognition methods are
subject. In experimental studies on the Youtube Face Database, reviewed in detail, and tables are used to compare different meth-
DLRC algorithm was the fastest algorithm, although the face recog- ods.
nition performance rate was not very different from the compared The first studies on face recognition mainly utilize images taken
methods at that time. in controlled environments. Since the performance for face recog-
Franco et al. [98] proposed a video-based face recognition sys- nition methods using images may be limited, researchers thought
tem using spatio-temporal key points. In the proposed method, key that using temporal or spatio-temporal information obtained from
points were analyzed in a temporal window and key points in the videos would increase the accuracy of face recognition. As a
fixed positions and scales were selected using Hough Transform. result of this, between 2000 and 2010 facial several video data
Then, a template was created using the spatio-temporal descriptor sets were collected. In 2010, the success of deep neural networks
computation and binary representation computation. In the exper- for object recognition attracted the attention of researchers and
imental studies with MoBo and Honda dataset, 100% recognition deep neural networks began to be applied to problems in many
performance was achieved. areas. The use of Deep Neural networks for face recognition can be
Recently, deep CNNs have been used for feature extraction and considered a milestone for face recognition. Facial recognition sys-
classification for sequence-based face recognition. Kim et al. [175] tems using Deep Neural Networks have achieved over 99% accuracy
designed a 3D convolutional neural network (3DCNN) for spatio- even when very large face data sets collected in the wild are used.
temporal representation obtained from facial motion and appear- On the other hand, several recent studies after 2018 [346,115,258]
ance. In order to train the network with a small number of images, showed that the performance of face recognition systems using
some human attributes were used. In the experimental studies, deep neural networks decrease when face images collected under
it was observed that 3DCNN, which was designed using human adverse conditions are used such as images with low resolution,
attributes, was more successful than 3DCNN without human at- severe illumination variations, blur, and noise, which are also re-
tributes. ferred to as semantic adversarial attacks [244]. Hence, research
In the study conducted by Haque et al. [132], a biometric per- efforts towards making deep learning based methods more ro-
son recognition system was proposed using the features obtained bust under adverse conditions is needed. Methods for verifying the
from the pain expression model. The pain database, which was robustness of the deep learning models against semantic perturba-
collected by [218] was used. The database includes face videos of tions are also emerging [244].
participants suffering from shoulder pain and performing a series Video-based face recognition methods are more successful un-
of active and passive motion tests. FACS was used for feature ex- der challenging conditions as compared to image-based face recog-
traction and ANN was used for classification. nition approaches since behavioral features obtained from facial
One of the problems encountered in video-based face recogni- dynamics also can be used as auxiliary features and have a pos-
tion algorithms is that videos can be very short-length. In order itive effect on the recognition rate. Moreover, facial dynamics are
to solve this problem, Hajati et al. [129] have proposed a new less sensitive to illumination and other appearance changes (beard,
derivative sparse representation approach for face recognition in glasses, makeup, aging etc.). However, using only the facial dynam-
short-length videos. In the experiments conducted with four dif- ics can not achieve sufficient performance for person recognition
ferent databases, it was observed that proposed method was more and not every feature obtained from the face/head dynamics has
successful than other methods for short-length videos. been shown to be useful for person recognition. Therefore, de-
In a study by [122], a person recognition system was proposed composition and utilization of the identity-related facial dynamics
using transition frames of emotions for dynamic feature extraction. information is a future direction for research.
A fine-tuned VGG-Face CNN [262] was used and geometric features It is foreseen that studies related to improving facial recognition
were obtained from facial landmark points. The study focuses on systems will include the following concepts in the future:
two different approaches. In the first approach, a Long-Short Term Image enhancement: The goal here is to apply super resolution
Memory (LSTM) network [141] was trained by using the features algorithms or 3-D image generation algorithms to low resolution
obtained from the CNN and the geometric features obtained from face images in order to increase the performance of face recog-
the facial landmarks. In the second approach, an LSTM is indepen- nition systems. Since there is a large volume of data obtained
dently trained by using the features of the CNN and the geometric from security cameras, face recognition using low-resolution im-
features obtained from the facial landmarks and then used with ages is an important and challenging research problem to inves-
an SVM. As a result of the study, it was observed that the system, tigate [223,229,103]. New algorithms for resolution-robust feature
which used SVM was more successful than the others. extraction methods are needed for reducing the gap between low-
In summary, when the results of face recognition using sequ- resolution and high-resolution images.
ence-based methods were examined, it was observed that Franco
et al. [98] achieved a 100% accuracy on Honda UCSD data set and Loss functions: Since the performance of face recognition sys-
95.92% accuracy on MoBo DB using spatio-temporal key points tems using deep neural networks decrease under adverse condi-
and Hough Transforms. Moreover, CNN-based algorithms have ap- tions, another research direction is to increase the performance of
peared among the sequence-based methods since 2016 and it these systems by utilizing new loss functions [346,343,119]. Cur-
has been observed that face recognition accuracy has increased rently, there are approximately 20 different loss functions used
[175,122]. in the literature for deep-learning based face recognition and face
anti-spoofing systems. It is predicted that the number of loss func-
5. Conclusions and future work tions used in face recognition systems will increase in the future.
Data set design: In order to improve the robustness of face
In this survey, the vast literature on face recognition is reviewed recognition using deep neural networks, images with different il-
and the main experimental results using different databases are lumination, pose and noise effects could be used during train-
provided. In the first part of the survey, general information about ing [228,84]. Since it is very difficult to obtain large annotated
face recognition systems and their development throughout the databases, another approach could be to decrease the number of
history is given. In the second section, a taxonomy of facial recog- training images using active learning and similar approaches [198].
nition methods and a summary of popular facial data sets used Another interesting research direction would be to apply curricu-
for the training and testing facial recognition systems are pro- lum learning which uses the training data set starting from easier
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 19

images for better generalization performance [28,38]. New multi- Poses, OC: Occlusions, TI: Recording Times, AC: Accessories, ET:
modal video-based datasets are also needed, which contain various Ethnicities, CMC: Cumulative Match Characteristic, PCA: Principal
facial expressions of the same person to recognize the identity Component Analysis, 2DPCA: Two Dimensional Principal Compo-
from facial dynamics information only [321]. Multi-modal datasets, nent Analysis, LDA: Linear Discriminant Analysis, SVDU-IPCA: Sin-
which include RGB, depth and infrared data as well as 3D masks gular Value Decomposition Updatingbased on Incremental Princi-
are also important for face anti-spoofing research [201]. pal Component Analysis, DiaPCA: Diagonal Principal Component
Soft biometrics: Soft biometric data can be extracted from Analysis, ICA: Independent Component Analysis, IGFs: Independent
facial dynamics using the spatio-temporal information in facial Gabor features, PRM: Probabilistic Reasoning Model, ORL: Olivetti
videos. Although research results show that soft biometric features Research Laboratory, SVM: Support Vector Machine, KPCA: Ker-
alone are not sufficient for face recognition, they can be used to- nel Principal Component Analysis, LLDA: Locally Linear Discrim-
gether with appearance-based methods to increase the face recog- inant Analysis, KLDA: Kernel Linear Discriminant Analysis, LLE:
nition accuracy of the system under adverse conditions [122,321]. Locally Linear Embedding, DNN: Deep Neural Networks, CNN: Con-
It has been recently shown that face authentication with high ac- volutional Neural Network, 2D: 2-Dimensional, 3D: 3-Dimensional,
curacy is possible using facial dynamics of smile expression [176]. GAN: Generative Adversarial Networks, SAE: Stacked Autoencoders,
Investigating whether the facial dynamics of other emotional ex- SRC: Sparse Representation-based Classifier, AAM: Active Appear-
pressions (e.g. anger, sadness, surprise, disgust, fear) carry identity ance Model, NNC: Nearest Neighbor Classifier, EBGM: Elastic Bunch
information is an interesting direction of research. It may also Graph Matching, FRVT: Face Recognition Vendor Test, HOG: His-
be interesting to investigate the use of facial dynamics for face togram of Oriented Gradients, Co-HOG: Co-occurrence of Oriented
Gradients, SIFT: Scale-invariant Feature Transform, LGOBP: Local
anti-spoofing, which also requires collecting new datasets, as men-
Gradient Orientation Binary Pattern, GSEE: Generalized Survival
tioned above.
Exponential Entropy, MLBP: Multivariate Local Binary Patterns, CS-
Face anti-spoofing: Although face recognition systems, which LBP: Center Symmetric Local Binary Patterns, LDB: Local Difference
utilize deep neural networks have shown to exceed human per- Binary, gSIM: Genetic Shape-Illumination Manifold, SRCNN: Super-
formance in various scenarios, it has been observed that deep Resolution Convolutional Neural Network, CAR: Coupling Align-
learning networks are more easily deceived than humans. Deep- ments with Recognition, Avg-Feature: Feature averaging, MSM:
fake challenge has been recently organized [69] to encourage the Mutual subspace method, MMS: Manifold to manifold distance,
research on development of more robust deep networks. There- AHM: Affine hull method, GMM: Gaussian Mixture Model, DARG:
fore, making face recognition more robust to spoofing attacks is Riemannian Manifold of Gaussian Distributions, EPCC: Extended
an interesting research direction. Recently, many deep neural net- Polyhedral Conic Classifier, DMK: Deep MAtch Kernels, SFDL: Si-
work based methods have been proposed for face anti-spoofing multaneous Feature and Dictionary Learning, D-SFDL: Deep Simul-
[248,212,386], which have shown successful performance against taneous Feature and Dictionary Learning, LVP: Local Vector Pat-
various types of spoofing attacks. In the future, it is expected that tern, KNN: K-Nearest Neighbor, SANP: Sparse Approximated Near-
zero-shot face anti-spoofing approaches will be needed since new est Point, V2S: Video-to-still, S2V: Still-to-Video, PSCL: Point-to-Set
types of spoofing attacks are being created. Improving and validat- Correlation Learning, TBE-CNN: Trunk-Branch Ensemble Convolu-
ing the robustness of deep neural networks against adversarial and tional Neural Network, PaSC: Point-to-Shoot Camera, AU: Action
semantic attacks is also an active research area. Units, ARMA: Auto-regressive Moving Average, EVLBP: Extended
Multi-modal and cross-modal face recognition: In this survey, Volume Local Binary Patterns, MoBo: Motion of Body, DLRC: Dual
we focused on image and video-based face recognition using the Linear Regression Classification, ANN: Artificial Neural Network,
visual (RGB) modality, since it is the most widely used and cost- LSTM: Long-Short Term Memory.
effective way for capturing face information. However, we would
like to mention that face recognition using other modalities (3D, CRediT authorship contribution statement
near infrared, thermal infrared, sketches) and multi-modal face
recognition and face anti-spoofing are active and interesting re- Murat Taskiran: Writing - Original Draft, Writing - Review &
search areas [414]. Heterogeneous face recognition, which tries to Editing. Nihan Kahraman: Writing - Review & Editing, Supervisor.
match face images acquired using different modalities has been Cigdem Eroglu Erdem: Conceptualization, Writing - Review & Edit-
attracting the attention of researchers [264]. For example, match- ing, Supervisor, Project administration, Funding acquisition.
ing face sketches to photos is an important problem for forensic
security. Synthesizing a face photo from a sketch and vice-versa Declaration of competing interest
are related interesting and challenging problems to investigate
[416,399,48,391]. The authors declare that they have no known competing finan-
The ultimate goal of all these academic studies is to develop cial interests or personal relationships that could have appeared to
an automated face recognition system that can reproduce/surpass influence the work reported in this paper.
the human vision system. This aim can be achieved by mutual and
coordinated studies between computer-vision researchers and neu- References
roscientists.
[1] A. Albiol, D. Monzo, A. Martin, J. Sastre, A. Albiol, Face recognition using HOG-
EBGM, Pattern Recognit. Lett. 29 (2008) 1537–1543, https://doi.org/10.1016/j.
Abbreviations patrec.2008.03.017.
[2] A.K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE
Trans. Pattern Anal. Mach. Intell. 22 (2000) 4–37, https://doi.org/10.1109/34.
LBP: Local Binary Patterns, FR: Face Recognition, R-CNN: Re-
824819.
gion with Convolutional Neural Network, SSD: Single Shot Detec- [3] A.F. Abate, M. Nappi, D. Riccio, G. Sabatino, 2d and 3d face recognition: a
tor, CLM: Constrained Local Model, FMR: False MAtch Rate, FAR: survey, Pattern Recognit. Lett. 28 (2007) 1885–1906.
False Accept Rate, FNMR: False Non-Match Rate, FRR: False Re- [4] G. Aggarwal, A.K.R. Chowdhury, R. Chellappa, A system identification ap-
ject Rate, GAR: Geniune Accept Rate, TAR: True Acceptance Rate, proach for video-based face recognition, in: International Conference on Pat-
tern Recognition, Cambridge, UK, 2004, pp. 175–178.
EER: Equal Error Rate, ROC: Receiver Operating Characteristics, AU- [5] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over-
ROC: Area under the Receiver Operating Characteristics, V: Various, complete dictionaries for sparse representation, IEEE Trans. Signal Process. 54
N: No, Y: Yes, FE: Facial Expression, IL: Illuminations, PO: Head (2006) 4311–4322, https://doi.org/10.1109/TSP.2006.881199.
20 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

[6] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary pat- [31] M. Bodini, A review of facial landmark extraction in 2d images and videos
terns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. using deep learning, Big Data Cogn. Comput. 3 (2019) 14.
28 (2006) 2037–2041. [32] H. Bon-Woo, H. Byun, R. Myoung-Cheol, L. Seong-Whan, Performance evalu-
[7] W.A. Al-Jawhar, A.M. Mansour, Z.M. Kuraz, Multi technique face recogni- ation of face recognition algorithms on the Asian face database, KFDB, in: J.
tion using PCA/ICA with wavelet and optical flow, in: International Multi- Kittler, M.S. Nixon (Eds.), Audio- and Video-Based Biometric Person Authenti-
Conference on Systems, Signals and Devices, 2008. cation, Springer, Berlin, Heidelberg, 2003, pp. 557–565.
[8] W.N.I. Al-Obaydy, S.A. Suandi, Open-set single-sample face recognition in [33] K.W. Bowyer, K. Chang, P. Flynn, A survey of approaches and challenges in 3d
video surveillance using fuzzy artmap, Neural Comput. Appl. (2018) 1–8. and multi-modal 3d + 2d face recognition, Comput. Vis. Image Underst. 101
[9] A. Ali, S. Hoque, F. Deravi, Gaze stability for liveness detection, Pattern Anal. (2006) 1–15.
Appl. 21 (2018) 437–449. [34] M. Breidt, D.W. Cunningham, C. Wallraven, Max Planck video database,
[10] A.A. Alomari, F. Khalid, R.W.O.K. Rahmat, M.T. Abdallah, Expression invari- https://vdb.kyb.tuebingen.mpg.de/.
ant face recognition using multi-stage 3d face fitting with 3d morphable face [35] V. Bruce, P.J.B. Hancock, A.M. Burton, Human face perception and identifica-
model, in: International Conference on Computer Applications and Industrial tion, in: H. Wechsler, P.J. Phillips, V. Bruce, F.F. Soulie, T.S. Huang (Eds.), Face
Electronics, Kuala Lumpur, Malaysia, 2010, pp. 151–154. Recognition: From Theory to Applications, Springer-Verlag, Berlin, Germany,
[11] N. Alskeini, K. Nguyen, V. Chandran, W. Boles, Face recognition: sparse rep- 1998, pp. 51–72.
resentation vs. deep learning, in: International Conference on Graphics and
[36] V. Bruce, A. Young, Understanding face recognition, Br. J. Psychol. 77 (1986)
Signal Processing, ICGSP 2018, Sydney, Australia, 2018.
305–327.
[12] B. Amberg, R. Knothe, T. Vetter, Expression invariant 3d face recognition with
[37] R. Brunelli, T. Poggio, Face recognition through geometrical features, in: Euro-
a morphable model, in: IEEE International Conference on Automatic Face and
pean Conference on Computer Vision, Springer, 1992, pp. 792–800.
Gesture Recognition, 2008, pp. 1–6.
[38] B. Buyuktas, C.E. Erdem, A.T. Erdem, Curriculum learning for face recognition,
[13] Z. An, W. Deng, J. Hu, Y. Zhong, Y. Zhao, APA: adaptive pose alignment for
in: European Signal Processing Conference, EUSIPCO, 2020.
pose-invariant face recognition, IEEE Access 7 (2019) 14653–14670, https://
[39] T.E. de Campos, R.S. Feris, R.M.C. Jr, A framework for face recognition from
doi.org/10.1109/ACCESS.2019.2894162.
[14] O. Arandjelovic, R. Cipolla, Face recognition from video using the generic video sequences using GWN and eigenfeature selection, in: Workshop on Ar-
shape-illumination manifold, in: European Conference on Computer Vision, tificial Intelligence and Computer Vision, 2000.
Graz, Austria, 2006. [40] Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman, Vggface2: a dataset for
[15] O. Arandjelovic, R. Cipolla, A manifold approach to face recognition from low recognising faces across pose and age, CoRR abs/1710.08092, http://arxiv.org/
quality video across illumination and pose using implicit super-resolution, in: abs/1710.08092, arXiv:1710.08092, 2017.
IEEE International Conference on Computer Vision, Piscataway, New Jersey, [41] Z. Cao, Q. Yin, X. Tang, J. Sun, Face recognition with learning based descriptor,
USA, 2007. in: IEEE International Conference on Computer Vision and Pattern Recogni-
[16] O. Arandjelović, R. Cipolla, A methodology for rapid illumination-invariant tion, 2010, pp. 2707–2714.
face recognition using image processing filters, Comput. Vis. Image Underst. [42] O. Celiktutan, S. Ulukaya, B. Sankur, A comparative study of face landmark-
113 (2009) 159–171, https://doi.org/10.1016/j.cviu.2008.06.008, http://www. ing techniques, Int. J. Image Video Process. 2013 (2013) 13, https://doi.org/10.
sciencedirect.com/science/article/pii/S1077314208000933. 1186/1687-5281-2013-13.
[17] M. Bagga, B. Singh, Spoofing detection in face recognition: a review, in: 2016 [43] F. Cen, G. Wang, Dictionary representation of deep features for occlusion-
3rd International Conference on Computing for Sustainable Global Develop- robust face recognition, IEEE Access 7 (2019) 26595–26605, https://doi.org/
ment, INDIACom, 2016, pp. 2037–2042. 10.1109/ACCESS.2019.2901376.
[18] X. Bai, B. Yin, Q. Shi, Y. Sun, Face recognition using extended fisherface with [44] H. Cevikalp, H. Serhan Yavuz, Fast and accurate face recognition with image
3d morphable model, in: International Conference on Machine Learning and sets, in: Proceedings of the IEEE International Conference on Computer Vision,
Cybernetics, 2005, pp. 4481–4486. 2017, pp. 1564–1572.
[19] E. Bailly-Bailliére, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariéthoz, J. [45] H. Cevikalp, B. Triggs, Face recognition based on image sets, in: IEEE Com-
Matas, K. Messer, V. Popovici, F. Porée, B. Ruiz, J.P. Thiran, The BANCA database puter Society Conference on Computer Vision and Pattern Recognition, San
and evaluation protocol, in: Proceedings of the 4th International Confer- Francisco, CA, USA, 2010, pp. 2567–2573.
ence on Audio- and Video-Based Biometric Person Authentication, Springer- [46] W. Chai, W. Deng, H. Shen, Cross-generating GAN for facial identity preserv-
Verlag, Berlin, Heidelberg, 2003, pp. 625–638, http://dl.acm.org/citation.cfm? ing, in: 2018 13th IEEE International Conference on Automatic Face Gesture
id=1762222.1762304. Recognition, FG 2018, 2018, pp. 130–134.
[20] W.A. Bainbridge, P. Isola, A. Oliva, The intrinsic memorability of face pho- [47] C.Y. Chang, C.S. Huang, Application of active appearance model for dual-
tographs, J. Exp. Psychol. Gen. 142 (4) (2013) 1323–1334, https://www. camera face recognition, in: International Conference on Information Security
wilmabainbridge.com/facememorability2.html. and Intelligence Control, ISIC, 2012, pp. 333–336.
[21] S. Balaban, Deep learning and face recognition: the state of the art, in: [48] W. Chao, L. Chang, X. Wang, J. Cheng, X. Deng, F. Duan, High-fidelity face
Proceedings, Biometric and Surveillance Technology for Human and Activity, sketch-to-photo synthesis using generative adversarial network, in: 2019 IEEE
2015. International Conference on Image Processing, ICIP, 2019, pp. 4699–4703.
[22] A. Bansal, C.D. Castillo, R. Ranjan, R. Chellappa, The do’s and don’ts for CNN- [49] R. Chellappa, C.L. Wilson, S. Sirohey, Human and machine recognition of faces:
based face verification, CoRR abs/1705.07426, http://arxiv.org/abs/1705.07426, a survey, Proc. IEEE 83 (1995) 705–740.
arXiv:1705.07426, 2017.
[50] B. Chen, C. Chen, W.H. Hsu, Face recognition and retrieval using cross-
[23] J. Bao, D. Chen, F. Wen, H. Li, G. Hua, CVAE-GAN: fine-grained image genera-
age reference coding with cross-age celebrity dataset, IEEE Trans. Multimed.
tion through asymmetric training, in: 2017 IEEE International Conference on
17 (2015) 804–815, https://doi.org/10.1109/TMM.2015.2420374, https://
Computer Vision, ICCV, 2017, pp. 2764–2773.
bcsiriuschen.github.io/CARC/.
[24] J. Bao, D. Chen, F. Wen, H. Li, G. Hua, Towards open-set identity preserving
[51] J. Chen, C. Yang, Y. Deng, G. Zhang, G. Su, Exploring facial asymmetry using
face synthesis, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern
optical flow, IEEE Signal Process. Lett. 21 (2014) 792–795, https://doi.org/10.
Recognition, 2018, pp. 6713–6722.
1109/LSP.2014.2316918.
[25] W. Bao, H. Li, N. Li, W. Jiang, A liveness detection method for face recognition
[52] K. Chen, S. Gong, T. Xiang, C.C. Loy, Cumulative attribute space for age and
based on optical flow field, in: International Conference on Image Analysis
crowd density estimation, in: 2013 IEEE Conference on Computer Vision
and Signal Processing, Taizhou, China, 2009, pp. 233–236.
[26] J.R. Barr, K.W. Bowyer, P.J. Flynn, S. Biswas, Face recognition from video: a and Pattern Recognition, 2013, pp. 2467–2474, http://www-prima.inrialpes.
review, Int. J. Pattern Recognit. Artif. Intell. 26 (2012) 1–53, https://doi.org/10. fr/FGnet/html/benchmarks.html.
1142/S0218001412660024. [53] L. Chen, Dual linear regression based classification for face cluster recogni-
[27] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recog- tion, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014,
nition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. pp. 2673–2680.
Intell. 19 (1997) 711–720. [54] L. Chen, H.M. Liao, J. Lin, Person identification using facial motion, in: IEEE
[28] Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in: Pro- International Conference on Image Processing, ICIP, Thessaloniki, Greece, 2001,
ceedings of the 26th International Conference on Machine Learning, Montreal, pp. 677–680.
Canada, 2009. [55] S. Chen, S. Mau, M.T. Harandi, C. Sanderson, A. Bigdeli, B.C. Lovell, Face recog-
[29] J.R. Beveridge, P.J. Phillips, D.S. Bolme, B.A. Draper, G.H. Givens, Y.M. Lui, M.N. nition from still images to video sequences: a local-feature-based framework,
Teli, H. Zhang, W.T. Scruggs, K.W. Bowyer, P.J. Flynn, S. Cheng, The challenge of Int. J. Image Video Process. (2011), https://doi.org/10.1155/2011/790598.
face recognition from digital point-and-shoot cameras, in: IEEE International [56] W. Chen, K. Wang, H. Jiang, M. Li, Skin color modeling for face detection and
Conference on Biometrics: Theory, Applications and Systems, BTAS, Arlington, segmentation: a review and a new approach, Multimed. Tools Appl. 75 (2016)
VA, USA, 2013, pp. 1–8, https://www.cs.colostate.edu/~vision/pasc/index.php. 839–862, https://doi.org/10.1007/s11042-014-2328-0.
[30] M.J. Black, Y. Yacoob, Recognizing facial expressions in image sequences using [57] X. Chen, J. Cheng, R. Song, Y. Liu, R. Ward, Z.J. Wang, Video-based heart
local parameterized models of image motion, Int. J. Comput. Vis. 25 (1997) rate measurement: recent advances and future prospects, IEEE Trans. Instrum.
23–48. Meas. 68 (2019) 3600–3615.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 21

[58] H. Cheng, Z. Liu, L. Yang, X. Chen, Sparse representation and learning in visual [87] Y. Dong, L. Zhen, L. Shengcai, Z.L. Stan, Learning face representation from
recognition: theory and applications, Signal Process. 93 (2013) 1408–1425, scratch, CoRR abs/1411.7923, http://arxiv.org/abs/1411.7923, 2014.
https://doi.org/10.1016/j.sigpro.2012.09.011. [88] S. Eberz, K.B. Rasmussen, V. Lenders, I. Martinovic, Evaluating behavioral bio-
[59] G. Chetty, Robust audio visual biometric person authentication with live- metrics for continuous authentication: challenges and metrics, in: Proceedings
ness verification, in: Intelligent Multimedia Analysis for Security Applications, of the 2017 ACM on Asia Conference on Computer and Communications Se-
Springer, 2010, pp. 59–78. curity, ACM, 2017, pp. 386–399.
[60] G.G. Chrysos, E. Antonakos, P. Snape, A. Asthana, S. Zafeiriou, A comprehensive [89] R. Ejbali, M. Zaied, C.B. Amar, Face recognition based on beta 2d elastic bunch
performance evaluation of deformable face tracking “in-the-wild”, Int. J. Com- graph matching, in: 13th International Conference on Hybrid Intelligent Sys-
put. Vis. 126 (2018) 198–232, https://doi.org/10.1007/s11263-017-0999-5. tems, HIS 2013, 2013, pp. 88–92.
[61] M. Chrzan, Liveness detection for face recognition, Ph.D. thesis, Masarykova [90] M. Elad, Sparse and redundant representation modeling-what next?, IEEE
Univerzita, Fakulta Informatiky, 2014. Signal Process. Lett. 19 (2012) 922–928, https://doi.org/10.1109/LSP.2012.
[62] C.H. Chu, Y.K. Feng, Study of eye blinking to improve face recognition for 2224655.
screen unlock on mobile devices, J. Electr. Eng. Technol. 13 (2018) 953–960. [91] A. ElSayed, A. Mahmood, T. Sobh, Effect of super resolution on high dimen-
[63] J.F. Cohn, K. Schmidt, R. Gross, P. Ekman, Individual differences in facial ex- sional features for unsupervised face recognition in the wild, in: IEEE Applied
pression: stability over time, relation to self-reported emotion, and ability to Imagery Pattern Recognition Workshop, AIPR, Washington, DC, USA, 2017.
inform person identification, in: IEEE International Conference on Multimodal [92] C.E. Erdem, C. Turan, Z. Aydin, Baum-2: a multilingual audio-visual affective
Interfaces, Pittsburgh, PA, USA, 2002, pp. 491–496. face database, Multimed. Tools Appl. 74 (2015) 7429–7459.
[64] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active appearance models, IEEE Trans. Pat- [93] C.E. Erdem, S. Ulukaya, A. Karaali, A.T. Erdem, Combining Haar feature and
tern Anal. Mach. Intell. 6 (2001) 681–685, https://doi.org/10.1109/34.927467. skin color based classifiers for face detection, in: IEEE International Confer-
[65] CRIM, CRIM database, http://www.crim.ca/. ence on Acoustics, Speech and Signal Processing, ICASSP 2011, Prague, 2011,
[66] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, pp. 22–57.
in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern [94] N. Erdogmus, S. Marcel, Spoofing face recognition with 3d masks, IEEE Trans.
Recognition, CVPR’05, vol. 1, 2005, pp. 886–893. Inf. Forensics Secur. 9 (2014) 1084–1097, https://doi.org/10.1109/TIFS.2014.
[67] A. Dantcheva, F. Bremond, Gender estimation based on smile dynamics, IEEE 2322255.
Trans. Inf. Forensics Secur. 12 (2017) 719–729. [95] K. Fan, T. Hung, A novel local pattern descriptor-local vector pattern in high-
[68] A. Dantcheva, P. Elia, A. Ross, What else does your biometric data reveal? A order derivative space for face recognition, IEEE Trans. Image Process. 23
survey on soft biometrics, IEEE Trans. Inf. Forensics Secur. 11 (2016) 441–467. (2014) 2877–2891, https://doi.org/10.1109/TIP.2014.2321495.
[69] DeepFake, Deepfake detection challange, https://www.kaggle.com/c/deepfake- [96] M.I. Faraj, J. Bigun, Audio-visual person authentication using lip-motion from
detection-challenge. orientation maps, Pattern Recognit. Lett. 28 (2007) 1368–1382, https://doi.
[70] H. Demirezen, C.E. Erdem, Remote photoplethysmography using nonlinear org/10.1016/j.patrec.2007.02.017.
mode decomposition, in: 2018 IEEE International Conference on Acoustics,
[97] S.S. Farfade, M.J. Saberian, L.J. Li, Multi-view face detection using deep convo-
Speech and Signal Processing, ICASSP, 2018, pp. 1060–1064.
lutional neural networks, in: Proc. ACM Int. Conf. Multimedia Retrievals, 2015,
[71] J. Deng, J. Guo, S. Zafeiriou, Arcface: additive angular margin loss for deep
pp. 643–650.
face recognition, CoRR abs/1801.07698, http://arxiv.org/abs/1801.07698, arXiv:
[98] A. Franco, D. Maio, F. Turroni, Spatio-temporal keypoints for video-based face
1801.07698, 2018.
recognition, in: International Conference on Pattern Recognition, Stockholm,
[72] J. Deng, Y. Zhou, S. Zafeiriou, Marginal loss for deep face recognition, in: Pro-
Sweden, 2014, pp. 489–494.
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition
[99] J. Galbally, S. Marcel, J. Fierrez, Biometric antispoofing methods: a survey in
Workshops, 2017, pp. 60–68.
face recognition, IEEE Access 2 (2014) 1530–1552.
[73] W. Deng, J. Hu, J. Guo, Extended SRC: undersampled face recognition via in-
[100] M.J. Gangeh, A.K. Farahat, A. Ghodsi, M.S. Kamel, Supervised dictionary learn-
traclass variant dictionary, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012)
ing and sparse representation-a review, CoRR abs/1502.05928, http://arxiv.
1864–1870, https://doi.org/10.1109/TPAMI.2012.30.
org/abs/1502.05928, arXiv:1502.05928, 2015.
[74] W. Deng, J. Hu, J. Guo, Extended SRC: undersampled face recognition via in-
[101] G. Gao, J. Yang, X.Y. Jing, F. Shen, W. Yang, D. Yue, Learning robust and dis-
traclass variant dictionary, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012)
criminative low-rank representations for face recognition with occlusion, Pat-
1864–1870, https://doi.org/10.1109/TPAMI.2012.30.
tern Recognit. 66 (2017) 129–143.
[75] W. Deng, J. Hu, J. Guo, Face recognition via collaborative representation:
[102] G. Gao, J. Yang, S. Wu, X. Jing, D. Yue, Bayesian sample steered discriminative
its discriminant nature and superposed representation, IEEE Trans. Pattern
regression for biometric image classification, Appl. Soft Comput. 37 (2015)
Anal. Mach. Intell. 40 (2018) 2513–2521, https://doi.org/10.1109/TPAMI.2017.
48–59.
2757923.
[76] W. Deng, J. Hu, J. Guo, Compressive binary patterns: designing a robust binary [103] G. Gao, Y. Yu, M. Yang, H. Chang, P. Huang, D. Yue, Cross-resolution face
face descriptor with random-field eigenfilters, IEEE Trans. Pattern Anal. Mach. recognition with pose variations via multilayer locality-constrained structural
Intell. 41 (2019) 758–767, https://doi.org/10.1109/TPAMI.2018.2800008. orthogonal procrustes regression, Inf. Sci. 506 (2020) 19–36.
[77] W. Deng, J. Hu, J. Lu, J. Guo, Transform-invariant PCA: a unified approach [104] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, D. Zhao, The cas-peal
to fully automatic facealignment, representation, and recognition, IEEE Trans. large-scale Chinese face database and baseline evaluations, IEEE Trans. Syst.
Pattern Anal. Mach. Intell. 36 (2014) 1275–1284, https://doi.org/10.1109/ Man Cybern., Part A, Syst. Hum. 38 (2008) 149–161, https://doi.org/10.1109/
TPAMI.2013.194. TSMCA.2007.909557, http://www.jdl.ac.cn/peal/.
[78] O. Deniz, G. Bueno, J. Salido, F.D. la Torre, Face recognition using histograms [105] M. Gavrilescu, Study on using individual differences in facial expressions for a
of oriented gradients, Pattern Recognit. Lett. 32 (2011) 1598–1603, https:// face recognition system immune to spoofing attacks, IET Biometrics 5 (2016)
doi.org/10.1016/j.patrec.2011.01.004. 236–242, https://doi.org/10.1049/iet-bmt.2015.0078.
[79] O. Deniz, M. Castrillon, M. Hernandez, Face recognition using independent [106] S. Ge, S. Zhao, C. Li, J. Li, Low-resolution face recognition in the wild via selec-
component analysis and support vector machines, Pattern Recognit. Lett. 24 tive knowledge distillation, IEEE Trans. Image Process. 28 (2019) 2051–2062,
(2003) 2153–2157, https://doi.org/10.1016/S0167-8655(03)00081-3. https://doi.org/10.1109/TIP.2018.2883743.
[80] H. Dibeklioglu, F. Alnajar, A.A. Salah, T. Gevers, Combining facial dynamics [107] A. Georghiades, P. Belhumeur, D. Kriegman, Yale face database, [Online] http://
with appearance for age estimation, IEEE Trans. Image Process. 24 (2015) cvc.yale.edu/projects/yalefaces/yalefa, 1997.
1928–1943. [108] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: illumina-
[81] H. Dibeklioglu, A.A. Salah, T. Gevers, Are you really smiling at me? Spon- tion cone models for face recognition under variable lighting and pose, IEEE
taneous versus posed enjoyment smiles, in: European Conference Computer Trans. Pattern Anal. Mach. Intell. 23 (2001) 643–660, https://doi.org/10.1109/
Vision, Springer, Berlin, Heidelberg, 2012, pp. 525–538. 34.927464, http://vision.ucsd.edu/content/extended-yale-face-database-b-b.
[82] H. Dibeklioglu, A.A. Salah, T. Gevers, Recognition of genuine smiles, IEEE [109] M.M. Ghazi, H.K. Ekenel, A comprehensive analysis of deep learning based
Trans. Multimed. 17 (2015) 279–294. representation for face recognition, in: 2016 IEEE Conference on Computer
[83] C. Ding, D. Tao, A comprehensive survey on pose-invariant face recognition, Vision and Pattern Recognition Workshops, CVPRW, 2016, pp. 102–109.
ACM Trans. Intell. Syst. Technol. 7 (2016). [110] R. Goh, L. Liu, X. Liu, T. Chen, The CMU face in action (FIA) database, in: In-
[84] C. Ding, D. Tao, Trunk-branch ensemble convolutional neural networks for ternational Workshop on Analysis and Modeling of Faces and Gestures, Berlin,
video-based face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018) 2005, pp. 255–263.
1002–1014, https://doi.org/10.1109/TPAMI.2017.2700390. [111] B. Gokberk, H. Dutağacı, A. Ulaş, L. Akarun, B. Sankur, Representation plurality
[85] Z. Ding, Y. Guo, L. Zhang, Y. Fu, One-shot face recognition via generative learn- and fusion for 3-d face recognition, IEEE Trans. Syst. Man Cybern., Part B,
ing, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Cybern. 38 (2008) 155–173, https://doi.org/10.1109/TSMCB.2007.908865.
Recognition, FG 2018, IEEE, 2018, pp. 1–7. [112] B. Gokberk, M.O. Irfanoglu, L. Akarun, 3d shape-based face represen-
[86] T. Do, E. Kijak, Face recognition using co-occurrence histograms of oriented tation and feature extraction for face recognition, Image Vis. Comput.
gradients, in: IEEE International Conference on Acoustics, Speech and Signal 24 (2006) 857–869, https://doi.org/10.1016/j.imavis.2006.02.009, http://www.
Processing, ICASSP, Kyoto, Japan, 2012, pp. 1301–1304. sciencedirect.com/science/article/pii/S0262885606000928.
22 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

[113] F. Gosselin, P.G. Schyns, Bubbles: a technique to reveal the use of information [137] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
in recognition tasks, Vis. Res. 41 (2001) 2261–2271, https://doi.org/10.1016/ in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR,
S0042-6989(01)00097-9. 2016, pp. 770–778.
[114] D.B. Graham, N.M. Allinson, Characterizing virtual eigensignatures for gen- [138] L. He, H. Li, Q. Zhang, Z. Sun, Dynamic feature matching for partial face recog-
eral purpose face recognition, in: H. Wechsler, P.J. Phillips, V. Bruce, F. nition, IEEE Trans. Image Process. 28 (2019) 791–802, https://doi.org/10.1109/
Fogelman-Soulie, T.S. Huang (Eds.), Face Recognition: From Theory to Applica- TIP.2018.2870946.
tions, in: NATO ASI Series F, Computer and Systems Sciences, vol. 163, 1998, [139] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacian faces,
pp. 446–456, http://images.ee.umist.ac.uk/danny/database.html. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 328–340, https://doi.org/10.
[115] K. Grm, V. Struc, A. Artiges, M. Caron, H.K. Ekenel, Strengths and weaknesses 1109/TPAMI.2005.55.
of deep learning models for face recognition against image degradations, IET [140] H. Hill, V. Bruce, Effects of lighting on matching facial surfaces, J. Exp. Psychol.
Biometrics 7 (2018) 81–89. Hum. Percept. Perform. 22 (1996) 986–1004.
[116] R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-PIE, Image Vis. Com- [141] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9
put. 28 (2010) 807–813, https://doi.org/10.1016/j.imavis.2009.08.002, http:// (1997) 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735.
www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html, Special is- [142] C. Hsieh, S. Lai, Y. Chen, 2d expression-invariant face recognition with con-
sue: Best of Automatic Face and Gesture Recognition 2008. strained optical flow, in: IEEE International Conference on Multimedia and
[117] R. Gross, J. Shi, The CMU Motion of Body (MoBo) Database, Technical Report Expo, New York, NY, USA, 2009, pp. 1058–1061.
[143] C. Hsieh, S. Lai, Y. Chen, An optical flow-based approach to robust face
CMU-RI-TR-01-18, Carnegie Mellon University, Pittsburgh, PA, 2001, https://
recognition under expression variations, IEEE Trans. Image Process. 19 (2010)
www.ri.cmu.edu/publications/the-cmu-motion-of-body-mobo-database/.
233–240, https://doi.org/10.1109/TIP.2009.2031233.
[118] B.K. Gunturk, A.U. Batur, Y. Altunbasak, M.H. Hayes, R.M. Mersereau,
[144] C. Hu, X. Lu, P. Liu, X. Jing, D. Yue, Single sample face recognition under vary-
Eigenface-domain super-resolution for face recognition, IEEE Trans. Image
ing illumination via qrcp decomposition, IEEE Trans. Image Process. 28 (2019)
Process. 12 (2003) 597–606, https://doi.org/10.1109/TIP.2003.811513.
2624–2638, https://doi.org/10.1109/TIP.2018.2887346.
[119] G. Guo, N. Zhang, A survey on deep learning based face recognition, Comput.
[145] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: 2018
Vis. Image Underst. 189 (2019) 102805.
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018,
[120] Y. Guo, L. Zhang, One-shot face recognition by promoting underrepresented
pp. 7132–7141.
classes, arXiv preprint arXiv:1707.05574, 2017.
[146] P. Hu, D. Ramanan, Finding tiny faces, in: 2017 IEEE Conference on Computer
[121] Y. Guo, L. Zhang, Y. Hu, X. He, J. Gao, MS-Celeb-1M: a dataset and bench- Vision and Pattern Recognition, CVPR, 2017, pp. 1522–1530.
mark for large-scale face recognition, CoRR abs/1607.08221, https:// [147] Y. Hu, A.S. Mian, R. Owens, Face recognition using sparse approximated near-
www.microsoft.com/en-us/research/publication/ms-celeb-1m-dataset- est points between image sets, IEEE Trans. Pattern Anal. Mach. Intell. 34
benchmark-large-scale-face-recognition-2/, 2016. (2012) 1992–2004.
[122] R.E. Haamer, K. Kulkarni, N. Imanpour, M.A. Haque, E. Avots, M. Breisch, K. [148] C. Huang, Y. Li, C.L. Chen, X. Tang, Deep imbalanced learning for face recog-
Nasrollahi, S. Escalera, C. Ozcinar, X. Baro, A.R. Naghsh-Nilchi, T.B. Moeslund, nition and attribute prediction, in: IEEE Transactions on Pattern Analysis and
G. Anbarjafari, Changes in facial expression as biometric: a database and Machine Intelligence, 2019.
benchmarks of identification, in: IEEE Conf. on Automatic Face and Gesture [149] D. Huang, C. Shan, M. Ardabilian, Y. Wang, L. Chen, Local binary patterns and
Recognition Workshop, China, 2018, pp. 621–628. its application to facial image analysis: a survey, IEEE Trans. Syst. Man Cy-
[123] A. Hadid, Face biometrics under spoofing attacks: vulnerabilities, counter- bern., Part C, Appl. Rev. 41 (2011) 765–781, https://doi.org/10.1109/TSMCC.
measures, open issues, and research directions, in: Proceedings of the IEEE 2011.2118750.
Conference on Computer Vision and Pattern Recognition Workshops, 2014, [150] G.B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild:
pp. 113–118. a database for studying face recognition in unconstrained environments, in:
[124] A. Hadid, J. Dugelay, M. Pietikainen, On the use of dynamic features in face Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recog-
biometrics: recent advances and challenges, Signal Image Video Process. 5 nition, Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Marseille,
(2011) 495–506, https://doi.org/10.1007/s11760-011-0247-3. France, 2008, https://hal.inria.fr/inria-00321923.
[125] A. Hadid, M. Pietikainen, An experimental investigation about the integration [151] Y. Huang, S. Chen, A geometrical-model-based face recognition, in: IEEE Inter-
of facial dynamics in video-based face recognition, Electron. Lett. Comput. Vis. national Conference on Image Processing, ICIP, Quebec City, QC, Canada, 2015,
Image Anal. 5 (2005) 1–13, https://doi.org/10.5565/rev/elcvia.80. pp. 3106–3110.
[126] A. Hadid, M. Pietikainen, Combining appearance and motion for face and gen- [152] Z. Huang, S. Shan, R. Wang, H. Zhang, S. Lao, A. Kuerban, X. Chen, A bench-
der recognition from videos, Pattern Recognit. 42 (2009) 2818–2827, https:// mark and comparative study of video-based face recognition on Cox face
doi.org/10.1016/j.patcog.2009.02.011. database, IEEE Trans. Image Process. 24 (2015) 5967–5981, https://doi.org/10.
[127] A. Hadid, M. Pietikáinen, Manifold learning for video-to-video face recog- 1109/TIP.2015.249344.
nition, in: J. Fierrez, J. Ortega-Garcia, A. Esposito, A. Drygajlo, M. Faundez- [153] Z. Huang, S. Shan, H. Zhang, H. Lao, A. Kuerban, X. Chen, Benchmarking still-
Zanuy (Eds.), Biometric ID Management and Multimodal Communication, to-video face recognition via partial and local linear discriminant analysis on
BioID 2009, in: Lecture Notes in Computer Science, vol. 5707, Springer, Berlin, COX-S2V dataset, in: Asian Conference on Computer Vision, Daejeon, Korea,
Heidelberg, 2009. 2012, pp. 589–600.
[128] A. Hadid, M. Pietikainen, S. Li, Learning personal specific facial dynamics for [154] Z. Huang, R. Wang, S. Shan, X. Chen, Projection metric learning on Grassmann
face recognition from videos, in: Lecture Notes in Computer Science, 2007, manifold with application to video based face recognition, in: Proceedings
pp. 1–15. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015,
[129] F. Hajati, M. Tavakolian, S. Gheisari, Y. Gao, A.S. Mian, Dynamic texture com- pp. 140–149.
[155] Z. Huang, X. Zhao, S. Shan, R. Wang, X. Chen, Coupling alignments with recog-
parison using derivative sparse representation: application to video-based
nition for still-to-video face recognition, in: IEEE International Conference on
face recognition, IEEE Trans. Human-Mach. Syst. 47 (2017) 970–982, https://
Computer Vision, Sydney, NSW, Australia, 2013, pp. 3296–3303.
doi.org/10.1109/THMS.2017.2681425.
[156] A. Jadhav, V.P. Namboodiri, K. Venkatesh, Deep attributes for one-shot face
[130] P.W. Hallinan, A Deformable Model for the Recognition of Human Faces Under
recognition, in: European Conference on Computer Vision, Springer, 2016,
Arbitrary Illumination, Ph.D. thesis, Harvard University, Cambridge, MA, USA,
pp. 516–523.
1995, ftp://ftp.hrl.harvard.edu/pub/faces.
[157] R. Jafri, H.R. Arabnia, A survey of face recognition techniques, J. Inf. Process.
[131] E.M. Hand, R. Chellappa, Attributes for improved attributes: a multi-task net-
Syst. 5 (2009) 41–68.
work utilizing implicit and explicit relationships for facial attribute classifica-
[158] A.K. Jain, A.A. Ross, K. Nandakumar, Introduction to Biometrics, Springer, 2011.
tion, in: Proc. AAAI Conf. Artificial Intelligence, 2017, pp. 4068–4074.
[159] V. Jain, A. Mukherjee, The Indian face database, http://vis-www.cs.umass.edu/
[132] M.A. Haque, K. Nasrollahi, T.B. Moeslund, Pain expression as a biometric: ~vidit/IndianFaceDatabase/, 2002.
why patients’ self-reported pain doesn’t match with the objectively measured [160] R. Jenkins, A.M. Burton, 100% accuracy in automatic face recognition, Science
pain?, in: IEEE International Conference on Identity, Security and Behavior 319 (2008) 435, https://doi.org/10.1126/science.1149656.
Analysis, ISBA, New Delhi, India, 2017. [161] O. Jesorsky, K.J. Kirchberg, R.W. Frischholz, Robust face detection using the
[133] M.T. Harandi, C. Sanderson, S. Shirazi, B.C. Lovell, Graph embedding discrim- Hausdorff distance, in: J. Bigun, F. Smeraldi (Eds.), Audio- and Video-Based
inant analysis on grassmannian manifolds for improved image set matching, Biometric Person Authentication, Springer, Berlin, Heidelberg, 2001, pp. 90–95,
in: CVPR 2011, IEEE, 2011, pp. 2705–2712. http://www.humanscan.de/support/downloads/facedb.php.
[134] M. Hasnat, J. Bohné, J. Milgram, S. Gentric, L. Chen, et al., von Mises-Fisher [162] H. Jiang, E. Learned-Miller, Face detection with the faster R-CNN, in: 2017
mixture model-based deep learning: application to face verification, arXiv 12th IEEE International Conference on Automatic Face Gesture Recognition,
preprint arXiv:1706.04264, 2017. FG 2017, 2017, pp. 650–657.
[135] M. Hassaballah, S. Aly, Face recognition: challenges, achievements and future [163] X. Jin, X. Tan, Face alignment in-the-wild: a survey, Comput. Vis. Image Un-
directions, IET Comput. Vis. 9 (2015) 614–626. derst. 162 (2017) 1–22.
[136] S.M. Hatture, P. Karchi, Prevention of spoof attack in biometric system using [164] T. Kanade, Computer Recognition of Human Faces, Birkhauser Verlag, Basel
liveness detection, Int. J. Latest Trends Eng. Technol. (2013) 42–49. und Stuttgart, 1977.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 23

[165] T. Kanade, J.F. Cohn, Y. Tian, Comprehensive database for facial expression [191] G. Lenz, S.H. Ieng, R. Benosman, High speed event-based face detection and
analysis, in: IEEE International Conference on Automatic Face and Gesture tracking in the blink of an eye, arXiv preprint arXiv:1803.10106, 2018.
Recognition, 2000, pp. 46–53. [192] H. Li, P. He, S. Wang, A. Rocha, X. Jiang, A.C. Kot, Learning generalized deep
[166] C. Kant, N. Sharma, Fake face recognition using fusion of thermal imaging and feature representation for face anti-spoofing, IEEE Trans. Inf. Forensics Secur.
skin elasticity, Int. J. Comput. Sci. Commun. 4 (2013) 65–72. 13 (2018) 2639–2652.
[167] A. Kar, P.P.G. Neogi, Triangular coil pattern of local radius of gyration face for [193] H. Li, Z. Lin, X. Shen, J. Brandt, G. Hua, A convolutional neural network cascade
heterogeneous face recognition, Appl. Intell. 50 (2020) 698–716. for face detection, in: Proc. IEEE Conf. Computer Vision Pattern Recognition,
[168] A.L. Kashyap, S. Tulyakov, V. Govindaraju, Facial behavior as a soft biomet- 2015, pp. 5325–5334.
ric, in: Proceedings of IAPR International Conference on Biometrics, ICB, New [194] S.Z. Li, Z. Lei, M. Ao, The HFB face database for heterogeneous face biometrics
Delhi, India, 2012, pp. 147–151. research, in: 2009 IEEE Computer Society Conference on Computer Vision and
[169] V. Kazemi, J. Sullivan, One millisecond face alignment with an ensemble of Pattern Recognition Workshops, 2009, pp. 1–8.
regression trees, in: IEEE Conference on Computer Vision and Pattern Recog- [195] S.Z. Li, D. Yi, Z. Lei, S. Liao, The CASIA NIR-VIS 2.0 face database, in: IEEE
nition, 2014, pp. 1867–1874. Conference on Computer Vision and Pattern Recognition Workshops, 2013,
[170] N. Kela, A. Rattani, P. Gupta, Illumination invariant elastic bunch graph match- pp. 348–353, https://pypi.org/project/bob.db.cbsr-nir-vis-2/.
ing for efficient face recognition, in: IEEE International Conference on Com- [196] Y. Li, B. Sun, T. Wu, Y. Wang, Face detection with end-to-end integration of
puter Vision and Pattern Recognition Workshop, CVPRW, 2006. a ConvNet and a 3d model, in: Proc. European Conf. Computer Vision, 2016,
[171] M.A. Khan, C. Xydeas, H. Ahmed, Multi-model aam framework for face image pp. 420–436.
modeling, in: 18th International Conference on Digital Signal Processing, DSP, [197] S. Liao, A.C.S. Chung, Face recognition with salient local gradient orientation
2013. binary patterns, in: 16th IEEE International Conference on Image Processing,
[172] M. Killioglu, M. Taskiran, N. Kahraman, Anti-spoofing in face recognition ICIP, Cairo, Egypt, 2009, pp. 3317–3320.
with liveness detection using pupil tracking, in: 2017 IEEE 15th International [198] L. Lin, K. Wang, D. Meng, W. Zuo, L. Zhang, Active self-paced learning for cost-
Symposium on Applied Machine Intelligence and Informatics, SAMI, 2017, effective and progressive face identification, IEEE Trans. Pattern Anal. Mach.
pp. 000087–000092. Intell. 40 (2018) 7–19, https://doi.org/10.1109/TPAMI.2017.2652459.
[173] K.I. Kim, K. Jung, H.J. Kim, Face recognition using kernel principal component [199] Y. Lin, S. Cheng, J. Shen, M. Pantic, Mobiface: a novel dataset for mobile face
analysis, IEEE Signal Process. Lett. 9 (2002) 40–42, https://doi.org/10.1109/97. tracking in the wild, in: IEEE Conf. Automatic Face and Gesture Recognition,
991133. FG, 2019.
[174] M. Kim, S. Kumar, V. Pavlovic, H. Rowley, Face tracking and recognition with [200] Lina T. Takahashi, I. Ide, H. Murase, Incremental unsupervised-learning of
visual constraints in real-world videos, in: 2008 IEEE Conference on Computer appearance manifold with view-dependent covariance matrix for face recog-
Vision and Pattern Recognition, 2008, pp. 1–8. nition from video sequences, IEICE Trans. Inf. Syst. E92.D (2009) 642–652,
[175] S.T. Kim, D.H. Kim, Y.M. Ro, Spatio-temporal representation for face authen-
https://doi.org/10.1587/transinf.E92.D.642.
tication by using multi-task learning with human attributes, in: IEEE In-
[201] A. Liu, J. Wan, S. Escalera, H. Jair Escalante, Z. Tan, Q. Yuan, K. Wang, C. Lin, G.
ternational Conference on Image Processing, ICIP, Phoenix, AZ, USA, 2016,
Guo, I. Guyon, et al., Multi-modal face anti-spoofing attack detection challenge
pp. 2996–3000.
at CVPR2019, in: Proceedings of the IEEE Conference on Computer Vision and
[176] S.T. Kim, Y.M. Ro, Attended relation feature representation of facial dy-
Pattern Recognition Workshops, 2019.
namics for facial authentication, IEEE Trans. Inf. Forensics Secur. 14 (2019)
[202] C. Liu, H. Wechsler, Comparative assessment of independent component anal-
1768–1778.
ysis (ICA) for face recognition, in: The Second International Conference on
[177] T. Kim, J. Kittler, R. Cipolla, Discriminative learning and recognition of image
Audio and Video Based Biometric Oerson Authentication, 1999, pp. 1–6.
set classes using canonical correlations, IEEE Trans. Pattern Anal. Mach. Intell.
[203] C. Liu, H. Wechsler, Independent component analysis of Gabor features for
29 (2007) 1005–1018, https://doi.org/10.1109/TPAMI.2007.1037.
face recognition, IEEE Trans. Neural Netw. 14 (2003) 919–928.
[178] T.K. Kim, J. Kittler, Locally linear discriminant analysis for multimodally dis-
[204] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, A.C. Berg, SSD:
tributed classes for face recognition with a single model image, IEEE Trans.
single shot multibox detector, in: Proc. European Conf. Computer Vision, 2016,
Pattern Anal. Mach. Intell. 27 (2005) 318–327, https://doi.org/10.1109/TPAMI.
pp. 21–37.
2005.58.
[205] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, L. Song, Sphereface: deep hypersphere
[179] T.K. Kim, J. Kittler, R. Cipolla, On-line learning of mutually orthogonal sub-
embedding for face recognition, in: Proceedings of the IEEE Conference on
spaces for face recognition by image sets, IEEE Trans. Image Process. 19 (2010)
Computer Vision and Pattern Recognition, 2017, pp. 212–220.
1067–1074.
[206] W. Liu, Y. Wen, Z. Yu, M. Yang, Large-margin softmax loss for convolutional
[180] B.F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A.
neural networks, in: International Conference on Machine Learning, 2016.
Mah, M. Burge, A.K. Jain, Pushing the frontiers of unconstrained face detec-
[207] W. Liu, Y.M. Zhang, X. Li, Z. Yu, B. Dai, T. Zhao, L. Song, Deep hyperspher-
tion and recognition: IARPA Janus Benchmark A, in: 2015 IEEE Conference
ical learning, in: Advances in Neural Information Processing Systems, 2017,
on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1931–1939,
pp. 3950–3960.
https://www.nist.gov/itl/iad/image-group/ijb-dataset-request-form.
[181] B. Knappmeyer, I.M. Thornton, H.H. Bulthoff, The use of facial motion and [208] X. Liu, T. Chen, Face mosaicing for pose robust video-based recognition, in:
facial form during the processing of identity, Vis. Res. 43 (2003) 1921–1936, Asian Conference on Computer Vision, Tokyo, Japan, 2007, pp. 662–671.
https://doi.org/10.1016/S0042-6989(03)00236-0. [209] X. Liu, T. Cheng, Video-based face recognition using adaptive hidden Markov
[182] B. Knight, A. Johnston, The role of movement in face recognition, Vis. Cogn. 4 models, in: IEEE Conference on Computer Vision and Pattern Recognition,
(1997) 265–273, https://doi.org/10.1080/713756764. Madison, WI, USA, 2003.
[183] K. Kollreider, H. Fronthaler, M.I. Faraj, J. Bigun, Real-time face detection and [210] Y. Liu, H. Li, X. Wang, Rethinking feature discrimination and polymerization
motion analysis with application in “liveness” assessment, IEEE Trans. Inf. for large-scale recognition, arXiv preprint arXiv:1710.00870, 2017.
Forensics Secur. 2 (2007) 548–558. [211] Y. Liu, K.L. Schmidt, J.F. Mitra, Facial asymmetry quantification for expres-
[184] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep con- sion invariant human identification, Comput. Vis. Image Underst. 91 (2003)
volutional neural networks, in: Conf. on Neural Information Processing Sys- 138–159, https://doi.org/10.1016/S1077-3142(03)00078-X.
tems, NIPS, 2012, pp. 1097–1105. [212] Y. Liu, J. Stehouwer, A. Jourabloo, X. Liu, Deep tree learning for zero-shot face
[185] V. Kruger, A. Happe, G. Sommer, Affine real-time face tracking using a wavelet anti-spoofing, in: Proceedings of the IEEE Conference on Computer Vision and
network, in: Proceedings International Workshop on Recognition, Analysis, Pattern Recognition, 2019, pp. 4680–4689.
and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with [213] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.
ICCV’99, Corfu, Greece, 1999, pp. 141–148. Comput. Vis. 60 (2004) 91–110, https://doi.org/10.1023/B:VISI.0000029664.
[186] M. Lades, J. Vorbruggen, J. Buhmann, J. Lange, C. Malsburg, R. Wurtz, W. Ko- 99615.94.
nen, Distortion invariant object recognition in the dynamic link architecture, [214] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face recognition using kernel
IEEE Trans. Comput. 42 (1993) 300–311. direct discriminant analysis algorithms, IEEE Trans. Neural Netw. 14 (2003)
[187] A. Lagorio, M. Tistarelli, M. Cadoni, C. Fookes, S. Sridharan, Liveness detection 117–126, https://doi.org/10.1109/TNN.2002.806629.
based on 3d face shape analysis, in: 2013 International Workshop on Biomet- [215] J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Regularization studies of lin-
rics and Forensics, IWBF, IEEE, 2013, pp. 1–4. ear discriminant analysis in small sample size scenarios with application to
[188] G. Lavrentyeva, O. Kudashev, A. Melnikov, M. De Marsico, Y. Matveev, In- face recognition, Pattern Recognit. Lett. 26 (2005) 181–191, https://doi.org/10.
teractive photo liveness for presentation attacks detection, in: International 1016/j.patrec.2004.09.014.
Conference Image Analysis and Recognition, Springer, 2018, pp. 252–258. [216] J. Lu, G. Wang, P. Moulin, Image set classification using holistic multiple order
[189] K. Lee, J. Ho, M. Yang, D. Kriegman, Visual tracking and recogni- statistics features and localized multi-kernel metric learning, in: 2013 IEEE
tion using probabilistic appearance manifolds, Computer Vision and Im- International Conference on Computer Vision, 2013, pp. 329–336.
age Understanding, http://vision.ucsd.edu/~leekc/HondaUCSDVideoDatabase/ [217] J. Lu, G. Wang, J. Zhou, Simultaneous feature and dictionary learning for image
HondaUCSD.html, 2005. set based face recognition, IEEE Trans. Image Process. 26 (2017) 4042–4054.
[190] Z. Lei, M. Pietikainen, S.Z. Li, Learning discriminant face descriptor, IEEE Trans. [218] P. Lucey, J.F. Cohn, K.M. Prkachin, P.E. Solomon, I. Matthews, Painful data: the
Pattern Anal. Mach. Intell. 36 (2014) 289–302. unbc-mcmaster shoulder pain expression archive database, in: IEEE Interna-
24 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

tional Conference on Automatic Face and Gesture Recognition Workshops, FG, [244] J. Mohapatra, T.W. Weng, P.Y. Chen, S. Liu, L. Daniel, Towards verifying ro-
Santa Barbara, CA, USA, 2011, pp. 57–64. bustness of neural networks against semantic perturbations, in: IEEE Conf.
[219] M.J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, J. Budynek, The Japanese fe- Computer Vision and Pattern Recognition, CVPR, 2020.
male facial expression (JAFFE) database, in: IEEE International Conference on [245] F. Mokhayeri, E. Granger, A paired sparse representation model for robust face
Automatic Face and Gesture Recognition, 1998, pp. 14–16, https://zenodo.org/ recognition from a single sample, Pattern Recognit. 100 (2020) 107129.
record/3451524#.Xrp3nWgzaUk. [246] F. Mokhayeri, E. Granger, G. Bilodeau, Domain-specific face synthesis for video
[220] M.A. Khan, C. Xydeas, H. Ahmed, On the application of AAM-based systems in face recognition from a single sample per person, IEEE Trans. Inf. Forensics
face recognition, in: European Signal Processing Conference, EUSIPCO, 2014, Secur. 14 (2019) 757–772, https://doi.org/10.1109/TIFS.2018.2866295.
pp. 2445–2449. [247] H. Murase, S.K. Nayar, Visual learning and recognition of 3-d objects from
[221] H. Mady, S.M.S. Hilles, Face recognition and detection using random forest and appearance, Int. J. Comput. Vis. 14 (1995) 5–24.
combination of LBP and HOG features, in: International Conference on Smart [248] C. Nagpal, S.R. Dubey, A performance evaluation of convolutional neural net-
Computing and Electronic Enterprise, ICSCEE, Shah Alam, Malaysia, 2018. works for face anti spoofing, in: 2019 International Joint Conference on Neural
[222] H.H. Mady, S.M. Hilles, Efficient real time attendance system based on face Networks, IJCNN, IEEE, 2019, pp. 1–8.
detection case study “mediu staff”, Int. J. Contemp. Comput. Res. 1 (2017) [249] A. Nech, I. Kemelmacher-Shlizerman, Level playing field for million scale face
21–25. recognition, in: 2017 IEEE Conference on Computer Vision and Pattern Recog-
[223] Z. Mahmood, N. Muhammad, N. Bibi, T. Ali, A review on state-of-the-art face nition, CVPR, 2017, pp. 3406–3415, http://megaface.cs.washington.edu/.
recognition approaches, Fractals 25 (2017). [250] A. Nefian, M.H. Hayes III, A hidden Markov model-based approach for face
[224] S. Marcel, C. McCool, P. Matéjka, T. Ahonen, J. Ćernocký, S. Chakraborty, et al., detection and recognition, Ph.D. thesis, School of Electrical and Computer
On the results of the first mobile biometry (MOBIO) face and speaker ver- Engineering, Georgia Institute of Technology, 1999, http://www.anefian.com/
ification evaluation, in: Recognizing Patterns in Signals, Speech, Images and research/face_reco.htm.
Videos, ICPR 2010, 2010, pp. 210–225. [251] O. Nikisins, M. Greitans, Reduced complexity automatic face recognition al-
[225] O. Martin, I. Kotsia, B. Macq, I. Pitas, P. Levant, The enterface’05 audio-visual gorithm based on local binary patterns, in: 19th International Conference
emotion database, in: International Conference on Data Engineering Work- on Systems, Signals and Image Processing, IWSSIP, Vienna, Austria, 2012,
shops, Atlanta, GA, USA, 2006, pp. 2–9. pp. 433–436.
[226] A. Martinez, R. Benavente, The AR Face Database, Technical Report [252] Y. Ning, T. Sim, Smile, you‘re on identity camera, in: International Conference
24, CVC Technical Report, http://www2.ece.ohio-state.edu/~aleix/ARdatabase. on Pattern Recognition, Tampa, FL, USA, 2008.
html, 1998. [253] K.A. Nixon, V. Aimale, R.K. Rowe, Spoof detection schemes, in: Handbook of
[227] B. Martinez, M.F. Valstar, B. Jiang, Automatic analysis of facial actions: a sur- Biometrics, Springer, 2008, pp. 403–423.
vey, IEEE Trans. Affect. Comput. 13 (2017). [254] E.M. Nowara, A. Sabharwal, A. Veeraraghavan, Ppgsecure: biometric presen-
[228] I. Masi, F. Chang, J. Choi, S. Harel, J. Kim, K. Kim, J. Leksut, S. Rawls, Y. Wu,
tation attack detection using photopletysmograms, in: 2017 12th IEEE Inter-
T. Hassner, W. AbdAlmageed, G. Medioni, L. Morency, P. Natarajan, R. Nevatia,
national Conference on Automatic Face Gesture Recognition, FG 2017, 2017,
Learning pose-aware models for pose-invariant face recognition in the wild,
pp. 56–62.
IEEE Trans. Pattern Anal. Mach. Intell. 41 (2019) 379–393, https://doi.org/10.
[255] T. Ojala, M. Pietikáinen, D. Harwood, A comparative study of texture mea-
1109/TPAMI.2018.2792452.
sures with classification based on featured distributions, Pattern Recognit.
[229] I. Masi, T. Hassner, A.T. Tran, G. Medioni, Rapid synthesis of massive face sets
29 (1996) 51–59, https://doi.org/10.1016/0031-3203(95)00067-4, http://www.
for improved face recognition, in: 2017 12th IEEE International Conference on
sciencedirect.com/science/article/pii/0031320395000674.
Automatic Face Gesture Recognition, FG 2017, 2017, pp. 604–611.
[256] A.J. O’Toole, J. Harms, S.L. Snow, D.R. Hurst, M.R. Pappas, J.H. Ayyad, H.
[230] I. Masi, A.T. Trán, T. Hassner, J.T. Leksut, G. Medioni, Do we really need to
Abdi, A video database of moving faces and people, IEEE Trans. Pattern
collect millions of faces for effective face recognition?, in: B. Leibe, J. Matas, N.
Anal. Mach. Intell. 27 (2005) 812–816, https://doi.org/10.1109/TPAMI.2005.90,
Sebe, M. Welling (Eds.), Computer Vision – ECCV 2016, Springer International
http://www.utdallas.edu/dept/bbs/FACULTY_PAGES/otoole/database.htm.
Publishing, Cham, 2016, pp. 579–596.
[257] A.J. O’Toole, D.A. Roark, H. Abdi, Recognizing moving faces: a psychological
[231] I. Masi, Y. Wu, T. Hassner, P. Natarajan, Deep face recognition: a survey, in:
and neural synthesis, Trends Cogn. Sci. 6 (2002) 261–266, https://doi.org/10.
SIBGRAPI – Conference on Graphics, Patterns and Images, 2018.
1016/S1364-6613(02)01908-3.
[232] F. Matta, J. Dugelay, A behavioural approach to person recognition, in: IEEE
[258] G. Pala, C.E. Erdem, Performance comparison of deep learning based face
International Conference on Multimedia and Expo, 2006, pp. 1461–1464.
identification methods for video under adverse conditions, in: The 15th Int.
[233] F. Matta, J. Dugelay, Video face recognition: a physiological and behavioural
Conf. on Signal-Image Technology and Internet Based Systems, SITIS, 2019.
multimodal approach, in: IEEE International Conference on Image Processing,
[259] M. Paleari, C. Velardo, B. Huet, J. Dugelay, S. Antipolis, Face dynamics for bio-
San Antonio, TX, USA, 2007, pp. 497–500.
metric people recognition, in: IEEE International Workshop on Multimedia
[234] F. Matta, J. Dugelay, Person recognition using facial video information: a state
Signal Processing, Rio de Janeiro, Brazil, 2009.
of the art, J. Vis. Lang. Comput. 20 (2009) 180–187, https://doi.org/10.1016/j.
jvlc.2009.01.002. [260] G. Pan, Z. Wu, L. Sun, Liveness detection for face recognition, in: Recent Ad-
[235] B. Maze, J. Adams, J.A. Duncan, N. Kalka, T. Miller, C. Otto, A.K. Jain, W.T. vances in Face Recognition, IntechOpen, 2008.
Niggel, J. Anderson, J. Cheney, P. Grother, Iarpa Janus benchmark - C: face [261] U. Park, A.K. Jain, 3d model-based face recognition in video, in: International
dataset and protocol, in: 2018 International Conference on Biometrics (ICB), Conference on Biometrics, Seoul, Korea, 2007, pp. 1085–1094.
2018, pp. 158–165, https://www.nist.gov/itl/iad/ig/ijb-c-dataset-request-form. [262] O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in: British Ma-
[236] C. McCool, S. Marcel, MOBIO Database for the ICPR 2010 Face and chine Vision Conference, Swansea, UK, 2015.
Speech Competition. Idiap-Com Idiap-Com-02-2009. Idiap, https://www.idiap. [263] O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in: BMVC, 2015,
ch/dataset/mobio, 2009. https://www.robots.ox.ac.uk/~vgg/data/vgg_face/.
[237] K. Meena, A. Suruliandi, Local binary patterns and its variants for face recog- [264] C. Peng, X. Gao, N. Wang, J. Li, Graphical representation for heterogeneous
nition, in: International Conference on Recent Trends in Information Technol- face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2016) 301–312.
ogy, ICRTIT, Chennai, Tamil Nadu, India, 2011, pp. 782–786. [265] P.J. Phillips, Human identification technical challenges, in: Proceedings. Inter-
[238] H. Mendez-Vazquez, Y. Martinez-Diaz, Z. Chai, Volume structured ordinal fea- national Conference on Image Processing, 2002, http://www.nd.edu/cvrl/HID-
tures with background similarity measure for video face recognition, in: In- data.html.
ternational Conference on Biometrics, ICB, Madrid, Spain, 2013, pp. 1–6. [266] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, K. Hoffman, J. Marques,
[239] K. Messer, J. Kittler, M. Sadeghi, S. Marcel, C. Marcel, S. Bengio, F. Cardinaux, W. Worek, Overview of the face recognition grand challenge, in: IEEE
C. Sanderson, J. Czyz, L. Vandendorpe, et al., Face verification competition on Conference on Computer Vision and Pattern Recognition, CVPR’05, vol. 1,
the xm2vts database, in: International Conference on Audio-and Video-Based 2005, pp. 947–954, https://cvrl.nd.edu/projects/data/#face-recognition-grand-
Biometric Person Authentication, Springer, 2003, pp. 964–974. challenge-frgc-v20-data-collection.
[240] K. Messer, J. Matas, J. Kittler, J. Luettin, G. Maitre, Xm2vtsdb: the extended [267] P.J. Phillips, P. Grother, R. Micheals, D.M. Blackburn, E. Tabassi, M. Bone, Face
m2vts database, in: Second International Conference on Audio and Video- recognition vendor test 2002, in: 2003 IEEE International SOI Conference. Pro-
Based Biometric Person Authentication, 1999, pp. 965–966, http://www.ee. ceedings (Cat. No.03CH37443), 2003, p. 44.
surrey.ac.uk/CVSSP/xm2vtsdb/. [268] P.J. Phillips, H. Wechsler, J. Huang, P. Rauss, The feret database and evalua-
[241] A. Mian, Unsupervised learning from local features for video-based face recog- tion procedure for face-recognition algorithms, Image Vis. Comput. 16 (1998)
nition, in: 2008 8th IEEE International Conference on Automatic Face Gesture 295–306, https://nist.gov/itl/products-and-services/color-feret-database.
Recognition, 2008, pp. 1–6. [269] P.J. Phillips, W.T. Scruggs, A.J. O’Toole, P.J. Flynn, K.W. Bowyer, C.L. Schott, M.
[242] W. Miaoli, Face and speech recognition fusion method based on penalty co- Sharpe, Frvt 2006 and ice 2006 large scale experimental results, IEEE Trans.
efficient and SVM, in: IEEE Advanced Information Technology, Electronic and Pattern Anal. Mach. Intell. 32 (2010) 831–846, https://doi.org/10.1109/TPAMI.
Automation, Control Conference, IAEAC, Chongqing, China, 2015, pp. 6–10. 2009.59.
[243] S. Milborrow, J. Morkel, F. Nicolls, The MUCT landmarked face database, [270] N. Poh, S. Bengio, Database, protocols and tools for evaluating score-level
Pattern recognition association of South Africa, http://www.milbo.org/muct, fusion algorithms in biometric authentication, Pattern Recognit. 39 (2006)
2010. 223–233, https://doi.org/10.1016/j.patcog.2005.06.011.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 25

[271] N. Poh, C. Chan, J. Kittler, J.F. UAM, J.G. UAM, Description of metrics for the [297] A. Scheenstra, A. Ruifrok, R.C. Veltkamp, A survey of 3d face recognition meth-
evaluation of biometric performance, Technical Report, (BEAT) Biometrics ods, in: International Conference on Audio and Video Based Biometric Person
Evaluation and Testing https://www.beat-eu.org/project/deliverables-public/ Authentication, 2005, pp. 891–899.
d3.3-description-of-metrics-for-the-evaluation-of-biometric-performance, [298] U. Scherhag, C. Rathgeb, J. Merkle, R. Breithaupt, C. Busch, Face recog-
2011. nition systems under morphing attacks: a survey, IEEE Access 7 (2019)
[272] X. Qi, L. Zhang, Face recognition via centralized coordinate learning, arXiv 23012–23026, https://doi.org/10.1109/ACCESS.2019.2899367.
preprint arXiv:1801.05678, 2018. [299] F. Schroff, D. Kalenichenko, J. Philbin, Facenet: a unified embedding for face
[273] Y. Qian, W. Deng, J. Hu, Task specific networks for identity and face varia- recognition and clustering, in: IEEE Conference on Computer Vision and Pat-
tion, in: 2018 13th IEEE International Conference on Automatic Face Gesture tern Recognition, CVPR, Boston, MA, USA, 2015, pp. 815–823.
Recognition, FG 2018, 2018, pp. 271–277. [300] S. Sengupta, J. Chen, C. Castillo, V.M. Patel, R. Chellappa, D.W. Jacobs, Frontal
[274] Q. Qiu, R. Chellappa, Compositional dictionaries for domain adaptive face to profile face verification in the wild, in: IEEE Winter Conference on Appli-
recognition, IEEE Trans. Image Process. 24 (2015) 5152–5165, https://doi.org/ cations of Computer Vision, WACV, 2016, pp. 1–9, http://www.cfpw.io/.
10.1109/TIP.2015.2479456. [301] R. Shao, X. Lan, P.C. Yuen, Joint discriminative learning of deep dynamic tex-
[275] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in tures for 3d mask face anti-spoofing, IEEE Trans. Inf. Forensics Secur. 14
speech recognition, Proc. IEEE 77 (1989) 257–286. (2019) 923–938, https://doi.org/10.1109/TIFS.2018.2868230.
[302] Y. Shen, C. Chiu, Local binary pattern orientation based face recognition, in:
[276] R. Ramachandra, C. Busch, Presentation attack detection methods for face
IEEE International Conference on Acoustics, Speech and Signal Processing,
recognition systems, ACM Comput. Surv. 50 (2017) 1–37.
ICASSP, Brisbane, QLD, Australia, 2015, pp. 1091–1095.
[277] R. Ranjan, C.D. Castillo, R. Chellappa, L2-constrained softmax loss for discrim-
[303] Y. Shen, P. Luo, P. Luo, J. Yan, X. Wang, X. Tang, FaceID-GAN: learning a
inative face verification, arXiv preprint arXiv:1703.09507, 2017.
symmetry three-player GAN for identity-preserving face synthesis, in: 2018
[278] R. Ranjan, V.M. Patel, R. Chellappa, Hyperface: a deep multi-task learning
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018,
framework for face detection, landmark localization, pose estimation, and
pp. 821–830.
gender recognition, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2019) 121–135,
[304] J.W. Shepherd, G.M. Davies, H.D. Ellis, Cue saliency in faces as assessed by the
https://doi.org/10.1109/TPAMI.2017.2781233.
“photofit” technique, in: G.M. Davies, H.D. Ellis, J.W. Shepherd (Eds.), Studies
[279] R. Ranjan, S. Sankaranarayanan, A. Bansal, N. Bodla, J.C. Chen, V.M. Patel, C.D.
of Cue Saliency, Academic Press, London, U.K., 1981.
Castillo, R. Chellappa, Deep learning for understanding faces: machines may [305] M. Shreve, E.A. Bernal, Q. Li, J. Kumar, R. Bala, A study on the discriminability
be just as good, or better, than humans, IEEE Signal Process. Mag. 35 (2018) of facs from spontaneous facial expressions, in: IEEE International Conference
66–83. on Image Processing, ICIP, Phoenix, AZ, USA, 2016, pp. 1674–1678.
[280] R. Ranjan, S. Sankaranarayanan, C.D. Castillo, R. Chellappa, An all-in-one con- [306] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and ex-
volutional neural network for face analysis, in: 2017 12th IEEE International pression database, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003)
Conference on Automatic Face Gesture Recognition, FG 2017, 2017, pp. 17–24. 1615–1618, https://doi.org/10.1109/TPAMI.2003.1251154, http://www.cs.cmu.
[281] Y. Rao, J. Lin, J. Lu, J. Zhou, Learning discriminative aggregation network for edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html.
video-based face recognition, in: 2017 IEEE International Conference on Com- [307] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
puter Vision, ICCV, 2017, pp. 3801–3810. image recognition, Comput. Vis. Pattern Recognit. (2015) 1–14, https://arxiv.
[282] Y. Rao, J. Lu, J. Zhou, Attention-aware deep reinforcement learning for video org/pdf/1409.1556.pdf.
face recognition, in: 2017 IEEE International Conference on Computer Vision, [308] M. Singh, A. Arora, A novel face liveness detection algorithm with multiple
ICCV, 2017, pp. 3951–3960. liveness indicators, Wirel. Pers. Commun. 100 (2018) 1677–1687.
[283] D. Rathod, A. Vinay, S. Shylaja, S. Natarajan, Facial landmark localization-a lit- [309] D.A. Socolinsky, L.B. Wolff, J.D. Neuheisel, C.K. Eveland, Illumination invariant
erature survey, Int. J. Curr. Eng. Technol. 4 (2014) 1901–1907. face recognition using thermal infrared imagery, in: IEEE Conference on Com-
[284] S. Ren, X. Cao, Y. Wei, J. Sun, Face alignment at 3000 FPS via regressing local puter Vision and Pattern Recognition, IEEE, 2001, pp. I–I.
binary features, in: Proceedings of the IEEE Conference on Computer Vision [310] F. Solina, P. Peer, B. Batagelj, S. Juvan, J. Kovač, Color-based face detection in
and Pattern Recognition, 2014, pp. 1685–1692. the “15 seconds of fame” art installation, in: Proceedings of Mirage 2003, IN-
[285] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object de- RIA Rocquencourt, France, 2003, pp. 37–47, http://www.lrv.fri.uni-lj.si/facedb.
tection with region proposal networks, in: Proc. Advances Neural Information html.
Processing Systems Conf., 2015, pp. 91–99. [311] H. Song, U. Yang, S. Lee, K. Sohn, 3d face recognition based on facial shape
[286] K. Ricanek, T. Tesafaye, Morph: a longitudinal image database of normal adult indexes with dynamic programming, in: D. Zhang, A.K. Jain (Eds.), Advances
age-progression, in: 7th International Conference on Automatic Face and Ges- in Biometrics, Springer, Berlin, Heidelberg, 2005, pp. 99–105.
ture Recognition, FGR06, 2006, pp. 341–345, https://ebill.uncw.edu/C20231_ [312] M. Soriano, E. Marszalec, M. Pietikainen, Physics-based face database for color
ustores/web/product_detail.jsp?PRODUCTID=8. research, J. Electron. Imaging 9 (2000) 32–38, http://www.cse.oulu.fi/CMV/
[287] S.A. Rizvi, P.J. Phillips, H. Moon, The feret verification testing protocol for face Downloads/Pbfd.
recognition algorithms, in: Proceedings Third IEEE International Conference [313] J. Stallkamp, H.K. Ekenel, R. Stiefelhagen, Video-based face recognition on
on Automatic Face and Gesture Recognition, IEEE, 1998, pp. 48–53. real-world data, in: IEEE International Conference on Computer Vision, Rio
[288] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear em- de Janeiro, Brazil, 2007.
bedding, Science 290 (2000) 2323–2326. [314] H. Sun, X. Zhen, Y. Zheng, G. Yang, Y. Yin, S. Li, Learning deep match kernels
for image-set classification, in: Proceedings of the IEEE Conference on Com-
[289] R. Rubinstein, A.M. Bruckstein, M. Elad, Dictionaries for sparse representa-
puter Vision and Pattern Recognition, 2017, pp. 3307–3316.
tion modeling, Proc. IEEE 98 (2010) 1045–1057, https://doi.org/10.1109/JPROC.
[315] Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint
2010.2040551.
identification-verification, in: Proc. Advances in Neural Information Processing
[290] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A.
Systems, NIPS, 2014, pp. 1988–1996, http://mmlab.ie.cuhk.edu.hk/projects/
Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, Imagenet large scale
CelebA.html.
visual recognition challenge, Int. J. Comput. Vis. 115 (2015) 211–252, https://
[316] Y. Sun, D. Liang, X. Wang, X. Tang, DeepID3: face recognition with very deep
doi.org/10.1007/s11263-015-0816-y.
neural networks, arXiv preprint arXiv:1502.00873, 2015.
[291] U. Saeed, J. Dugelay, Person recognition form video using facial mimics, in:
[317] Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting
IEEE International Conference on Acoustics, Speech and Signal Processing,
10,000 classes, in: IEEE Conference on Computer Vision and Pattern Recogni-
ICASSP, Honolulu, HI USA, 2007, pp. 493–496.
tion, Columbus, OH, USA, 2014, pp. 1891–1898.
[292] U. Saeed, F. Matta, J. Dugelay, Person recognition based on head and mouth [318] C. Szegedy, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A.
dynamics, in: IEEE Workshop on Multimedia Signal Processing, Victoria, BC, Rabinovich, Going deeper with convolutions, in: 2015 IEEE Conference on
Canada, 2006, pp. 29–32. Computer Vision and Pattern Recognition, CVPR, 2015, pp. 1–9.
[293] A. Samal, P. Iyengar, Automatic recognition and analysis of human faces and [319] Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Web-scale training for face iden-
facial expressions: a survey, Pattern Recognit. 25 (1992) 65–77. tification, in: IEEE Conference on Computer Vision and Pattern Recognition,
[294] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for human CVPR, Boston, MA, USA, 2015, pp. 2746–2754.
face identification, in: IEEE Workshop on Applications of Computer Vision, [320] Y. Taigman, M. Yanga, M. Ranzato, L. Wolf, DeepFace: closing the gap to
1994, pp. 138–142, http://cam-orl.co.uk/facedatabase.html. human-level performance in face verification, in: IEEE Conference on Com-
[295] E.N. Sandikci, C.E. Erdem, S. Ulukaya, A comparison of facial landmark de- puter Vision and Pattern Recognition, 2014, pp. 1701–1708.
tection methods, in: IEEE Signal Processing and Applications Conference, SIU, [321] M. Taskiran, M. Killioglu, N. Kahraman, C.E. Erdem, Face recognition using dy-
2018. namic features extracted from smile videos, in: IEEE Int. Symp. on Innovations
[296] A. Savran, N. Alyüz, H. Dibeklioğlu, O. Çeliktutan, B. Gökberk, B. Sankur, L. in Intelligent Systems and Applications, INISTA, Sofia, Bulgaria, 2019.
Akarun, Bosphorus database for 3d face analysis, in: B. Schouten, N.C. Juul, A. [322] Y. Tayal, R. Lamba, S. Padhee, Automatic face detection using color based seg-
Drygajlo, M. Tistarelli (Eds.), Biometrics and Identity Management, Springer, mentation, Int. J. Sci. Res. Publ. 2 (2012) 1–7.
Berlin, Heidelberg, 2008, pp. 47–56, http://bosphorus.ee.boun.edu.tr/Home. [323] J. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for
aspx. nonlinear dimensionality reduction, Science 290 (2000) 2319–2323.
26 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

[324] D. Thomas, K.W. Bowyer, P.J. Flynn, Multi-factor approach to improving recog- [352] X. Wang, X. Tang, Dual-space linear discriminant analysis for face recognition,
nition performance in surveillance-quality video, in: IEEE Second International in: IEEE Conference on Computer Vision and Pattern Recognition, 2004.
Conference on Biometrics: Theory, Applications and Systems, Arlington, VA, [353] X. Wang, X. Tang, Face photo-sketch synthesis and recognition, IEEE Trans.
USA, 2008. Pattern Anal. Mach. Intell. 31 (2009) 1955–1967, https://doi.org/10.1109/
[325] C.E. Thomaz, G.A. Giraldi, A new ranking method for principal components TPAMI.2008.222, http://mmlab.ie.cuhk.edu.hk/archive/facesketch.html.
analysis and its application to face image analysis, https://fei.edu.br/~cet/ [354] X. Wang, X. Tang, Face photo-sketch synthesis and recognition, IEEE Trans.
facedatabase.html, 2010. Pattern Anal. Mach. Intell. 31 (2009) 1955–1967, https://doi.org/10.1109/
[326] M. Tistarelli, M. Bicego, E. Grosso, Dynamic face recognition: from human to TPAMI.2008.222, http://mmlab.ie.cuhk.edu.hk/archive/cufsf/.
machine vision, Image Vis. Comput. 27 (2009) 222–232, https://doi.org/10. [355] C. Wei, Y.F. Wang, Undersampled face recognition via robust auxiliary dictio-
1016/j.imavis.2007.05.006. nary learning, IEEE Trans. Image Process. 24 (2015) 1722–1734.
[327] I. Tosic, P. Frossard, Dictionary learning, IEEE Signal Process. Mag. 28 (2011) [356] K.Q. Weinberger, L.K. Saul, Distance metric learning for large margin nearest
27–38, https://doi.org/10.1109/MSP.2010.939537. neighbor classification, J. Mach. Learn. Res. 10 (2009) 207–244, http://dl.acm.
[328] L. Tran, X. Yin, X. Liu, Disentangled representation learning gan for pose- org/citation.cfm?id=1577069.1577078.
invariant face recognition, in: 2017 IEEE Conference on Computer Vision and [357] Y. Weiwei, Face recognition using constrained active appearance model, in:
Pattern Recognition, CVPR, 2017, pp. 1283–1292. Third International Symposium on Intelligent Information Technology Appli-
[329] P. Tsai, L. Cao, T. Hintz, T. Jan, A bi-modal face recognition framework integrat- cation Workshops, 2009, pp. 348–351.
ing facial expression with facial appearance, Pattern Recognit. Lett. 30 (2009) [358] Y. Wen, K. Zhang, Z. Li, Y. Qiao, A discriminative feature learning approach for
1096–1109, https://doi.org/10.1016/j.patrec.2009.05.008. deep face recognition, in: European Conference on Computer Cision, Springer,
[330] M.W. Tsigie, R. Thakare, R. Joshi, Face recognition techniques based on 2d local 2016, pp. 499–515.
binary pattern, histogram of oriented gradient and multiclass support vector [359] C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J.C. Adams, T. Miller, N.D.
machines for secure document authentication, in: Second International Con- Kalka, A.K. Jain, J.A. Duncan, K.E. Allen, J. Cheney, P. Grother, Iarpa Janus
ference on Inventive Communication and Computational Technologies, ICICCT, benchmark-b face dataset, in: 2017 IEEE Conference on Computer Vision
Coimbatore, India, 2018, pp. 1671–1676. and Pattern Recognition Workshops, CVPRW, 2017, pp. 592–600, https://
[331] S. Tulyakov, T. Slowe, Z. Zhang, V. Govindaraju, Facial expression biometrics www.nist.gov/programs-projects/face-challenges.
using tracker displacement features, in: IEEE Int. Conf. Computer Vision and [360] L. Wiskott, J.M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by
Pattern Recognition, CVPR, Minneapolis, MN, USA, 2007. elastic bunch graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997)
[332] M. Turk, A. Pentland, Eigenfaces for face recognition, J. Cogn. Neurosci. 3 775–779, https://doi.org/10.1109/ICIP.1997.647401.
(1991) 71–86. [361] L. Wolf, T. Hassner, I. Maoz, Face recognition in unconstrained videos with
[333] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: IEEE Confer- matched background similarity, in: IEEE Int. Conf. Computer Vision and Pat-
ence on Computer Vision and Pattern Recognition, 1991, pp. 586–591. tern Recognition, CVPR, Colorado Springs, CO, USA, 2011, pp. 529–534, https://
[334] L. Vaina, J. Solomon, S. Chowdhury, P. Sinha, J. Belliveau, Functional neu- www.cs.tau.ac.il/~wolf/ytfaces/, https://doi.org/10.1109/CVPR.2011.5995566.
[362] D.K. Wong, R. Janakiraman, Face liveness detection, US Patent App.
roanatomy of biological motion perception in humans, Proc. Natl. Acad. Sci.
15/610,273, 2018.
USA 98 (2001).
[363] J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition
[335] E. Vezzetti, F. Marcolin, 3D human face description: landmarks measures
via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)
and geometrical features, Image Vis. Comput. 30 (2012) 698–712, https://
210–227, https://doi.org/10.1109/TPAMI.2008.79.
doi.org/10.1016/j.imavis.2012.02.007, http://www.sciencedirect.com/science/
[364] F. Wu, X.Y. Jing, X. Dong, R. Hu, D. Yue, L. Wang, Y.M. Ji, R. Wang, G.
article/pii/S0262885612000224, 3D Facial Behaviour Analysis and Under-
Chen, Intraspectrum discrimination and interspectrum correlation analysis
standing.
deep network for multispectral face recognition, IEEE Trans. Cybern. 50 (2020)
[336] E. Vezzetti, F. Marcolin, Geometrical descriptors for human face morphological
1009–1022.
analysis and recognition, Robot. Auton. Syst. 60 (2012) 928–939.
[365] Y. Wu, T. Hassner, K. Kim, G. Medioni, P. Natarajan, Facial landmark detection
[337] C. Vinette, F. Gosselin, P.G. Schyns, Spatio-temporal dynamics of face recog-
with tweaked convolutional neural networks, IEEE Trans. Pattern Anal. Mach.
nition in a flash: it’s in the eyes, Cogn. Sci. 28 (2004) 289–301, https://
Intell. 40 (2018) 3067–3074, https://doi.org/10.1109/TPAMI.2017.2787130.
doi.org/10.1207/s15516709cog2802-8.
[366] Y. Wu, Q. Ji, Facial landmark detection: a literature survey, Int. J. Comput. Vis.
[338] P. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57
127 (2019) 115–142, https://doi.org/10.1007/s11263-018-1097-z.
(2004) 137–154.
[367] Y. Wu, H. Liu, Y. Fu, Low-shot face recognition with hybrid classifiers, in: Pro-
[339] D. Wang, S. Kong, A classification-oriented dictionary learning model: ex-
ceedings of the IEEE International Conference on Computer Vision Workshops,
plicitly learning the particularity and commonality across categories, Pat-
2017, pp. 1933–1939.
tern Recognit. 47 (2014) 885–898, https://doi.org/10.1016/j.patcog.2013.08.
[368] S. Xie, Z. Tu, Holistically-nested edge detection, in: 2015 IEEE International
004, http://www.sciencedirect.com/science/article/pii/S0031320313003245.
Conference on Computer Vision, ICCV, 2015, pp. 1395–1403.
[340] F. Wang, J. Cheng, W. Liu, H. Liu, Additive margin softmax for face verification, [369] Z. Xie, J. Zeng, G. Liu, Z. Fang, A novel infrared face recognition based on local
IEEE Signal Process. Lett. 25 (2018) 926–930. binary pattern, in: International Conference on Wavelet Analysis and Pattern
[341] F. Wang, X. Xiang, J. Cheng, A.L. Yuille, Normface: L2 hypersphere embedding Recognition, Guilin, China, 2011, pp. 55–59.
for face verification, in: Proceedings of the 25th ACM International Conference [370] E.P. Xing, M.I. Jordan, S.J. Russell, A.Y. Ng, Distance metric learning with ap-
on Multimedia, ACM, 2017, pp. 1041–1049. plication to clustering with side-information, in: S. Becker, S. Thrun, K. Ober-
[342] H. Wang, Y. Wang, Y. Cao, Video-based face recognition: a survey, Int. J. Com- mayer (Eds.), Advances in Neural Information Processing Systems, vol. 15, MIT
put. Inf. Eng. 3 (2009) 293–302. Press, 2003, pp. 521–528, http://papers.nips.cc/paper/2164-distance-metric-
[343] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, W. Liu, Cosface: large learning-with-application-to-clustering-with-side-information.pdf, 2003.
margin cosine loss for deep face recognition, in: 2018 IEEE/CVF Conference [371] X. Xiong, F. De la Torre, Supervised descent method and its applications to
on Computer Vision and Pattern Recognition, 2018, pp. 5265–5274. face alignment, in: Proceedings of the IEEE Conference on Computer Vision
[344] H. Wang, D. Zhang, Z. Miao, Fusion of ldb and hog for face recognition, in: and Pattern Recognition, 2013, pp. 532–539.
37th Chinese Control Conference, CCC, Wuhan, China, 2018, pp. 9192–9196. [372] Y. Xu, Z. Li, J. Yang, D. Zhang, A survey of dictionary learning algorithms for
[345] L. Wang, Y. Li, S. Wang, Feature learning for one-shot face recognition, in: face recognition, IEEE Access 5 (2017) 8502–8514.
2018 25th IEEE International Conference on Image Processing, ICIP, IEEE, 2018, [373] L. Xue-fang, P. Tao, Realization of face recognition system based on Gabor
pp. 2386–2390. wavelet and elastic bunch graph matching, in: 25th Chinese Control and De-
[346] M. Wang, W. Deng, Deep face recognition: a survey, arXiv:1804.06655, 2019. cision Conference, CCDC, 2013, pp. 3384–3386.
[347] M.J. Wang, Face feature dynamic recognition method based on intelligent im- [374] O. Yamaguchi, K. Fukui, K. Maeda, Face recognition using temporal image
age, in: International Conference on Virtual Reality and Intelligent Systems, sequence, in: IEEE International Conference on Automatic Face and Gesture
ICVRIS, Changsha, China, 2018. Recognition, Nara, Japan, 1998, pp. 318–323.
[348] N. Wang, X. Gao, D. Tao, H. Yang, X. Li, Facial feature point detection, Neuro- [375] J. Yang, Q. Liu, K. Zhang, Stacked hourglass network for robust facial land-
computing 275 (2018) 50–65, https://doi.org/10.1016/j.neucom.2017.05.013. mark localisation, in: 2017 IEEE Conference on Computer Vision and Pattern
[349] R. Wang, S. Shan, X. Chen, W. Gao, Manifold-manifold distance with applica- Recognition Workshops, CVPRW, 2017, pp. 2025–2033.
tion to face recognition based on image set, in: IEEE Conference on Computer [376] J. Yang, S.E. Reed, M.H. Yang, H. Lee, Weakly-supervised disentangling with
Vision and Pattern Recognition, Anchorage, AK, USA, 2008. recurrent transformations for 3d view synthesis, in: C. Cortes, N.D. Lawrence,
[350] W. Wang, R. Wang, Z. Huang, S. Shan, X. Chen, Discriminant analysis on Rie- D.D. Lee, M. Sugiyama, R. Garnett (Eds.), in: Advances in Neural Information
mannian manifold of Gaussian distributions for face recognition with image Processing Systems, vol. 28, Curran Associates, Inc., 2015, pp. 1099–1107,
sets, in: Proceedings of the IEEE Conference on Computer Vision and Pattern http://papers.nips.cc/paper/5639-weakly-supervised-disentangling-with-
Recognition, 2015, pp. 2048–2057. recurrent-transformations-for-3d-view-synthesis.pdf, 2015.
[351] W. Wang, R. Wang, Z. Huang, S. Shan, X. Chen, Discriminant analysis on Rie- [377] J. Yang, D. Zhang, A.F. Frangi, J.Y. Yang, Two-dimensional PCA: a new approach
mannian manifold of Gaussian distributions for face recognition with image to appearance-based face representation and recognition, IEEE Trans. Pattern
sets, IEEE Trans. Image Process. 27 (2018) 151–163. Anal. Mach. Intell. 26 (2004) 131–137.
M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809 27

[378] J. Yang, D. Zhang, J. Yang, B. Niu, Globally maximizing, locally minimizing: [405] Z. Zhang, X. Chen, B. Wang, G. Hu, W. Zuo, E.R. Hancock, Face frontal-
unsupervised discriminant projection with applications to face and palm bio- ization using an appearance-flow-based convolutional neural network, IEEE
metrics, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 650–664, https:// Trans. Image Process. 28 (2019) 2187–2199, https://doi.org/10.1109/TIP.2018.
doi.org/10.1109/TPAMI.2007.1008. 2883554.
[379] M. Yang, W. Liu, W. Luo, L. Shen, Analysis-synthesis dictionary learning for [406] Z. Zhang, Y. Xu, J. Yang, X. Li, D. Zhang, A survey of sparse representation:
universality-particularity representation based classification, in: Proceedings algorithms and applications, IEEE Access 3 (2015) 490–530, https://doi.org/
of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, 2016, 10.1109/ACCESS.2015.2430359.
pp. 2251–2257, http://dl.acm.org/citation.cfm?id=3016100.3016213. [407] H. Zhao, P.C. Yuen, J.T. Kwok, A novel incremental principal component analy-
[380] M. Yang, L. Zhang, X. Feng, D. Zhang, Sparse representation based Fisher dis- sis and its application for face recognition, IEEE Trans. Syst. Man Cybern., Part
crimination dictionary learning for image classification, Int. J. Comput. Vis. B, Cybern. 36 (2006) 873–886.
109 (2014) 209–232, https://doi.org/10.1007/s11263-014-0722-8. [408] J. Zhao, L. Xiong, P. Karlekar Jayashree, J. Li, F. Zhao, Z. Wang, P. Sugiri Pranata,
[381] M.H. Yang, Face recognition using extended isomap, in: IEEE International P. Shengmei Shen, S. Yan, J. Feng, Dual-agent GANs for photorealistic and iden-
Conference on Image Processing, 2002, pp. 117–120. tity preserving profile face synthesis, in: I. Guyon, U.V. Luxburg, S. Bengio, H.
[382] S. Yang, P. Luo, C.C. Loy, X. Tang, From facial parts responses to face detection: Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural In-
a deep learning approach, in: Proc. IEEE Int. Conf. Computer Vision, 2015, formation Processing Systems 30, Curran Associates, Inc., 2017, pp. 66–76.
pp. 3676–3684. [409] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature
[383] S. Yang, Y. Xiong, C.C. Loy, X. Tang, Face detection through scale-friendly survey, ACM Comput. Surv. 35 (2003) 399–458.
deep convolutional networks, CoRR abs/1706.02863, http://arxiv.org/abs/1706. [410] T. Zheng, W. Deng, Cross-pose LFW: a database for studying cross-pose face
02863, arXiv:1706.02863, 2017. recognition in unconstrained environments, Technical Report 18-01, Beijing
[384] S. Yang, L. Zhang, L. He, Y. Wen, Sparse low-rank component-based represen- University of Posts and Telecommunications, 2018, http://www.whdeng.cn/
tation for face recognition with low-quality images, IEEE Trans. Inf. Forensics CPLFW/index.html?reload=true.
Secur. 14 (2019) 251–261, https://doi.org/10.1109/TIFS.2018.2849883. [411] Y. Zheng, D.K. Pal, M. Savvides, Ring loss: convex feature normalization for
[385] X. Yang, K.T. Cheng, Local difference binary for ultrafast and distinctive feature face recognition, in: 2018 IEEE/CVF Conference on Computer Vision and Pat-
description, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014) 188–194, https:// tern Recognition, 2018, pp. 5089–5097.
doi.org/10.1109/TPAMI.2013.150. [412] H. Zhi, S. Liu, Face recognition based on genetic algorithm, J. Vis. Commun.
[386] X. Yang, W. Luo, L. Bao, Y. Gao, D. Gong, S. Zheng, Z. Li, W. Liu, Face anti- Image Represent. 58 (2019) 495–502.
spoofing: model matters, so does data, in: Proceedings of the IEEE Conference [413] C. Zhong, Z. Sun, T. Tan, Robust 3d face recognition using learned visual code-
on Computer Vision and Pattern Recognition, 2019, pp. 3507–3516. book, in: 2007 IEEE Conference on Computer Vision and Pattern Recognition,
[387] J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, J. Kim, Rotating your face using multi- 2007, pp. 1–6, http://www.cbsr.ia.ac.cn/english/3DFace%20Databases.asp.
task deep neural network, in: 2015 IEEE Conference on Computer Vision and [414] H. Zhou, A. Mian, L. Wei, D. Creighton, M. Hossny, S. Nahavandi, Recent ad-
Pattern Recognition, CVPR, 2015, pp. 676–684. vances on singlemodal and multimodal face recognition: a survey, IEEE Trans.
[388] L. Yin, X. Chen, Y. Sun, T. Worm, M. Reale, A high-resolution 3d dynamic Human-Mach. Syst. 44 (2014) 701–716.
facial expression database, in: 2008 8th IEEE International Conference on Au- [415] F. Zhu, L. Shao, Weakly-supervised cross-domain dictionary learning for visual
tomatic Face Gesture Recognition, 2008, pp. 1–6, http://www.cs.binghamton. recognition, Int. J. Comput. Vis. 109 (2014) 42–59.
edu/~lijun/Research/3DFE/3DFE_Analysis.html. [416] M. Zhu, J. Li, N. Wang, X. Gao, A deep collaborative framework for face photo–
[389] L. Yin, X. Wei, Y. Sun, J. Wang, M. Rosato, A 3d facial expression database for sketch synthesis, IEEE Trans. Neural Netw. Learn. Syst. 30 (2019) 3096–3108.
facial behavior research, in: International Conference on Automatic Face and [417] Z. Zhu, P. Luo, X. Wang, X. Tang, Deep learning identity-preserving face
Gesture Recognition, FGR06, 2006, pp. 211–216, http://www.cs.binghamton. space, in: 2013 IEEE International Conference on Computer Vision, 2013,
edu/~lijun/Research/3DFE/3DFE_Analysis.html. pp. 113–120.
[390] L. Yin, X. Wei, Y. Sun, J. Wang, M.J. Rosato, A 3d facial expression database for [418] X. Zou, J. Kittler, K. Messer, Illumination invariant face recognition: a survey,
facial behavior research, in: International Conference on Automatic Face and in: IEEE International Conference on Biometrics: Theory, Applications, and
Gesture Recognition, FGR, Southampton, UK, 2006, pp. 211–216. Systems, 2007.
[391] J. Yu, X. Xu, F. Gao, S. Shi, M. Wang, D. Tao, Q. Huang, Toward realistic face
photo-sketch synthesis via composition-aided gans, IEEE Trans. Cybern. (2020)
1–13. Murat Taskiran received B.Sc. (2013) and M.Sc. (2016) degrees in Elec-
[392] S. Zafeiriou, M. Pantic, Facial behaviometrics: the case of facial deformation tronics and Communication Engineering, from Yildiz Technical University
in spontaneous smile/laughter, in: IEEE Conference on Computer Vision and (YTU), Istanbul, Turkey. He is currently a Ph.D. student at YTU. Since 2014,
Pattern Recognition Workshops, Colorado Springs, CO, USA, 2011, pp. 13–19. he has been working as a research assistant in Department of Electron-
[393] E. Zangeneh, M. Rahmati, Y. Mohsenzadeh, Low resolution face recognition ics and Communication Engineering in YTU. His research interests are in
using a two-branch deep convolutional neural network architecture, Expert
image processing, neural networks and randomness analysis.
Syst. Appl. 139 (2020) 112854.
[394] S. Zhalehpour, O. Onder, Z. Akhtar, C.E. Erdem, BAUM-1: a spontaneous audio-
visual face database of affective and mental states, IEEE Trans. Affect. Comput. Nihan Kahraman received B.Sc. (2001), M.Sc. (2003) and Ph.D. (2008)
8 (2017) 300–313, https://doi.org/10.1109/TAFFC.2016.2553038. degrees in Electronics and Communication Engineering, from Yildiz Tech-
[395] S. Zhalehpour, Z. Akhtar, C.E. Erdem, Multimodal emotion recognition based nical University (YTU), Istanbul, Turkey. Currently, she is working as an
on peak frame selection from video, Signal Image Video Process. 10 (2016) assistant professor at YTU. Her research interests are includes VLSI design,
827–834, https://doi.org/10.1007/s11760-015-0822-0. hardware and software implementations of neural networks and neural
[396] D. Zhang, Z.H. Zhou, S. Chen, Diagonal principal component analysis for face
network architectures.
recognition, Pattern Recognit. 39 (2006) 140–142.
[397] J. Zhang, S. Shan, M. Kan, X. Chen, Coarse-to-fine auto-encoder networks
(CFAN) for real-time face alignment, in: European Conference on Computer Cigdem Eroglu Erdem received the B.S. and M.Sc. degrees in Electri-
Vision, Springer, 2014, pp. 1–16. cal and Electronics Engineering from Bilkent University, Ankara, Turkey
[398] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment us- in 1995 and 1997, respectively, with high honors. She received the Ph.D.
ing multitask cascaded convolutional networks, IEEE Signal Process. Lett. 23 degree in Electrical and Electronics Engineering from Bogazici University,
(2016) 1499–1503, https://doi.org/10.1109/LSP.2016.2603342. Istanbul, Turkey, in 2002. From September 2000 to June 2001, she was
[399] M. Zhang, N. Wang, Y. Li, X. Gao, Neural probabilistic graphical model for face a visiting researcher in the Department of Electrical and Computer Engi-
sketch synthesis, IEEE Trans. Neural Netw. Learn. Syst. (2019).
neering, University of Rochester, NY, USA. Between 2003-2004, she was a
[400] S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, S.Z.
Li, A dataset and benchmark for large-scale multi-modal face anti-spoofing,
postdoctoral fellow at the Faculty of Electrical Engineering at Delft Univer-
arXiv preprint arXiv:1812.00408, 2018. sity of Technology, the Netherlands, where she was also affiliated with the
[401] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, S.Z. Li, S3 FD: single shot scale- video processing group at Philips Research Laboratories, Eindhoven. Be-
invariant face detector, CoRR abs/1708.05237, http://arxiv.org/abs/1708.05237, tween 2002-2009, she was the director of research at Momentum Digital
2017. Media Technologies Inc., a technology SME located in İstanbul. Between
[402] W. Zhang, X. Zhao, J. Morvan, L. Chen, Improving shadow suppression for il- 2009-2016, she was a faculty member in the Department of Electrical and
lumination robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 41 Electronics Engineering at Bahcesehir University, Istanbul, Turkey. Since
(2019) 611–624, https://doi.org/10.1109/TPAMI.2018.2803179.
Sep. 2016, she is a faculty member in the Department of Computer En-
[403] X. Zhang, Z. Fang, Y. Wen, Z. Li, Y. Qiao, Range loss for deep face recogni-
tion with long-tailed training data, in: Proceedings of the IEEE International
gineering at Marmara University, Istanbul, Turkey.
Conference on Computer Vision, 2017, pp. 5409–5418. Dr. Erdem’s current research interests are in the areas of digital image
[404] X. Zhang, Y. Gao, Face recognition across pose: a review, Pattern Recognit. 42 and video processing, computer vision and pattern recognition, with ap-
(2009) 2876–2896. plications to affective computing, motion estimation, video segmentation,
28 M. Taskiran et al. / Digital Signal Processing 106 (2020) 102809

object tracking, and human computer interaction. She served as a referee nication, Image and Vision Computing. She also serves as an independent
for numerous technical journals and conferences including IEEE Transac- expert and vice chair during project evaluations for the European Com-
tions on Image Processing, IEEE Transactions on Circuits and Systems for mission.
Video Technology, Pattern Recognition, Signal Processing: Image Commu-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy