0% found this document useful (0 votes)
105 views

Vulnerability of Face Recognition To Deep Morphing

This document discusses research on the vulnerability of face recognition systems to deep morphing. Deep morphing uses generative adversarial networks to automatically blend two faces into one highly realistic fake face. The researchers created a dataset of deepfake videos generated with GAN-based face morphing software. They found that state-of-the-art face recognition systems achieved false acceptance rates of 85.62% and 95% on the deepfake videos, demonstrating vulnerability. Baseline detection methods were also unable to reliably detect the deepfakes, with the best approach achieving an equal error rate of 8.97%. The experiments show that deep morphing poses challenges for both face recognition and detection of manipulated faces.

Uploaded by

ShoaibMeraj Sami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Vulnerability of Face Recognition To Deep Morphing

This document discusses research on the vulnerability of face recognition systems to deep morphing. Deep morphing uses generative adversarial networks to automatically blend two faces into one highly realistic fake face. The researchers created a dataset of deepfake videos generated with GAN-based face morphing software. They found that state-of-the-art face recognition systems achieved false acceptance rates of 85.62% and 95% on the deepfake videos, demonstrating vulnerability. Baseline detection methods were also unable to reliably detect the deepfakes, with the best approach achieving an equal error rate of 8.97%. The experiments show that deep morphing poses challenges for both face recognition and detection of manipulated faces.

Uploaded by

ShoaibMeraj Sami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Vulnerability of Face Recognition to Deep Morphing

Pavel Korshunov and Sébastien Marcel


Idiap Research Institute, Martigny, Switzerland
{pavel.korshunov,sebastien.marcel}@idiap.ch
arXiv:1910.01933v1 [cs.CV] 3 Oct 2019

Abstract
It is increasingly easy to automatically swap faces in images and video or morph two faces into one using generative
adversarial networks (GANs). The high quality of the resulted deep-morph raises the question of how vulnerable the current
face recognition systems are to such fake images and videos. It also calls for automated ways to detect these GAN-generated
faces. In this paper, we present the publicly available dataset of the Deepfake videos with faces morphed with a GAN-
based algorithm. To generate these videos, we used open source software based on GANs, and we emphasize that training
and blending parameters can significantly impact the quality of the resulted videos. We show that the state of the art face
recognition systems based on VGG and Facenet neural networks are vulnerable to the deep morph videos, with 85.62% and
95.00% false acceptance rates, respectively, which means methods for detecting these videos are necessary. We consider
several baseline approaches for detecting deep morphs and find that the method based on visual quality metrics (often
used in presentation attack detection domain) leads to the best performance with 8.97% equal error rate. Our experiments
demonstrate that GAN-generated deep morph videos are challenging for both face recognition systems and existing detection
methods, and the further development of deep morphing technologies will make it even more so.

1. Introduction
Recent advances in automated video and audio editing tools, generative adversarial networks (GANs), and social media
allow the creation and the fast dissemination of high quality tampered video content. Such content already led to appearance
of deliberate misinformation, coined ‘fake news’, which is impacting political landscapes of several countries [2]. A recent
surge of videos (started as obscene) called Deepfakes1 , in which a neural network is used to train a model to replace faces
with a likeness of someone else, are of a great public concern2 . Accessible open source software and apps for such face
swapping lead to large amounts of synthetically generated Deepfake videos appearing in social media and news, posing a
significant technical challenge for detection and filtering of such content.
Although the original purpose of GAN-based Deepfake is to swap faces of two people in an image or a video, the resulted
synthetic face is essentially a morph, i.e., a deep morph, of two original faces. The main difference from more traditional
morphing techniques is that deep-morph can seamlessly mimic facial expression of the target person and, therefore, can also
be successfully used to generate convincing fake videos of people talking and moving about. However, to understand how
threatening such videos can be in the context of biometric security, we need to find out whether these deep-morphed videos
pose a challenge to face recognition systems and whether they can be easily detected.
Traditional face morphing (Figure 1a illustrates the morphing process) has been shown to be challenging for face recog-
nition systems [3, 16] and several detection methods has been proposed since [10, 18, 9]. For the GAN-based deep-
morphing, until recently, most of the research was focusing on advancing the GAN-based face swapping [6, 8, 12, 14].
However, responding to the public demand to detect these synthetic faces, researchers started to work on databases and
detection methods, including image and video data [15] generated with a previous generation of face swapping approach
Face2Face [19] or videos collected using Snapchat3 application [1]. Several methods for detection of Deepfakes have also

International Conference on Biometrics for Borders


1 Open source: https://github.com/deepfakes/faceswap
2 BBC (Feb 3, 2018): http://www.bbc.com/news/technology-42912529
3 https://www.snapchat.com/
Final morphed face Target face
cording to each corresponding triangle;

Target face • For each


Source face pixel in the final image, compute its in-
Source face
tensity as a weighted
Train sum of intensities between
corresponding pixels GANin original source image and
Swap face
model
target image. Weights are determined by a given
intensity strength value such as in
Interpolation level Intensity strength
If = (1 wi )Is + wi It , (1)
Deepfake face

(a) Morphing faces where(b) If Generating


, Is , It areDeepfake
final morphed,
faces original source,
Figure 1: Overview of the proposed method. and target faces respectively, and wi is the inten-
Figure 1: Comparing morphing and GAN-based sity facestrength
swapping value.
techniques.
target image, key points, and interpolation and inten- As can be noticed form the algorithm’s summary
sity values are known. and Figure 1, two main values determine the final mor-
been proposed [7,We 21,demonstrate
5]. feasibility of the proposed mor- phed face: interpolation level and intensity strength.
phing method by applying it to faces of a standard Figure 2 demonstrate
In this paper, we focus on evaluating the vulnerability
FERET face dataset [10], since faces are among the
of face recognition systems tothe e↵ect of di↵erent
Deepfake such values
videos where real faces
on the morphed image. When interpolation level and
are replaced bymost GAN-generated images trained on the
privacy sensitive regions. Location of each face faces of two people. The resulted synthetic face is essentially
intensity strength are zero, the resulted 4face is the same
a deep
morph of two people. The database
is first detected was created[12]
with Viola-Jones using
face the open source
detection as the original, but the closer these values are to one,1b for
software with cyclic GAN model (see Figure
algorithm. A set offrom
key points, used as triangle ver- 1
illustration), which is developed the original autoencoder-based Deepfake
the more thealgorithm
resulted face . We manually
looks like the selected 16 In
target face. similar
tices in the morphing transformation, is constructed 5 our demonstrations and experiment, an average male
looking pairs of people from publicly available VidTIMIT database . For each of 32 subjects, we trained two different models
from automatically detected eyes, nose, and mouth. face(LQ)
[4] was chosen as the
(see Figure 2 for examples), referred to in the paper as the low quality
To determine reversibility, robustness, and security of
model, with 64 target face, but, in practical
× 64 input/output size, and the
applications, it can be any other face or facial avatar.
high quality (HQ) model, with 128 × 128 size. Since there
the method, we use Eigenfaces, Fisherfaces [1], and lo- are 10 videos per person in VidTIMIT database, we generated 320
In a surveillance scenario, when a protected face
videos corresponding
cal binary to patterns
each version, resulting
histograms (LBPH) in total 620 videos
[6] based face with faces swapped. For the audio, we
needs to be recovered, an inverse of morphing opera- kept the original
audio track of recognition
each video,algorithms. The recognition
i.e., no manipulation was done algorithms
to the audiotion,
channel.
termed unmorphing, is applied. For the recovery
We assess thewere run on the of
vulnerability morphed and recovered
face recognition to deepfacesmorph
to de-videos of using
the original
two statesource face,art
of the the key points
systems: and on
based theVGG
tar- [13]
termine the efficiency of the proposed visual privacy
and Facenet6 [17] neural networks. For detection of the deep morphs,get weface need several
applied to be known.
baselineThe recoveryfrom
methods algorithm is
presentation
protection tool. In an ideal scenario, a protected face essentially the same as morphing, instead we simply
attack detection domain, by treating deep morph videos
would be visible as a face but would not be correctly as digital presentation attacks [1], including simple principal com-
estimate a starting face (source) by using known the
ponent analysisidentified
(PCA) and linear
by the discriminant
recognition analysis (LDA) approaches,
algorithm. and the approach based on image
‘middle’ (morphed) and the end (target) faces (see 1). quality metrics
(IQM) and support vector machine (SVM) [4, 20]. This morphing-based visual privacy protection
2. Morphing
To allow researchers based
to verify, privacyand
reproduce, protection
extend our work,method we provide
is designedthe todatabase
overcomecoined
common DeepfakeTIMIT
shortcomings of
Deepfake videos7In, face recognition and deep morph detection systemsof with
other corresponding
privacy protection techniques.
scores as an Since
open morph-
source Python
this section, we describe visual privacy protec-
package 8 . ing is simply a geometrical transformation of pixels,
tion method based on morphing. To demonstrate how
with pixels interpolated into weighted sum of known
such privacy protection works, we assume face to be
intensities, it is compression independent, as opposed
a sensitive region to which the protection is applied.
to scrambling, while retaining the main features of the
The following is the summary, illustrated in Figure 1,
morphed region (such as face), as opposed to encryp-
of morphing based privacy protection method:
tion privacy protection methods. Security of the pro-
• Automatically select key points in both original posed method can be ensured by encrypting the key
source and target (e.g., a standard human face) points (the vertices of Dalaunay triangles) of the mor-
images by using face, eyes, nose, and mouth de- phing algorithm and randomizing interpolation level
tections; and intensity strength values for each morphed trian-
gle (see Figure 2f for illustration), as we discuss in more
• For each pair of the corresponding points in two details in Section 2.2.
images determine some point in between, by using We use standard FERET dataset [10] (a subset of
a given level of interpolation value; 100 faces) with provided ground truth for testing the
• Divide images using Delaunay triangulation [2] proposed morphing-based privacy protection. Morph-
with determined points as vertices of the triangles; ing was applied to faces in the dataset, which were de-
(g) Original 1 (h) Original 2 (i) LQ swap 1 → 2 (j) HQ tected swap 1 with
→ 2 Viola-Jones
(k) LQ swapface2 →detection
1 (l) HQ[12]
swapalgorithm.
2→1
• Find coordinates of pixels in the final image by For vertices of Delaunay triangles, 18 key points were
Figure 2: Screenshotinterpolating both source
of the original videosand target
from images ac-
VidTIMIT databaseautomatically
and low (LQ) selected basedquality
and high on the(HQ)
detected
deepeyes (5
morphs.

4 https://github.com/shaoanlu/faceswap-GAN
5 http://conradsanderson.id.au/vidtimit/
6 https://github.com/davidsandberg/facenet
7 https://www.idiap.ch/dataset/deepfaketimit
8 Source code: https://gitlab.idiap.ch/bob/bob.report.deepfakes
2. Database of deep morph videos
As the original data, we took video from VidTIMIT database5 . The database contains 10 videos for each of 43 subjects,
which were shot in controlled environment with people facing camera and reciting predetermined short phrases. From these
43 subject, we manually selected 16 pairs in such a way that subjects in the same pair have similar prominent visual features,
e.g., mustaches or hair styles. Using GAN-based algorithm based on the available code4 , for each pair of subjects, we
generated videos where their faces are replaced by a GAN-generated deep morphs (see the example screenshots in Figure 2).
For each pair of subjects, we have trained two different GAN models and generated two versions of the deep morphs:
1. The low quality (LQ) model has input and output image (facial regions only) of size 64 × 64. About 200 frames from
the videos of each subject were used for training and the frames were extracted at 4 fps from the original videos. The
training was done for 100 000 iterations and took about 4 hours per model on Tesla P40 GPU.
2. The high quality (HQ) model has input/output image size of 128 × 128. About 400 frames extracted at 8 fps from
videos were used for training, which was done for 200 000 iterations (about 12 hours on Tesla P40 GPU).
Also, different blending techniques were used when generating deep morph videos using different models. With LQ
model, for each frame from an input video, generator of the GAN model was applied on the face region to generate the fake
counterpart. Then a facial mask was detected using a CNN-based face segmentation algorithm proposed in [12]. Using this
mask, the generated fake face was blended with the face in the target video. For HQ model, the blending was done based
on facial landmarks (detected with publicly available MTCNN model [22]) alignment between generated fake face and the
original face in the target video. Finally, histogram normalization was applied to the blended result to adjust for the lighting
conditions, which makes the result more realistic (see Figure 2).

EER threshold Zero-effort impostors FAR EER threshold Zero-effort impostors FAR
Genuine Deepfake videos Genuine Deepfake videos
35
100 20.0 100
95.0%
30 85.6% 17.5
80 80
25 15.0
Probability density
Probability density

20 60 12.5 60

FAR (%)
FAR (%)

10.0
15 40 40
7.5
10
20 5.0 20
5 2.5
0 0
0 0.0
0.5 0.4 0.3 0.2 0.1 0.0 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0
Score values Score values

(a) VGG-based face recognition (b) FaceNet-based face recognition

Figure 3: Histograms show the vulnerability of VGG and Facenet based face recognition to high quality deep morphs.

2.1. Evaluation protocol


When evaluating vulnerability of face recognition, for the licit scenario without the deep morph videos, we used the
original VidTIMIT5 videos for the 32 subjects for which we have generated corresponding deep morph videos. In this
scenario, we used 2 videos of the subject for enrollment and the other 8 videos as probes, for which we computed the
verification scores.
From the scores, for each possible threshold θ, we computed commonly used metrics for evaluation of classification
systems: false acceptance rate (FAR) and false reject rate (FRR). Threshold at which these FAR and FRR are equal leads to
an equal error rate (EER), which is commonly used as a single value metric of the system performance.
To evaluate vulnerability of face recognition, in tampered scenario, we use deep morph videos (10 for each of 32 subjects)
as probes and compute the corresponding scores using the enrollment model from the licit scenario. To understand if face
recognition perceives deep morphs to be similar to the genuine original videos, we report the FAR metric computed using
EER threshold θ from licit scenario. If FAR value for deep morph videos is significantly higher than the one computed in licit
scenario, it means the face recognition system cannot distinguish synthetic videos from originals and is therefore vulnerable
to deep morphs.
Table 1: Baseline detection systems for low (LQ) and high quality (HQ) deep morph videos. EER and FRR when FAR equal
to 10% are computed on Test set.

Database Detection system EER (%) FRR@FAR10% (%)

Pixels+PCA+LDA 39.48 78.10


LQ deep morph IQM+PCA+LDA 20.52 66.67
IQM+SVM 3.33 0.95

HQ deep morph IQM+SVM 8.97 9.05

When evaluating deep morph detection, we consider it as a binary classification problem and evaluate the ability of
detection approaches to distinguish original videos from deep morph videos. All videos in the dataset, including genuine
and fake parts, were split into training (Train) and evaluation (Test) subsets. To avoid bias during training and testing, we
arranged that the same subject would not appear in both sets. We did not introduce a development set, which is typically used
to tune hyper parameters such as threshold, because the dataset is not large enough. Therefore, for deep morph detection
system, we report the EER and the FRR (using the threshold when F AR = 10%) values on the Test set.

3. Vulnerability of face recognition


We used publicly available pre-trained VGG and Facenet architectures for face recognition. We used the fc7 and bottleneck
layers of these networks, respectively, as features and used cosine distance as a classifier. For a given test face, the confidence
score of whether it belongs to a pre-enrolled model of a person is the cosine distance between the average feature vector,
i.e., model, and the features vector of a test face. Both of these systems are state of the art recognition systems with VGG of
98.95% [13] and Facenet of 99.63% [17] accuracies on labeled faces in the wild (LFW) dataset.
We conducted the vulnerability analysis of VGG and Facenet-based face recognition systems on low quality (LQ) and
high quality (HQ) face swaps in VidTIMIT5 database. In a licit scenario when only original videos are present, both systems
performed very well, with EER value of 0.03% for VGG and 0.00% for Facenet-based system. Using the EER threshold from
licit scenario, we computed FAR value for the scenario when deep morph videos are used as probes. In this case, for VGG
the FAR is 88.75% on LQ deep morphs and 85.62% on HQ deep morphs, and for Facenet the FAR is 94.38% and 95.00%
on LQ and HQ deep morphs respectively. To illustrate this vulnerability, we plot the score histograms for high quality deep
morph videos in Figure 3. The histograms show a considerable overlap between deep morph and genuine scores with clear
separation from the zero-effort impostor scores (the probes from licit scenario).
From the results, it is clear that both VGG and Facenet based systems cannot effectively distinguish GAN-generated
synthetic faces from the original ones. The fact that more advanced Facenet system is more vulnerable is also consistent with
the findings about presentation attacks [11].

4. Detection of deep morph videos


We considered several baseline deep morph detection systems:
• Pixels+PCA+LDA: use raw faces as features with PCA-LDA classifier, with 99% retained variance resulting in 446
dimensions of transform matrix.
• IQM+PCA+LDA: IQM features with PCA-LDA classifier with 95% retained variance resulting in 2 dimensions of
transform matrix.
• IQM+SVM: IQM features with SVM classifier, each video has an averaged score from 20 frames.
The systems based on image quality measures (IQM) are borrowed from the domain of presentation (including replay
attacks) attack detection, where such systems have shown good performance [4, 20]. As IQM feature vector, we used 129
measures of image quality, which include such measures like signal to noise ratio, specularity, bluriness, etc., by combining
the features from [4] and [20].
The results for all detection systems are presented in Table 1. The results demonstrate that the IQM+SVM system has a
reasonably high accuracy of detecting deep morph videos, although videos generated with HQ model pose a more serious
challenge. It means that a more advanced techniques for face swapping will be even more challenging to detect.
5. Conclusion
In this paper, we demonstrated that state of the art VGG and Facenet-based face recognition algorithms are vulnerable
to the deep morphed videos from DeepfaTIMIT database and fail to distinguish such videos from the original ones with up
to 95.00% equal error rate. We also evaluated several baseline detection algorithms and found that the techniques based on
image quality measures with SVM classifier can detect HQ deep morph videos with 8.97% equal error rate.
However, the continued advancements in development of GAN-generated faces will result in more challenging videos,
which will be harder to detect by the existing algorithms. Therefore, new databases and new more generic detection methods
need to be developed in the future.

References
[1] A. Agarwal, R. Singh, M. Vatsa, and A. Noore. Swapped! digital face presentation attack detection via weighted local magnitude
pattern. In IEEE International Joint Conference on Biometrics (IJCB), pages 659–665, Oct 2017.
[2] H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2):211–236,
2017.
[3] M. Ferrara, A. Franco, and D. Maltoni. The magic passport. In IEEE International Joint Conference on Biometrics (BTAS), pages
1–7, Sep. 2014.
[4] J. Galbally and S. Marcel. Face anti-spoofing based on general image quality assessment. In International Conference on Pattern
Recognition, pages 1173–1178, Aug 2014.
[5] D. Güera and E. J. Delp. Deepfake video detection using recurrent neural networks. In IEEE International Conference on Advanced
Video and Signal Based Surveillance (AVSS), pages 1–6, Nov 2018.
[6] P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, July 2017.
[7] P. Korshunov and S. Marcel. Vulnerability assessment and detection of Deepfake videos. In International Conference on Biometrics
(ICB 2019), Crete, Greece, June 2019.
[8] I. Korshunova, W. Shi, J. Dambre, and L. Theis. Fast face-swap using convolutional neural networks. In IEEE International
Conference on Computer Vision (ICCV), pages 3697–3705, Oct 2017.
[9] R. S. S. Kramer, M. O. Mireku, T. R. Flack, and K. L. Ritchie. Face morphing attacks: Investigating detection with humans and
computers. Cognitive Research: Principles and Implications, 4(1):28, Jul 2019.
[10] A. Makrushin, T. Neubert, and J. Dittmann. Automatic generation and detection of visually faultless facial morphs. In Proceedings of
International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pages
39–50. INSTICC, SciTePress, 2017.
[11] A. Mohammadi, S. Bhattacharjee, and S. Marcel. Deeply vulnerable: a study of the robustness of face recognition to presentation
attacks. IET Biometrics, 7(1):15–26, 2018.
[12] Y. Nirkin, I. Masi, A. T. Tuan, T. Hassner, and G. Medioni. On face segmentation, face swapping, and face perception. In IEEE
International Conference on Automatic Face Gesture Recognition (FG), pages 98–105, May 2018.
[13] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, 2015.
[14] H. X. Pham, Y. Wang, and V. Pavlovic. Generative adversarial talking head: Bringing portraits to life with a weakly supervised neural
network. arXiv.org, 2018.
[15] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner. Faceforensics: A large-scale video dataset for forgery
detection in human faces. arXiv.org, 2018.
[16] U. Scherhag, C. Rathgeb, J. Merkle, R. Breithaupt, and C. Busch. Face recognition systems under morphing attacks: A survey. IEEE
Access, 7:23012–23026, Feb. 2019.
[17] F. Schroff, D. Kalenichenko, and J. Philbin. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pages 815–823, June 2015.
[18] C. Seibold, W. Samek, A. Hilsmann, and P. Eisert. Detection of face morphing attacks by deep learning. In C. Kraetzer, Y.-Q.
Shi, J. Dittmann, and H. J. Kim, editors, Digital Forensics and Watermarking, pages 107–120, Cham, 2017. Springer International
Publishing.
[19] J. Thies, M. Zollhfer, M. Stamminger, C. Theobalt, and M. Niener. Face2Face: Real-time face capture and reenactment of RGB
videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2387–2395, June 2016.
[20] D. Wen, H. Han, and A. K. Jain. Face spoof detection with image distortion analysis. IEEE Transactions on Information Forensics
and Security, 10(4):746–761, April 2015.
[21] X. Yang, Y. Li, and S. Lyu. Exposing deep fakes using inconsistent head poses. In IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 8261–8265, May 2019.
[22] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE
Signal Processing Letters, 23(10):1499–1503, Oct 2016.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy