Deep-Fake Detection
Deep-Fake Detection
org (ISSN-2349-5162)
DEEP-FAKE DETECTION:
MACHINE LEARNING APPROACHES TO COMBAT DEEP-FAKE
THREATS
1
Velpooru Venkata Sai Thushar, 2Rupani Varun Shiva Krishna, 3Kondadi Tejith
1,2,3
Students
Department of CSE, Institute of Aeronautical Engineering, Dundigal, Hyderabad-500043, Telangana, India
ABSTRACT
The term deep-fake emerged towards the end of 2017 within the reddit community a popular an American social media platform focused
on content rating and discussions since then deep-fake technology has rapidly advanced enabling the crafting incredibly lifelike media
material that convincingly mimics individuals appearances however its potential for misuse including the spread of disinformation and
deception of users has sparked major apprehensions in response researchers have been actively exploring novel approaches to develop
effective deep-fake detection systems this summary provides an overview of the latest advancements in employing machine learning
techniques for deep-fake detection aiming to mitigate the risks associated with its misuse.
The proposed approach trains machine learning models on a broad dataset that contains both real and deep-fake material it uses essential
characteristics taken from eye movements and facial expressions to distinguish between real and fake information the focus of the study
is on using deep neural networks and making sure that they can adapt to new developments in deep-fake generating technology.
To check the model's precision, accuracy, and recall validation and testing are done on separate datasets. Ethical questions linked to
privacy and consent are important to the study framework, addressing concerns connected with the analysis of media material. The
research stresses the continual improvement of detection algorithms, including updates to defeat developing deep fake tactics efficiently.
The outputs of this research seek to contribute to the development of practical and ethical deep fake detection technologies. As the
threat landscape advances, it is vital to be at the forefront of technical innovations to prevent the malicious use of synthetic media.
This research gives useful information for the continuing attempts to limit the hazards related with deep fake propagation.
Keywords: Deepfake Detection, Multi-task Cascaded Convolutional Networks (MTCNN), InceptionresnetV1, GradCAM, Gardio
JETIR2403030 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a231
© 2024 JETIR March 2024, Volume 11, Issue 3 www.jetir.org (ISSN-2349-5162)
discriminator. Common loss functions include binary cross- and lighting reflections, especially in glasses, ensuring they
entropy or Wasserstein distance. align with movement. Examine facial hair, moles, and
blinking patterns for authenticity. Pay close attention to lip
6) Fine-Tuning and Refinement: After the initial training, the movements, as discrepancies in syncing may indicate
model may undergo fine-tuning and refinement to improve manipulation. Overall, meticulous observation of these cues
the quality of generated deep-fakes. This may involve can help detect subtle anomalies indicative of deep-fake
adjusting hyperparameters, using larger datasets, or alterations.
employing advanced techniques such as self-attention
mechanisms. These are few parameters that we can look for to detect deep-
fakes.
7) Post-Processing: The generated deep-fakes may undergo
post-processing to enhance their realism. This could include
adding imperfections, adjusting lighting, or smoothing
transitions between frames.
8) Deployment: Once trained, the deep-fake model can be 4.2 Deep-fake detection mathematical model?
deployed to generate manipulated images or videos. These
deep-fakes can be shared online or used for various purposes, The mathematical model we used involves several
ranging from entertainment to malicious deception. components:
Pre-trained parameters: The pretrained- using the embedding for tasks such as distance calculation or
parameters of InceptionResnetV1 often involve classification.
weights learned from vast datasets containing facial
c) GradCAM for Explainability: GradCAM (Gradient-
images. This pretraining phase equips the model
weighted Class Activation Mapping) is used to visualize the
with the capability to extract distinct facial features,
regions of the input image that are most influential in the
which proves beneficial for tasks related to
model's decision-making process. It computes the gradients
individual recognition.
of the target class output with respect to the feature maps of
Feature-Extraction: A face-containing input photo
the specified layers in the model. This helps in understanding
is sent through the network in order to identify a face
which parts of the image contribute the most to the model's
using InceptionResnetV1. Features at various
prediction.
degrees of abstraction are taken from the photo as it
moves through the network's layers. These features d) Sigmoid Activation for Classification: The output of the
record details including high-level traits, textures, model is passed through a sigmoid activation function, which
and face shapes. squashes the output values between 0 and 1. This output
Embedding Generation: The outcome derived represents the confidence score for the input image being
from the ultimate layer of the network, prior to the classified as a real or fake face. Sigmoid activation function
classification layer, is frequently utilized as a feature is used in classification:
vector or embedding to portray the input face. This
embedding encapsulates essential details about the Output Interpretation: The output of the sigmoid
facial traits crucial for recognition purposes. function represents the probability or confidence
Distance Metric: When comparing two face score that an input belongs to the positive class (e.g.,
embeddings to ascertain their potential belonging to class 1). For binary classification, this is often
the same individual, a distance metric such as interpreted as the probability of the input belonging
Euclidean distance or cosine similarity is frequently to the "positive" class.
employed. Smaller distances imply higher similarity Decision Boundary: In binary classification, a
between embeddings, indicating a higher likelihood decision boundary is typically set at 0.5. Inputs with
that the faces originate from the same person. sigmoid output greater than 0.5 are classified as
Classification (Optional): In some instances, the belonging to the positive class, while inputs with
output from the embedding-layer might undergo output less than 0.5 are classified as belonging to the
processing via a classification-layer to execute tasks negative class.
like face verification ascertaining if two faces Training: During the training process, the sigmoid
pertain to the same individual or face identification function is applied to the output of the last layer of
assigning identities to faces. the neural network. The parameters of the network
(weights and biases) are adjusted using optimization
algorithms such as gradient descent to minimize the
difference between the predicted outputs and the
b.2) Mathematical model for face recognition with
true labels of the training data.
InceptionResnetV1:
Loss Function: The output of the sigmoid function
1. Input Image: We have an input image (I). is often used as input to a loss function, such as
binary cross-entropy loss, which quantifies the
2. CNN Operations: The image is processed through a deep difference between the predicted probabilities and
neural network (InceptionResnetV1) to extract features: the true labels. This loss is minimized during
F = CNN(I) training to improve the model's ability to correctly
classify inputs.
where F represents the features extracted from the image.
E = Embed(F)
d(E1,E2)=|E2-E1|
5. Classification: Optionally, the embedding E can be used Fig3: Sigmoid function graph
for classification:
e) Confidence Scores: The confidence scores for real and
P = Classify(E) fake faces are computed based on the sigmoid output. The
score closer to 1 indicates higher confidence in the predicted
where P represents the probability distribution over classes. class. The confidence scores for classifying whether the input
In summary, the simplified mathematical model involves image contains a real or fake face are calculated using a
processing the input image through a CNN to extract features, sigmoid activation function applied to the output of the face
converting these features into an embedding, and optionally recognition model. Here's how we have computed:
JETIR2403030 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a234
© 2024 JETIR March 2024, Volume 11, Issue 3 www.jetir.org (ISSN-2349-5162)
JETIR2403030 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a235
© 2024 JETIR March 2024, Volume 11, Issue 3 www.jetir.org (ISSN-2349-5162)
JETIR2403030 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org a237