0% found this document useful (0 votes)
31 views15 pages

Criminal Identification - Basepaper

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

Criminal Identification - Basepaper

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received 13 December 2024, accepted 23 December 2024, date of publication 25 December 2024, date of current version 3 January 2025.

Digital Object Identifier 10.1109/ACCESS.2024.3523101

Leveraging Lightweight Hybrid Ensemble


Distillation (HED) for Suspect Identification
With Face Recognition
VAISHNAVI MUNUSAMY AND SUDHA SENTHILKUMAR
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
Corresponding author: Sudha Senthilkumar (sudha.s@vit.ac.in)

ABSTRACT Face recognition biometric systems focus on identifying individuals by extracting their
facial characteristics. However, these systems often fail or are misclassified because of external factors,
obstructions, and varying environmental conditions. Traditional models cannot effectively handle these
variations, leading to inaccuracies. Moreover, the complexity and computational demands of advanced
models can hinder their real-time application. In this study, the Hybrid Ensemble Distillation (HED) model
addresses these issues by leveraging both knowledge distillation and an ensemble of pre-trained models
(VGG16, ResNet50, and DenseNet121) to enhance the precision and proficiency of categorization. The
model combines the strengths of these architectures while utilizing data augmentation techniques such as
GANs to enhance the training dataset. The proposed model demonstrated high efficiency and accuracy, with
the teacher model achieving 98.42% accuracy and the student model reaching 96.78% validation accuracy,
thereby highlighting the efficacy of knowledge distillation. It also showed progressive improvements in the
validation accuracy and loss reduction over 350 epochs, emphasizing the robustness of the training process.
This lightweight method helps identify suspects or individuals because the model was trained using 360-
degree images in the dataset, ensuring comprehensive feature extraction from multiple angles. The reduced
computational requirements and high accuracy make this approach suitable for real-time applications,
thereby enhancing its practicality for various human identification tasks.

INDEX TERMS Person identification, suspects/criminal identification, face recognition, knowledge


distillation, ensemble method.

I. INTRODUCTION expression analysis and recognition performance. Despite


Face detection and recognition technologies have garnered technological advancements, face recognition systems still
significant attention in recent years, primarily due to their face limitations, especially in low-illumination environments.
high level of security and resistance to tampering. The State-of-the-art models such as DeepFace (developed by
human face encompasses diverse biological traits, such Facebook in 2014) [1], FaceNet (developed by Google in
as appearance, facial expression, and skin color, which 2015) [2], and SphereFace [3] have shown near-human
inherently vary among individuals. This variability, coupled performance in optimal lighting conditions. However, their
with external factors like changes in hairstyle, eye posi- effectiveness deteriorates in dim settings, highlighting the
tioning, makeup, image sharpness, and lighting, introduces need for improved approaches to handle diverse environ-
considerable challenges for accurate face detection and mental conditions. Addressing these challenges is crucial for
recognition. Among these factors, variations in illumination improving the robustness and accuracy of face detection and
stand out as a major obstacle, adversely impacting facial recognition systems.
Today, face recognition has moved beyond unlocking
The associate editor coordinating the review of this manuscript and phones or enabling financial transactions, finding application
approving it for publication was Kumaradevan Punithakumar . in attendance management, public safety, security systems,
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
2112 VOLUME 13, 2025
V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

and more. However, many current person identification II. RELATED WORK
systems focus on attire, dressing patterns, or color rather than YOLOv5s which are utilized for person detection, were
facial or gestural recognition. Ideally, these systems should proposed by Zennayi et al., [4], who also employed
mimic human vision—humans can recognize known individ- the Multi-Hypothesis Data Association Tracking (MHDT)
uals from a distance or through partial visibility. Furthermore, method [5] for tracking individuals across frames. The
most recognition systems prioritize superficial features like image acquisition module uses a standard CCTV camera
attire or color patterns over intrinsic characteristics like facial to capture and preprocess the images. Location analysis
and gestural traits, limiting their applicability in scenarios involves tracking the movements of individuals to ensure that
involving partial visibility or long-distance recognition. they do not access unauthorized areas. A retinaface is used for
This study advances beyond existing literature by address- detecting faces within bounding boxes, and a deep learning
ing these critical gaps through the development of a Hybrid algorithm facilitates face recognition. Unauthorized access
Ensemble Distillation (HED) model. The HED model is identification combines movement and identity data to detect
designed to: and report an unauthorized access.
• Enhance robustness against low illumination, motion Heterogeneous feature extraction and face identification
blur, and partial occlusion. are the two primary elements of the RHDA (Robust
• Simulate human-like recognition capabilities to ensure Heterogeneous Discriminative Analysis) method, which is
accurate identification even under challenging condi- used to handle identification process based single pic-
tions. ture samples [6]. After creating intrinsic and penalty
• Reduce computational complexity for real-time deploy- graphs, two discriminative manifold embeddings DSME
ment in practical applications. (‘‘Discriminative Single-Manifold Embedding’’) and DMME
The proposed approach is validated using a custom (‘‘Discriminative Multi-Manifold Embedding’’) are proposed
dataset specifically curated to simulate diverse environmental in heterogeneous feature extraction in order to produce
conditions, enabling a direct evaluation of its perfor- heterogeneous subspace representations for image patches
mance against classical models such as VGG16, ResNet50, based on a Fisher-like criterion. Two distance metrics, the
and DenseNet121. By benchmarking these models, the patch-to-patch and patch-to-manifold, were combined to
study demonstrates the superior robustness, efficiency, and create a fusion technique for face identification. This method
real-time applicability of the HED model compared to uses joint majority voting to identify each unlabeled query
existing methods. sample by utilizing heterogeneous subspace representations.
Key Contributions: The process involves partitioning facial images into non-
• Enhanced Robustness: Development of a lightweight overlapping patches, constructing local dictionaries, and pro-
model tailored for real-time applications, reducing the jecting patches into subspaces. The Fisher-like criterion helps
likelihood of false positives in face recognition. preserve within-class patch relationships while suppressing
• Human Vision Simulation: Implementation of a recog- between-class similarities. Finally, joint majority voting
nition system based on human vision principles, combines the outputs from both distance metrics, improving
ensuring accurate identification even in challenging identification accuracy and robustness against various facial
scenarios. variations by aggregating complementary information from
• Comprehensive Benchmarking: Evaluation of classical the two metrics.
object detection and recognition algorithms, including The application of Wavelet Neural Networks (WNNs)
VGG16, ResNet50, and DenseNet121, using the custom has been explored to predict criminal suspect characteristics
dataset to benchmark their performance in face recogni- from complex, nonlinear crime data, addressing challenges
tion tasks. such as dimensionality catastrophes and overfitting that
• Real-Time Suitability: Optimization of the model for plague traditional models [7]. WNNs, which merge wavelet
real-time applications, making it suitable for public transforms with neural networks, excel in terms of fitting
safety, surveillance, and other time-sensitive domains. accuracy and generalization. The methodology includes data
Thus, critical gaps in the field including the handling of preprocessing, feature selection using the information gain
illumination variability, ensuring robustness under adverse method, and parameter optimization via Particle Swarm
conditions, and achieving lightweight implementation for Optimization (PSO) for support vector machines within the
practical use are addressed in this work. Section II analyzes WNN [8]. The model was trained and validated, predicting
the literature survey on gait recognition. A detailed archi- suspect features based on case and victim data, which were
tecture of the proposed model is discussed in Section III, then matched against a suspect pool. Key functions of
while Section IV highlights the lightweight nature of wavelet analysis, such as the mother wavelet function and
the model. Section V presents an in-depth analysis of discrete wavelet transform, enable a detailed signal analysis
the experimental results, followed by a discussion of the at multiple scales. The WNN structure utilizes the Morlet
limitations in Section VI. Finally, the findings and future and Mexican Hat functions and is applied to face feature
research directions are summarized in Section VII. recognition, evaluated by precision, recall, and similarity

VOLUME 13, 2025 2113


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

measures. To address the big data challenge, this study images. The classifier, trained using positive and negative
employs Hadoop for distributed computing, improving the images, applied a cascaded approach with thresholds and
efficiency through parallelized MapReduce implementations decision rules to detect faces, subsequently facilitating face
for feature selection. In addition, Haar features are used recognition or identification tasks.
for grayscale changes in images, which are crucial for To handle face recognition of suspects, with challenges
face recognition. A performance analysis on the CIFAR like tilted or side faces, the machine learning approach
dataset shows that TI-ResGWNN outperforms traditional of Haar Cascade has been associated with DNN, and an
methods with fewer parameters. Thus, WNNs offer a robust extra preprocessing stage is introduced, involving scaling
solution for predicting suspect characteristics, leveraging the detected face to a predetermined size and adjusting
wavelet analysis, neural networks, optimization techniques, the alignment using facial landmarks [12]. A Local Binary
and distributed computing to enhance law enforcement Pattern Histogram (LBPH) was then used to identify the
investigations. preprocessed face. Using TensorFlow, dlib, and OpenCV,
The face detection method employs a Haar Cascade the system was put into practice. The Haar Cascade, known
Classifier to identify faces in the images. Web Scraping for its speed, employs three basic features (edge, line, and
utilizes the Python request library and BeautifulSoup to four-rectangle) but has limitations such as false positives
extract images and information from specified websites, and sensitivity to face orientation changes. In contrast,
ensuring that data are continuously updated without requiring the DNN-based detector, which use the SSD model and
retraining [9]. Feature Extraction uses OpenCV to detect ResNet-10 architecture, handles various face orientations
Multiscale, converting images to grayscale, and HSV to and occlusions effectively but is slower. The workflow of
extract features stored as unique integer arrays, mapped via the system includes image capture, face detection using
deep learning. In the Template Comparison, Facepplib API Haar and DNN, face alignment, training of the recognizer,
in Python compares face vectors by calculating similarity and recognition by comparing new faces with the trained
levels based on feature matching using confidence values to database.
determine matches. If the confidence exceeds a threshold, the The VGG-16 architecture employs distance metrics to
faces are considered identical and the relevant information compare facial embeddings and generate identification
is displayed. The author utilized user-provided suspect probabilities, and aims to reduce false positives in face
images and dynamically accessed web images of criminals recognition systems by implementing prediction findings
and missing children, including video frames processed at and tracking objects to accumulate recognition ratings for
intervals to reduce redundancy. each face [13]. To address the time constraints in real-
Sandhaya et al., [10] proposed a method for face detection time identification, downsampling techniques are applied to
and identification of suspects based on deep learning. Ini- reduce the processing time without compromising accuracy.
tially, face detection was performed on input images or video A comprehensive classification score was obtained by
frames using an SSD model with Resnet-10, implemented combining various indicators, including face proximity and
through OpenCV DNN (Deep Neural Networks), which confidence scores. In addition, the method incorporates
detects and crops faces. To compare faces, an encoder in the a dynamic threshold to validate the identification results,
autoencoder converts images to embedding vectors, which ensuring reliable predictions in diverse scenarios. This
are then compared using Cosine Similarity. The system approach offers the potential benefit of enhancing security
employs a subset of the LFW dataset for model training in crime-prone areas by providing accurate and timely
owing to the hardware constraints. The autoencoder model surveillance analyse.
includes Conv2D and AveragePooling2D layers, with batch Gupta et al., [14] suggested a method to capture the
normalization for accuracy. Finally, the system uses Cosine notion of similarity in a user mind by associating positive
Similarity to compare the angles between vectors, thereby and negative image samples with selected and non-selected
determining the likelihood that the input image matches an images, respectively. In order to accomplish this, a fully
image of criminals in the database. connected neural network is trained with a Separating Cluster
A Convolutional Neural Network (CNN) leverages image Loss (SCLoss) to project pretrained base representations onto
recognition tasks such as face detection and criminal iden- a lower-dimensional space. This ensures that comparable
tification. The CNN architecture [11] involves several key images are closer together and dissimilar ones are farther
operations: convolution, ReLu as an non-linear activation, apart in the projected space. Disentangled representation
max, average, and global pooling, fully connected, and learning is utilized to provide favorable initialization, ensur-
output layer, each with specific formulas to compute the ing robustness to noise and distortion in real-time scenarios.
output dimensions and values. In addition, the Haar Cascade The SCLoss objective maximizes the similarity between
classifier is employed for face detection, utilizing a cascade selected images and minimizes it with non-selected images,
of Haar-like features to scan input images or video streams. enabling flexible use during online training. To ensure that
These features capture the intensity changes across image newly acquired images are linked to the projected space,
regions, and their responses are computed using integral which is composed of previously created clusters, anchoring

2114 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

was included to preserve the idea of similarity between


iterations. This method makes it easier to train the projection
network to recognize the user’s similarities and allows for
effective inference in real-time applications.
An agile methodology has been employed for person and
face recognition, suspicious suit identification, facial expres- FIGURE 1. The workflow of hybrid ensemble distillation method.
sion identification, wrinkle extraction, gender identification,
and facial feature extraction for age identification [15]. For
III. HYBRID ENSEMBLE DISTILLATION MODEL
instance, CNN has been employed for face recognition,
ARCHITECTURE
the LeNet architecture for suit identification, and machine
Ensemble learning combines multiple models to improve
learning techniques for facial expression identification and
results, and in this study, through the application of
gender classification. Techniques such as disentangled repre-
knowledge distillation, a smaller, more effective model for
sentation learning, CNN, and dlib library were employed for
image classification was developed by mimicking the per-
image classification and feature extraction, whereas training
formance of larger pre-trained models. Specifically, stacking
datasets such as FER-2013 emotion dataset and IMDB
is employed, which involves training multiple base models
WIKI dataset were utilized for emotion classification and
such as VGG16, ResNet50, and DenseNet121, that process
gender identification, respectively. The system underwent
the input images to produce feature representations. These
continuous training and testing to improve its accuracy and
outputs are concatenated and passed through additional
efficiency in real-world scenarios.
dense layers, acting as the meta-learner, to obtain the final
A pre-trained FaceNet model was utilised for a criminal
classification. The workflow is revealed in Figure 1.
identification system using its own criminal dataset [16].
Facial detection was handled by Multi-Scale Cascaded A. DATA PREPARATION
Convolutional Networks (MTCNN) [17], which helps to The images were read from their respective directories,
recognize the facial landmarks. Each face was enumerated resized to a uniform size of 50×50 pixels, and appended
to determine its prediction and embedding from the training with their corresponding labels to create a structured dataset.
and test sets. The classification model used in Linear Support This resized dataset ensured consistency and reduced the
Vector Machine was effective for differentiating between face computational complexity. Subsequently, the data were
embeddings. shuffled to ensure a random distribution and then split into
Based on Lombroso theory, criminals are believed to pos- features and labels. Finally, the processed data were saved
sess innate physical traits that predispose them to crime [18]. using the pickle module, making it ready for future use in
In line with this theory, Amjid et al. [19] proposed a model model training and validation.
incorporating deep learning techniques, such as ResNet50 In the initial stage, the image dataset was loaded and
and SVM. The ResNet50 model prediction layer generates preprocessed. The dataset comprises images (X) and corre-
results based on the SVM analysis, which depends on the sponding labels (y), which are normalized to the [0, 1] range
structure of the neural network. These features extracted by dividing them by 255.0. The labels were converted to a
by ResNet50, particularly the facial features, correspond categorical format for use with the categorical cross-entropy
to those associated with the Lombroso theory of criminal loss function. The data were then split into training and
characteristics. validation sets using a 90-10 split.
The surveyed methods for face recognition face challenges,
such as occlusions, varying lighting, sensitivity to face B. TEACHER MODEL ARCHITECTURE
orientation, and partial occlusions. They are resource- The architecture of the teacher model integrates three promi-
heavy, require high-end GPUs for real-time processing, nent convolutional neural networks: VGG16, ResNet50, and
and are computationally demanding, particularly for low- DenseNet121, all initialized with random weights.
resolution images. These methods often yield higher false • VGG16 is a simple and uniform architecture specif-
positive rates and are less effective for detecting non-frontal ically designed for object detection and classification
faces. They require extensive training data, preprocessing, algorithm [20], [21], [22], [23], [24], [25]. This provided
and significant computational resources, making them less a strong foundation for feature extraction. VGG16
suitable for real-time applications on standard hardware. acquires low-level features, including edges and texture,
These approaches are sensitive to variations in image quality from the early layer, and extracts high -level or complex
and orientation, and dynamic thresholds may require fine - features, such as shape and object parts.
tuning. Additionally, extensive user input for training and • ResNet-50 is a cutting-edge deep CNN architecture
diverse datasets are required, in addition to their complexity which is a member of the family of Residual Network
and computational burden. Some methods are ethically (ResNet). ResNet-50 consists of several stacked residual
questionable and limited by the validity of their underlying blocks. Each residual block has two main paths: the
theories in modern contexts. identity path, which passes the input to the next

VOLUME 13, 2025 2115


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

layer, and the convolutional path, which employs a


series of convolutional operations. The outputs of these The fully connected layers in VGG16 were removed
operations are added element-wise to the identity path. to reduce the computational complexity and model size,
It uses bottleneck blocks, composed of 1×1, 3×3, and making the hybrid model lightweight and suitable for real-
1×1 convolutions [11], [26], [27], [28], [29], [30]. time applications. This adjustment helps maintain high
These bottleneck blocks help reduce computational performance while ensuring that the model can be efficiently
complexity. Pre-trained weights are often available for deployed on standard hardware.
ResNet-50, trained on large datasets, ImageNet for
C. KNOWLEDGE DISTILLATION
instance, making it easier to transfer knowledge to new
tasks with relatively small datasets. Skip connections Knowledge distillation is used to transform the complex
allow free flow of the gradient during training, alleviat- model into a simple and small model. This helps reduce the
ing the gradient loss problem and enabling the training computational complexity and is less expensive to evaluate.
of very deep networks. Knowledge distillation has two training networks: training
• DenseNet-121 offers efficient parameter usage, an the teacher network and training the student network. The
improved gradient flow, and feature reuse. DenseNet- teacher network is a large complex model and has a high
121 requires fewer parameters than traditional CNNs capacity for knowledge, but it may not be utilized fully for
of similar depth owing to its dense connectivity and each task, as it exploits only a few portions of its knowledge
use of bottleneck layers. This design primarily lowers in a task. Therefore, the distilled knowledge is transferred to
the computational complexity as well as enhances the the student network (small complex model), which mimics
performance of the network [31], [32], [33], [34], the teacher model without any loss of validity. The objective
[35], [36], [37], [38], [39], [40], [41], [42]. The dense is to minimize the size and computational demands of the
connections between layers facilitate a better gradient student model while ensuring that its performance remains
flow, which simplifies the training of deep networks comparable to that of the teacher model. The workflow of
by diminishing issues such as the vanishing gradient knowledge distillation is demonstrated in the Figure 4, and
problem. Additionally, the reuse of features across its process is described as follows:
layers enhances the learning efficiency, often resulting
in superior performance in various image classification 1) TEACHER MODEL
tasks. The teacher model is made up of several complex architec-
Each base model processes the same input image, and tures (such as VGG16, ResNet50, and DenseNet121) that
feature extraction is performed using GlobalAveragePool- have been trained to produce predictions for a particular
ing2D layers. The extracted features from each base model dataset. These devices have a great deal of capacity and
were concatenated and passed through two dense layers can pick up a wide range of features. For every class, the
with 126 and 64 units, respectively, incorporating L2 instructor model generates logits, or raw predictions.
regularization and dropout to alleviate overfitting. The final
output layer employed softmax activation to predict the five 2) SOFT OBJECTIVES
classes in the dataset. The model was trained with the Adam Knowledge distillation uses soft objectives instead of hard
optimizer and a learning rate of 0.0001 using early stopping targets (i.e., one-hot encoded labels) to train the student
and learning rate reduction on the plateau as callbacks. model. The probability produced by the instructor model
Figure 2 shows the total number of layers in the hybrid following the application of a softmax function are known as
model, which is the sum of the layers in each branch soft targets. These soft targets improve the generalization of
and the additional layers after concatenation. In general, the student model by offering more details about the teacher
VGG16 has 13 convolutional layers and 3 fully connected model predicts, such as the relative likelihood of each class.
layers; however, in this model, only 13 convolutional layers
were used. ResNet50 and DenseNet121 employ 49 and 3) TEMPERATURE SCALING
121 convolutional layers, respectively. Each model was The teacher model logits undergo temperature scaling to
incorporated with 1 Global Average Pooling layer. produce softer probability. One hyperparameter that affects
• VGG16 Branch: 13 convolutional layers + 1 Global the sharpness of probability distribution is temperature.
Average Pooling layer = 14 layers Higher temperatures produce softer (flatter) probabilities,
• ResNet50 Branch: 49 convolutional layers + 1 Global which help the student model comprehend how confident the
Average Pooling layer = 50 layers teacher is in each lesson. A temperature of three is used in
• DenseNet121 Branch: 120 convolutional layers + 1 this case.
Global Average Pooling layer = 121 layers
• Post-Concatenation Layers: 2 dense layers (126 and 4) DISTILLATION LOSS
64 units) + 2 dropout layers (each 50%) + 1 dense One important factor that supports the student model learning
layer (units equals the number of classes 5) + 1 softmax process is the distillation loss. It is computed as the
layer = 6 layers categorical cross-entropy between the student and the soft

2116 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

FIGURE 2. The sequential steps involved in the Hybrid Ensemble Distillation (HED) method.

targets (teacher) anticipated probability. The overall loss model, and α is a hyperparameter that balances the two loss
comprises: components.
• By comparing the student prediction with the actual
label, the typical categorical cross-entropy loss is 5) TRAINING THE STUDENT MODEL
evaluated.
The student model is designed to emulate the performance
• The distillation loss assesses how well the student model
of the teacher model while being more compact and
prediction matches with the soft targets of the teacher
efficient. The training process is organized as follows:
model.
The custom loss function is divided into two components:
Standard Categorical Cross-Entropy Loss: This component D. STUDENT MODEL ARCHITECTURE
is utilized to assess the student model predictions against The architecture of the student model is considerably
the actual labels (one-hot encoded). This is calculated as smaller than that of the teacher model, comprising three
the categorical cross-entropy between the predictions of the convolutional layers with 32, 64, and 128 filters, followed by
student model and the soft targets provided by the teacher a MaxPooling2D layer, Global Average Pooling, and a dense
model (logits adjusted by the temperature). The final loss layer containing 64 units. The output layer employs a softmax
function can be expressed as: activation function for multi-class classification.
Total Loss = α · CE(ystudent , ytrue )
1) OPTIMIZATION STRATEGY
+ (1 − α) · CE(ystudent , ysoft )
The training of the student model employs the Adam
where ystudent are the predictions of the student model, ytrue optimizer with a learning rate set at 0.0001. Additionally,
are the true labels, ysoft are the soft targets from the teacher callbacks are implemented for:

VOLUME 13, 2025 2117


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

FIGURE 3. Detailed architecture of the hybrid ensemble model.

• Learning Rate Reduction: This mechanism decreases their high computational cost. In contrast, the proposed model
the learning rate when the validation loss stabilizes, reduces parameters by employing just two dense layers (126
facilitating quicker convergence of the model. and 64 units) after feature concatenation. This architecture
• Early Stopping: This feature halts the training process minimizes memory usage and accelerates inference time.
if there is no improvement in validation loss over a The total number of parameters in this model is significantly
predetermined number of epochs, thereby mitigating the smaller compared to the 138 million parameters in VGG16
risk of overfitting. and 25 million parameters in ResNet50, making the model
highly efficient while retaining performance.
2) TRAINING LOOP B. FLOPS (FLOATING POINT OPERATIONS PER SECOND)
The student model undergoes training for 350 epochs, during ANALYSIS
which the teacher model generates soft targets for each The computational complexity of the model is reduced by
training instance. The student model learns to approximate leveraging Global Average Pooling (GAP) layers in base
the predictions of the teacher through the process of models such as VGG16, ResNet50, and DenseNet121. GAP
distillation. layers significantly decrease the number of parameters passed
to fully connected layers, lowering the computational burden
IV. LIGHTWEIGHT NATURE OF THE MODEL FOR during inference. The FLOPS of this model is much lower
REAL-TIME SUSPECT IDENTIFICATION compared to ResNet50 (approximately 4.1 GFLOPS) and
A. REDUCED NUMBER OF PARAMETERS DenseNet121, demonstrating the lightweight nature of the
Traditional models such as VGG16, ResNet50, and model and its suitability for real-time applications without
DenseNet121 have large parameter counts, contributing to compromising accuracy.

2118 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

FIGURE 4. Workflow of knowledge distillation and its process.

C. MEMORY USAGE OPTIMIZATION usage, and computational efficiency. This balance between
With its reduced parameter count and efficient architecture, performance and efficiency, achieved through architectural
the memory footprint of the model is optimized to a optimizations and model distillation, makes the model ideal
value significantly lower than the memory requirements of for real-time suspect identification in both resource-rich and
traditional models like VGG16 and ResNet50. This low resource-constrained environments, effectively addressing
memory usage makes the model suitable for deployment the challenges of real-time deployment.
on resource-constrained devices, ensuring seamless perfor-
mance in real-time suspect identification tasks. V. EXPERIMENTAL RESULT
A. DATASET
D. INFERENCE TIME OPTIMIZATION One of the primary challenges in human identification is the
The model achieves faster inference times, essential for real- scarcity of large, labeled datasets that capture a wide range of
time decision-making. On the tested hardware system (HP Z2 real-world scenarios. Our dataset addresses this by including
G4 GPU with 32GB RAM and Intel i9-9900K processor), the 250 images per individual for four participants, resulting in
model processes an image much faster than larger models like a total of 1000 images. Each individual was photographed
VGG16 and ResNet50. This speed ensures rapid responses in under various controlled and uncontrolled conditions to
real-time applications, such as suspect identification. simulate real-world challenges. The dataset includes:
• Mugshots (standard frontal face images),

E. EFFICIENCY ON MODERATE HARDWARE • Masked images (with the face partially covered),
• Side profiles (both left and right),
Evaluated on a system with moderate computational
• Concealed images (where the lower portion of the face
resources (HP Z2 G4 GPU, 32GB RAM, Intel i9-
9900K processor), the model achieves competitive accuracy is hidden),
• Images taken under poor lighting conditions (night
while maintaining fast inference times and low resource
consumption. This demonstrates its scalability and suitability effects),
• Blurred images (to simulate motion or poor camera
for deployment in environments with limited computational
power, ensuring efficiency in real-world use cases. focus),
• Inverted images (to test the model’s ability to handle
Table 1 showcases that the HED Teacher Model, designed
for training purposes, achieves high accuracy (98.42%) rotated perspectives), and
• Pictures captured from a height of up to 22 feet (to mimic
with a loss of 0.2314. However, it is computationally
intensive, requiring 1.8 GFLOPS, 12.5 million parameters, surveillance camera angles in public areas).
and 150 MB of memory, making it suitable for resource- This diversity in image capture conditions introduces
rich environments. In contrast, the HED Student Model, variability that mimics real-world scenarios and makes the
distilled from the teacher model, achieves comparable dataset a good challenge for human identification tasks, even
accuracy (96.78%) with negligible loss while reducing though it is relatively small. However, to improve model
FLOPS by 72% and parameters by 74%, requiring only robustness and performance, we employed data augmentation
0.5 GFLOPS and 3.2 million parameters. This signifi- techniques and synthetic data generation through GANs.
cantly reduces memory usage to 60 MB, making it highly
efficient for resource-constrained devices. The extended B. DATA AUGMENTATION
training period of 350 epochs compensates for the reduction The data augmentation strategy is a crucial component of
in complexity, ensuring reliable performance in real-time the performance improvement of our model, especially in
applications. These optimizations result in a 72% reduction scenarios where acquiring a large volume of real-world data
in FLOPS and a 74% reduction in parameters compared is impractical. In our approach, we use the Augmentor library
to the teacher model, ensuring fast inference, low memory for conventional augmentations, such as Gaussian blur,

VOLUME 13, 2025 2119


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

TABLE 1. Comparison of HED teacher and student models.

image I , resulting in an augmented image I ′ :


I ′ = Tn (Tn−1 (. . . T1 (I )))
where:
• T1 to Tn represent the individual augmentation opera-
tions (e.g., Gaussian blur, noise, brightness adjustment).
• I is the input image, and I ′ is the augmented image.
The GAN itself can be formulated as follows:
• The Generator G learns to map a noise vector z to a
synthetic image G(z):
G : z → G(z) ∈ RH ×W ×C
where H , W , C represent the height, width, and chan-
nels of the generated image, respectively.
• The Discriminator D classifies images as real or fake:
D(x) ∈ {0, 1}
where x is the input image (real or generated), and:
FIGURE 5. Generator-discriminator synergy for synthetic image D(x) = 1 for real images, and
refinement in GAN training.
D(x) = 0 for fake images.
median blur, and changes to brightness, contrast, saturation,
and hue. Additionally, Gaussian noise and salt-and-pepper 2) OPTIMAL NUMBER OF SYNTHETIC SAMPLES
noise are introduced to further diversify the dataset. These The optimal number of synthetic samples is highly dependent
are combined in an augmentation pipeline, which also include on the dataset size and the performance improvement of our
random distortions, flipping, and zooming. model. While 2000 synthetic samples per class have been
The augmented data pipeline is designed to improve model generated, further analysis is necessary to determine the ideal
generalization by presenting a broader variety of image quantity.
transformations, ensuring the model does not overfit to In practice, dearth of augmented samples may not lead
the training data. After applying the augmentation pipeline, to significant improvements in model performance, while
we generate a total of 2000 augmented samples per class, an excessive number could result in diminishing returns and
bringing the total number of augmented images to 8258. computational inefficiency. Therefore, experimentation with
Next, augmented images are integrated into a Generative various amounts of synthetic data is utilized to study the
Adversarial Network (GAN) to generate synthetic samples relationship between the quantity of augmented data and
that mimic the original data distribution. The Generator and model performance. The analysis will involve:
Discriminator of the GAN are constructed using sequential • Training the model with different sets of synthetic
models with Leaky ReLU activations, and the GAN is trained samples (e.g., 1000, 2000, and 4000 samples per class).
using the Adam optimizer. The images generated by the • Evaluating model performance after each training ses-
GAN are periodically saved during training and normalized sion.
before use. The primary purpose of the GAN is to produce • Visualizing the relationship between synthetic sample
high-quality synthetic images that augment the dataset and size and model performance metrics to identify the
enhance model performance, particularly in scenarios where optimal number of augmented images for this task.
real data is scarce or expensive to collect. The step-by-step
approach is explained in Figure 5. C. IMPLEMENTATION DETAILS
This implementation involved training a deep learning
1) MATHEMATICAL REPRESENTATION OF AUGMENTATION model for image classification using knowledge distillation.
PROCESS Initially, data were loaded and normalized by scaling pixel
Mathematically, the data augmentation process can be values to the range [0, 1], followed by one-hot encoding
described by applying a series of transformations T to an of labels for 4 classes. The dataset was split into training

2120 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

TABLE 2. The performance metric analysis of both the separated and TABLE 3. Comparative analysis of facial recognition techniques and their
HED models. results.

and validation sets with 90-10 split. VGG16, ResNet50, and


DenseNet121 base models were defined, each initialized
without pre-trained weights, and their outputs were combined
after global average pooling. The combined features are
passed through dense layers with L2 regularization and
dropout (50%) before the final classification layer. The
teacher model was compiled with the Adam optimizer
(learning rate 0.0001) and trained with early stopping
(patience 10) and learning rate reduction (factor 0.2, patience
5) callbacks over 30 epochs with a batch size of 32.
After training, the logits of the teacher model are extracted
to guide the student model. The student model is simpler
and consists of convolutional layers followed by global
average pooling and dense layers. A custom distillation loss
function was defined, incorporating both student and teacher
predictions, scaled by a temperature parameter of 3. A custom
training loop handles the distillation process. The distillation
model was trained for 350 epochs with a batch size of 32,
evaluated, and the training progress of both the teacher and
student models was visualized. Finally, the trained models
were saved for future use.
This entire process was conducted on an HP Z2 G4 GPU
system equipped with 32GB RAM, a 1TB HDD, and an Intel
i9-9900K processor, ensuring sufficient computational power
and storage for deep learning tasks.

D. RESULTS
The results were obtained in two phases using the separated
model and the HED model.

1) SEPARATED MODEL 2) PERFORMANCE EVALUATION OF FACIAL RECOGNITION


The VGG16 model produced an accuracy of 92.12% and METHODS ACROSS DIVERSE DATASETS AND CONDITIONS
a loss of 0.5711 across 30 epochs. The accuracy of the This state-of-art analysis evaluates various facial recognition
ResNet50 model was 73.06% with a loss of 0.7943. methods across multiple datasets and conditions shown
Eventually, the DenseNet121 model demonstrated a 94.9% in table 3. The techniques include identifying individuals
accuracy rate with a loss of 0.4748. Table 2 lists the in suspicious attire while also detecting emotions, age,
performance metrics for each of the pre-trained models and gender using the IMDb and FER2013 datasets, which
individually. achieved an accuracy of 80% [23]. Additionally, recognizing
The findings show how different architectures differ in criminals in images yielded a confidence level of 75% with
terms of performance on the provided dataset, out of the three the Labelled Faces in the Wild dataset [18]. Real-time facial
models, DenseNet121 had the best accuracy, closely followed detection via security cameras showed an accuracy of 86%
by VGG16, while ResNet50 had the lowest accuracy, based on live video feeds [43]. Remarkably, a proprietary
as shown in Figure 6(a) and 6(b). Although, the individual dataset reached a 90% accuracy in identifying criminals from
models had marked accuracy in VGG16 and DenseNet121, different angles, including those with obstructions and masks
they misclassified the given image. Therefore, there is a lack [44]. Furthermore, face tracking under varying conditions,
of accurate prediction. such as changes in lighting and posture, averaged an accuracy

VOLUME 13, 2025 2121


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

FIGURE 6. 6(a) and 6(b) shows the performance metrics of the individual deep learning models.

of 86.09% across several datasets [45]. Attempts to identify TABLE 4. Benchmarking face recognition: Accuracy, efficiency, and model
size.
faces in low-resolution images achieved an accuracy of 88.6%
using the SCface dataset [46]. In contrast, the SCAAI-FSL
dataset showed a lower average accuracy of 72.72% due
to issues like diverse facial orientations and occlusions
[47]. Lastly, the proposed method for identifying suspects
in a variety of scenarios, including low-light and blurred
images, achieved remarkable accuracy rates of 98.42% for
the HED Teacher model and 96.78% for the student model,
highlighting its effectiveness in challenging recognition
tasks.
4) EVALUATION METRICS FOR HED MODEL AND
3) HED MODEL
BENCHMARK FACE RECOGNITION MODELS
The suggested model was executed on the dataset. The train-
ing durations and validation accuracies of both the teacher The table 4 compares the performance of the proposed
and student models provided valuable insights into their HED model with several state-of-the-art face recognition
performances across different epochs. The teacher model models, including ArcFace, FaceNet, SphereFace, CosFace,
trained over 30 epochs, consistently required 292 seconds per MobileFaceNet, DeepFace, and OpenFace, across metrics
epoch to complete its training cycle, reaching an accuracy of such as accuracy, F1 score, inference time, and model size.
98.42%. There was also a gradual decrease in loss, as shown The HED model achieves the highest accuracy (98.42%),
in Figure 7. surpassing ArcFace (98.3%) and FaceNet (98.0%), while
In contrast, the student model underwent training for up to also demonstrating an excellent balance between precision
350 epochs, with each epoch consistently taking 35 seconds, and recall with an F1 score of 0.98. Additionally, it offers
showing a notable efficiency in model training over time. the fastest inference time (8 ms), making it well-suited for
Figure 8 illustrates how the student model validation accuracy real-time applications. Furthermore, the HED model has a
rose and validation loss fell with the number of training significantly smaller memory footprint (60 MB) compared
epochs. It finally reached a validation accuracy of 96.78% to larger models like ArcFace (200 MB) and FaceNet (250
with a validation loss greater than 0.0000e+00. Additionally, MB), showcasing its efficiency and suitability for resource-
there was a consistent increase in precision and recall values, constrained environments.
as shown in Figure 9.
Figure 10 shows the gradual increase in accuracy over 5) EVALUATION OF HUMAN IDENTIFICATION ACCURACY
the duration of 50 epochs: starting at 71.18% accuracy UNDER VARYING CONDITIONS
after 50 epochs, reaching 76.7% at 100 epochs, and further The prediction table 5 assesses the performance of the
improving to 89.23% at 150 epochs. Subsequently, the human identification system across various cases involving
accuracy was significant, achieving 92.31% after 200 epochs both in-sample and out-of-sample data. For Subjects 1 to 4,
and culminating in impressive 94.11%, 96.07%, and 96.78% which were all in-sample, the system consistently detected
accuracy after 250, 300, and 350 epochs, respectively. individuals across a range of conditions. For Subject 1, the
This consistent enhancement in accuracy highlights the system accurately identified individuals in three cases: an
efficacy of knowledge distillation, in which the student model indoor setting with a resolution of 320×240 from a frontal
leverages distilled insights from the teacher model to refine view, an outdoor scene with 180160 resolution from a left
its predictions and performance. side view with multiple persons, and a blurred image with a

2122 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

FIGURE 7. Training and validation accuracy and loss over epochs in teacher model.

FIGURE 8. Training and validation of accuracy and loss over epochs in student model.

FIGURE 9. 9(a) Precision and 9(b) Recall for the student model.

resolution of 240×180 from a right side view. Similarly, for successfully detected in an indoor setting with a 320×240
Subject 2, the system correctly identified individuals in an resolution from a left side view, an outdoor scene with a
indoor scene with a 254×133 resolution from a frontal view, 180×160 resolution from a right side view, and a blurred
an outdoor scene with a 180×160 resolution and multiple image from a frontal view with a resolution of 240×180.
persons from a right side view, and a blurred image with a Subject 4 was accurately detected in an indoor setting with a
240×180 resolution from a left side view. Subject 3 was also resolution of 160×120 from a frontal view, an outdoor scene

VOLUME 13, 2025 2123


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

TABLE 5. Prediction results of human identification system across various scenarios.

ability to generalize effectively may be limited by this


very small sample size, especially in real-world scenarios
when unanticipated variables like occlusion or blurriness are
present.
• Mitigation: Data augmentation techniques, such as
creating synthetic data using Generative Adversar-
ial Networks (GANs), were used to overcome this
constraint. This increased sample size and diversity
improved generalizability.

B. DATASET BIAS
The model may overfit on specific facial features as a result
of the dataset’s four subjects, which could hinder its capacity
to generalize across various demographic groups.
FIGURE 10. The validation accuracy of the student model over the • Mitigation: There are plans to expand the dataset to
duration of 50 epochs. include a wider range of demographics. To further
reduce bias, methods including data augmentation,
with a 180×160 resolution from a left side view, and a blurred synthetic data generation, and transfer learning from
image with a 120×154 resolution from a right side view. different datasets will be used.
For the out-of-sample data involving Subject 5, the system
correctly did not detect the individual in all three cases: an C. REAL-WORLD APPLICATION PERFORMANCE
indoor scene with a resolution of 120×154 from a frontal The flexibility of the model in different circumstances
view, an outdoor scene with a 240×180 resolution from a may be limited by the dataset’s potential inability to
left side view, and a blurred image with a high resolution accurately represent real-world conditions, such as shifting
of 1080×1350 from a right side view. This showcases that environmental elements (such harsh weather or obstructions)
the model was able to correctly reject the out-of-sample cases and posture fluctuations.
demonstrates how well it can distinguish between in-sample • Mitigation: Future research will focus on testing the
and out-of-sample data. This is a significant benefit since it model in difficult settings and a range of situations. Its
shows that the model is accurately classifying unknown data resilience will be increased by using domain adaptation
and successfully identifying persons that are a part of the and optimization with actual data.
dataset.
D. LIMITATIONS OF SYNTHETIC DATA GENERATION
VI. DISCUSSION AND LIMITATIONS GANs are used to generate high-quality synthetic images,
A. LIMITED SIZE OF LABELED DATASETS but they might not perfectly capture the nuances of real-
The dataset used in this study consists of 1,000 photographs, world photos, which could affect the model capacity for
250 images for each of the four participants. The model’s generalization.

2124 VOLUME 13, 2025


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

• Mitigation: A hybrid training strategy that combines real-world scenarios, where both accuracy and efficiency are
actual and synthetic data will help balance the disparities paramount.
between the two, while ongoing improvements to GAN Future work could focus on extending the applicability of
technology will raise the caliber of synthetic data. this model to other domains such as medical imaging, security
surveillance, and autonomous systems. This adaptability will
further emphasize the versatility of the HED model, paving
VII. CONCLUSION the way for its use in industries where accurate, efficient,
This study presents a novel Hybrid Ensemble Distilla- and real-time decision-making is essential. Additionally,
tion (HED) model that integrates VGG16, ResNet50, and testing the model with larger and more diverse datasets could
DenseNet121, leveraging ensemble learning and knowledge help assess its generalization capability in different real-
distillation to balance high accuracy with computational world environments, enhancing its robustness across various
efficiency. The model demonstrates impressive performance, challenging scenarios.
achieving a validation accuracy of 96.78% after 350 epochs,
making it a strong candidate for real-time applications in REFERENCES
resource-constrained environments. The use of data aug- [1] W. W. Bledsoe and H. Chan, ‘‘A man-machine facial recognition
mentation through Generative Adversarial Networks (GANs) system-some preliminary results,’’ Panoramic Res., Palo Alto, CA, USA,
Tech. Rep. PRI A 19, 1965.
further improves the robustness, addressing challenges such [2] T. Kanade, ‘‘Picture processing system by computer complex and
as limited data availability and diverse imaging conditions of recognition of human faces,’’ Ph.D. thesis, Kyoto Univ., Japan, 1974.
the model. [3] M. Kirby and L. Sirovich, ‘‘Application of the Karhunen–Loeve pro-
cedure for the characterization of human faces,’’ IEEE Trans. Pattern
The findings from this work contribute to the field of Anal. Mach. Intell., vol. 12, no. 1, pp. 103–108, Jan. 1990, doi:
face recognition and identification in several meaningful 10.1109/34.41390.
ways. First, the ability to distillation knowledge from a large, [4] M. Turk and A. Pentland, ‘‘Eigenfaces for recognition,’’ J. Cogn. Neurosci.,
vol. 3, no. 1, pp. 71–86, Jan. 1991, doi: 10.1162/jocn.1991.3.1.71.
complex teacher model into a smaller, efficient student model [5] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, ‘‘Eigenfaces vs.
significantly reduces computational requirements, making fisherfaces: Recognition using class specific linear projection,’’ IEEE
the system more practical for real-time applications. This Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997,
doi: 10.1109/34.598228.
is particularly important in contexts where computational
[6] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, ‘‘DeepFace: Closing the
resources are limited, such as in mobile devices, edge com- gap to human-level performance in face verification,’’ in Proc. IEEE Conf.
puting, or surveillance systems. Moreover, by demonstrating Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1701–1708.
the synergy between VGG16, ResNet50, and DenseNet121, [7] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified
embedding for face recognition and clustering,’’ in Proc. IEEE Conf.
the model capitalizes on the strengths of each architecture, Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, Jun. 2015,
enhancing its ability to generalize across different types of pp. 815–823, doi: 10.1109/CVPR.2015.7298682.
input data and challenging scenarios. [8] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, ‘‘SphereFace:
Deep hypersphere embedding for face recognition,’’ in Proc. IEEE Conf.
The inclusion of GAN-based data augmentation also Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 212–220.
provides a significant contribution to the field, addressing the [9] R. Ullah, H. Hayat, A. A. Siddiqui, U. A. Siddiqui, J. Khan, F. Ullah,
issue of data scarcity in face recognition tasks. By generating S. Hassan, L. Hasan, W. Albattah, M. Islam, and G. M. Karami,
‘‘A real-time framework for human face detection and recognition in
high-quality synthetic images, the model can be trained on CCTV images,’’ Math. Problems Eng., vol. 2022, pp. 1–12, Mar. 2022,
a more diverse and comprehensive dataset, improving its doi: 10.1155/2022/3276704.
ability to handle real-world variability. This approach not [10] M. Jacquet and C. Champod, ‘‘Automated face recognition in forensic
science: Review and perspectives,’’ Forensic Sci. Int., vol. 307, Feb. 2020,
only enhances the model’s performance but also provides a Art. no. 110124, doi: 10.1016/j.forsciint.2019.110124.
framework for tackling other machine learning tasks where [11] K. Amjad, P. D. A. A. Malik, and D. S. Mehta, ‘‘A technique and
data is limited or difficult to obtain. architectural design for criminal detection based on Lombroso theory using
deep learning,’’ Lahore Garrison Univ. Res. J. Comput. Sci. Inf. Technol.,
Furthermore, this work highlights the potential for hybrid vol. 4, no. 3, pp. 47–63, Sep. 2020, doi: 10.54692/lgurjcsit.2020.040398.
models in face recognition tasks, where combining the [12] Y. Zennayi, S. Benaissa, H. Derrouz, and Z. Guennoun, ‘‘Unauthorized
strengths of multiple architectures and leveraging knowledge access detection system to the equipments in a room based on the persons
identification by face recognition,’’ Eng. Appl. Artif. Intell., vol. 124,
distillation offers a balanced trade-off between performance Sep. 2023, Art. no. 106637, doi: 10.1016/j.engappai.2023.106637.
and efficiency. The ability to improve accuracy while [13] H. A. Abdelali, H. Derrouz, Y. Zennayi, R. O. H. Thami, and F. Bourzeix,
reducing computational complexity is a critical aspect for ‘‘Multiple hypothesis detection and tracking using deep learning for video
traffic surveillance,’’ IEEE Access, vol. 9, pp. 164282–164291, 2021, doi:
deploying such models in practical applications, such as 10.1109/ACCESS.2021.3133529.
public surveillance, biometric security systems, and real-time [14] M. Pang, Y.-M. Cheung, B. Wang, and R. Liu, ‘‘Robust heterogeneous
human identification in various environments. discriminative analysis for face recognition with single sample per
person,’’ Pattern Recognit., vol. 89, pp. 91–107, May 2019, doi:
In conclusion, the HED model not only advances the field
10.1016/j.patcog.2019.01.005.
of face recognition but also opens up new possibilities for [15] Y. Lei and B. Huang, ‘‘Prediction of criminal suspect characteristics with
the development of efficient, real-time human identification application of wavelet neural networks,’’ Appl. Math. Nonlinear Sci., vol. 9,
systems. Its ability to balance high performance with reduced no. 1, pp. 1–18, Jan. 2024, doi: 10.2478/amns.2023.2.01313.
[16] L. Lei, ‘‘Wavelet neural network prediction method of stock price trend
computational complexity makes it a valuable contribution based on rough set attribute reduction,’’ Appl. Soft Comput., vol. 62,
to the ongoing efforts to deploy deep learning models in pp. 923–932, Jan. 2018, doi: 10.1016/j.asoc.2017.09.029.

VOLUME 13, 2025 2125


V. Munusamy, S. Senthilkumar: Leveraging Lightweight HED for Suspect Identification

[17] S. Ayyappan and S. Matilda, ‘‘Criminals and missing children iden- [36] N. Choudhary, P. S. Rathore, L. Kumar, R. Rajaan, A. Sharma, and
tification using face recognition and Web scrapping,’’ in Proc. Int. D. Sinha, ‘‘ResNet-50 powered masked face detection: A deep learning
Conf. Syst., Comput., Autom. Netw. (ICSCAN), Jul. 2020, pp. 1–5, doi: perspective,’’ in Proc. IEEE 9th Int. Conf. Converg. Technol. (I2CT),
10.1109/ICSCAN49426.2020.9262390. Apr. 2024, pp. 1–5, doi: 10.1109/I2CT61223.2024.10543563.
[18] S. Sandhya, A. Balasundaram, and A. Shaik, ‘‘Deep learning [37] M. M. Hasan, M. A. Hossain, A. Y. Srizon, A. Sayeed, M. Ahmed, and
based face detection and identification of criminal suspects,’’ M. R. Haquek, ‘‘Improving performance of a pre-trained ResNet-50 based
Comput., Mater. Continua, vol. 74, no. 2, pp. 2331–2343, 2023, doi: VGGFace recognition system by utilizing retraining as a heuristic step,’’
10.32604/cmc.2023.032715. in Proc. 24th Int. Conf. Comput. Inf. Technol. (ICCIT), Dec. 2021, pp. 1–6,
[19] K. P. Teja, G. D. Kumar, and T. P. Jacob, ‘‘Face detection and doi: 10.1109/ICCIT54785.2021.9689918.
recognition for criminal identification,’’ in Proc. 8th Int. Conf. [38] B. Li and D. Lima, ‘‘Facial expression recognition via ResNet-50,’’
Commun. Electron. Syst. (ICCES), Jun. 2023, pp. 1431–1435, doi: Int. J. Cogn. Comput. Eng., vol. 2, pp. 57–64, Jun. 2021, doi:
10.1109/ICCES57224.2023.10192845. 10.1016/j.ijcce.2021.02.002.
[20] S. Jagtap, N. B. Chopade, and S. Tungar, ‘‘An investigation of face [39] B. Li, ‘‘Facial expression recognition by DenseNet-121,’’ in Multi-
recognition system for criminal identification in surveillance video,’’ in Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different
Proc. 6th Int. Conf. Comput., Commun., Control Autom. (ICCUBEA, Complex Systems. New York, NY, USA: Academic, 2022, pp. 263–276,
Aug. 2022, pp. 1–5, doi: 10.1109/ICCUBEA54992.2022.10010987. doi: 10.1016/B978-0-323-90032-4.00019-5.
[21] R. Bhatt, S. Malik, R. Arora, G. Agarwal, S. Sharma, and A. Dhablia, [40] S. Yu, S. E. Kim, K. H. Suh, and E. C. Lee, ‘‘Face spoofing detection
‘‘Recognition of criminal faces from wild videos surveillance system using using DenseNet,’’ in Proc. Int. Conf. Intell. Hum. Comput. Interact. Cham,
VGG-16 architecture,’’ in Proc. Int. Conf. Data Sci. Netw. Secur. (ICD- Switzerland: Springer, Jan. 2021, pp. 229–238, doi: 10.1007/978-3-030-
SNS), Jul. 2023, pp. 1–8, doi: 10.1109/ICDSNS58469.2023.10245450. 68452-5_24.
[22] D. Gupta, A. Saini, S. Bhagat, S. Uppal, R. Raj Jain, D. Bhasin, [41] A. Nandy, ‘‘A densenet based robust face detection framework,’’ in
P. Kumaraguru, and R. Ratn Shah, ‘‘A suspect identification framework Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019,
using contrastive relevance feedback,’’ in Proc. IEEE/CVF Winter pp. 1840–1847.
Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 4361–4369, doi: [42] M. Torky, A. Bakheit, M. Bakry, and A. E. Hassanien, ‘‘Deep learning
10.1109/WACV56688.2023.00434. model for recognizing monkey pox based on dense net-121 algorithm,’’
[23] K. Rasanayagam, S. D. D. C. Kumarasiri, W. A. D. D. Tharuka, MedRxiv, pp. 2022–12, 2022, doi: 10.1101/2022.12.20.22283747.
N. T. Samaranayake, P. Samarasinghe, and S. E. R. Siriwardana, ‘‘CIS: [43] N. Zhang, J. Luo, and W. Gao, ‘‘Research on face detection technology
An automated criminal identification system,’’ in Proc. IEEE Int. based on MTCNN,’’ in Proc. Int. Conf. Comput. Netw., Electron. Autom.
Conf. Inf. Autom. Sustainability (ICIAfS), Dec. 2018, pp. 1–6, doi: (ICCNEA), Xi’an, China, Sep. 2020, pp. 154–158.
10.1109/ICIAFS.2018.8913367. [44] V. Munusamy and S. Senthilkumar, ‘‘Face identification of suspects using
[24] S. T. Ratnaparkhi, A. Tandasi, and S. Saraswat, ‘‘Face detection and sequential -deep convolutional neural network,’’ in Proc. 2nd Int. Conf.
recognition for criminal identification system,’’ in Proc. 11th Int. Conf. Emerg. Trends Inf. Technol. Eng. (ICETITE), Feb. 2024, pp. 1–3, doi:
Cloud Comput., Data Sci. Eng. (Confluence), Jan. 2021, pp. 773–777, doi: 10.1109/ic-ETITE58242.2024.10493654.
10.1109/Confluence51648.2021.9377205. [45] G. Zheng and Y. Xu, ‘‘Efficient face detection and tracking in video
[25] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, ‘‘Joint face detection and sequences based on deep learning,’’ Inf. Sci., vol. 568, pp. 265–285,
alignment using multitask cascaded convolutional networks,’’ IEEE Aug. 2021.
Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016, doi: [46] N. K. Mishra, M. Dutta, and S. K. Singh, ‘‘Multiscale parallel deep CNN
10.1109/LSP.2016.2603342. (mpdCNN) architecture for the real low-resolution face recognition for
surveillance,’’ Image Vis. Comput., vol. 115, Nov. 2021, Art. no. 104290.
[26] N. Rafter, ‘‘Cesare Lombroso and the origins of criminology: Rethinking
criminological tradition 1,’’ in The Essential Criminology Reader, 2nd ed., [47] A. Holkar, R. Walambe, and K. Kotecha, ‘‘Few-shot learning for face
Evanston, IL, USA: Routledge, 2018, pp. 33–42. recognition in the presence of image discrepancies for limited multi-class
datasets,’’ Image Vis. Comput., vol. 120, Apr. 2022, Art. no. 104420.
[27] A. B. Perdana and A. Prahara, ‘‘Face recognition using light-convolutional
neural networks based on modified VGG16 model,’’ in Proc. Int. Conf.
Comput. Sci. Inf. Technol. (ICoSNIKOM), Nov. 2019, pp. 1–4, doi: VAISHNAVI MUNUSAMY received the B.E.
10.1109/ICoSNIKOM48755.2019.9111481. degree in computer science and engineering from
[28] A. K. Dubey and V. Jain, ‘‘Automatic facial recognition using VGG16 Anna University, Tiruchirappalli, in 2011, and the
based transfer learning model,’’ J. Inf. Optim. Sci., vol. 41, no. 7,
M.Tech. degree in database systems from SRM
pp. 1589–1596, Oct. 2020, doi: 10.1080/02522667.2020.1809126.
University, Chennai, in 2013. She is currently
[29] O. K. Sikha and B. Bharath, ‘‘VGG16-random Fourier hybrid model for
pursuing the Ph.D. degree with Vellore Institute
masked face recognition,’’ Soft Comput., vol. 26, no. 22, pp. 12795–12810,
of Technology (VIT), Vellore, Tamil Nadu, India.
Nov. 2022, doi: 10.1007/s00500-022-07289-0.
[30] S. A. Dar, ‘‘Neural networks (CNNs) and VGG on real time face
She has approximately seven years of teaching
recognition system,’’ Turkish J. Comput. Math. Educ., vol. 12, no. 9, experience. Her research interests include machine
pp. 1809–1822, Apr. 2021. learning, computer vision, and neural networks,
[31] H. Aung, A. V. Bobkov, and N. L. Tun, ‘‘Face detection in real time with her current work concentrating on person identification and the
live video using YOLO algorithm based on VGG16 convolutional neural integration of machine learning techniques into lightweight models.
network,’’ in Proc. Int. Conf. Ind. Eng., Appl. Manuf. (ICIEAM), May 2021,
pp. 697–702, doi: 10.1109/ICIEAM51226.2021.9446291. SUDHA SENTHILKUMAR received the B.E.
[32] H. Chen and C. Haoyu, ‘‘Face recognition algorithm based on VGG degree in CSE from Madras University, the
network model and SVM,’’ J. Phys., Conf. Ser., vol. 1229, no. 1, May 2019,
M.Tech. degree in information technology and
Art. no. 012015, doi: 10.1088/1742-6596/1229/1/012015.
engineering from Vellore Institute of Technology,
[33] Y. Pratama, L. M. Ginting, E. H. L. Nainggolan, and A. E. Rismanda,
Vellore, and the Ph.D. degree from the School
‘‘Face recognition for presence system by using residual networks-50
architecture,’’ Int. J. Electr. Comput. Eng., vol. 11, no. 6, p. 5488, of Information Technology and Engineering, VIT
Dec. 2021, doi: 10.11591/ijece.v11i6.pp5488-5496. University. She is currently a Professor with the
[34] D. A. Wangean, G. Pangestu, S. Setyawan, F. I. Maulana, E. P. Gunawan, School of Computer Science and Engineering,
and C. Huda, ‘‘The implementation of ResNet-50 architecture for face VIT University. She has authored more than
recognition algorithm in attendance system,’’ in Proc. AIP Conf., 2024, 55 research articles in reputed international and
vol. 2927, no. 1, pp. 1–14, doi: 10.1063/5.0205236. conferences. She has published few books in the reputed publisher. Her
[35] J.-R. Lee, K.-W. Ng, and Y.-J. Yoong, ‘‘Face and facial expressions current research interests include cryptography, network security, big data,
recognition system for blind people using ResNet50 architecture and block chain technologies, machine learning, deep learning, and cloud
CNN,’’ J. Informat. Web Eng., vol. 2, no. 2, pp. 284–298, Sep. 2023, doi: computing. She is a Lifetime Member of the Computer Society of India.
10.33093/jiwe.2023.2.2.20.

2126 VOLUME 13, 2025

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy