Criminal Identification - Basepaper
Criminal Identification - Basepaper
ABSTRACT Face recognition biometric systems focus on identifying individuals by extracting their
facial characteristics. However, these systems often fail or are misclassified because of external factors,
obstructions, and varying environmental conditions. Traditional models cannot effectively handle these
variations, leading to inaccuracies. Moreover, the complexity and computational demands of advanced
models can hinder their real-time application. In this study, the Hybrid Ensemble Distillation (HED) model
addresses these issues by leveraging both knowledge distillation and an ensemble of pre-trained models
(VGG16, ResNet50, and DenseNet121) to enhance the precision and proficiency of categorization. The
model combines the strengths of these architectures while utilizing data augmentation techniques such as
GANs to enhance the training dataset. The proposed model demonstrated high efficiency and accuracy, with
the teacher model achieving 98.42% accuracy and the student model reaching 96.78% validation accuracy,
thereby highlighting the efficacy of knowledge distillation. It also showed progressive improvements in the
validation accuracy and loss reduction over 350 epochs, emphasizing the robustness of the training process.
This lightweight method helps identify suspects or individuals because the model was trained using 360-
degree images in the dataset, ensuring comprehensive feature extraction from multiple angles. The reduced
computational requirements and high accuracy make this approach suitable for real-time applications,
thereby enhancing its practicality for various human identification tasks.
and more. However, many current person identification II. RELATED WORK
systems focus on attire, dressing patterns, or color rather than YOLOv5s which are utilized for person detection, were
facial or gestural recognition. Ideally, these systems should proposed by Zennayi et al., [4], who also employed
mimic human vision—humans can recognize known individ- the Multi-Hypothesis Data Association Tracking (MHDT)
uals from a distance or through partial visibility. Furthermore, method [5] for tracking individuals across frames. The
most recognition systems prioritize superficial features like image acquisition module uses a standard CCTV camera
attire or color patterns over intrinsic characteristics like facial to capture and preprocess the images. Location analysis
and gestural traits, limiting their applicability in scenarios involves tracking the movements of individuals to ensure that
involving partial visibility or long-distance recognition. they do not access unauthorized areas. A retinaface is used for
This study advances beyond existing literature by address- detecting faces within bounding boxes, and a deep learning
ing these critical gaps through the development of a Hybrid algorithm facilitates face recognition. Unauthorized access
Ensemble Distillation (HED) model. The HED model is identification combines movement and identity data to detect
designed to: and report an unauthorized access.
• Enhance robustness against low illumination, motion Heterogeneous feature extraction and face identification
blur, and partial occlusion. are the two primary elements of the RHDA (Robust
• Simulate human-like recognition capabilities to ensure Heterogeneous Discriminative Analysis) method, which is
accurate identification even under challenging condi- used to handle identification process based single pic-
tions. ture samples [6]. After creating intrinsic and penalty
• Reduce computational complexity for real-time deploy- graphs, two discriminative manifold embeddings DSME
ment in practical applications. (‘‘Discriminative Single-Manifold Embedding’’) and DMME
The proposed approach is validated using a custom (‘‘Discriminative Multi-Manifold Embedding’’) are proposed
dataset specifically curated to simulate diverse environmental in heterogeneous feature extraction in order to produce
conditions, enabling a direct evaluation of its perfor- heterogeneous subspace representations for image patches
mance against classical models such as VGG16, ResNet50, based on a Fisher-like criterion. Two distance metrics, the
and DenseNet121. By benchmarking these models, the patch-to-patch and patch-to-manifold, were combined to
study demonstrates the superior robustness, efficiency, and create a fusion technique for face identification. This method
real-time applicability of the HED model compared to uses joint majority voting to identify each unlabeled query
existing methods. sample by utilizing heterogeneous subspace representations.
Key Contributions: The process involves partitioning facial images into non-
• Enhanced Robustness: Development of a lightweight overlapping patches, constructing local dictionaries, and pro-
model tailored for real-time applications, reducing the jecting patches into subspaces. The Fisher-like criterion helps
likelihood of false positives in face recognition. preserve within-class patch relationships while suppressing
• Human Vision Simulation: Implementation of a recog- between-class similarities. Finally, joint majority voting
nition system based on human vision principles, combines the outputs from both distance metrics, improving
ensuring accurate identification even in challenging identification accuracy and robustness against various facial
scenarios. variations by aggregating complementary information from
• Comprehensive Benchmarking: Evaluation of classical the two metrics.
object detection and recognition algorithms, including The application of Wavelet Neural Networks (WNNs)
VGG16, ResNet50, and DenseNet121, using the custom has been explored to predict criminal suspect characteristics
dataset to benchmark their performance in face recogni- from complex, nonlinear crime data, addressing challenges
tion tasks. such as dimensionality catastrophes and overfitting that
• Real-Time Suitability: Optimization of the model for plague traditional models [7]. WNNs, which merge wavelet
real-time applications, making it suitable for public transforms with neural networks, excel in terms of fitting
safety, surveillance, and other time-sensitive domains. accuracy and generalization. The methodology includes data
Thus, critical gaps in the field including the handling of preprocessing, feature selection using the information gain
illumination variability, ensuring robustness under adverse method, and parameter optimization via Particle Swarm
conditions, and achieving lightweight implementation for Optimization (PSO) for support vector machines within the
practical use are addressed in this work. Section II analyzes WNN [8]. The model was trained and validated, predicting
the literature survey on gait recognition. A detailed archi- suspect features based on case and victim data, which were
tecture of the proposed model is discussed in Section III, then matched against a suspect pool. Key functions of
while Section IV highlights the lightweight nature of wavelet analysis, such as the mother wavelet function and
the model. Section V presents an in-depth analysis of discrete wavelet transform, enable a detailed signal analysis
the experimental results, followed by a discussion of the at multiple scales. The WNN structure utilizes the Morlet
limitations in Section VI. Finally, the findings and future and Mexican Hat functions and is applied to face feature
research directions are summarized in Section VII. recognition, evaluated by precision, recall, and similarity
measures. To address the big data challenge, this study images. The classifier, trained using positive and negative
employs Hadoop for distributed computing, improving the images, applied a cascaded approach with thresholds and
efficiency through parallelized MapReduce implementations decision rules to detect faces, subsequently facilitating face
for feature selection. In addition, Haar features are used recognition or identification tasks.
for grayscale changes in images, which are crucial for To handle face recognition of suspects, with challenges
face recognition. A performance analysis on the CIFAR like tilted or side faces, the machine learning approach
dataset shows that TI-ResGWNN outperforms traditional of Haar Cascade has been associated with DNN, and an
methods with fewer parameters. Thus, WNNs offer a robust extra preprocessing stage is introduced, involving scaling
solution for predicting suspect characteristics, leveraging the detected face to a predetermined size and adjusting
wavelet analysis, neural networks, optimization techniques, the alignment using facial landmarks [12]. A Local Binary
and distributed computing to enhance law enforcement Pattern Histogram (LBPH) was then used to identify the
investigations. preprocessed face. Using TensorFlow, dlib, and OpenCV,
The face detection method employs a Haar Cascade the system was put into practice. The Haar Cascade, known
Classifier to identify faces in the images. Web Scraping for its speed, employs three basic features (edge, line, and
utilizes the Python request library and BeautifulSoup to four-rectangle) but has limitations such as false positives
extract images and information from specified websites, and sensitivity to face orientation changes. In contrast,
ensuring that data are continuously updated without requiring the DNN-based detector, which use the SSD model and
retraining [9]. Feature Extraction uses OpenCV to detect ResNet-10 architecture, handles various face orientations
Multiscale, converting images to grayscale, and HSV to and occlusions effectively but is slower. The workflow of
extract features stored as unique integer arrays, mapped via the system includes image capture, face detection using
deep learning. In the Template Comparison, Facepplib API Haar and DNN, face alignment, training of the recognizer,
in Python compares face vectors by calculating similarity and recognition by comparing new faces with the trained
levels based on feature matching using confidence values to database.
determine matches. If the confidence exceeds a threshold, the The VGG-16 architecture employs distance metrics to
faces are considered identical and the relevant information compare facial embeddings and generate identification
is displayed. The author utilized user-provided suspect probabilities, and aims to reduce false positives in face
images and dynamically accessed web images of criminals recognition systems by implementing prediction findings
and missing children, including video frames processed at and tracking objects to accumulate recognition ratings for
intervals to reduce redundancy. each face [13]. To address the time constraints in real-
Sandhaya et al., [10] proposed a method for face detection time identification, downsampling techniques are applied to
and identification of suspects based on deep learning. Ini- reduce the processing time without compromising accuracy.
tially, face detection was performed on input images or video A comprehensive classification score was obtained by
frames using an SSD model with Resnet-10, implemented combining various indicators, including face proximity and
through OpenCV DNN (Deep Neural Networks), which confidence scores. In addition, the method incorporates
detects and crops faces. To compare faces, an encoder in the a dynamic threshold to validate the identification results,
autoencoder converts images to embedding vectors, which ensuring reliable predictions in diverse scenarios. This
are then compared using Cosine Similarity. The system approach offers the potential benefit of enhancing security
employs a subset of the LFW dataset for model training in crime-prone areas by providing accurate and timely
owing to the hardware constraints. The autoencoder model surveillance analyse.
includes Conv2D and AveragePooling2D layers, with batch Gupta et al., [14] suggested a method to capture the
normalization for accuracy. Finally, the system uses Cosine notion of similarity in a user mind by associating positive
Similarity to compare the angles between vectors, thereby and negative image samples with selected and non-selected
determining the likelihood that the input image matches an images, respectively. In order to accomplish this, a fully
image of criminals in the database. connected neural network is trained with a Separating Cluster
A Convolutional Neural Network (CNN) leverages image Loss (SCLoss) to project pretrained base representations onto
recognition tasks such as face detection and criminal iden- a lower-dimensional space. This ensures that comparable
tification. The CNN architecture [11] involves several key images are closer together and dissimilar ones are farther
operations: convolution, ReLu as an non-linear activation, apart in the projected space. Disentangled representation
max, average, and global pooling, fully connected, and learning is utilized to provide favorable initialization, ensur-
output layer, each with specific formulas to compute the ing robustness to noise and distortion in real-time scenarios.
output dimensions and values. In addition, the Haar Cascade The SCLoss objective maximizes the similarity between
classifier is employed for face detection, utilizing a cascade selected images and minimizes it with non-selected images,
of Haar-like features to scan input images or video streams. enabling flexible use during online training. To ensure that
These features capture the intensity changes across image newly acquired images are linked to the projected space,
regions, and their responses are computed using integral which is composed of previously created clusters, anchoring
FIGURE 2. The sequential steps involved in the Hybrid Ensemble Distillation (HED) method.
targets (teacher) anticipated probability. The overall loss model, and α is a hyperparameter that balances the two loss
comprises: components.
• By comparing the student prediction with the actual
label, the typical categorical cross-entropy loss is 5) TRAINING THE STUDENT MODEL
evaluated.
The student model is designed to emulate the performance
• The distillation loss assesses how well the student model
of the teacher model while being more compact and
prediction matches with the soft targets of the teacher
efficient. The training process is organized as follows:
model.
The custom loss function is divided into two components:
Standard Categorical Cross-Entropy Loss: This component D. STUDENT MODEL ARCHITECTURE
is utilized to assess the student model predictions against The architecture of the student model is considerably
the actual labels (one-hot encoded). This is calculated as smaller than that of the teacher model, comprising three
the categorical cross-entropy between the predictions of the convolutional layers with 32, 64, and 128 filters, followed by
student model and the soft targets provided by the teacher a MaxPooling2D layer, Global Average Pooling, and a dense
model (logits adjusted by the temperature). The final loss layer containing 64 units. The output layer employs a softmax
function can be expressed as: activation function for multi-class classification.
Total Loss = α · CE(ystudent , ytrue )
1) OPTIMIZATION STRATEGY
+ (1 − α) · CE(ystudent , ysoft )
The training of the student model employs the Adam
where ystudent are the predictions of the student model, ytrue optimizer with a learning rate set at 0.0001. Additionally,
are the true labels, ysoft are the soft targets from the teacher callbacks are implemented for:
• Learning Rate Reduction: This mechanism decreases their high computational cost. In contrast, the proposed model
the learning rate when the validation loss stabilizes, reduces parameters by employing just two dense layers (126
facilitating quicker convergence of the model. and 64 units) after feature concatenation. This architecture
• Early Stopping: This feature halts the training process minimizes memory usage and accelerates inference time.
if there is no improvement in validation loss over a The total number of parameters in this model is significantly
predetermined number of epochs, thereby mitigating the smaller compared to the 138 million parameters in VGG16
risk of overfitting. and 25 million parameters in ResNet50, making the model
highly efficient while retaining performance.
2) TRAINING LOOP B. FLOPS (FLOATING POINT OPERATIONS PER SECOND)
The student model undergoes training for 350 epochs, during ANALYSIS
which the teacher model generates soft targets for each The computational complexity of the model is reduced by
training instance. The student model learns to approximate leveraging Global Average Pooling (GAP) layers in base
the predictions of the teacher through the process of models such as VGG16, ResNet50, and DenseNet121. GAP
distillation. layers significantly decrease the number of parameters passed
to fully connected layers, lowering the computational burden
IV. LIGHTWEIGHT NATURE OF THE MODEL FOR during inference. The FLOPS of this model is much lower
REAL-TIME SUSPECT IDENTIFICATION compared to ResNet50 (approximately 4.1 GFLOPS) and
A. REDUCED NUMBER OF PARAMETERS DenseNet121, demonstrating the lightweight nature of the
Traditional models such as VGG16, ResNet50, and model and its suitability for real-time applications without
DenseNet121 have large parameter counts, contributing to compromising accuracy.
C. MEMORY USAGE OPTIMIZATION usage, and computational efficiency. This balance between
With its reduced parameter count and efficient architecture, performance and efficiency, achieved through architectural
the memory footprint of the model is optimized to a optimizations and model distillation, makes the model ideal
value significantly lower than the memory requirements of for real-time suspect identification in both resource-rich and
traditional models like VGG16 and ResNet50. This low resource-constrained environments, effectively addressing
memory usage makes the model suitable for deployment the challenges of real-time deployment.
on resource-constrained devices, ensuring seamless perfor-
mance in real-time suspect identification tasks. V. EXPERIMENTAL RESULT
A. DATASET
D. INFERENCE TIME OPTIMIZATION One of the primary challenges in human identification is the
The model achieves faster inference times, essential for real- scarcity of large, labeled datasets that capture a wide range of
time decision-making. On the tested hardware system (HP Z2 real-world scenarios. Our dataset addresses this by including
G4 GPU with 32GB RAM and Intel i9-9900K processor), the 250 images per individual for four participants, resulting in
model processes an image much faster than larger models like a total of 1000 images. Each individual was photographed
VGG16 and ResNet50. This speed ensures rapid responses in under various controlled and uncontrolled conditions to
real-time applications, such as suspect identification. simulate real-world challenges. The dataset includes:
• Mugshots (standard frontal face images),
E. EFFICIENCY ON MODERATE HARDWARE • Masked images (with the face partially covered),
• Side profiles (both left and right),
Evaluated on a system with moderate computational
• Concealed images (where the lower portion of the face
resources (HP Z2 G4 GPU, 32GB RAM, Intel i9-
9900K processor), the model achieves competitive accuracy is hidden),
• Images taken under poor lighting conditions (night
while maintaining fast inference times and low resource
consumption. This demonstrates its scalability and suitability effects),
• Blurred images (to simulate motion or poor camera
for deployment in environments with limited computational
power, ensuring efficiency in real-world use cases. focus),
• Inverted images (to test the model’s ability to handle
Table 1 showcases that the HED Teacher Model, designed
for training purposes, achieves high accuracy (98.42%) rotated perspectives), and
• Pictures captured from a height of up to 22 feet (to mimic
with a loss of 0.2314. However, it is computationally
intensive, requiring 1.8 GFLOPS, 12.5 million parameters, surveillance camera angles in public areas).
and 150 MB of memory, making it suitable for resource- This diversity in image capture conditions introduces
rich environments. In contrast, the HED Student Model, variability that mimics real-world scenarios and makes the
distilled from the teacher model, achieves comparable dataset a good challenge for human identification tasks, even
accuracy (96.78%) with negligible loss while reducing though it is relatively small. However, to improve model
FLOPS by 72% and parameters by 74%, requiring only robustness and performance, we employed data augmentation
0.5 GFLOPS and 3.2 million parameters. This signifi- techniques and synthetic data generation through GANs.
cantly reduces memory usage to 60 MB, making it highly
efficient for resource-constrained devices. The extended B. DATA AUGMENTATION
training period of 350 epochs compensates for the reduction The data augmentation strategy is a crucial component of
in complexity, ensuring reliable performance in real-time the performance improvement of our model, especially in
applications. These optimizations result in a 72% reduction scenarios where acquiring a large volume of real-world data
in FLOPS and a 74% reduction in parameters compared is impractical. In our approach, we use the Augmentor library
to the teacher model, ensuring fast inference, low memory for conventional augmentations, such as Gaussian blur,
TABLE 2. The performance metric analysis of both the separated and TABLE 3. Comparative analysis of facial recognition techniques and their
HED models. results.
D. RESULTS
The results were obtained in two phases using the separated
model and the HED model.
FIGURE 6. 6(a) and 6(b) shows the performance metrics of the individual deep learning models.
of 86.09% across several datasets [45]. Attempts to identify TABLE 4. Benchmarking face recognition: Accuracy, efficiency, and model
size.
faces in low-resolution images achieved an accuracy of 88.6%
using the SCface dataset [46]. In contrast, the SCAAI-FSL
dataset showed a lower average accuracy of 72.72% due
to issues like diverse facial orientations and occlusions
[47]. Lastly, the proposed method for identifying suspects
in a variety of scenarios, including low-light and blurred
images, achieved remarkable accuracy rates of 98.42% for
the HED Teacher model and 96.78% for the student model,
highlighting its effectiveness in challenging recognition
tasks.
4) EVALUATION METRICS FOR HED MODEL AND
3) HED MODEL
BENCHMARK FACE RECOGNITION MODELS
The suggested model was executed on the dataset. The train-
ing durations and validation accuracies of both the teacher The table 4 compares the performance of the proposed
and student models provided valuable insights into their HED model with several state-of-the-art face recognition
performances across different epochs. The teacher model models, including ArcFace, FaceNet, SphereFace, CosFace,
trained over 30 epochs, consistently required 292 seconds per MobileFaceNet, DeepFace, and OpenFace, across metrics
epoch to complete its training cycle, reaching an accuracy of such as accuracy, F1 score, inference time, and model size.
98.42%. There was also a gradual decrease in loss, as shown The HED model achieves the highest accuracy (98.42%),
in Figure 7. surpassing ArcFace (98.3%) and FaceNet (98.0%), while
In contrast, the student model underwent training for up to also demonstrating an excellent balance between precision
350 epochs, with each epoch consistently taking 35 seconds, and recall with an F1 score of 0.98. Additionally, it offers
showing a notable efficiency in model training over time. the fastest inference time (8 ms), making it well-suited for
Figure 8 illustrates how the student model validation accuracy real-time applications. Furthermore, the HED model has a
rose and validation loss fell with the number of training significantly smaller memory footprint (60 MB) compared
epochs. It finally reached a validation accuracy of 96.78% to larger models like ArcFace (200 MB) and FaceNet (250
with a validation loss greater than 0.0000e+00. Additionally, MB), showcasing its efficiency and suitability for resource-
there was a consistent increase in precision and recall values, constrained environments.
as shown in Figure 9.
Figure 10 shows the gradual increase in accuracy over 5) EVALUATION OF HUMAN IDENTIFICATION ACCURACY
the duration of 50 epochs: starting at 71.18% accuracy UNDER VARYING CONDITIONS
after 50 epochs, reaching 76.7% at 100 epochs, and further The prediction table 5 assesses the performance of the
improving to 89.23% at 150 epochs. Subsequently, the human identification system across various cases involving
accuracy was significant, achieving 92.31% after 200 epochs both in-sample and out-of-sample data. For Subjects 1 to 4,
and culminating in impressive 94.11%, 96.07%, and 96.78% which were all in-sample, the system consistently detected
accuracy after 250, 300, and 350 epochs, respectively. individuals across a range of conditions. For Subject 1, the
This consistent enhancement in accuracy highlights the system accurately identified individuals in three cases: an
efficacy of knowledge distillation, in which the student model indoor setting with a resolution of 320×240 from a frontal
leverages distilled insights from the teacher model to refine view, an outdoor scene with 180160 resolution from a left
its predictions and performance. side view with multiple persons, and a blurred image with a
FIGURE 7. Training and validation accuracy and loss over epochs in teacher model.
FIGURE 8. Training and validation of accuracy and loss over epochs in student model.
FIGURE 9. 9(a) Precision and 9(b) Recall for the student model.
resolution of 240×180 from a right side view. Similarly, for successfully detected in an indoor setting with a 320×240
Subject 2, the system correctly identified individuals in an resolution from a left side view, an outdoor scene with a
indoor scene with a 254×133 resolution from a frontal view, 180×160 resolution from a right side view, and a blurred
an outdoor scene with a 180×160 resolution and multiple image from a frontal view with a resolution of 240×180.
persons from a right side view, and a blurred image with a Subject 4 was accurately detected in an indoor setting with a
240×180 resolution from a left side view. Subject 3 was also resolution of 160×120 from a frontal view, an outdoor scene
B. DATASET BIAS
The model may overfit on specific facial features as a result
of the dataset’s four subjects, which could hinder its capacity
to generalize across various demographic groups.
FIGURE 10. The validation accuracy of the student model over the • Mitigation: There are plans to expand the dataset to
duration of 50 epochs. include a wider range of demographics. To further
reduce bias, methods including data augmentation,
with a 180×160 resolution from a left side view, and a blurred synthetic data generation, and transfer learning from
image with a 120×154 resolution from a right side view. different datasets will be used.
For the out-of-sample data involving Subject 5, the system
correctly did not detect the individual in all three cases: an C. REAL-WORLD APPLICATION PERFORMANCE
indoor scene with a resolution of 120×154 from a frontal The flexibility of the model in different circumstances
view, an outdoor scene with a 240×180 resolution from a may be limited by the dataset’s potential inability to
left side view, and a blurred image with a high resolution accurately represent real-world conditions, such as shifting
of 1080×1350 from a right side view. This showcases that environmental elements (such harsh weather or obstructions)
the model was able to correctly reject the out-of-sample cases and posture fluctuations.
demonstrates how well it can distinguish between in-sample • Mitigation: Future research will focus on testing the
and out-of-sample data. This is a significant benefit since it model in difficult settings and a range of situations. Its
shows that the model is accurately classifying unknown data resilience will be increased by using domain adaptation
and successfully identifying persons that are a part of the and optimization with actual data.
dataset.
D. LIMITATIONS OF SYNTHETIC DATA GENERATION
VI. DISCUSSION AND LIMITATIONS GANs are used to generate high-quality synthetic images,
A. LIMITED SIZE OF LABELED DATASETS but they might not perfectly capture the nuances of real-
The dataset used in this study consists of 1,000 photographs, world photos, which could affect the model capacity for
250 images for each of the four participants. The model’s generalization.
• Mitigation: A hybrid training strategy that combines real-world scenarios, where both accuracy and efficiency are
actual and synthetic data will help balance the disparities paramount.
between the two, while ongoing improvements to GAN Future work could focus on extending the applicability of
technology will raise the caliber of synthetic data. this model to other domains such as medical imaging, security
surveillance, and autonomous systems. This adaptability will
further emphasize the versatility of the HED model, paving
VII. CONCLUSION the way for its use in industries where accurate, efficient,
This study presents a novel Hybrid Ensemble Distilla- and real-time decision-making is essential. Additionally,
tion (HED) model that integrates VGG16, ResNet50, and testing the model with larger and more diverse datasets could
DenseNet121, leveraging ensemble learning and knowledge help assess its generalization capability in different real-
distillation to balance high accuracy with computational world environments, enhancing its robustness across various
efficiency. The model demonstrates impressive performance, challenging scenarios.
achieving a validation accuracy of 96.78% after 350 epochs,
making it a strong candidate for real-time applications in REFERENCES
resource-constrained environments. The use of data aug- [1] W. W. Bledsoe and H. Chan, ‘‘A man-machine facial recognition
mentation through Generative Adversarial Networks (GANs) system-some preliminary results,’’ Panoramic Res., Palo Alto, CA, USA,
Tech. Rep. PRI A 19, 1965.
further improves the robustness, addressing challenges such [2] T. Kanade, ‘‘Picture processing system by computer complex and
as limited data availability and diverse imaging conditions of recognition of human faces,’’ Ph.D. thesis, Kyoto Univ., Japan, 1974.
the model. [3] M. Kirby and L. Sirovich, ‘‘Application of the Karhunen–Loeve pro-
cedure for the characterization of human faces,’’ IEEE Trans. Pattern
The findings from this work contribute to the field of Anal. Mach. Intell., vol. 12, no. 1, pp. 103–108, Jan. 1990, doi:
face recognition and identification in several meaningful 10.1109/34.41390.
ways. First, the ability to distillation knowledge from a large, [4] M. Turk and A. Pentland, ‘‘Eigenfaces for recognition,’’ J. Cogn. Neurosci.,
vol. 3, no. 1, pp. 71–86, Jan. 1991, doi: 10.1162/jocn.1991.3.1.71.
complex teacher model into a smaller, efficient student model [5] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, ‘‘Eigenfaces vs.
significantly reduces computational requirements, making fisherfaces: Recognition using class specific linear projection,’’ IEEE
the system more practical for real-time applications. This Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997,
doi: 10.1109/34.598228.
is particularly important in contexts where computational
[6] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, ‘‘DeepFace: Closing the
resources are limited, such as in mobile devices, edge com- gap to human-level performance in face verification,’’ in Proc. IEEE Conf.
puting, or surveillance systems. Moreover, by demonstrating Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1701–1708.
the synergy between VGG16, ResNet50, and DenseNet121, [7] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified
embedding for face recognition and clustering,’’ in Proc. IEEE Conf.
the model capitalizes on the strengths of each architecture, Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, Jun. 2015,
enhancing its ability to generalize across different types of pp. 815–823, doi: 10.1109/CVPR.2015.7298682.
input data and challenging scenarios. [8] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, ‘‘SphereFace:
Deep hypersphere embedding for face recognition,’’ in Proc. IEEE Conf.
The inclusion of GAN-based data augmentation also Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 212–220.
provides a significant contribution to the field, addressing the [9] R. Ullah, H. Hayat, A. A. Siddiqui, U. A. Siddiqui, J. Khan, F. Ullah,
issue of data scarcity in face recognition tasks. By generating S. Hassan, L. Hasan, W. Albattah, M. Islam, and G. M. Karami,
‘‘A real-time framework for human face detection and recognition in
high-quality synthetic images, the model can be trained on CCTV images,’’ Math. Problems Eng., vol. 2022, pp. 1–12, Mar. 2022,
a more diverse and comprehensive dataset, improving its doi: 10.1155/2022/3276704.
ability to handle real-world variability. This approach not [10] M. Jacquet and C. Champod, ‘‘Automated face recognition in forensic
science: Review and perspectives,’’ Forensic Sci. Int., vol. 307, Feb. 2020,
only enhances the model’s performance but also provides a Art. no. 110124, doi: 10.1016/j.forsciint.2019.110124.
framework for tackling other machine learning tasks where [11] K. Amjad, P. D. A. A. Malik, and D. S. Mehta, ‘‘A technique and
data is limited or difficult to obtain. architectural design for criminal detection based on Lombroso theory using
deep learning,’’ Lahore Garrison Univ. Res. J. Comput. Sci. Inf. Technol.,
Furthermore, this work highlights the potential for hybrid vol. 4, no. 3, pp. 47–63, Sep. 2020, doi: 10.54692/lgurjcsit.2020.040398.
models in face recognition tasks, where combining the [12] Y. Zennayi, S. Benaissa, H. Derrouz, and Z. Guennoun, ‘‘Unauthorized
strengths of multiple architectures and leveraging knowledge access detection system to the equipments in a room based on the persons
identification by face recognition,’’ Eng. Appl. Artif. Intell., vol. 124,
distillation offers a balanced trade-off between performance Sep. 2023, Art. no. 106637, doi: 10.1016/j.engappai.2023.106637.
and efficiency. The ability to improve accuracy while [13] H. A. Abdelali, H. Derrouz, Y. Zennayi, R. O. H. Thami, and F. Bourzeix,
reducing computational complexity is a critical aspect for ‘‘Multiple hypothesis detection and tracking using deep learning for video
traffic surveillance,’’ IEEE Access, vol. 9, pp. 164282–164291, 2021, doi:
deploying such models in practical applications, such as 10.1109/ACCESS.2021.3133529.
public surveillance, biometric security systems, and real-time [14] M. Pang, Y.-M. Cheung, B. Wang, and R. Liu, ‘‘Robust heterogeneous
human identification in various environments. discriminative analysis for face recognition with single sample per
person,’’ Pattern Recognit., vol. 89, pp. 91–107, May 2019, doi:
In conclusion, the HED model not only advances the field
10.1016/j.patcog.2019.01.005.
of face recognition but also opens up new possibilities for [15] Y. Lei and B. Huang, ‘‘Prediction of criminal suspect characteristics with
the development of efficient, real-time human identification application of wavelet neural networks,’’ Appl. Math. Nonlinear Sci., vol. 9,
systems. Its ability to balance high performance with reduced no. 1, pp. 1–18, Jan. 2024, doi: 10.2478/amns.2023.2.01313.
[16] L. Lei, ‘‘Wavelet neural network prediction method of stock price trend
computational complexity makes it a valuable contribution based on rough set attribute reduction,’’ Appl. Soft Comput., vol. 62,
to the ongoing efforts to deploy deep learning models in pp. 923–932, Jan. 2018, doi: 10.1016/j.asoc.2017.09.029.
[17] S. Ayyappan and S. Matilda, ‘‘Criminals and missing children iden- [36] N. Choudhary, P. S. Rathore, L. Kumar, R. Rajaan, A. Sharma, and
tification using face recognition and Web scrapping,’’ in Proc. Int. D. Sinha, ‘‘ResNet-50 powered masked face detection: A deep learning
Conf. Syst., Comput., Autom. Netw. (ICSCAN), Jul. 2020, pp. 1–5, doi: perspective,’’ in Proc. IEEE 9th Int. Conf. Converg. Technol. (I2CT),
10.1109/ICSCAN49426.2020.9262390. Apr. 2024, pp. 1–5, doi: 10.1109/I2CT61223.2024.10543563.
[18] S. Sandhya, A. Balasundaram, and A. Shaik, ‘‘Deep learning [37] M. M. Hasan, M. A. Hossain, A. Y. Srizon, A. Sayeed, M. Ahmed, and
based face detection and identification of criminal suspects,’’ M. R. Haquek, ‘‘Improving performance of a pre-trained ResNet-50 based
Comput., Mater. Continua, vol. 74, no. 2, pp. 2331–2343, 2023, doi: VGGFace recognition system by utilizing retraining as a heuristic step,’’
10.32604/cmc.2023.032715. in Proc. 24th Int. Conf. Comput. Inf. Technol. (ICCIT), Dec. 2021, pp. 1–6,
[19] K. P. Teja, G. D. Kumar, and T. P. Jacob, ‘‘Face detection and doi: 10.1109/ICCIT54785.2021.9689918.
recognition for criminal identification,’’ in Proc. 8th Int. Conf. [38] B. Li and D. Lima, ‘‘Facial expression recognition via ResNet-50,’’
Commun. Electron. Syst. (ICCES), Jun. 2023, pp. 1431–1435, doi: Int. J. Cogn. Comput. Eng., vol. 2, pp. 57–64, Jun. 2021, doi:
10.1109/ICCES57224.2023.10192845. 10.1016/j.ijcce.2021.02.002.
[20] S. Jagtap, N. B. Chopade, and S. Tungar, ‘‘An investigation of face [39] B. Li, ‘‘Facial expression recognition by DenseNet-121,’’ in Multi-
recognition system for criminal identification in surveillance video,’’ in Chaos, Fractal and Multi-Fractional Artificial Intelligence of Different
Proc. 6th Int. Conf. Comput., Commun., Control Autom. (ICCUBEA, Complex Systems. New York, NY, USA: Academic, 2022, pp. 263–276,
Aug. 2022, pp. 1–5, doi: 10.1109/ICCUBEA54992.2022.10010987. doi: 10.1016/B978-0-323-90032-4.00019-5.
[21] R. Bhatt, S. Malik, R. Arora, G. Agarwal, S. Sharma, and A. Dhablia, [40] S. Yu, S. E. Kim, K. H. Suh, and E. C. Lee, ‘‘Face spoofing detection
‘‘Recognition of criminal faces from wild videos surveillance system using using DenseNet,’’ in Proc. Int. Conf. Intell. Hum. Comput. Interact. Cham,
VGG-16 architecture,’’ in Proc. Int. Conf. Data Sci. Netw. Secur. (ICD- Switzerland: Springer, Jan. 2021, pp. 229–238, doi: 10.1007/978-3-030-
SNS), Jul. 2023, pp. 1–8, doi: 10.1109/ICDSNS58469.2023.10245450. 68452-5_24.
[22] D. Gupta, A. Saini, S. Bhagat, S. Uppal, R. Raj Jain, D. Bhasin, [41] A. Nandy, ‘‘A densenet based robust face detection framework,’’ in
P. Kumaraguru, and R. Ratn Shah, ‘‘A suspect identification framework Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019,
using contrastive relevance feedback,’’ in Proc. IEEE/CVF Winter pp. 1840–1847.
Conf. Appl. Comput. Vis. (WACV), Jan. 2023, pp. 4361–4369, doi: [42] M. Torky, A. Bakheit, M. Bakry, and A. E. Hassanien, ‘‘Deep learning
10.1109/WACV56688.2023.00434. model for recognizing monkey pox based on dense net-121 algorithm,’’
[23] K. Rasanayagam, S. D. D. C. Kumarasiri, W. A. D. D. Tharuka, MedRxiv, pp. 2022–12, 2022, doi: 10.1101/2022.12.20.22283747.
N. T. Samaranayake, P. Samarasinghe, and S. E. R. Siriwardana, ‘‘CIS: [43] N. Zhang, J. Luo, and W. Gao, ‘‘Research on face detection technology
An automated criminal identification system,’’ in Proc. IEEE Int. based on MTCNN,’’ in Proc. Int. Conf. Comput. Netw., Electron. Autom.
Conf. Inf. Autom. Sustainability (ICIAfS), Dec. 2018, pp. 1–6, doi: (ICCNEA), Xi’an, China, Sep. 2020, pp. 154–158.
10.1109/ICIAFS.2018.8913367. [44] V. Munusamy and S. Senthilkumar, ‘‘Face identification of suspects using
[24] S. T. Ratnaparkhi, A. Tandasi, and S. Saraswat, ‘‘Face detection and sequential -deep convolutional neural network,’’ in Proc. 2nd Int. Conf.
recognition for criminal identification system,’’ in Proc. 11th Int. Conf. Emerg. Trends Inf. Technol. Eng. (ICETITE), Feb. 2024, pp. 1–3, doi:
Cloud Comput., Data Sci. Eng. (Confluence), Jan. 2021, pp. 773–777, doi: 10.1109/ic-ETITE58242.2024.10493654.
10.1109/Confluence51648.2021.9377205. [45] G. Zheng and Y. Xu, ‘‘Efficient face detection and tracking in video
[25] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, ‘‘Joint face detection and sequences based on deep learning,’’ Inf. Sci., vol. 568, pp. 265–285,
alignment using multitask cascaded convolutional networks,’’ IEEE Aug. 2021.
Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016, doi: [46] N. K. Mishra, M. Dutta, and S. K. Singh, ‘‘Multiscale parallel deep CNN
10.1109/LSP.2016.2603342. (mpdCNN) architecture for the real low-resolution face recognition for
surveillance,’’ Image Vis. Comput., vol. 115, Nov. 2021, Art. no. 104290.
[26] N. Rafter, ‘‘Cesare Lombroso and the origins of criminology: Rethinking
criminological tradition 1,’’ in The Essential Criminology Reader, 2nd ed., [47] A. Holkar, R. Walambe, and K. Kotecha, ‘‘Few-shot learning for face
Evanston, IL, USA: Routledge, 2018, pp. 33–42. recognition in the presence of image discrepancies for limited multi-class
datasets,’’ Image Vis. Comput., vol. 120, Apr. 2022, Art. no. 104420.
[27] A. B. Perdana and A. Prahara, ‘‘Face recognition using light-convolutional
neural networks based on modified VGG16 model,’’ in Proc. Int. Conf.
Comput. Sci. Inf. Technol. (ICoSNIKOM), Nov. 2019, pp. 1–4, doi: VAISHNAVI MUNUSAMY received the B.E.
10.1109/ICoSNIKOM48755.2019.9111481. degree in computer science and engineering from
[28] A. K. Dubey and V. Jain, ‘‘Automatic facial recognition using VGG16 Anna University, Tiruchirappalli, in 2011, and the
based transfer learning model,’’ J. Inf. Optim. Sci., vol. 41, no. 7,
M.Tech. degree in database systems from SRM
pp. 1589–1596, Oct. 2020, doi: 10.1080/02522667.2020.1809126.
University, Chennai, in 2013. She is currently
[29] O. K. Sikha and B. Bharath, ‘‘VGG16-random Fourier hybrid model for
pursuing the Ph.D. degree with Vellore Institute
masked face recognition,’’ Soft Comput., vol. 26, no. 22, pp. 12795–12810,
of Technology (VIT), Vellore, Tamil Nadu, India.
Nov. 2022, doi: 10.1007/s00500-022-07289-0.
[30] S. A. Dar, ‘‘Neural networks (CNNs) and VGG on real time face
She has approximately seven years of teaching
recognition system,’’ Turkish J. Comput. Math. Educ., vol. 12, no. 9, experience. Her research interests include machine
pp. 1809–1822, Apr. 2021. learning, computer vision, and neural networks,
[31] H. Aung, A. V. Bobkov, and N. L. Tun, ‘‘Face detection in real time with her current work concentrating on person identification and the
live video using YOLO algorithm based on VGG16 convolutional neural integration of machine learning techniques into lightweight models.
network,’’ in Proc. Int. Conf. Ind. Eng., Appl. Manuf. (ICIEAM), May 2021,
pp. 697–702, doi: 10.1109/ICIEAM51226.2021.9446291. SUDHA SENTHILKUMAR received the B.E.
[32] H. Chen and C. Haoyu, ‘‘Face recognition algorithm based on VGG degree in CSE from Madras University, the
network model and SVM,’’ J. Phys., Conf. Ser., vol. 1229, no. 1, May 2019,
M.Tech. degree in information technology and
Art. no. 012015, doi: 10.1088/1742-6596/1229/1/012015.
engineering from Vellore Institute of Technology,
[33] Y. Pratama, L. M. Ginting, E. H. L. Nainggolan, and A. E. Rismanda,
Vellore, and the Ph.D. degree from the School
‘‘Face recognition for presence system by using residual networks-50
architecture,’’ Int. J. Electr. Comput. Eng., vol. 11, no. 6, p. 5488, of Information Technology and Engineering, VIT
Dec. 2021, doi: 10.11591/ijece.v11i6.pp5488-5496. University. She is currently a Professor with the
[34] D. A. Wangean, G. Pangestu, S. Setyawan, F. I. Maulana, E. P. Gunawan, School of Computer Science and Engineering,
and C. Huda, ‘‘The implementation of ResNet-50 architecture for face VIT University. She has authored more than
recognition algorithm in attendance system,’’ in Proc. AIP Conf., 2024, 55 research articles in reputed international and
vol. 2927, no. 1, pp. 1–14, doi: 10.1063/5.0205236. conferences. She has published few books in the reputed publisher. Her
[35] J.-R. Lee, K.-W. Ng, and Y.-J. Yoong, ‘‘Face and facial expressions current research interests include cryptography, network security, big data,
recognition system for blind people using ResNet50 architecture and block chain technologies, machine learning, deep learning, and cloud
CNN,’’ J. Informat. Web Eng., vol. 2, no. 2, pp. 284–298, Sep. 2023, doi: computing. She is a Lifetime Member of the Computer Society of India.
10.33093/jiwe.2023.2.2.20.