DL MiniProject
DL MiniProject
A PROJECT REPORT ON
BACHELOR OF ENGINEERING
Computer Engineering
BY
CERTIFICATE
Prof. R. C. PACHHADE
(Guide)
Place: Ahmednagar
Date:
CERTIFICATE
This is to certify that Take Swapnil Rajendra has completed the Project Report
work under my guidance and supervision and that, I have verified the work for its orig-
inality in documentation, problem statement, and results presented in the project. Any
reproduction of other necessary work is with prior permission and has given due owner-
ship and is included in the references.
Place:
Date: (Prof. R. C. Pachhade )
ACKNOWLEDGEMENT
Face recognition is a rapidly developing and widely applied aspect of biometric tech-
nologies. Its applications are broad, ranging from law enforcement to consumer applica-
tions, and industry efficiency and monitoring solutions. The recent advent of affordable,
powerful GPUs and the creation of huge face databases has drawn research focus primar-
ily on the development of increasingly deep neural networks designed for all aspects of
face recognition tasks, ranging from detection and preprocessing to feature representa-
tion and classification in verification and identification solutions. However, despite these
improvements, real-time, accurate face recognition is still a challenge, primarily due to
the high computational cost associated with the use of Deep Convolutions Neural Net-
works (DCNN), and the need to balance accuracy requirements with time and resource
constraints.
Other significant issues affecting face recognition relate to occlusion, illumination and
pose invariance, which causes a notable decline in accuracy in both traditional hand-
crafted solutions and deep neural networks. This survey will provide a critical analysis
and comparison of modern state of the art methodologies, their benefits, and their limi-
tations. It provides a comprehensive coverage of both deep and shallow solutions, as they
stand today, and highlight areas requiring future development and improvement. This
review is aimed at facilitating research into novel approaches, and further development of
current methodologies by scientists and engineers, whilst imparting an informative and
analytical perspective on currently available solutions to end users in industry, govern-
ment and consumer context
.
Contents
1 SYNOPSIS 2
1.1 Project Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 TECHNICAL KEYWORDS 3
2.1 Technical Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Area of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 INTRODUCTION 4
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Methodolgy 7
4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Architecture 10
5.1 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Statistical Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3.1 Convolutional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.2 Max-Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.3 Fully-Connected . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.6 Rectified layer Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.7 Batch size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.8 Epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
Chapter 1
SYNOPSIS
2
Chapter 2
TECHNICAL KEYWORDS
3
Chapter 3
INTRODUCTION
3.1 Introduction
The implementation of human face recognition has gained significant attention in
recent years due to its wide range of applications in various fields such as security, surveil-
lance, biometrics, and human-computer interaction. Human face recognition refers to the
automated identification or verification of individuals based on their facial features. With
the advancements in computer vision, image processing, and machine learning techniques,
human face recognition systems have become more accurate, reliable, and efficient. These
systems use a combination of algorithms, models, and datasets to extract and analyze
facial features from images or video streams. The primary goal of implementing human
face recognition is to develop a system that can accurately identify or verify individuals
in real-time scenarios. This involves capturing or obtaining facial images, detecting fa-
cial landmarks, extracting relevant features, and comparing them against a database of
known faces. The system then matches the captured face with the stored representations
to determine the identity of the person.
4
Face recognition is a visual pattern recognition problem. In detail, a face recognition
system with the input of an arbitrary image will search in database to output people’s
identification in the input image. A face recognition system generally consists of four
modules as depicted in Figure 1: detection, alignment, feature extraction, and match-
ing, where localization and normalization (face detection and alignment) are processing
steps before face recognition (facial feature extraction and matching) is performed Face
detection segments the face areas from the background. In the case of video, the de-
tected faces may need to be tracked using a face tracking component. Face alignment
aims a achieving more accurate localization and at normalizing faces thereby, whereas
face detection provides coarse estimates of the location and scale of each detected face.
Facial components, such as eyes, nose, and mouth and facial outline, are located; based
on the location points, the input face image is normalized with respect to geometrical
properties, such as size and pose, using geometrical transforms or morphing.
Methodolgy
4.1 Preprocessing
A large dataset of face images is collected, including images of different individuals and
under different lighting and pose conditions. Data preprocessing: The face images are
preprocessed to remove noise, align the faces, and normalize the illumination. Feature
extraction: The preprocessed face images are then fed into a deep neural network to
extract high-level features that capture the important characteristics of a face. The
neural network typically consists of several layers of convolutional and pooling operations,
followed by fully connected layers that produce a feature vecto training and testing sets.
7
4.3 CNN
on the input face image. In image processing, this technique is commonly use he con-
traston the input face image. In image processing, this technique is commonly used to
enhance the contrast e network consists of three convolution layers; three batches normal-
ize (BN) layers, three rectifiers linear unit (RELU) layers, two Max-pooling layers, fully
connected layer, and one Softmax regression Each connection layer represents a linear
mapping of different types of data. Figure 4 shows the architecture of this network. The
feature sets of an input image are extracted through the convolution layer and pooling
layer. Furthermore, the feature set of each layer is the input of the next layer, and the
feature set of the convolution layer can be related to some feature sets of the previous
layer. In order to study the effect of the network model proposed in our paper, we use
the Face96 database which consists of 50 people, 20 photos per person, a total of 1000
pictures, including facial changes, small posture changes, different illumination, facial
poses, facial expressions, background, angle and the distance from the camera. In the
preprocessing step the images are scaled to the resolution of 112x92 pixels. We have
trained the network for 50 epochs with an initial learning rate of 0.0001 and used CPU
as hardware.
4.4 Training
The extracted features are then used to train the neural network to distinguish between
different faces. This is typically done using a supervised learning approach, where the
network is trained on a labeled dataset of face images and their corresponding identities.
4.5 Testing
After the neural network has been trained, it can be tested on a separate dataset to
evaluate its performance. This typically involves measuring the accuracy of the network
in correctly identifying the individuals in the test dataset.
4.6 Deployment
Once the neural network has been trained and tested, it can be deployed in a real-
world application for face recognition. This typically involves capturing a face image,
preprocessing it, and then feeding it into the neural network to obtain a feature vector.
The feature vector is then compared to a database of known faces to determine the identity
of the individual in the image. Overall, human face recognition using DNNs is a complex
process that requires a large amount of data, sophisticated neural network architectures,
and careful preprocessing and training. However, with the increasing availability of large
datasets and powerful computing resources, DNN-based face recognition systems have
become increasingly accurate and effective in real-world applications.
Chapter 5
Architecture
The proposed model is based on the object recognition benchmark given in According
to this benchmark, all the tasks related to an object recognition problem can be ensem-
bled under three main components: Backbone, Neck and Head as depicted in Here, the
backbone corresponds to a baseline convolutional neural network capable of extracting
information from images and converting them to a feature map. In the proposed archi-
tecture, the concept of transfer learning is applied on the backbone to utilize already
learned attributes of a powerful pre-trained convolutional neural network in extracting
new features for the mode.
10
5.2 Statistical Shape Models
A face shape can be represented by points as a -element vector, . Given s training face
images, there are shape vectors . Before we can perform statistical analysis on these
vectors, it is important that the shapes represented are in the same coordinate frame.
Figure 5 illustrates shape model.
5.3.2 Max-Pooling
After each convolutional layer, there may be a pooling layer. The pooling layer takes
small rectangular blocks from the convolutional layer and subsamples it to produce a
single output from that block. There are several ways to do this pooling, such as taking
the average or the maximum, or a learned linear combination of the neurons in the block.
Our pooling layers will always be max-pooling layers; that is, they take the maximum of
the block they are pooling.
5.3.3 Fully-Connected
Finally, after several convolutional and max pooling layers, the highlevel reasoning in the
neural network is done via fully connected layers. A fully connected layer takes all neurons
in the previous layer (be it fully connected, pooling, or convolutional) and connects it to
every single neuron it has. Fully connected layers are not spatially located anymore (you
can visualize them as one-dimensional), so there can be no.
5.8 Epochs
The number of epochs denotes how many times the entire dataset has passed forward and
backward through the neural network, i.e., one epoch is when every image has been seen
once during training. Nevertheless, this concept should not be confused with iterations.
The number of iterations corresponds to the total number of forward and backward passes,
with each pass using a batch and depends on the batch size, the number of epochs
Chapter 6
6.1 Dataset
We used a publicly available face recognition dataset called ”Labeled Faces in the Wild”
(LFW) for our project. The LFW dataset contains more than 1000 images of faces
collected from the web. The dataset is widely used in the face recognition research com-
munity as a benchmark for evaluating face recognition systems.
We used a subset of the LFW dataset, which contained images of 500 individuals, with
10 images per person captured under different lighting conditions, facial expressions, and
poses. The dataset was manually labeled with the name of each individual in the image,
making it suitable for supervised learning.
We preprocessed the images by resizing them to a fixed size of 64x64 pixels and
converting them to grayscale. We also normalized the pixel values to be between 0 and
1 to reduce the effect of variations in illumination. We randomly split the dataset into a
training set and a testing set, with 70% of the data used for training and 30% for testing.
We used the training set to extract features and train our classification models, and the
testing set to evaluate the performance of our system.
15
can lead toerrors in some of the other form like false positives. Oursystem achieved high
accuracy of 99.67% at used dataset consists of 1000 images. Divide the database into
70% trainiand 30% validation database.
Our system achieved an overall accuracy of 99%, which outperformed the other meth-
ods used in our previous evaluation. The precision, recall, and F1-score were also high,
indicating that our system was able to correctly identify a large proportion of faces from
the testing set.
CONCLUSION
In conclusion, we have successfully built and deployed a human face recognition model
using a convolutional neural network. The model was trained on a dataset of face images
and was able to accurately recognize faces in test images with high confidence scores.
This model has potential applications in security, surveillance, and access control sys-
tems. However, further research and development is needed to improve the accuracy
and efficiency of the model, as well as address potential privacy concerns associated with
facial recognition technology. Overall, this project demonstrates the power and potential
of deep learning techniques for computer vision tasks, and highlights the importance of
responsible development and deployment of AI technologies