FER 2013projectreport
FER 2013projectreport
DEPARTMENT OF CSM
MARRI LAXMAN REDDY
INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(AUTONOMOUS)
(Affliated to JNTUH,Approved by AICTE New Delhi and Accrediated by NBA &NAAC With A grade)
(JULY 2023)
CERTIFICATE
This is to certify that the project titled “Face Emotion identification using
convolutional neural networks and machine learning” is being submitted by
VIDAGOTTI ROHIN (207Y1A6635) and SOMA SAIKIRAN(207Y1A6643)
in IV B.tech I semester in computer science and engineering(CSM) is a record
bonafide work carried out by him.The results embodied in this report have not been
submitted to any other university for the award of any degree.
I hereby declare that the Mini Project Report entitled ,”Face Emotion
identification using convolutional neural networks and Machine learning”
submitted for the B.Tech degree is entirely my work with the help of my team member
and all ideas and refernces have been duly acknowledged.It does not contain any work
for the award of other degree.
DATE:
VIDAGOTTI ROHIN
(207Y1A6635)
SOMA SAIKIRAN
(207Y1A6643)
ACKNOWLEDGEMENT
I am happy to express my deep sense of gratitude to the principal of the college Dr.K
Venkateswara Reddy,Professor,Department of Computer Science and
Engineering.Marri Laxman Reddy Institute of Technology & Management,for having
provided me with adequate facilities to pursue my project.
I sincerely thank my seniors and all the teaching and non-teaching staff of the
Department of computer science for their timely suggestions,healthy criticism and
Motivation during the course of this work.
I would also like to thank my classmates for always being there whenever I needed
Help or moral support.with great respect and obedience,I thank my parents and
Brother who were the backbone behind my deeds.
Finally, I express my immense gratitude with pleasure to the other individuals who
have either directly or indirectly contributed to my need at right time for the
development and success of this work.
ABSTRACT
1 Introduction 1
1.1 Introduction 1
1.2 Existing System 5
1.3 Problem Statement 5
1.4 Proposed System 6
1.4.1 Proposed System 6
1.4.2 Objective 7
2 Literature Review 8
3 Requirements and Domain Information 18
3.1 Requirement Specification 18
3.1.1 Hardware Requirements 18
3.1.2 Software Requirements 18
3.2 Domain Information 18
4 System Methodology 25
4.1 Architecture of Proposed System 25
4.2 Algorithm 26
4.3 System Design 33
4.3.1 Data Flow Diagrams 33
4.3.2 Class Diagram 34
4.3.3 UML Diagram 35
5 Experimentation and Analysis 37
5.1 Experimentation 37
5.2 Results 48
5.3 Testing 57
5.3.1 Types of Testing 57
5.3.2 Test Cases 62
6 Conclusion and Future Scope 64
6.1 Conclusion 64
6.2 Future Scope 65
References 66
PAPER PUBLICATION 69
LIST OF TABLES
i
LIST OF FIGURES
FIGURE DESCRIPTION PAGE
NO. NO.
1.1.1 FER Procedure for an Image 3
1.1.2 Facial Landmarks to be Extracted from a Face 4
3.2.1 Example of Deep Learning 20
4.1.1 Architecture 25
4.1.2 System Design 25
4.2.1 Example of CNN 27
4.2.2 CNN Architecture 28
4.2.3 Kernel Process 29
4.2.4 Strides Process 30
4.2.5 Padding Process 30
4.2.6 Fully Connected layer 32
4.3.1 Flow Chart 33
4.3.2 Class Diagram 34
4.3.3 User Module 35
4.3.4 Software Module 35
4.3.5 Sequence Diagram 36
5.1.1 Example of Happy Face 38
5.1.2 Example of Angry Face 39
5.1.3 Example of Surprise Face 40
5.1.4 Example of Sad Face 41
5.1.5 Example of Disgust Face 42
5.1.6 Example of Fear Face 43
5.1.7 Example of Neutral Face 44
5.1.8 Image Identification and Classification 47
5.2.1 Screenshot of Execution 48
5.2.2 Screenshot of Angry 49
5.2.3 Screenshot of Fearful 49
5.2.4 Expression of Surprised 50
ii
LIST OF ABBREVIATION
ABBREVATION DESCRIPTION
iii
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Any one person even without saying anything, it can convey a vast range of
emotions. Facial expressions convey a person's thoughts, feelings, and actions, and
facial expression recognition software can detect these expressions in a photograph
of a person's face. During the early 20th century, Americans Ekman and Friesen
created a common set of six globally shared sentiments termed the "basic emotions"
(angry; afraid; disgusted; sad; surprised; happy). Facial expression detection has
gained a lot of attention recently because of its impact on clinical practice, friendly
robots, and education. According to a number of research, emotions have a substantial
impact on education. Despite the fact that teachers are already receiving feedback
from exams and questionnaires, these methods aren't necessarily the most effective.
Teachers can use the facial expressions of their pupils to alter their teaching strategies
and resources. This study uses Convolutional Neural Networks (CNNs), a deep
learning method widely used in image classification, to identify students' emotions
through facial expression analysis. A multistage image processing method is used to
extract feature representations. Each of these seven emotions can be recognized in a
three-step process that begins with face detection and ends with recognition.
Embrace Your Feelings Many academics are intrigued by this newfound notoriety
and seek to use it to improve instruction in the classroom (FER). According to Tang
et al. students' facial expressions can be used to gauge the success of classroom
teaching. Facial recognition, face detection, and facial expression detection are all
part of the system. We employ ULGBPHS and KNN to sort and categorize data.
Savva et al reported an investigation of students' emotional states during active
classroom instruction via a web application. Live footage from webcams installed in
classrooms was analysed using machine learning algorithms.
1
Students' emotional states can be identified and monitored in real time by an e-
learning system, and the authors came up with a proposal for how to do this. Eye and
head movement data can be used to infer a student's emotional state in an online
learning environment. A Facial Emotion Recognition System was created by Ayvaz
et al to recognize the emotional states and motivations of students in videoconference
style e-learning (FERS). KNN and SVM were shown to be the most accurate machine
learning algorithms, while Random Forest, Classification & Regression Trees, and
SVM were found to be the least accurate.
When it comes to improving the quality and memorability of their lectures, Kim
and co- workers devised a system that provides real-time recommendations to
instructors so they can adjust their nonverbal behaviour, such as body language and
facial expressions in real-time. The Haar Cascade technique was used to detect facial
expression utilizing a virtual learning environment based on facial emotion detection
using the JAFF database. In Chiou et al. used wireless sensor network technology to
build an intelligent classroom management system that enables teachers swiftly
switch instruction modes to prevent wasting time.
Artificial Intelligence (AI) and Machine Learning (ML) are widely employed
in many domains. In data mining, they have been used to detect insurance fraud. In
Clustering based data mining was used to identify patterns in stock market data. ML
algorithms have played a significant role in pattern recognition and pattern
classification problems such as FER, Electroencephalography (EEG) and spam
detection. ML can be used to provide cost-effective, reliable and low computation
time FER solutions.
FER typically has four steps. The first is to detect a face in an image and draw
a rectangle around it and the next step is to detect landmarks in this face region. The
third step is extracting spatial and temporal features from the facial components. The
final step is to use a Feature Extraction (FE) classifier and produce the recognition
results using the extracted features. Figure 1.1 shows the FER procedure for an input
image where a face region and facial landmarks are detected. Facial landmarks are
visually salient points such as the end of a nose, and the endsof eyebrows and the
mouth as shown in Figure 1.2. The pairwise positions of two landmark points or the
local texture of a landmark are used as features. Table 1.1 gives the definitions of 64
primary and secondary landmarks. The spatial and temporal features are extracted
from the face and the expression is determined basedon one of the facial categories
using pattern classifiers.
When it comes to improving the quality and memorability of lectures, Kim and
co-workers have built a system that provides real-time recommendations to instructors so
they can adjust their body language and facial expressions in real-time. Facialemotion
identification using Haar Cascades was proposed by the authors in to detect emotions in
a virtual learning environment utilizing data from the JAFF database. An intelligent
classroom management system was developed by Chiou et al in using wireless sensor
network technology. This system aids teachers in quickly switching instruction modes in
order to save time. People's emotions can never be predicted precisely.
Behavior of sentiments through facial feelings was an object of interest since the
season of Aristotle. This point became simply after 1955, when a rundown of
5
general feelings was set up and a few parametrized frameworks were proposed.
Encouraged by Deep Learning and Computer Vision, building mechanized
acknowledgment frameworks has gotten a ton of consideration inside the Computer
Science region. To understanding correspondence Mehran, has deduced in his
investigation that in eye-to-eye correspondence, feelings are transmitted in extent of 55%
through outward appearances. It implies that, if the PC could catch and compare the
feelings of the user, correspondence would be progressively normal and proper,
particularly in the event that we consider situations where a PC would assume the job
of a human.
Detailed facial motions will be captured, and appropriate emotion will be detected
by using deep learning algorithm such as CNN. The system will determine the best
classifiers for recognizing particular emotions, where single and multi-layerednetworks
will be tested. Different resolutions of the images representing faces as wellas the
images including regions of mouth and eyes will be included. On the basis of the test
results, a cascade of the neural networks will be proposed. The cascade will recognize six
basic emotions and neutral expression. Depending upon the resultant emotion, the module
will suggest songs or suitable tasks, to cheer up a person and enhance his/her mood.
6
gathering of initiation maps. This system will display the emotion name with their
percentage level. For example, if the user is happy, the happy emotion doesn’t contain
only happiness, there will be mixture of some other additional emotions.
Like, the happiness may be shock mixture happiness, angry mixed happiness.
According to this mixture the percentage will be displayed. For example, if shock mixed
happiness the percentage of happy may be 77% and the shock may be 60% approximately.
And its changes dynamically when the emotion of the person changes.
1.4.2 Objective
7
CHAPTER 2
LITERATURE REVIEW
Many academics are intrigued by Face Emotion Recognition and intend to put
it to good use in the classroom (FER). Based on students' facial expressions, Tang et al.
claim that classroom teaching efficacy can be assessed. The system includes data
collection, face detection, face recognition, facial expression recognition, and post-
processing. We employ ULGBPHS and KNN to sort and categorize data. Savva et al.
used a web application to examine the emotional states of students who were taking part
in active classroom instruction. We employed machine learning techniques toexamine
camera footage from schools. According to Whitehill et al in students' facial expressions
can be used to gauge their level of engagement in class. Using Gabor properties and the
SVM algorithm, students' cognitive skill training software involvement may be tracked.
The videos were annotated by human judges, who provided the authors with labels.
A computer vision and machine learning technique was then used to determine
the emotional state of students playing an educational game in a school computer lab
designed to teach students about the fundamental principles of classic mechanical design
built a system that can identify and track the emotional state of students in real time and
provide comments to improve the e-learning environment. E learning systems use eye
and head movement to infer important information about students' moods and energy
levels in the classroom. Students' emotional states and motivation can be tracked using a
Facial Emotion Recognition System in videoconference-style e- learning, created by
Ayvaz and colleagues (FERS). There are a variety of machine learning approaches
available, but SVM and KNN have the highest accuracy rates, followed by Random
Forest and Classification & Regression Trees. When it comes to improving the quality
and memorability of lectures, Kim and co-workers have built a system that provides real-
time recommendations to instructors so they can adjust their body language and facial
expressions in real-time. Facial emotion identification using
8
Haar Cascades was proposed by the authors in to detect emotions in a virtual learning
environment utilizing data from the JAFF database. Smart classroom management
systems that assist teachers' ability to quickly change instruction modes are built using
wireless sensor networks by Chiou et al in.
Biometric Recognition, vol. 9428, ed. J. Yang, Z, Sun, S. Shan, W Zheng and J
Feng, Cham: Springer International Publishing in 2015 p 439-447; 2.2 "Automatic Facial
Expression Analysis of Students in Teaching Environments."
9
"Recognizing student facial expressions: A web application," in IEEE Global
Engineering Education Conference (EDUCON), Tenerife, 2018, p. 1459-1462, A. Savva,
V. Stylianou, K. Kyriacou, and F. Domenach.
The research reported in this paper is being carried out with the purpose of
analysing the emotions of students engaged in hands-on, face-to-face classroom training.
Live video feeds from classroom webcams are incorporated into algorithms for machine
learning. In order for the professor to be able to examine the visualization program
remotely, it was designed as an internet-based app. An emotional chronology of student
reactions helps the lecturer and other interested parties improve educational content
delivery. Artificial Intelligence (AI) and Machine Learning (ML) a few words to
introduce yourself A wide range of information is being obtained in today's world from a
variety of sources. To maximize the value of a company's existing resources, it is
common to leverage data that was collected for one reason to be used for another. Even
though most businesses have security cameras in place to prevent theft, thefootage from
these cameras can be used in a variety of ways. In the future, an intelligent system may
evaluate images to reveal consumer emotions, and even estimate customer contentment;
in other words, rate the entire customer purchasing experience! It is also possible that the
ability to recognize and analyse emotions could be a powerful instrument for business
success.
The implementation consists of three main parts. There are many ways to conduct
data mining, including data collecting, processing, and aggregation. The datais collected
by a client application running on a PC in the classroom. Pupils in a classroom are
photographed using the computer's webcam at regular intervals. Anexternal API analyzes
these photos for emotional content before transmitting them. The results of the analysis
are sent to a central repository via a RESTful service. There are several HTTP queries
made by the central server to aggregate the data using RESTful Application Programming
Interfaces (APIs). Customers can transmit and receive data through APIs. REST APIs are
used by clients to retrieve data from the repository, whichis then viewed on their end. The
final outcome is a display that is relevant to the requester. The network 978-1-5386-2957-
4/18/$31.00 is a good match for the predicted heavy HTTP traffic in this project. Santa
Cruz de Tenerife, Spain, and the Canary Islands, 17- 20 April 2018. At the 2018 IEEE
Global Engineering Education Conference, this project's increasing traffic was addressed
as a major security issue (EDUCON). In order to prevent anyone from accessing or
tampering with the data, security measures have been put in place. Customer-Recording
App B The webcam recording client application can be installed and used by the ordinary
computer user in a matter of minutes. Instructors should not be concerned with the
system's complexity but should instead have a simple application that allows them to
record at the push of a single button. There is a GUI for visual feedback and ease of use
in addition to the webcam's low-level API, the web client's API call, and the graphical
user interface First, the app will guide you through the process. Using the webcam, take a
few pictures every now and then. You can then use an external API to identify the
emotions captured in the captured photographs. Take advantage of your emotions to get
outcomes. Send the
11
data to an emotion repository server afterward. It is necessary to repeat this procedure. So
that they do not interfere with the main GUI and prevent the webcam from taking any
more photographs, steps 2, 3, and 4 are run asynchronously. The RecordingsNoSQL
database-based servers are used to acquire the data. They are located on the Internet.
Servers can be classified into two categories: dedicated and cloud. Using HTTP, clients
are able to access the internal database through the server. Some examples of what it
does are as follows: To access the internal database, x provides a web-based interface for
CRUD tasks.
The implementation consists of three main parts. There are many ways to conduct
data mining, including data collecting, processing, and aggregation. The datais collected
by a client application running on a PC in the classroom. Pupils in a classroom are
photographed using the computer's webcam at regular that is relevant to the requester.
The network 978-1-5386-2957-4/18/$31.00 is a good match for the predicted heavy
HTTP traffic in this project. Santa Cruz de Tenerife, Spain, and the Canary Islands, 17-
20 April 2018. At the 2018 IEEE Global Engineering Education Conference, this project's
increasing traffic was addressed as a major security issue (EDUCON). In order to prevent
anyone from accessing or tampering with the data, security measures have been put in
place. Customer-Recording App B The webcam recording client application can be
installed and used by the ordinary computer user ina matter of minutes. Instructors should
not be concerned with the system's complexity but should instead have a simple
application that allows them to record at the push of a single button. There is a GUI for
visual feedback and ease of use in addition to the webcam's low-level API, the web client's
API call, and the graphical user interface First, the app will guide you through the process.
Using the webcam, take a few pictures everynow and then. You can then use an external
API to identify the emotions captured in the captured photographs. Take advantage of
your emotions to get outcomes. Send the data to an emotion repository server afterward.
It is necessary to repeat this procedure. So that they do not interfere with the main GUI
and prevent the webcam from taking any more photographs, steps 2, 3, and 4 are run
asynchronously. The RecordingsNoSQL database-based servers are used to acquire
the data. They are located on the
12
Internet. Servers can be classified into two categories: dedicated and cloud. Using HTTP,
clients are able to access the internal database through the server. Some examples of what
it does are as follows: To access the internal database, x provides a web-based interface
for CRUD tasks.
A RESTful API can be accessed by clients. x provides a safe and secure way for
customers to log in. Virtual machines can be used to run any operating system remotely
(OS). As many requests as possible can be handled by x at once. Using Gopher, a server
was created (goland). Secondly, the Database. In most cases, the data is captured with a
time stamp and location. Other than that, only the user's personal data can be utilized to
verify their identity. A NoSQL solution was chosen because of the non-relational nature
of emotions data and the need for schema less properties. There is an open-source
technology that supports statically typed communication between the database and Go,
and MongoDB was chosen as a result. Because it was going to be accessible through the
Internet, the server needed a physical address. As a result, Microsoft Azure was chosen as
the company's cloud solution. The client's visualization program. In order for the
professor to be able to examine the visualization program remotely, it was designed as an
internet-based app. The user interface was developed using HTML and CSS. The Vanilla
JavaScript framework uses asynchronous JavaScript and XML (Ajax) to connect with
the server. With AngularJS, data binding was made possible in D. As a whole, the
functionality of this application was built using a number of different technologies,
including C#, Go, and JavaScript, as well as HTML, a NoSQL database called MongoDB,
and Microsoft Azure and Microsoft Cognitive Services.
Issues and Prospects for Engineering and Education in a Smart Classroom with
Emotionally-Aware AI "IEEE Access published the article in 2018 (p. 5308-5331).
13
what this study suggests. Our suggested technology can help an in-class speaker improve
their presentation quality and memorability by allowing the presenter to make real-time
adjustments/corrections to their non-verbal behaviour, such as hand gestures, facial
expressions, and body language. Our proposed approach includes emotion detection, deep
learning-based recognition, and mobile cloud computing. These technologies and the
computing requirements of a system that includes them are examined in this study in great
detail. Based on these requirements, we undertake a system feasibility analysis. Most of
our system's components can be built using themost up-to-date research.
14
Journal for the Calendar Year 2018 Emotional intelligence is an issue that engineers and
educators are currently trying to solve. Humans provide inputs 1 and 2, and the system
computes and sends back a response to one of the individuals. This is a smart classroom
system that was designed. It's not easy to properly quantify human behaviour since, when
it comes to the human-machine link, the system relies only on quantitative data. Our
"system" for smart classrooms is made up of students, computational architecture, and
educational philosophy. Any scenario in which humans speak with each other in order to
exchange knowledge or information can benefit greatly from the implementation of such
a system. Salespeople, medics, and security staff, as well as military stationed abroad,
will all benefit from this training. Voice intonation and body movements, as well as other
nonverbal communications including eye contact and facial expressions, play a
significant part in human communication. Understudied today, but theoretical
underpinnings can be quantified and integrated into machine- based education. Machine
intelligence-driven systems allow students to receive critical feedback during practice
presentations in front of the "machine," while avoiding presenting anxiety or shame due
to a poor presentation.
One of the two human sources of input, which may be biased, can be sent back
to the other source using the new system design paradigm. This design introduces four
important research questions: It is necessary for both inputs to the machine learning
algorithm to be established on a strictly quantitative basis in order to be used in machine
learning algorithms. Because of this, the system must have "input" boxes "a machine
intelligence platform (Box III) that can learn the relationship between these quantified
inputs; while existing research investigates Multimodal Learning Analytics (MLA), in
which the acquired multi-modal presentation data is used to create a single, quantitative
value; and It is necessary to build a new algorithm set for this purpose This study makes
the following important contributions: Integration of multimodal sensing and emotion
recognition; quantification of important human variables in a smart classroom such as
crowd scores and behavioural cues; demonstration and verification of our proposed
system design with a template smart classroom at SUNY Albany; and a bridge between
engineering and education. The sections of this paper that follow are listed below.
15
Section II outlines our proposed system design. In the sections that follow, we'll look
at the current state of technology needed to put this system into action: Section III of this
work examines methods for measuring human-based measures, such as crowd scores. As
in Section III, Section IV examines nonverbal human communication metrics such as
facial expressions and body language in general and voice-related metrics. For high-
intensity computations, Section V concentrates on real-time algorithms. Section VI
establishes a detailed algorithmic/computing infrastructure for our proposed system, and
Section VII provides a feasibility study. The problems that have still to be solved are
described in great depth. In Volume 6 of 5309, Emotional intelligence is an issue that
engineers and educators are currently trying to solve. Asmart classroom system that
includes code to extract audio, visual, and cognitive load vectors from a presentation, a
crowd of peers and experts, and deep learning algorithms that learn "best practices" in
training mode and estimate crowd scores during presentation mode (as well as a machine
learning engine). Final remarks are made in Section IX.
We propose two ideas in order to design the system. For Box III, machine
intelligence can learn how presenter behaviour affects presentation quality in Training
Mode and then convey this information to the presenter (Box IV) in Presentation Mode
without distracting them in real-time. Box I and II, which include presenter and listener
input, can be quantified despite their subjective natures. we plan to test both assumptions
by streaming raw audio and video data from a presenter to the cloud during a presentation.
Analysis Engine (Box I) translates this raw data into processed audio and visual feature
vectors to measure the behavioural signs of the presenter, such as vocal emotion, face
movement and body gesture. Pupil dilation or other facial expressions can also be used to
correlate the cognitive strain of presenters ([C] vector). To make quantifiable
measurements, the Crowd Annotation Engine (Box II) uses votes from the crowd (experts
or peer listeners) (such as the proposed Crowd Score Vector [S]). When using the Deep
Learning Engine (Box III) and the [A] and [V] vectors, open-handed motions result in the
highest crowd ratings (quantified by the [S] vector). The Input Engine provides real-
time feedback on 5310's performance (Box IV). In
16
2018, the sixth volume was released. Y. Kim and others: Challenges and Opportunities
for Engineering, Science, and Education in the Emotionally-Aware AI SmartClassroom
An audio/video data acquisition component, a pre-processing component, and a
massively parallel computing component are all necessary components for the suggested
techniques. For presentations with visual feedback, the presenter's cognitive burden is
taken into account when constructing the presentation (quantified by the [C] vector). They
can use these methods to alter their body language, voice intonation and hand gestures to
improve their presentation. Many design issues must be overcome as well.
A system for quantifying the presenter's multimodal cues (Box I) as well as the
listeners' subjective input (Box II), must first be devised (the [S] Vector in Box II). Using
educational psychometric investigations, it is necessary to investigate how to compute
[S]. Using well-established crowd-sensing methods, a valid [S] vector may be generated
by excluding outliers and findings that may be erroneous or biased. Box III also requires
deep neural networks (Box III) capable of learning the complex non-linear relationship
between (A), [V], and (S) vectors in Training Mode and simulating a crowd in
Presentation Mode by providing an estimated score vector for the audience. On the fly
adjustment of a parametric feedback engine (Box IV) is necessitated by an investigation
of optimal visual and haptic feedback alternatives for distinct presenters.
17
CHAPTER 3
Jupiter (or)
DEEP LEARNING: Deep learning (also known as deep structured learning) is part
of a broader family of machine learning methods based on artificial neural networks with
representation learning. Learning can be supervised, semi-supervised or
18
unsupervised. Deep-learning architectures such as deep neural networks, deep belief
networks, graph neural networks, recurrent neural networks and convolutional neural
networks have been applied to fields including computer vision, speech recognition,
natural language processing, machine translation, bioinformatics, drug design, medical
image analysis, material inspection and board game programs, where they have produced
results comparable to and in some cases surpassing human expert performance.
The adjective "deep" in deep learning refers to the use of multiple layers in the
network. Early work showed that a linear perceptron cannot be a universal classifier, but
that a network with a nonpolynomial activation function with one hidden layer of
unbounded width can. Deep learning is a modern variation which is concerned with an
unbounded number of layers of bounded size, which permits practical application and
optimized implementation, while retaining theoretical universality under mild conditions.
Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful in
solving out the problem of dimensionality. Deep learning algorithms are used, especially
when we have a huge no of inputs and outputs.
19
Since deep learning has been evolved by the machine learning, which itself is a
subset of artificial intelligence and as the idea behind the artificial intelligence is to mimic
the human behaviour, so same is "the idea of deep learning to build suchalgorithm that
can mimic the brain".
Deep learning is implemented with the help of Neural Networks, and the idea
behind the motivation of Neural Network is the biological neurons, which is nothing
but a brain cell.
In the example given above, we provide the raw data of images to the first layer
of the input layer. After then, these input layer will determine the patterns of local contrast
that means it will differentiate on the basis of colors, luminosity, etc. Then the 1st hidden
layer will determine the face feature, i.e., it will fixate on eyes, nose, and lips, etc. And
then, it will fixate those face features on the correct face template. So, in the 2nd hidden
layer, it will actually determine the correct face here as it can be seen in the above image,
after which it will be sent to the output layer. Likewise, more hidden layers can be added
to solve more complex problems, for example, if you want to find out a particular kind of
face having large or light complexions. So, as and when the hidden layers increase, we
are able to solve complex problems.
20
ARCHITECTURES
A deep belief network is a class of Deep Neural Network that comprises of multi-
layer belief networks.
Next, the formerly trained features are treated as visible units, which perform
learning of features.
Lastly, when the learning of the final hidden layer is accomplished, then the
whole DBN is trained.
21
forward network. To minimize the prediction error, the backpropagation algorithm can
be used to update the weight values.
22
4. RESTRICTED BOLTZMANN MACHINE
RBMs are yet another variant of Boltzmann Machines. Here the neurons present
in the input layer and the hidden layer encompasses symmetric connections amid them.
However, there is no internal association within the respective layer. But in contrast to
RBM, Boltzmann machines do encompass internal connections inside the hidden layer.
These restrictions in BMs helps the model to train efficiently.
5. AUTOENCODERS
Self-Driving cars:
23
Voice Controlled Assistance:
When we talk about voice control assistance, then Siri is the one thing that comes
into our mind. So, you can tell Siri whatever you want it to do it for you, and it will search
it for you and display it for you.
Whatever image that you upload, the algorithm will work in such a way that it
will generate caption accordingly. If you say blue coloured eye, it will display a blue-
coloured eye with a caption at the bottom of the image.
With the help of automatic machine translation, we are able to convert one
language into another with the help of deep learning.
24
CHAPTER 4
SYSTEM METHODOLOGY
25
User can access the system through android and web application also. Then the
user will be detected, and after the face is captured, it will be pre-processed and the
features extracted will be stored into the image database. These features obtained will
then be sent to the trained neural network, which will predict features and use them to
detect the emotion and obtain the results. Based on these results, the system will provide
relevant recommendations to the user. The user will then find some recommended tasks
or some videos on his screen, as per the resultant mood, in order to improve their mood.
a) Input Image
d) Recommendation kernel
e) Result
First of all, when the user enters into the application, it detects the user face. This
image is then divided into different sections of the face, such as forehead, eyebrows, lower
eye, right check and left check. After all the pre-processing is done, then with
Convolutional neural network it trains the given dataset and with every epoch accuracy
increases. Then with the user’s image it detects emotion and give accordingly suggest
tasks to change mood of sad, depressed person.
4.2 ALGORITHM
26
algorithms. While in primitive methods filters are hand-engineered, with enough training,
CNN have the ability to learn these filters/characteristics.
CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an array of pixels
and depends on the resolution of the image. Based on image resolution, it willsee as h *
w * d, where h= height w= width and d= dimension. For example, An RGB image is 6 *
6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 1 array of the matrix.
27
In CNN, each input image will pass through a sequence of convolution layers
along with pooling,fully connected layers, filters (Also known as kernels). After that, we
will apply the Soft-max function to classify an object with probabilistic values 0 and 1.
CONVOLUTIONAL LAYER
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which takes two
inputs such as image matrix and a kernel or filter.
28
Figure 4.2.3: Kernel Process
Let's start with consideration a 5*5 image whose pixel values are 0, 1, and filter matrix
3*3as:
The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is
called "Features Map"and show as an output.
29
Convolution of an image with different filters can perform an operation such as blur,
sharpen, and edgedetection by applying filters.
Strides:
Stride is the number of pixels which are shift over the input matrix. When the
stride is equal to1, then we move the filters to 1 pixel at a time and similarly, if the stride
is equal to 2, then we move thefilters to 2 pixels at a time. The following figure shows
that the convolution would work with a stride of 2.
Shrinking outputs
Max pooling:
Average pooling:
Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.
31
Sum Pooling:
The sub-region for sum pooling or mean pooling are set exactly the same as for
max- pooling but instead of using the max function we use sum or mean.
The fully connected layer is a layer in which the input from the other layers will
be flattened into a vector and sent. It will transform the output into the desired number
of classes by the network.
In the above diagram, the feature map matrix will be converted into the vector
such as x1, x2,x3... xn with the help of fully connected layers. We will combine features
to create a model and applythe activation function such as soft max or sigmoid to classify
the outputs as a car, dog, truck, etc.
32
4.3 SYSTEM DESIGN
33
Facial recognition requires several phases: detection of face images, pre-
processing of face images,retrieval of facial features, alignment of face images,
and identification of face images. There are primarilytwo types of extraction of
features: one is geometric attribute extraction, and the other is a procedure which
focused on total statistical characteristics. To describe the location of facialorgans
as the features of the classification, the geometrical feature-based approach is
widely used.
34
4.3.3 UML Diagrams
Diagrams in the Unified Modelling Language (UML) based on use cases are
known as use cases in the UML. Graphically depicting the system's actors, their goals
(expressed as use cases), and any interdependencies between those use cases is its primary
goal. Using a use case diagram, you can show which actors are responsible for which
system functions. They can be depicted in the system.
35
Sequence Diagram
Diagrams that show how processes interact with one another and in what order
are known as sequence diagrams in the Unified Modelling Language (UML). A Message
Sequence Chart is the basisfor this diagram. Sequence diagrams are also known as event
diagrams, scenario diagrams, and timingdiagrams.
36
CHAPTER 5
5.1 EXPERIMENTATION
The human face is captured by using PC’s web cam or external webcam. From
that live steam the face is extracted and all other unwanted components are not
considered. To achieve this efficiency and comprehensiveness we have picked theCNN,
so as to identify and remove the countenances. For this we have utilized the Open CV
library (to be specific classifier).
Pre-processing:
It is a common name for operations with images at the lowest level of abstraction
for both input and output are intensity images. The aim of pre-processing is an
improvement of the image data that suppresses unwanted distortion or enhances some
image features important for further processing.
Region Splitting:
For the emotion recognition the main region of face under consideration are
eyebrows and mouth. And the splitting of mouth and the eyebrows is named as region
splitting.
Emotion Classification:
After the sub task of feature extraction is completed the reaction of the person
is produced simultaneously with their percentage level.
Happy
Angry
Surprise
Sad
37
Disgust
Fear
Neutral
Happy:
Normally includes a grin both corner of the mouth rising, the eyes are squinting
and wrinkles show up at eyes corners. The underlying practical job of the grin, which
speaks to bliss, remains a riddle. A few scientists trust that grin was at first an indication
of dread. Monkeys and primates gripped teeth to demonstrate predators that they are
innocuous. A grin urges the mind to discharge endorphins that help reducing torment
and take after a sentiment of prosperity. Those positive sentiment that one grin can create
can help managing the dread. A grin can likewise create positive affections for somebody
who is observer to the grin, and may even inspire him to grin as well.
38
Angry:
39
Surprise:
Includes enlarged eyes and now and then open mouth. The capacity opening the
eyes so wide is assume to help expanding the visual field to thinks about demonstrate that
it doesn't really do as such and the quick eye development, which can help discovering
dangers. Opening the mouth empowers to breathe discreetly and by thatnot being
uncovered by the foe. Eyebrow is raised joined creating wrinkles on the brow; eyes have
been opened to the greatest, with upper eyelids raised as high as could reasonably be
expected; lips are extended on a level plane towards the ears; jaw has been pulled
marginally in reverse as is obvious by the flat wrinkles on the neck.
40
Sad:
Includes a slight pulling down of lip corners, internal side of eyebrows is rising.
Darwin clarified this articulation by smothering the will to cry. The command over the
upper lip is more noteworthy than the authority over the lower lip, thus the lower lip
drops. At the point when an individual shouts amid a cry, the eyes are shut so as to shield
them from circulatory strain that collects in the face. In this way, when we have the
inclination to cry and we need to stop it, the eyebrows are ascending to keep the eyes
from shutting.
41
Disgust:
Includes wrinkled nose and mouth. Now and again even includes tongue
turning out. This articulation imitates an individual that tasted terrible nourishment
and needs to spit it out, or smelling foul smell. An obvious outrageous nauseate
articulation. Eyebrows are brought down shaping a 'V' over the nose and delivering
wrinkles on the brow; eyes are limited to shut out the wellspring of disturb
conceivable jawline is marginally pulled in reverse and a roundabout wrinkle show.
42
Fear:
Includes enlarged eyes and now and then open mouth. The capacity opening the
eyes so wide is assume to help expanding the visual field to thinks about demonstrate
that it doesn't really do as suchand the quick eye development, which can help
discovering dangers. Opening the mouth empowers tobreathe discreetly and by that not
being uncovered by the foe. Eyebrow is raised joined creating wrinkleson the brow;eyes
have been opened to the greatest, with upper eyelids raised as high as could
reasonablybe expected; lips are extended on a level plane towards the ears; jaw has been
pulled marginally in reverse as is obvious by the flat wrinkles on the neck.
43
Neutral:
It does not include in any of the reaction like happy, shock, sad, disgust,
angry etc. this expression is a simple one where the lips and eyes are in normal
position. Which indicates that the useris not showing any reaction. The default type
of emotion is neutral. Every reaction change starts from the neutral.
44
WORKING
OpenCV its commonly well-known library for facial extraction. OpenCV utilizes
AI calculations to scan for countenances inside an image. Since countenances are so
confused, there isn't one basic test that would be understand it found a face or not. Rather,
there are a great many little examples and highlights that must be coordinated. The
calculations break the assignment of recognizing the face into a great many littler, nibble
estimated undertakings, every one of which is anything but difficult to tackle. These
undertakings are likewise called classifiers.
For something like a face, you may have at least 7,000 classifiers, all of which
must counterpart for a face to be identified inside blunder limits, obviously. Be that as it
may, in that lies the issue: for face recognition, the calculation begins at the upper left of
an image and moves down crosswise over little squares of information, taking a gander
at each square. Like a progression of cascades, the OpenCV course breaks the issue of
distinguishing faces into various stages. For each square, it completes an extremely harsh
and speedy test. On the off chance that that passes, it completes a somewhat progressively
itemized test, etc.
The calculation might had 20 to 40 this stages of falls, and it will possibly
recognize the facial should clear. The preferred standpoint is that most of the image will
restore a negative amid the initial couple of stages, which implies the calculation won't
sit around idly testing every one of the 6,000 highlights on it. Rather than taking hours,
face identification should now be possible progressively. Since face recognition is such
a typical case, OpenCV accompanies various inherent falls for recognizing everything
from countenances to eyes to hands to legs.
45
articulations is an essential advance towards fruitful relations. Articulations and feelings
go inseparably, for example exceptional blends of face strong activities mirror a specific
feeling.
For a few articulations we can't generally tell just by taking a gander at it, what
is the contrary articulation, so the choice that took the gander the muscles including all
46
the while, at that point it turns out to be exceptionally clear. A fascinating clarification
to the direct opposite practical source depends on hindrance. An individual or a creature
is attempting to anticipate completing a specific activity, one path is to utilize the hostile
muscles. Indeed, when an improvements flag is sent to a muscle, an inhibitory flag is send
naturally to the rival muscle. Outward appearances that can be clarified with direct
opposite all the parts of identity with hostility and maintaining a strategic distance.
CNN Process:
47
Fig 5.1.8 shows the calculation how to perceive protests in pictures, we utilize a particular
kind of Artificial Neural Network, a Convolutional Neural Network (CNN). The name
comes from it’s the most important tasks in the system called convolution. The
straightforward cells enact, for instance, when they recognize fundamental fixedby
lines as the shapes territory and a particular point. The unpredictable cells have bigger
open fields and their yield isn't touchy to the particular position in the field. The complex
cells keep on reacting to a specific improvement, despite the fact that its total direction of
the eyes will change. Complex alludes to increasingly adaptable, for this situation. In
vision, a responsive field of a solitary tangible area of the retina in which something will
influence the terminating of that neuron (that is, will dynamic theneuron). Each tangible
neuron cell has comparable open fields, and their fields are overlying.
5.2 RESULTS
Screenshots:
48
Figure 5.2.2: Expression of Angry
50
fromkeras.models import Sequential
fromkeras.layers import Dense
import time
51
52
53
54
55
56
5.3 TESTING
Errors are discovered during testing. The goal of testing is to find any and all flaws
in a product or service. Components and subassemblies, as well as finished products, can
be tested to ensure their functionality. If you want to make sure that the software system
doesn't break in an unacceptable way and meets all of its needs and expectations, it is the
process of exercising software. There are a wide variety of exams to be taken. Each test
type is designed to meet a certain need.
Unit Testing:
When a program's logic is tested, it may be assured that the program's inputs
and outputs are legitimate. It is essential to test each step of the decision-making process
as well as the internal code flow. Piecemeal testing is done on the application. After each
unit is completed, it is integrated into the whole. In order to do this invasive
57
structural test, you must be familiar with the design of the system beforehand. In order
to evaluate a specific business process, application, or system configuration at the
component level, this type of test is utilised Unit tests are one method of verifying that
a business process adheres to its documented criteria.
An integration test's job is to make sure that all of the software components that
have been combined work together as a single unit. The importance of screen or field
outcomes has waned in testing. However, even if each component has passed unit testing,
it is still correct and consistent to put them together. Interfacing two or more pieces of
software together is the primary goal of integration testing.
A test of functionality:
Functional tests are organized and prepared in accordance with the requirements,
important functionalities, or specific test cases that they are designed to evaluate. The
system must also be tested for its capacity to cover all of the required data
58
fields, pre-programmed processes, and subsequent operations. Before functional testing
is finished, new tests are discovered and the value of the ones that already exist is
assessed.
It is only via system testing that you can be certain that your integrated software
system is up to par with the required specifications. A collection of tests is run on a
particular setup to ensure that the results are in line with what was predicted. A system
testing method is the configuration-oriented system integration test. Pre-driven process
links and integration points are the primary focus of system testing.
It's a type of software testing in which the tester has some prior knowledge of the
software or the purpose of the product. It serves a purpose and has a cause to exist. It is
required to have an additional tool for testing in locations that cannot be accessed from
the level of the black box.
You can't do Black Box Testing if you don't know anything about the module
you're trying to test. In order to write a black box test, a definitive source document, such
as a specification or requirements document, must be in place first. During this testing
method, the programme under scrutiny is treated as if it were a black box. One cannot
"see" into it. Inputs and outputs are only tested; software is not taken into account at this
time.
59
A unit test typically comprises of three stages: plan, cases and scripting and the
unit test itself. In the first step, the unit test is prepared and reviewed. The next step is
for the test cases and scripts to be made, then the code is tested.
Test-driven development requires that developers first write failing unit tests.
Then they write code and refactor the application until the test passes. TDD typically
results in an explicit and predictable code base.
Unit testing involves only those characteristics that are vital to the performance
of the unit under test. This encourages developers to modify the source code without
immediate concerns about how such changes might affect the functioning of other units
or the program as a whole. Once all of the units in a program have been found to be
working in the most efficient and error-free manner possible, larger components of the
program can be evaluated by means of integration testing. Unit tests should be performed
frequently, and can be done manually or can be automated. Unit tests can be performed
manually or automated. Those employing a manual method may have an instinctual
document made detailing each step in the process; however, automatedtesting is the more
common method to unit tests. Automated approaches commonly usea testing framework
to develop test cases. These frameworks are also set to flag and report any failed test cases
while also providing a summary of test cases.
60
Both manual field testing and detailed functional testing are in the works.
Indicators of success:
Aspects that will be put to the test It's important to make sure that all of the data is
entered correctly.
Integrity verification:
There were no failures in any of the tests listed above. There were no issues.
Acknowledgement Tests:
All of the test cases listed above passed with flying colours. No issues were
found.
61
5.3.2 Test Cases
62
5 T5 Disgust Disgust Pass
63
CHAPTER 6
6.1 CONCLUSION
In this project, the expressions of the faces are effectively identified by processing
the dataset that consists of various facial expression which is then coded in python or
classification. Our proposed architecture is recognizing the emotion of human face
dynamically. Here, the main parameter consider is the position of the eyes and the mouth.
The emotion is recognized according to the position change of eyes and mouth. Here in
addition, it displays the percentage of every reaction of a person dynamically and from
the data, each and every single data is processed in such a way that it takesthe portion
of the image and keeps on cropping the image and tries to get average or maximum
information out of it, which is termed as pooling.
64
6.2 FUTURE SCOPE
65
REFERENCES
[1] R. G. Harper, A. N. Wiens, and J. D. Matarazzo, Nonverbal communication: the state
of the art. New York: Wiley, 1978.
[2] P. Ekman and W. V. Friesen, “Constants across cultures in the face and emotion,”
Journal of Personality and Social Psychology, vol. 17, no 2, p. 124-129, 1971.
[3] C. Tang, P. Xu, Z. Luo, G. Zhao, and T. Zou, “Automatic Facial ExpressionAnalysis
of Students in Teaching Environments,” in Biometric Recognition, vol. 9428,
J. Yang, J. Yang, Z. Sun, S. Shan, W. Zheng, et J. Feng, Éd. Cham: Springer International
Publishing, 2015, p. 439-447.
[5] J. Whitehill, Z. Serpell, Y.-C. Lin, A. Foster, and J. R. Movellan, “The Faces of
Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions,”
IEEE Transactions on Affective Computing, vol. 5, no 1, p. 86-98, janv. 2014.
[7] Krithika L.B and Lakshmi Priya GG, “Student Emotion Recognition System (SERS)
for e-learning Improvement Based on Learner Concentration Metric,” Procedia Computer
Science, vol. 85, p. 767-776, 2016.
66
[9] Y. Kim, T. Soyata, and R. F. Behnagh, “Towards Emotionally Aware AI Smart
Classroom: Current Issues and Directions for Engineering and Education,” IEEE Access,
vol. 6, p. 5308-5331, 2018.
[11] C.-K. Chiou and J. C. R. Tseng, “An intelligent classroom management system
based on wireless sensor networks,” in 2015 8th International Conference on Ubi- Media
Computing (UMEDIA), Colombo, Sri Lanka, 2015, p. 44-48.
[13] A. Fathallah, L. Abdi, and A. Douik, “Facial Expression Recognition via Deep
Learning,” in 2017 IEEE/ACS 14th International Conference on Computer Systems and
Applications (AICCSA), Hammamet, 2017, p. 745-750.
[14] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, vol. 1, p. I-511-I-
518.
[16] Opencv.opencv.org.
[17] Keras.keras.io.
[18] Tensorflow.tensorflow.org.
[19] aionlinecourse.com/tutorial/machine-learning/convolution-neural-network.
Accessed 20 June 2019.
67
[20] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional
neural network,” in 2017 International Conference on Engineering and Technology
(ICET), Antalya, 2017, p. 1-6.
68