0% found this document useful (0 votes)
43 views77 pages

FER 2013projectreport

This document is a mini project report on face emotion recognition using convolutional neural networks and machine learning. It was submitted by two students, Soma Saikiran and Vidagotti Rohin, under the guidance of their professor Ms. Uday Sree. The report details the background, methodology, experiments, results and conclusion of the project on building a system for identifying emotions from facial expressions using deep learning techniques. It provides certificates, declarations and acknowledgements before explaining the literature review, requirements, system design including algorithms, testing process and future scope of the project.

Uploaded by

nsyadav8959
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views77 pages

FER 2013projectreport

This document is a mini project report on face emotion recognition using convolutional neural networks and machine learning. It was submitted by two students, Soma Saikiran and Vidagotti Rohin, under the guidance of their professor Ms. Uday Sree. The report details the background, methodology, experiments, results and conclusion of the project on building a system for identifying emotions from facial expressions using deep learning techniques. It provides certificates, declarations and acknowledgements before explaining the literature review, requirements, system design including algorithms, testing process and future scope of the project.

Uploaded by

nsyadav8959
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

A

MINI PROJECT REPORT


ON
FACE EMOTION RECOGNITION USING CONVOLUTIONAL
NEURAL NETWORKS AND MACHINE LEARNING.
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
(AI&ML)
Submitted by
(BATCH:CSM-24)
SOMA SAIKIRAN:207Y1A6643
VIDAGOTTI ROHIN:207Y1A6635
Under the guidance
Of
Ms.UDAY SREE
Associate professor

DEPARTMENT OF CSM
MARRI LAXMAN REDDY
INSTITUTE OF TECHNOLOGY AND MANAGEMENT
(AUTONOMOUS)
(Affliated to JNTUH,Approved by AICTE New Delhi and Accrediated by NBA &NAAC With A grade)

(JULY 2023)
CERTIFICATE

This is to certify that the project titled “Face Emotion identification using
convolutional neural networks and machine learning” is being submitted by
VIDAGOTTI ROHIN (207Y1A6635) and SOMA SAIKIRAN(207Y1A6643)
in IV B.tech I semester in computer science and engineering(CSM) is a record
bonafide work carried out by him.The results embodied in this report have not been
submitted to any other university for the award of any degree.

Internal Guide Hod

Principal External Examiner


DECLARATION

I hereby declare that the Mini Project Report entitled ,”Face Emotion
identification using convolutional neural networks and Machine learning”
submitted for the B.Tech degree is entirely my work with the help of my team member
and all ideas and refernces have been duly acknowledged.It does not contain any work
for the award of other degree.

DATE:
VIDAGOTTI ROHIN
(207Y1A6635)

SOMA SAIKIRAN
(207Y1A6643)
ACKNOWLEDGEMENT

I am happy to express my deep sense of gratitude to the principal of the college Dr.K
Venkateswara Reddy,Professor,Department of Computer Science and
Engineering.Marri Laxman Reddy Institute of Technology & Management,for having
provided me with adequate facilities to pursue my project.

I would like to thank B.Ravi Prasad,Assoc.Professor and Head of the Department of


CSM.Marri Laxman Reddy Institute of Technology & Management,for having
provided the freedom to use all the facilities available in the department,especially the
laboratories and the library.

I am very gratefull to my project guide Ms.UDAY SRI, Associate prof.,Department of


Computer science and Engineering.Marri Laxman Reddy Institute of Technology &
Management ,for his extensive patience an guidance throughout my project work.

I sincerely thank my seniors and all the teaching and non-teaching staff of the
Department of computer science for their timely suggestions,healthy criticism and
Motivation during the course of this work.

I would also like to thank my classmates for always being there whenever I needed
Help or moral support.with great respect and obedience,I thank my parents and
Brother who were the backbone behind my deeds.

Finally, I express my immense gratitude with pleasure to the other individuals who
have either directly or indirectly contributed to my need at right time for the
development and success of this work.
ABSTRACT

Emotion recognition system place the important role in many fields,


particularly image processing, medical science, machine learning. As per human
needs, the effect and potential use of programmed emotion recognition have been
developing in a wide scope of utilizations, including human- PC communication,
robot control and driver state observation. In any case, to date, vigorous
acknowledgment of outward appearances from pictures and recordings is yet a testing
errand because of the trouble in precisely extricating the helpful passionate highlights.
These highlights are regularly spoken to in various structures, for example, static,
dynamic, point-based geometric or area-based appearance. Facial development
highlights, which incorporate component position and shape changes, are by and large
brought about by the developments of facial components andmuscles on the face of
enthusiastic manner. Emotion recognition system has many applications. and it plays
a vital part in fault detection and in gaming application. In this project the emotion
recognition is of dynamic way and not like uploadingthe image and finding the
emotion. And this is achieved with the help of theconcept of machine learning called
Convolutional Neural Network. This is one of the most familiar deep learning
concepts. The main moto of using this concept isto maintain accuracy. The CNN
consists of many intermediate states which plays the important role in producing the
accurate output. The layers of CNN are input layer, hidden layer and output layer.
The hidden layer is used to update weight, bias and activation function. If we use
the CNN methodology the unwanted parts which is un necessary for the emotion
recognition will be eliminated accurately. The CNN helps to reduce our elimination
task in easier way and with minimal steps.
TABLE OF CONTENTS
TOPIC NAME PAGE NO
LIST OF TABLES i
LIST OF FIGURES ii

LIST OF ABBREVIATIONS iii

1 Introduction 1
1.1 Introduction 1
1.2 Existing System 5
1.3 Problem Statement 5
1.4 Proposed System 6
1.4.1 Proposed System 6
1.4.2 Objective 7
2 Literature Review 8
3 Requirements and Domain Information 18
3.1 Requirement Specification 18
3.1.1 Hardware Requirements 18
3.1.2 Software Requirements 18
3.2 Domain Information 18
4 System Methodology 25
4.1 Architecture of Proposed System 25
4.2 Algorithm 26
4.3 System Design 33
4.3.1 Data Flow Diagrams 33
4.3.2 Class Diagram 34
4.3.3 UML Diagram 35
5 Experimentation and Analysis 37
5.1 Experimentation 37
5.2 Results 48
5.3 Testing 57
5.3.1 Types of Testing 57
5.3.2 Test Cases 62
6 Conclusion and Future Scope 64
6.1 Conclusion 64
6.2 Future Scope 65

References 66
PAPER PUBLICATION 69
LIST OF TABLES

TABLE NO. DESCRIPTION PAGE NO.


1.1 Definition of 64 primary and secondary landmark 4
5.3.2 Test Cases 62

i
LIST OF FIGURES
FIGURE DESCRIPTION PAGE
NO. NO.
1.1.1 FER Procedure for an Image 3
1.1.2 Facial Landmarks to be Extracted from a Face 4
3.2.1 Example of Deep Learning 20
4.1.1 Architecture 25
4.1.2 System Design 25
4.2.1 Example of CNN 27
4.2.2 CNN Architecture 28
4.2.3 Kernel Process 29
4.2.4 Strides Process 30
4.2.5 Padding Process 30
4.2.6 Fully Connected layer 32
4.3.1 Flow Chart 33
4.3.2 Class Diagram 34
4.3.3 User Module 35
4.3.4 Software Module 35
4.3.5 Sequence Diagram 36
5.1.1 Example of Happy Face 38
5.1.2 Example of Angry Face 39
5.1.3 Example of Surprise Face 40
5.1.4 Example of Sad Face 41
5.1.5 Example of Disgust Face 42
5.1.6 Example of Fear Face 43
5.1.7 Example of Neutral Face 44
5.1.8 Image Identification and Classification 47
5.2.1 Screenshot of Execution 48
5.2.2 Screenshot of Angry 49
5.2.3 Screenshot of Fearful 49
5.2.4 Expression of Surprised 50

ii
LIST OF ABBREVIATION

ABBREVATION DESCRIPTION

CNN Convolutional Neural Network


FER Facial Emotions Recognition
KNN K Nearest Neighbor
SVM Support Vector Machine
API Application Programming Interface
XML Extensible Markup Language
HTTP Hypertext Transfer Protocol
MLA Multimodal Learning Analytics
UML Unified Modelling Language

iii
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

Any one person even without saying anything, it can convey a vast range of
emotions. Facial expressions convey a person's thoughts, feelings, and actions, and
facial expression recognition software can detect these expressions in a photograph
of a person's face. During the early 20th century, Americans Ekman and Friesen
created a common set of six globally shared sentiments termed the "basic emotions"
(angry; afraid; disgusted; sad; surprised; happy). Facial expression detection has
gained a lot of attention recently because of its impact on clinical practice, friendly
robots, and education. According to a number of research, emotions have a substantial
impact on education. Despite the fact that teachers are already receiving feedback
from exams and questionnaires, these methods aren't necessarily the most effective.
Teachers can use the facial expressions of their pupils to alter their teaching strategies
and resources. This study uses Convolutional Neural Networks (CNNs), a deep
learning method widely used in image classification, to identify students' emotions
through facial expression analysis. A multistage image processing method is used to
extract feature representations. Each of these seven emotions can be recognized in a
three-step process that begins with face detection and ends with recognition.

Embrace Your Feelings Many academics are intrigued by this newfound notoriety
and seek to use it to improve instruction in the classroom (FER). According to Tang
et al. students' facial expressions can be used to gauge the success of classroom
teaching. Facial recognition, face detection, and facial expression detection are all
part of the system. We employ ULGBPHS and KNN to sort and categorize data.
Savva et al reported an investigation of students' emotional states during active
classroom instruction via a web application. Live footage from webcams installed in
classrooms was analysed using machine learning algorithms.

1
Students' emotional states can be identified and monitored in real time by an e-
learning system, and the authors came up with a proposal for how to do this. Eye and
head movement data can be used to infer a student's emotional state in an online
learning environment. A Facial Emotion Recognition System was created by Ayvaz
et al to recognize the emotional states and motivations of students in videoconference
style e-learning (FERS). KNN and SVM were shown to be the most accurate machine
learning algorithms, while Random Forest, Classification & Regression Trees, and
SVM were found to be the least accurate.

When it comes to improving the quality and memorability of their lectures, Kim
and co- workers devised a system that provides real-time recommendations to
instructors so they can adjust their nonverbal behaviour, such as body language and
facial expressions in real-time. The Haar Cascade technique was used to detect facial
expression utilizing a virtual learning environment based on facial emotion detection
using the JAFF database. In Chiou et al. used wireless sensor network technology to
build an intelligent classroom management system that enables teachers swiftly
switch instruction modes to prevent wasting time.

Facial emotions are important factors in human communication that help to


understand the intentions of others. In general, people infer the emotional state of
other people, such as joy, sadness and anger, using facial expressions and vocal tones.
Facial expressions are one of the main information channels in interpersonal
communication. Therefore, it is natural that facial emotion research has gained a lot
of attention over the past decade with applications in perceptual and cognitive
sciences. Interest in automatic Facial Emotion Recognition (FER) has also been
increasing recently with the rapid development of Artificial Intelligent (AI)
techniques. They are now used in many applications and their exposure to humans
is increasing. To improve Human Computer Interaction (HCI) and make it more
natural, machines must be provided with the capability to understand the surrounding
environment, especially the intentions of humans. Machines can
capture their environment state through cameras and sensors. In recent years, Deep
Learning (DL) algorithms have proven to be very successful in capturing
2
environment states. Emotion detection is necessary for machines to better serve
their purpose since they deliver information about the inner state of humans. A
machine can use a sequence of facial images with DL techniques to determine human
emotions.

Artificial Intelligence (AI) and Machine Learning (ML) are widely employed
in many domains. In data mining, they have been used to detect insurance fraud. In
Clustering based data mining was used to identify patterns in stock market data. ML
algorithms have played a significant role in pattern recognition and pattern
classification problems such as FER, Electroencephalography (EEG) and spam
detection. ML can be used to provide cost-effective, reliable and low computation
time FER solutions.

FER typically has four steps. The first is to detect a face in an image and draw
a rectangle around it and the next step is to detect landmarks in this face region. The
third step is extracting spatial and temporal features from the facial components. The
final step is to use a Feature Extraction (FE) classifier and produce the recognition
results using the extracted features. Figure 1.1 shows the FER procedure for an input
image where a face region and facial landmarks are detected. Facial landmarks are
visually salient points such as the end of a nose, and the endsof eyebrows and the
mouth as shown in Figure 1.2. The pairwise positions of two landmark points or the
local texture of a landmark are used as features. Table 1.1 gives the definitions of 64
primary and secondary landmarks. The spatial and temporal features are extracted
from the face and the expression is determined basedon one of the facial categories
using pattern classifiers.

Figure 1.1.1: FER procedure for an image


3
Figure 1.1.2: Facial landmarks to be extracted from a face
Primary landmarks Secondary landmarks
Num Definition Number Definition
ber
16 Left eyebrow 1 Left temple
outer corner
19 Left eyebrow 8 Chin tip
inner corner
22 Right eyebrow 2-7,9-14 Cheek contours
inner corner
25 Right eyebrow 15 Right temple
outer corner
28 Left eye outer 16-19 Left eyebrow
corner contours
30 Left eye inner 22-25 Right eyebrow
corner corners
32 Right eye inner 29,33 Upper
corner eyelid
centers
34 Right eye outer 31,35 Lower
corner eyelid
centers
41 Nose tip 36,37 Nose saddles
46 Left mouth corner 40,42 Nose peaks
(nostrils)
52 Right mouth 38- Nose contours
corner 40,42-
45
63,64 Eye centers 47- Mouth contours
51,53-
62

Table 1.1: Definition of 64 primary and secondary landmarks


4
Deep Learning (DL) based FER approaches greatly reduce the dependence on
face-physics based models and other pre-processing techniques by enabling end to end
learning directly from the input images. Among Deep Learning models, Convolutional
Neural Networks (CNNs) are the most popular. With a CNN, an input image is filtered
through convolutional layers to produce a feature map. This map is then input to fully
connected layers, and the facial expression is recognized as belonging to a class based
on the output of the FE classifier.

1.2 EXISTING SYSTEM

Students' emotional states can be identified and monitored in real time by an e-


learning system, and the authors came up with a proposal for how to do this. E-learning
systems use eye and head movement to infer important information about students' moods
and energy levels in the classroom. Students' emotional states and motivation can be
tracked using a Facial Emotion Recognition System in videoconference-style e- learning,
created by Ayvaz and colleagues. (FERS). There are a variety of machine learning
approaches available, but SVM and KNN have the highest accuracy rates, followed by
Random Forest and Classification & Regression Trees.

When it comes to improving the quality and memorability of lectures, Kim and
co-workers have built a system that provides real-time recommendations to instructors so
they can adjust their body language and facial expressions in real-time. Facialemotion
identification using Haar Cascades was proposed by the authors in to detect emotions in
a virtual learning environment utilizing data from the JAFF database. An intelligent
classroom management system was developed by Chiou et al in using wireless sensor
network technology. This system aids teachers in quickly switching instruction modes in
order to save time. People's emotions can never be predicted precisely.

1.3 PROBLEM STATEMENT

Behavior of sentiments through facial feelings was an object of interest since the
season of Aristotle. This point became simply after 1955, when a rundown of

5
general feelings was set up and a few parametrized frameworks were proposed.
Encouraged by Deep Learning and Computer Vision, building mechanized
acknowledgment frameworks has gotten a ton of consideration inside the Computer
Science region. To understanding correspondence Mehran, has deduced in his
investigation that in eye-to-eye correspondence, feelings are transmitted in extent of 55%
through outward appearances. It implies that, if the PC could catch and compare the
feelings of the user, correspondence would be progressively normal and proper,
particularly in the event that we consider situations where a PC would assume the job
of a human.

1.4 PROPOSED SYSTEM

1.4.1 Proposed System

Detailed facial motions will be captured, and appropriate emotion will be detected
by using deep learning algorithm such as CNN. The system will determine the best
classifiers for recognizing particular emotions, where single and multi-layerednetworks
will be tested. Different resolutions of the images representing faces as wellas the
images including regions of mouth and eyes will be included. On the basis of the test
results, a cascade of the neural networks will be proposed. The cascade will recognize six
basic emotions and neutral expression. Depending upon the resultant emotion, the module
will suggest songs or suitable tasks, to cheer up a person and enhance his/her mood.

As a solution to the above problem statement, enforce the concept of


Convolutional Neural Network. However, such a lot of Neural Network algorithms uses
back propagation due to some unique features like pooling we go for Convolutional
Neural Network. The real statistics photo that is tested for highlights. This proposed
system ignores the background distractions and produce accurate output. The activation
function set up the highest point for each channel you use. The biggest square shape is
one fix to be down sampled. The activation function consolidated by means of down
sampling. Another gathering of initiation maps created by disregarding the channels the
stack that is down sampled first. The second down sampling which gathers the second

6
gathering of initiation maps. This system will display the emotion name with their
percentage level. For example, if the user is happy, the happy emotion doesn’t contain
only happiness, there will be mixture of some other additional emotions.

Like, the happiness may be shock mixture happiness, angry mixed happiness.
According to this mixture the percentage will be displayed. For example, if shock mixed
happiness the percentage of happy may be 77% and the shock may be 60% approximately.
And its changes dynamically when the emotion of the person changes.

The Convolutional Neural Network (CNN) architecture we propose to use to


analyze students' facial expressions is described in this section. To begin, the algorithm
looks for faces in the input image and then crops and normalizes the found faces to a 4848
size. Then, CNN uses these facial photographs as input. Facial expression recognition
results are the final by-product (anger, happiness, sadness, disgust, surprise or neutral).

1.4.2 Objective

The main objective is to prepare a solution for the outward appearance


acknowledgment difficulty by separating it as sub problems and dividing those sub
problems into sub groups of some particular action units. This method not only focused
on two class issues, which tells about the action unit is on or off, yet furthermore multi-
class issues that illuminate the client about the multi action or more than one action unit
in the meantime. For this aspect, we can use distinctive philosophies and methods for
highlighting the extraction, standardization, determination and arrangements. Solution for
these issues simply as taking the computational multifaced nature and timing problem
into the idea. The project objective is to actualize face acknowledgment in a really perfect
route concerning that is of run time implementation framework. Various calculation and
strategies are considered for accomplishing this objective. This type of face recognition
framework can be broadly utilized in our day-by-day life in various segments. We trust
that human life can be extraordinarily assisted with this innovation.

7
CHAPTER 2

LITERATURE REVIEW
Many academics are intrigued by Face Emotion Recognition and intend to put
it to good use in the classroom (FER). Based on students' facial expressions, Tang et al.
claim that classroom teaching efficacy can be assessed. The system includes data
collection, face detection, face recognition, facial expression recognition, and post-
processing. We employ ULGBPHS and KNN to sort and categorize data. Savva et al.
used a web application to examine the emotional states of students who were taking part
in active classroom instruction. We employed machine learning techniques toexamine
camera footage from schools. According to Whitehill et al in students' facial expressions
can be used to gauge their level of engagement in class. Using Gabor properties and the
SVM algorithm, students' cognitive skill training software involvement may be tracked.
The videos were annotated by human judges, who provided the authors with labels.

A computer vision and machine learning technique was then used to determine
the emotional state of students playing an educational game in a school computer lab
designed to teach students about the fundamental principles of classic mechanical design
built a system that can identify and track the emotional state of students in real time and
provide comments to improve the e-learning environment. E learning systems use eye
and head movement to infer important information about students' moods and energy
levels in the classroom. Students' emotional states and motivation can be tracked using a
Facial Emotion Recognition System in videoconference-style e- learning, created by
Ayvaz and colleagues (FERS). There are a variety of machine learning approaches
available, but SVM and KNN have the highest accuracy rates, followed by Random
Forest and Classification & Regression Trees. When it comes to improving the quality
and memorability of lectures, Kim and co-workers have built a system that provides real-
time recommendations to instructors so they can adjust their body language and facial
expressions in real-time. Facial emotion identification using

8
Haar Cascades was proposed by the authors in to detect emotions in a virtual learning
environment utilizing data from the JAFF database. Smart classroom management
systems that assist teachers' ability to quickly change instruction modes are built using
wireless sensor networks by Chiou et al in.

State of the art, by R. G. Harper, A. N. Wiens, and J. D. Matarazzo. Published


in 1978 by Wiley in New York Gestures and body language are just two examples of
nonverbal communication. To communicate information, a signal is used. Without even
realizing it, we communicate with our bodies in a variety of ways. It is possible to
communicate without using words in three different ways: verbally and vocally, as well
as vocally but not verbally. Nonverbal behaviour is connected to verbal behaviour in a
number of ways. In order to maintain movement-to-movement regulation and the
structure of interpersonal communication, nonverbal phenomena play a critical role.
Using nonverbal cues to create hierarchy and priority among communicators also serves
to signal communication direction and provide feedback on communication quality.
Inadvertently, we've come to believe that the vastness of space can serve as a medium
of communication.

Biometric Recognition, vol. 9428, ed. J. Yang, Z, Sun, S. Shan, W Zheng and J
Feng, Cham: Springer International Publishing in 2015 p 439-447; 2.2 "Automatic Facial
Expression Analysis of Students in Teaching Environments."

It is common knowledge that students' facial expressions are an excellent way


to gauge the comprehension of what their teachers are saying. In order to address the issue
of high costs and low efficiency caused by employing human analysts, we have developed
an efficient prototype system that automatically analyse pupils' expressions. Uniform
Local Gabor Binary Pattern Histogram Sequence is a fusion technique that is used in this
approach to data fusion (ULGBPHS). Using the K-nearest neighbour (KNN) classifier,
students' expressions database has an average recognition rate of 79 percent. Teachers'
evaluations can be improved using the proposed system, accordingto the results of the
study.

9
"Recognizing student facial expressions: A web application," in IEEE Global
Engineering Education Conference (EDUCON), Tenerife, 2018, p. 1459-1462, A. Savva,
V. Stylianou, K. Kyriacou, and F. Domenach.

The research reported in this paper is being carried out with the purpose of
analysing the emotions of students engaged in hands-on, face-to-face classroom training.
Live video feeds from classroom webcams are incorporated into algorithms for machine
learning. In order for the professor to be able to examine the visualization program
remotely, it was designed as an internet-based app. An emotional chronology of student
reactions helps the lecturer and other interested parties improve educational content
delivery. Artificial Intelligence (AI) and Machine Learning (ML) a few words to
introduce yourself A wide range of information is being obtained in today's world from a
variety of sources. To maximize the value of a company's existing resources, it is
common to leverage data that was collected for one reason to be used for another. Even
though most businesses have security cameras in place to prevent theft, thefootage from
these cameras can be used in a variety of ways. In the future, an intelligent system may
evaluate images to reveal consumer emotions, and even estimate customer contentment;
in other words, rate the entire customer purchasing experience! It is also possible that the
ability to recognize and analyse emotions could be a powerful instrument for business
success.

It would be advantageous for businesses to be able to recognize and profit on


the emotions of their customers. Aside from enhancing virtual reality experiences,
emotion recognition can be used to track TV viewers' preferences and boost security
measures in public settings, as well as many other uses. In terms of APIs, there are
numerous options for detecting emotions. Examples include Microsoft's Project Oxford,
the Kairos Emotion Analysis API, EmoVu, Nviso, Affectiva, and RESTful Emotient Web
API. Face recognition APIs and SDKs abound, including Noldus' Face Reader API and
Sightcorp's Insight SDK (Software Development Kit). Emotional profiling Emotions can
be extracted from a piece of text using APIs like IBM Watson. When text analysis can
be used to forecast how people will feel, it's called affective
computing.
10
Using machine learning, facial expressions may be identified totally
automatically. Breaking down these facial expressions into their various emotional
subtypes is now the time to do so Scientists such as Alm et al study emotions like rage
and disgust as well as happiness and sadness. II. STUDENT EMOTIONS
RESEARCHED This study shows how emotion recognition in education may be utilized
in a conventional classroom with a face-to-face teaching situation.

The implementation consists of three main parts. There are many ways to conduct
data mining, including data collecting, processing, and aggregation. The datais collected
by a client application running on a PC in the classroom. Pupils in a classroom are
photographed using the computer's webcam at regular intervals. Anexternal API analyzes
these photos for emotional content before transmitting them. The results of the analysis
are sent to a central repository via a RESTful service. There are several HTTP queries
made by the central server to aggregate the data using RESTful Application Programming
Interfaces (APIs). Customers can transmit and receive data through APIs. REST APIs are
used by clients to retrieve data from the repository, whichis then viewed on their end. The
final outcome is a display that is relevant to the requester. The network 978-1-5386-2957-
4/18/$31.00 is a good match for the predicted heavy HTTP traffic in this project. Santa
Cruz de Tenerife, Spain, and the Canary Islands, 17- 20 April 2018. At the 2018 IEEE
Global Engineering Education Conference, this project's increasing traffic was addressed
as a major security issue (EDUCON). In order to prevent anyone from accessing or
tampering with the data, security measures have been put in place. Customer-Recording
App B The webcam recording client application can be installed and used by the ordinary
computer user in a matter of minutes. Instructors should not be concerned with the
system's complexity but should instead have a simple application that allows them to
record at the push of a single button. There is a GUI for visual feedback and ease of use
in addition to the webcam's low-level API, the web client's API call, and the graphical
user interface First, the app will guide you through the process. Using the webcam, take a
few pictures every now and then. You can then use an external API to identify the
emotions captured in the captured photographs. Take advantage of your emotions to get
outcomes. Send the
11
data to an emotion repository server afterward. It is necessary to repeat this procedure. So
that they do not interfere with the main GUI and prevent the webcam from taking any
more photographs, steps 2, 3, and 4 are run asynchronously. The RecordingsNoSQL
database-based servers are used to acquire the data. They are located on the Internet.
Servers can be classified into two categories: dedicated and cloud. Using HTTP, clients
are able to access the internal database through the server. Some examples of what it
does are as follows: To access the internal database, x provides a web-based interface for
CRUD tasks.

The implementation consists of three main parts. There are many ways to conduct
data mining, including data collecting, processing, and aggregation. The datais collected
by a client application running on a PC in the classroom. Pupils in a classroom are
photographed using the computer's webcam at regular that is relevant to the requester.
The network 978-1-5386-2957-4/18/$31.00 is a good match for the predicted heavy
HTTP traffic in this project. Santa Cruz de Tenerife, Spain, and the Canary Islands, 17-
20 April 2018. At the 2018 IEEE Global Engineering Education Conference, this project's
increasing traffic was addressed as a major security issue (EDUCON). In order to prevent
anyone from accessing or tampering with the data, security measures have been put in
place. Customer-Recording App B The webcam recording client application can be
installed and used by the ordinary computer user ina matter of minutes. Instructors should
not be concerned with the system's complexity but should instead have a simple
application that allows them to record at the push of a single button. There is a GUI for
visual feedback and ease of use in addition to the webcam's low-level API, the web client's
API call, and the graphical user interface First, the app will guide you through the process.
Using the webcam, take a few pictures everynow and then. You can then use an external
API to identify the emotions captured in the captured photographs. Take advantage of
your emotions to get outcomes. Send the data to an emotion repository server afterward.
It is necessary to repeat this procedure. So that they do not interfere with the main GUI
and prevent the webcam from taking any more photographs, steps 2, 3, and 4 are run
asynchronously. The RecordingsNoSQL database-based servers are used to acquire
the data. They are located on the
12
Internet. Servers can be classified into two categories: dedicated and cloud. Using HTTP,
clients are able to access the internal database through the server. Some examples of what
it does are as follows: To access the internal database, x provides a web-based interface
for CRUD tasks.

A RESTful API can be accessed by clients. x provides a safe and secure way for
customers to log in. Virtual machines can be used to run any operating system remotely
(OS). As many requests as possible can be handled by x at once. Using Gopher, a server
was created (goland). Secondly, the Database. In most cases, the data is captured with a
time stamp and location. Other than that, only the user's personal data can be utilized to
verify their identity. A NoSQL solution was chosen because of the non-relational nature
of emotions data and the need for schema less properties. There is an open-source
technology that supports statically typed communication between the database and Go,
and MongoDB was chosen as a result. Because it was going to be accessible through the
Internet, the server needed a physical address. As a result, Microsoft Azure was chosen as
the company's cloud solution. The client's visualization program. In order for the
professor to be able to examine the visualization program remotely, it was designed as an
internet-based app. The user interface was developed using HTML and CSS. The Vanilla
JavaScript framework uses asynchronous JavaScript and XML (Ajax) to connect with
the server. With AngularJS, data binding was made possible in D. As a whole, the
functionality of this application was built using a number of different technologies,
including C#, Go, and JavaScript, as well as HTML, a NoSQL database called MongoDB,
and Microsoft Azure and Microsoft Cognitive Services.

Issues and Prospects for Engineering and Education in a Smart Classroom with
Emotionally-Aware AI "IEEE Access published the article in 2018 (p. 5308-5331).

In the future smart classrooms we envision, real-time sensing and machine


intelligence can significantly improve the learning experience for students and teachers.
Because of the current state of the art, engineering advances can be utilised as components
of a smart classroom. A smart classroom system with these components is

13
what this study suggests. Our suggested technology can help an in-class speaker improve
their presentation quality and memorability by allowing the presenter to make real-time
adjustments/corrections to their non-verbal behaviour, such as hand gestures, facial
expressions, and body language. Our proposed approach includes emotion detection, deep
learning-based recognition, and mobile cloud computing. These technologies and the
computing requirements of a system that includes them are examined in this study in great
detail. Based on these requirements, we undertake a system feasibility analysis. Most of
our system's components can be built using themost up-to-date research.

Integrating these technologies into an overall system architecture, modifyingthe


algorithms for real-time execution, and defining acceptable educational factors for use in
algorithms are the major obstacles. It is necessary to solve current issues in engineering
and education in order to adopt the suggested approach. Metacognition and emotion
recognition are both covered in the Index Terms. a few words to introduce yourself by
the year 2024, haptic gloves will be commonplace in the smart classroom, and students
and teachers alike can practice presentations in front of their peers. Our current training
and practice environments are similar to this setup. Their presentation was "live," and
they received immediate feedback on their nonverbal behaviour, including body language
and voice intonation, via haptic gloves and a feedback visualization dashboard in real-
time. This allowed them to improve their effectiveness and emotional intelligence as a
teacher immediately afterward. An advanced system called a GPU cluster, which can
perform complex tasks in milliseconds, processes the multimodal audio/visual data of the
presenters (e.g., through cameras and microphones) to determine the presenter's
behavioural state in "presentation mode," which uses a high-speed network to transmit
the data to the cloud. It's no longer science fiction, and our vision in this work is based on
current issues and research directions that are based on the most up-to-date behavioural
recognition research. deep in 2018, the IEEE. Translation and content mining are the sole
options available to academic researchers when conducting research. All non-personal
use must be approved by IEEE. Search for "publications/rights" on the IEEE website if
you'd want to learn more. Volume 6 of the

14
Journal for the Calendar Year 2018 Emotional intelligence is an issue that engineers and
educators are currently trying to solve. Humans provide inputs 1 and 2, and the system
computes and sends back a response to one of the individuals. This is a smart classroom
system that was designed. It's not easy to properly quantify human behaviour since, when
it comes to the human-machine link, the system relies only on quantitative data. Our
"system" for smart classrooms is made up of students, computational architecture, and
educational philosophy. Any scenario in which humans speak with each other in order to
exchange knowledge or information can benefit greatly from the implementation of such
a system. Salespeople, medics, and security staff, as well as military stationed abroad,
will all benefit from this training. Voice intonation and body movements, as well as other
nonverbal communications including eye contact and facial expressions, play a
significant part in human communication. Understudied today, but theoretical
underpinnings can be quantified and integrated into machine- based education. Machine
intelligence-driven systems allow students to receive critical feedback during practice
presentations in front of the "machine," while avoiding presenting anxiety or shame due
to a poor presentation.

One of the two human sources of input, which may be biased, can be sent back
to the other source using the new system design paradigm. This design introduces four
important research questions: It is necessary for both inputs to the machine learning
algorithm to be established on a strictly quantitative basis in order to be used in machine
learning algorithms. Because of this, the system must have "input" boxes "a machine
intelligence platform (Box III) that can learn the relationship between these quantified
inputs; while existing research investigates Multimodal Learning Analytics (MLA), in
which the acquired multi-modal presentation data is used to create a single, quantitative
value; and It is necessary to build a new algorithm set for this purpose This study makes
the following important contributions: Integration of multimodal sensing and emotion
recognition; quantification of important human variables in a smart classroom such as
crowd scores and behavioural cues; demonstration and verification of our proposed
system design with a template smart classroom at SUNY Albany; and a bridge between
engineering and education. The sections of this paper that follow are listed below.
15
Section II outlines our proposed system design. In the sections that follow, we'll look
at the current state of technology needed to put this system into action: Section III of this
work examines methods for measuring human-based measures, such as crowd scores. As
in Section III, Section IV examines nonverbal human communication metrics such as
facial expressions and body language in general and voice-related metrics. For high-
intensity computations, Section V concentrates on real-time algorithms. Section VI
establishes a detailed algorithmic/computing infrastructure for our proposed system, and
Section VII provides a feasibility study. The problems that have still to be solved are
described in great depth. In Volume 6 of 5309, Emotional intelligence is an issue that
engineers and educators are currently trying to solve. Asmart classroom system that
includes code to extract audio, visual, and cognitive load vectors from a presentation, a
crowd of peers and experts, and deep learning algorithms that learn "best practices" in
training mode and estimate crowd scores during presentation mode (as well as a machine
learning engine). Final remarks are made in Section IX.

We propose two ideas in order to design the system. For Box III, machine
intelligence can learn how presenter behaviour affects presentation quality in Training
Mode and then convey this information to the presenter (Box IV) in Presentation Mode
without distracting them in real-time. Box I and II, which include presenter and listener
input, can be quantified despite their subjective natures. we plan to test both assumptions
by streaming raw audio and video data from a presenter to the cloud during a presentation.
Analysis Engine (Box I) translates this raw data into processed audio and visual feature
vectors to measure the behavioural signs of the presenter, such as vocal emotion, face
movement and body gesture. Pupil dilation or other facial expressions can also be used to
correlate the cognitive strain of presenters ([C] vector). To make quantifiable
measurements, the Crowd Annotation Engine (Box II) uses votes from the crowd (experts
or peer listeners) (such as the proposed Crowd Score Vector [S]). When using the Deep
Learning Engine (Box III) and the [A] and [V] vectors, open-handed motions result in the
highest crowd ratings (quantified by the [S] vector). The Input Engine provides real-
time feedback on 5310's performance (Box IV). In

16
2018, the sixth volume was released. Y. Kim and others: Challenges and Opportunities
for Engineering, Science, and Education in the Emotionally-Aware AI SmartClassroom
An audio/video data acquisition component, a pre-processing component, and a
massively parallel computing component are all necessary components for the suggested
techniques. For presentations with visual feedback, the presenter's cognitive burden is
taken into account when constructing the presentation (quantified by the [C] vector). They
can use these methods to alter their body language, voice intonation and hand gestures to
improve their presentation. Many design issues must be overcome as well.

A system for quantifying the presenter's multimodal cues (Box I) as well as the
listeners' subjective input (Box II), must first be devised (the [S] Vector in Box II). Using
educational psychometric investigations, it is necessary to investigate how to compute
[S]. Using well-established crowd-sensing methods, a valid [S] vector may be generated
by excluding outliers and findings that may be erroneous or biased. Box III also requires
deep neural networks (Box III) capable of learning the complex non-linear relationship
between (A), [V], and (S) vectors in Training Mode and simulating a crowd in
Presentation Mode by providing an estimated score vector for the audience. On the fly
adjustment of a parametric feedback engine (Box IV) is necessitated by an investigation
of optimal visual and haptic feedback alternatives for distinct presenters.

17
CHAPTER 3

REQUIREMENTS & DOMAIN INFORMATION

3.1 REQUIREMENT SPECIFICATION

3.1.1 Hardware Requirements

Enthought Python / Canopy / VS Code users will have different hardware


requirements depending on the software they are developing. Storage of big arrays and
objects in memory will demand more RAM, whereas faster processors are needed for
applications that need to do several calculations or operations at once.

• Operating system: Windows, Linux

• Processor: minimum intel i3

• RAM: minimum 4GB

• Hard Disk: minimum 250 GB

3.1.2 Software Requirements

It is possible to get an overall picture of the project's strengths and weaknesses by


analysing the requirements and implementation restrictions.

 Python ide 3.7 version (or)

 Anaconda 3.7 (or)

 Jupiter (or)

 Google Colaboratory (Colab)

3.2 DOMAIN INFORMATION

DEEP LEARNING: Deep learning (also known as deep structured learning) is part
of a broader family of machine learning methods based on artificial neural networks with
representation learning. Learning can be supervised, semi-supervised or
18
unsupervised. Deep-learning architectures such as deep neural networks, deep belief
networks, graph neural networks, recurrent neural networks and convolutional neural
networks have been applied to fields including computer vision, speech recognition,
natural language processing, machine translation, bioinformatics, drug design, medical
image analysis, material inspection and board game programs, where they have produced
results comparable to and in some cases surpassing human expert performance.

Artificial neural networks (ANNs) were inspired by information processing and


distributed communication nodes in biological systems. ANNs have various differences
from biological brains. Specifically, neural networks tend to be static and symbolic, while
the biological brain of most living organisms is dynamic and analogue.

The adjective "deep" in deep learning refers to the use of multiple layers in the
network. Early work showed that a linear perceptron cannot be a universal classifier, but
that a network with a nonpolynomial activation function with one hidden layer of
unbounded width can. Deep learning is a modern variation which is concerned with an
unbounded number of layers of bounded size, which permits practical application and
optimized implementation, while retaining theoretical universality under mild conditions.

Deep learning is based on the branch of machine learning, which is a subset of


artificial intelligence. Since neural networks imitate the human brain and so deep learning
will do. In deep learning, nothing is programmed explicitly. Basically, it is a machine
learning class that makes use of numerous nonlinear processing units so as to perform
feature extraction as well as transformation. The output from each preceding layer is taken
as input by each one of the successive layers.

Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful in
solving out the problem of dimensionality. Deep learning algorithms are used, especially
when we have a huge no of inputs and outputs.

19
Since deep learning has been evolved by the machine learning, which itself is a
subset of artificial intelligence and as the idea behind the artificial intelligence is to mimic
the human behaviour, so same is "the idea of deep learning to build suchalgorithm that
can mimic the brain".

Deep learning is implemented with the help of Neural Networks, and the idea
behind the motivation of Neural Network is the biological neurons, which is nothing
but a brain cell.

Figure 3.2.1: Example of Deep learning

In the example given above, we provide the raw data of images to the first layer
of the input layer. After then, these input layer will determine the patterns of local contrast
that means it will differentiate on the basis of colors, luminosity, etc. Then the 1st hidden
layer will determine the face feature, i.e., it will fixate on eyes, nose, and lips, etc. And
then, it will fixate those face features on the correct face template. So, in the 2nd hidden
layer, it will actually determine the correct face here as it can be seen in the above image,
after which it will be sent to the output layer. Likewise, more hidden layers can be added
to solve more complex problems, for example, if you want to find out a particular kind of
face having large or light complexions. So, as and when the hidden layers increase, we
are able to solve complex problems.

20
ARCHITECTURES

DEEP NEURAL NETWORKS

It is a neural network that incorporates the complexity of a certain level, which


means several numbers of hidden layers are encompassed in between the input and output
layers. They are highly proficient on model and process non-linear associations.

DEEP BELIEF NETWORKS

A deep belief network is a class of Deep Neural Network that comprises of multi-
layer belief networks.

Steps to perform DBN:

With the help of the Contrastive Divergence algorithm, a layer of features is


learned from perceptible units.

Next, the formerly trained features are treated as visible units, which perform
learning of features.

Lastly, when the learning of the final hidden layer is accomplished, then the
whole DBN is trained.

TYPES OF DEEP LEARNING NETWORKS

1.FEED FORWARD NEURAL NETWORK

A feed-forward neural network is none other than an Artificial Neural Network,


which ensures that the nodes do not form a cycle. In this kind of neural network, all the
perceptrons are organized within layers, such that the input layer takes the input, and
the output layer generates the output. Since the hidden layers do not link with the outside
world, it is named as hidden layers. Each of the perceptrons contained in one single layer
is associated with each node in the subsequent layer. It can be concludedthat all of the
nodes are fully connected. It does not contain any visible or invisible connection between
the nodes in the same layer. There are no back-loops in the feed-

21
forward network. To minimize the prediction error, the backpropagation algorithm can
be used to update the weight values.

Applications: Data compression, Pattern Recognition, Computer Vision, Speech


Recognition, Hand written characters Recognition.

2. RECURRENT NEURAL NETWORK

Recurrent neural networks are yet another variation of feed-forward networks.


Here each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one must
have knowledge about the words that were previously used. It not only processes the
inputs but also shares the length as well as weights crossways time. It does not letthe
size of the model to increase with the increase in the input size. However, the only
problem with this recurrent neural network is that it has slow computational speed as well
as it does not contemplate any future input for the current state. It has a problem with
reminiscing prior information.

Applications: Machine Translation, Robot Control, Time Series Prediction,


Speech Recognition, Music Composition.

3. CONVOLUTIONAL NEURAL NETWORKS

Convolutional Neural Networks are a special kind of neural network mainlyused


for image classification, clustering of images and object recognition. DNNs enable
unsupervised construction of hierarchical image representations. To achieve the best
accuracy, deep convolutional neural networks are preferred more than any other neural
network.

Applications: Identify Faces, Image Recognition, Video Analysis, Drug


Discovery

22
4. RESTRICTED BOLTZMANN MACHINE

RBMs are yet another variant of Boltzmann Machines. Here the neurons present
in the input layer and the hidden layer encompasses symmetric connections amid them.
However, there is no internal association within the respective layer. But in contrast to
RBM, Boltzmann machines do encompass internal connections inside the hidden layer.
These restrictions in BMs helps the model to train efficiently.

Applications: Filtering, Feature Learning, Classification, Risk Detection, Business


and Economic analysis.

5. AUTOENCODERS

An autoencoder neural network is another kind of unsupervised machine learning


algorithm. Here the number of hidden cells is merely small than that of the input cells.
But the number of input cells is equivalent to the number of output cells. An autoencoder
network is trained to display the output similar to the fed input to force AEs to find
common patterns and generalize the data. The autoencoders are mainly used for the
smaller representation of the input. It helps in the reconstruction of the original data from
compressed data. This algorithm is comparatively simple as it only necessitates the output
identical to the input.

Applications: Classification, Clustering, Feature Compression.

DEEP LEARNING APPLICATIONS

Self-Driving cars:

In self-driven cars, it is able to capture the images around it by processing a huge


amount of data, and then it will decide which actions should be incorporated to take a
left or right or should it stop. So, accordingly, it will decide what actions it should take,
which will further reduce the accidents that happen every year.

23
Voice Controlled Assistance:

When we talk about voice control assistance, then Siri is the one thing that comes
into our mind. So, you can tell Siri whatever you want it to do it for you, and it will search
it for you and display it for you.

Automatic Image Caption Generation:

Whatever image that you upload, the algorithm will work in such a way that it
will generate caption accordingly. If you say blue coloured eye, it will display a blue-
coloured eye with a caption at the bottom of the image.

Automatic Machine Translation:

With the help of automatic machine translation, we are able to convert one
language into another with the help of deep learning.

24
CHAPTER 4
SYSTEM METHODOLOGY

4.1. ARCHITECTURE OF PROPOSED SYSTEM

Figure 4.1.1: Architecture

Figure 4.1.2: System Design

25
User can access the system through android and web application also. Then the
user will be detected, and after the face is captured, it will be pre-processed and the
features extracted will be stored into the image database. These features obtained will
then be sent to the trained neural network, which will predict features and use them to
detect the emotion and obtain the results. Based on these results, the system will provide
relevant recommendations to the user. The user will then find some recommended tasks
or some videos on his screen, as per the resultant mood, in order to improve their mood.

The general system design comprises of the following modules:

a) Input Image

b) Training using Convolutional Network

c) Analysis and Classification

d) Recommendation kernel

e) Result

First of all, when the user enters into the application, it detects the user face. This
image is then divided into different sections of the face, such as forehead, eyebrows, lower
eye, right check and left check. After all the pre-processing is done, then with
Convolutional neural network it trains the given dataset and with every epoch accuracy
increases. Then with the user’s image it detects emotion and give accordingly suggest
tasks to change mood of sad, depressed person.

4.2 ALGORITHM

CONVOLUTIONAL NEURAL NETWORKS

A Convolutional Neural Network (CNN) is a Deep Learning algorithm which can


take in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other. The pre-
processing required in a CNN is much lower as compared to other classification

26
algorithms. While in primitive methods filters are hand-engineered, with enough training,
CNN have the ability to learn these filters/characteristics.

Convolutional Neural Network is one of the main categories to do image


classification and image recognition in neural networks. Scene labelling, objects
detections, and face recognition, etc., are some of the areas where convolutional neural
networks are widely used.

CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an array of pixels
and depends on the resolution of the image. Based on image resolution, it willsee as h *
w * d, where h= height w= width and d= dimension. For example, An RGB image is 6 *
6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 1 array of the matrix.

Figure 4.2.1: Example of CNN

Convolutional Neural Networks a special type of neural network that roughly


imitates humanvision. Over the years CNNs have become a very important part of many
Computer Vision applicationsand hence a part of any computer vision course online.

27
In CNN, each input image will pass through a sequence of convolution layers
along with pooling,fully connected layers, filters (Also known as kernels). After that, we
will apply the Soft-max function to classify an object with probabilistic values 0 and 1.

Figure 4.2.2: CNN Architecture

A CNN typically has three layers:


 A convolutional layer,

 A pooling layer, and

 A fully connected layer

CONVOLUTIONAL LAYER

Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which takes two
inputs such as image matrix and a kernel or filter.

 The dimension of the image matrix is h×w×d.

 The dimension of the filter is fh×fw×d.

 The dimension of the output is (h-fh+1)×(w-fw+1)×1.

28
Figure 4.2.3: Kernel Process

Let's start with consideration a 5*5 image whose pixel values are 0, 1, and filter matrix
3*3as:

The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is
called "Features Map"and show as an output.

29
Convolution of an image with different filters can perform an operation such as blur,
sharpen, and edgedetection by applying filters.

Strides:

Stride is the number of pixels which are shift over the input matrix. When the
stride is equal to1, then we move the filters to 1 pixel at a time and similarly, if the stride
is equal to 2, then we move thefilters to 2 pixels at a time. The following figure shows
that the convolution would work with a stride of 2.

Figure 4.2.4: Strides Process


Padding:

Padding plays a crucial role in building the convolutional neural network.


If the image will getshrink and if we will take a neural network with 100's of layers
on it, it will give us a small image afterfiltered in the end.

Figure 4.2.5: Padding Process


30
It is clear from the above picture that the pixel in the corner will only
get covers one time, but the middle pixel will get covered more than once. It
means that we have more information on thatmiddle pixel, so there are two
downsides:

 Shrinking outputs

 Losing information on the corner of the image.


POOLING LAYER

Pooling layer plays an important role in pre-processing of an image. Pooling layer


reduces the number of parameters when the images are too large. Pooling is
"downscaling" of the image obtained from the previous layers. It can be compared to
shrinking an image to reduce its pixel density. Spatial pooling is also called down
sampling or subsampling, which reduces the dimensionality of each map but retains the
important information.

There are the following types of spatial pooling:

Max pooling:

Max pooling is a sample-based discretization process. Its main objective is to


downscale an input representation, reducing its dimensionality and allowing for the
assumption to be made about features contained in the sub-region binned.

Max pooling is done by applying a max filter to non-overlapping sub-regions


of the initial representation.

Average pooling:

Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.

Syntax: layer = averagePooling2dLayer (pool-Size)

layer = averagePooling2dLayer (pool-Size, Name, Value)

31
Sum Pooling:

The sub-region for sum pooling or mean pooling are set exactly the same as for
max- pooling but instead of using the max function we use sum or mean.

FULLY CONNECTED LAYER

The fully connected layer is a layer in which the input from the other layers will
be flattened into a vector and sent. It will transform the output into the desired number
of classes by the network.

Figure 4.2.6: Fully Connected layer

In the above diagram, the feature map matrix will be converted into the vector
such as x1, x2,x3... xn with the help of fully connected layers. We will combine features
to create a model and applythe activation function such as soft max or sigmoid to classify
the outputs as a car, dog, truck, etc.

32
4.3 SYSTEM DESIGN

4.3.1 Data Flow Diagrams:

Figure 4.3.1: Flow Chart

33
Facial recognition requires several phases: detection of face images, pre-
processing of face images,retrieval of facial features, alignment of face images,
and identification of face images. There are primarilytwo types of extraction of
features: one is geometric attribute extraction, and the other is a procedure which
focused on total statistical characteristics. To describe the location of facialorgans
as the features of the classification, the geometrical feature-based approach is
widely used.

4.3.2 Class Diagram

This sort of static structural diagram in software engineering is called a class


diagram in the Unified Modelling Language (UML), and it shows the classes, attributes,
operations (or methods), and interactions between those classes. It explains what
information belongs to which class.

Figure 4.3.2.: Class Diagram

34
4.3.3 UML Diagrams

Use Case Diagram

Diagrams in the Unified Modelling Language (UML) based on use cases are
known as use cases in the UML. Graphically depicting the system's actors, their goals
(expressed as use cases), and any interdependencies between those use cases is its primary
goal. Using a use case diagram, you can show which actors are responsible for which
system functions. They can be depicted in the system.

Figure 4.3.3: User Module

Figure 4.3.4: Software Module

35
Sequence Diagram

Diagrams that show how processes interact with one another and in what order
are known as sequence diagrams in the Unified Modelling Language (UML). A Message
Sequence Chart is the basisfor this diagram. Sequence diagrams are also known as event
diagrams, scenario diagrams, and timingdiagrams.

Figure 4.3.5: Sequence Diagram

36
CHAPTER 5

EXPERIMENTATION & ANALYSIS

5.1 EXPERIMENTATION

The human face is captured by using PC’s web cam or external webcam. From
that live steam the face is extracted and all other unwanted components are not
considered. To achieve this efficiency and comprehensiveness we have picked theCNN,
so as to identify and remove the countenances. For this we have utilized the Open CV
library (to be specific classifier).

Pre-processing:

It is a common name for operations with images at the lowest level of abstraction
for both input and output are intensity images. The aim of pre-processing is an
improvement of the image data that suppresses unwanted distortion or enhances some
image features important for further processing.

Region Splitting:

For the emotion recognition the main region of face under consideration are
eyebrows and mouth. And the splitting of mouth and the eyebrows is named as region
splitting.

Emotion Classification:

After the sub task of feature extraction is completed the reaction of the person
is produced simultaneously with their percentage level.

 Happy

 Angry

 Surprise

 Sad

37
 Disgust

 Fear

 Neutral

Happy:

Normally includes a grin both corner of the mouth rising, the eyes are squinting
and wrinkles show up at eyes corners. The underlying practical job of the grin, which
speaks to bliss, remains a riddle. A few scientists trust that grin was at first an indication
of dread. Monkeys and primates gripped teeth to demonstrate predators that they are
innocuous. A grin urges the mind to discharge endorphins that help reducing torment
and take after a sentiment of prosperity. Those positive sentiment that one grin can create
can help managing the dread. A grin can likewise create positive affections for somebody
who is observer to the grin, and may even inspire him to grin as well.

Figure 5.1.1: Example of Happy Face

38
Angry:

Includes three fundamental highlights teeth uncovering, eyebrows down and


internal side fixing, squinting eyes. The capacity is clear-getting ready for assault. The
teeth are prepared to nibble and undermine foes, eyes and eyebrows squinting to ensure
the eyes, however not shutting altogether so as to see the adversary. The eyebrows are
brought down and drawn together without the gazing eyes for example without raising
the upper eyelids, at that point this articulation does not demonstrate outrage. Raised
upper eyelids are an unquestionable requirement for the annoyance articulation since
when we are furious we gaze at our wellspring of indignation strongly to undermine it.
Also, extreme gazing is beyond the realm of imagination without raising the upper
eyelids.

Figure 5.1.2: Example of Angry Face

39
Surprise:

Includes enlarged eyes and now and then open mouth. The capacity opening the
eyes so wide is assume to help expanding the visual field to thinks about demonstrate that
it doesn't really do as such and the quick eye development, which can help discovering
dangers. Opening the mouth empowers to breathe discreetly and by thatnot being
uncovered by the foe. Eyebrow is raised joined creating wrinkles on the brow; eyes have
been opened to the greatest, with upper eyelids raised as high as could reasonably be
expected; lips are extended on a level plane towards the ears; jaw has been pulled
marginally in reverse as is obvious by the flat wrinkles on the neck.

Figure 5.1.3: Example of Surprise Face

40
Sad:

Includes a slight pulling down of lip corners, internal side of eyebrows is rising.
Darwin clarified this articulation by smothering the will to cry. The command over the
upper lip is more noteworthy than the authority over the lower lip, thus the lower lip
drops. At the point when an individual shouts amid a cry, the eyes are shut so as to shield
them from circulatory strain that collects in the face. In this way, when we have the
inclination to cry and we need to stop it, the eyebrows are ascending to keep the eyes
from shutting.

Figure 5.1.4: Example of Sad Face

41
Disgust:
Includes wrinkled nose and mouth. Now and again even includes tongue
turning out. This articulation imitates an individual that tasted terrible nourishment
and needs to spit it out, or smelling foul smell. An obvious outrageous nauseate
articulation. Eyebrows are brought down shaping a 'V' over the nose and delivering
wrinkles on the brow; eyes are limited to shut out the wellspring of disturb
conceivable jawline is marginally pulled in reverse and a roundabout wrinkle show.

Figure 5.1.5: Example of Disgust face

42
Fear:
Includes enlarged eyes and now and then open mouth. The capacity opening the
eyes so wide is assume to help expanding the visual field to thinks about demonstrate
that it doesn't really do as suchand the quick eye development, which can help
discovering dangers. Opening the mouth empowers tobreathe discreetly and by that not
being uncovered by the foe. Eyebrow is raised joined creating wrinkleson the brow;eyes
have been opened to the greatest, with upper eyelids raised as high as could
reasonablybe expected; lips are extended on a level plane towards the ears; jaw has been
pulled marginally in reverse as is obvious by the flat wrinkles on the neck.

Figure 5.1.6: Example of Fear Face

43
Neutral:

It does not include in any of the reaction like happy, shock, sad, disgust,
angry etc. this expression is a simple one where the lips and eyes are in normal
position. Which indicates that the useris not showing any reaction. The default type
of emotion is neutral. Every reaction change starts from the neutral.

Figure 5.1.7: Example of Neutral Face

44
WORKING

Face Extraction Process:

OpenCV its commonly well-known library for facial extraction. OpenCV utilizes
AI calculations to scan for countenances inside an image. Since countenances are so
confused, there isn't one basic test that would be understand it found a face or not. Rather,
there are a great many little examples and highlights that must be coordinated. The
calculations break the assignment of recognizing the face into a great many littler, nibble
estimated undertakings, every one of which is anything but difficult to tackle. These
undertakings are likewise called classifiers.

For something like a face, you may have at least 7,000 classifiers, all of which
must counterpart for a face to be identified inside blunder limits, obviously. Be that as it
may, in that lies the issue: for face recognition, the calculation begins at the upper left of
an image and moves down crosswise over little squares of information, taking a gander
at each square. Like a progression of cascades, the OpenCV course breaks the issue of
distinguishing faces into various stages. For each square, it completes an extremely harsh
and speedy test. On the off chance that that passes, it completes a somewhat progressively
itemized test, etc.

The calculation might had 20 to 40 this stages of falls, and it will possibly
recognize the facial should clear. The preferred standpoint is that most of the image will
restore a negative amid the initial couple of stages, which implies the calculation won't
sit around idly testing every one of the 6,000 highlights on it. Rather than taking hours,
face identification should now be possible progressively. Since face recognition is such
a typical case, OpenCV accompanies various inherent falls for recognizing everything
from countenances to eyes to hands to legs.

Understanding the human outward appearances and the investigation of


articulations has numerous perspectives, from PC examination, feelingacknowledgment,
lie indicators, airplane terminal security, nonverbal correspondence and even the job of
articulations in craftsmanship. Improving the abilities of perusing

45
articulations is an essential advance towards fruitful relations. Articulations and feelings
go inseparably, for example exceptional blends of face strong activities mirror a specific
feeling.

For specific feelings, it is exceptionally hard, and perhaps inconceivable, to stay


away from it's fitting outward appearance. For instance, an individual who is endeavoring
to disregard his supervisor's irritating hostile remark by keeping an unbiased articulation
may in any case demonstrate. This marvel of a short, automatic outward appearance
appeared on the essence of people as per feelings experienced is called 'micro expression'.
The direct opposite marvel alludes to the manner in which that some muscle
developments speak to a feeling, and the contrary muscle developments speak to the
contrary feeling. A great clarification for the outward appearance speaks to 'weakness'
should be possible utilizing direct opposite. Weakness body motion includes hands
spreading to the sides, fingers spreading and shoulders shrugging. Its outward
appearances include pulling down the base lip and raising eyebrows. Darwin clarified the
highlights of this articulation utilizing the direct opposite standard. He found that those
developments restricting to the developments of a man who is prepared to confront
something.

The developments of an individual who is setting himself up for something will


resemble that: shut hands and fingers (as though he is getting ready for a battle, for
instance), hands near the body for insurance and the neck is raised and tight. At a
weakness circumstance the shrugging of the shoulders discharges the neck. Concerning
the face: eyebrows are low (like in a method of assault or solidness), upper lip may
uncover teeth. The practical wellspring of the absolute opposite can be clarified with the
examination of muscles, and to be exact the rival's muscles. Each muscle has an adversary
muscle that plays out the contrary development. Spreading fingers is a development done
by a few muscles, and shutting the fingers is finished by the foe muscles.

For a few articulations we can't generally tell just by taking a gander at it, what
is the contrary articulation, so the choice that took the gander the muscles including all

46
the while, at that point it turns out to be exceptionally clear. A fascinating clarification
to the direct opposite practical source depends on hindrance. An individual or a creature
is attempting to anticipate completing a specific activity, one path is to utilize the hostile
muscles. Indeed, when an improvements flag is sent to a muscle, an inhibitory flag is send
naturally to the rival muscle. Outward appearances that can be clarified with direct
opposite all the parts of identity with hostility and maintaining a strategic distance.

CNN Process:

We have to demonstrate a calculation on a huge number of pictures before it is


have the capacity to sum up the information and make expectations for pictures it has
never observed. PCs 'see' uniquely in contrast to we do. Their reality comprises of just
numbers. Each picture can be spoken to as 2- dimensional varieties of numbers, known
as pixels. Any of the case have thing that they see pictures in an unexpected way, doesn't
mean we can't prepare them to perceive designs, as we do. Its need to be consider what
a picture is in an unexpected way.

Figure 5.1.8: Image Identification & Classification

47
Fig 5.1.8 shows the calculation how to perceive protests in pictures, we utilize a particular
kind of Artificial Neural Network, a Convolutional Neural Network (CNN). The name
comes from it’s the most important tasks in the system called convolution. The
straightforward cells enact, for instance, when they recognize fundamental fixedby
lines as the shapes territory and a particular point. The unpredictable cells have bigger
open fields and their yield isn't touchy to the particular position in the field. The complex
cells keep on reacting to a specific improvement, despite the fact that its total direction of
the eyes will change. Complex alludes to increasingly adaptable, for this situation. In
vision, a responsive field of a solitary tangible area of the retina in which something will
influence the terminating of that neuron (that is, will dynamic theneuron). Each tangible
neuron cell has comparable open fields, and their fields are overlying.

5.2 RESULTS

Screenshots:

Figure 5.2.1: Screenshot of Execution

48
Figure 5.2.2: Expression of Angry

Figure 5.2.3: Expression of Fearful


49
Figure 5.2.4: Expression of Surprised

SAMPLE SOURCE CODE:

50
fromkeras.models import Sequential
fromkeras.layers import Dense
import time

51
52
53
54
55
56
5.3 TESTING

Errors are discovered during testing. The goal of testing is to find any and all flaws
in a product or service. Components and subassemblies, as well as finished products, can
be tested to ensure their functionality. If you want to make sure that the software system
doesn't break in an unacceptable way and meets all of its needs and expectations, it is the
process of exercising software. There are a wide variety of exams to be taken. Each test
type is designed to meet a certain need.

5.3.1 Types of Testing

Unit Testing:

When a program's logic is tested, it may be assured that the program's inputs
and outputs are legitimate. It is essential to test each step of the decision-making process
as well as the internal code flow. Piecemeal testing is done on the application. After each
unit is completed, it is integrated into the whole. In order to do this invasive

57
structural test, you must be familiar with the design of the system beforehand. In order
to evaluate a specific business process, application, or system configuration at the
component level, this type of test is utilised Unit tests are one method of verifying that
a business process adheres to its documented criteria.

Unit testing is a technique in which particular module is tested to check by


developer himself whether there are any errors. The primary focus of unit testing is test
an individual unit of system to analyze, detect, and fix the errors.

In order to ensure compatibility:

An integration test's job is to make sure that all of the software components that
have been combined work together as a single unit. The importance of screen or field
outcomes has waned in testing. However, even if each component has passed unit testing,
it is still correct and consistent to put them together. Interfacing two or more pieces of
software together is the primary goal of integration testing.

A test of functionality:

To guarantee that business and technical requirements, system documentation,


and user guides are met, functional tests give systematic demonstrations that the evaluated
functionalities are available in the required form.

Functional testing is centered on the following items:

Valid Input: identified classes of valid input must be accepted.

Invalid Input: identified classes of invalid input must be rejected.

Functions: identified functions must be exercised.

Output: identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Functional tests are organized and prepared in accordance with the requirements,
important functionalities, or specific test cases that they are designed to evaluate. The
system must also be tested for its capacity to cover all of the required data
58
fields, pre-programmed processes, and subsequent operations. Before functional testing
is finished, new tests are discovered and the value of the ones that already exist is
assessed.

It is only via system testing that you can be certain that your integrated software
system is up to par with the required specifications. A collection of tests is run on a
particular setup to ensure that the results are in line with what was predicted. A system
testing method is the configuration-oriented system integration test. Pre-driven process
links and integration points are the primary focus of system testing.

Using the White Box for Research and Development:

It's a type of software testing in which the tester has some prior knowledge of the
software or the purpose of the product. It serves a purpose and has a cause to exist. It is
required to have an additional tool for testing in locations that cannot be accessed from
the level of the black box.

In the Dark Testing:

You can't do Black Box Testing if you don't know anything about the module
you're trying to test. In order to write a black box test, a definitive source document, such
as a specification or requirements document, must be in place first. During this testing
method, the programme under scrutiny is treated as if it were a black box. One cannot
"see" into it. Inputs and outputs are only tested; software is not taken into account at this
time.

Unit testing is a component of test-driven development (TDD), a pragmatic


methodology that takes a meticulous approach to building a product by means ofcontinual
testing and revision. This testing method is also the first level of software testing, which
is performed before other testing methods such as integration testing. Unit tests are
typically isolated to ensure a unit does not rely on any external code or functions. Testing
can be done manually but is often automated.

59
A unit test typically comprises of three stages: plan, cases and scripting and the
unit test itself. In the first step, the unit test is prepared and reviewed. The next step is
for the test cases and scripts to be made, then the code is tested.

Test-driven development requires that developers first write failing unit tests.
Then they write code and refactor the application until the test passes. TDD typically
results in an explicit and predictable code base.

Each test case is tested independently in an isolated environment, as to ensure a


lack of dependencies in the code. The software developer should code criteria to verify
each test case, and a testing framework can be used to report any failed tests. Developers
should not make a test for every line of code, as this may take up too much time.
Developers should then create tests focusing on code which could affect the behaviour
of the software being developed.

Unit testing involves only those characteristics that are vital to the performance
of the unit under test. This encourages developers to modify the source code without
immediate concerns about how such changes might affect the functioning of other units
or the program as a whole. Once all of the units in a program have been found to be
working in the most efficient and error-free manner possible, larger components of the
program can be evaluated by means of integration testing. Unit tests should be performed
frequently, and can be done manually or can be automated. Unit tests can be performed
manually or automated. Those employing a manual method may have an instinctual
document made detailing each step in the process; however, automatedtesting is the more
common method to unit tests. Automated approaches commonly usea testing framework
to develop test cases. These frameworks are also set to flag and report any failed test cases
while also providing a summary of test cases.

As part of the software development lifecycle, unit testing is often performed in


conjunction with coding and unit testing. Alternatively, unit testing can be performed
as a separate process.

It's a good idea to test your approach and strategy:

60
Both manual field testing and detailed functional testing are in the works.

Indicators of success:

There must be no errors in any of the fields.

• Pages can only be accessed by clicking on the link.

This includes the entry screen, messages, and responses.

Aspects that will be put to the test It's important to make sure that all of the data is
entered correctly.

• There should be no room for duplicate entries.

It is important that all links direct users to the relevant page.

Integrity verification:

There were no failures in any of the tests listed above. There were no issues.
Acknowledgement Tests:

Assumption of Use Any project's testing phase necessitates substantial input


from the project's end users. As a result, it is certain that the system is up to spec.

All of the test cases listed above passed with flying colours. No issues were
found.

61
5.3.2 Test Cases

S. NO Test Input Expecte- Actu Test case


cas doutput al pass/fail
eId outp
ut

1 T1 Happy Happy Pass

2 T2 Angry Angry pass

3 T3 Surprise Surprise Pass

4 T4 Sad Sad Pass

62
5 T5 Disgust Disgust Pass

6 T6 Fear Fear Pass

7 T7 Neutral Neutral Pass

Figure 5.3.2: Test Cases

63
CHAPTER 6

CONCLUSION AND FUTURE SCOPE

6.1 CONCLUSION

A Convolutional Neural Network model for students' facial expression


recognition was given in this paper. Four convolutional layers, four maximum pooling
layers, and two fully connected layers make up the model under consideration. The
system uses a Haar-like detector to classify students' photos into seven facial expressions:
surprise, fear, disgust, sadness, happiness, and anger. According to the FER 2013
database, the suggested model was 70% accurate. Recognizing student understanding of
a presenter's message is made easier with our facial expression detection system.

In this project, the expressions of the faces are effectively identified by processing
the dataset that consists of various facial expression which is then coded in python or
classification. Our proposed architecture is recognizing the emotion of human face
dynamically. Here, the main parameter consider is the position of the eyes and the mouth.
The emotion is recognized according to the position change of eyes and mouth. Here in
addition, it displays the percentage of every reaction of a person dynamically and from
the data, each and every single data is processed in such a way that it takesthe portion
of the image and keeps on cropping the image and tries to get average or maximum
information out of it, which is termed as pooling.

We developed various CNNs for a facial expression recognition problem and


evaluated their performances using different post-processing and visualization
techniques. The results demonstrated that deep CNNs are capable of learning facial
characteristics and improving facial emotion detection. Also, the hybrid feature sets did
not help in improving the model accuracy, which means that the convolutional networks
can intrinsically learn the key facial features by using only raw pixel data.

64
6.2 FUTURE SCOPE

Facial emotion recognition is an emerging field so considering other NNs such


as Recurrent Neural Networks (RNNs) may improve the accuracy. The feature extraction
is similar to pattern recognition which is used in intelligence, military and forensics for
identification purposes. Thus, techniques such as the Caps net algorithm for pattern
recognition can be considered. DL based approaches require a large labelled dataset,
significant memory and long training and testing times which makes them difficult to
implement on mobile and other platforms with limited resources. Thus, simple solutions
should be developed with lower data and memory requirements.

65
REFERENCES
[1] R. G. Harper, A. N. Wiens, and J. D. Matarazzo, Nonverbal communication: the state
of the art. New York: Wiley, 1978.

[2] P. Ekman and W. V. Friesen, “Constants across cultures in the face and emotion,”
Journal of Personality and Social Psychology, vol. 17, no 2, p. 124-129, 1971.

[3] C. Tang, P. Xu, Z. Luo, G. Zhao, and T. Zou, “Automatic Facial ExpressionAnalysis
of Students in Teaching Environments,” in Biometric Recognition, vol. 9428,
J. Yang, J. Yang, Z. Sun, S. Shan, W. Zheng, et J. Feng, Éd. Cham: Springer International
Publishing, 2015, p. 439-447.

[4] A. Savva, V. Stylianou, K. Kyriacou, and F. Domenach, “Recognizing student facial


expressions: A web application,” in 2018 IEEE Global Engineering Education
Conference (EDUCON), Tenerife, 2018, p. 1459- 1462.

[5] J. Whitehill, Z. Serpell, Y.-C. Lin, A. Foster, and J. R. Movellan, “The Faces of
Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions,”
IEEE Transactions on Affective Computing, vol. 5, no 1, p. 86-98, janv. 2014.

[6] N. Bosch, S. D'Mello, R. Baker, J. Ocumpaugh, V. Shute, M. Ventura, L. Wang and


W. Zhao, “Automatic Detection of Learning-Centered Affective States in the Wild,” in
Proceedings of the 20th International Conference on Intelligent User Interfaces - IUI ’15,
Atlanta, Georgia, USA, 2015, p. 379-388.

[7] Krithika L.B and Lakshmi Priya GG, “Student Emotion Recognition System (SERS)
for e-learning Improvement Based on Learner Concentration Metric,” Procedia Computer
Science, vol. 85, p. 767-776, 2016.

[8] U. Ayvaz, H. Gürüler, and M. O. Devrim, “USE OF FACIAL EMOTION


RECOGNITION IN ELEARNING SYSTEMS,” Information Technologies and Learning
Tools, vol. 60, no 4, p. 95, sept. 2017.

66
[9] Y. Kim, T. Soyata, and R. F. Behnagh, “Towards Emotionally Aware AI Smart
Classroom: Current Issues and Directions for Engineering and Education,” IEEE Access,
vol. 6, p. 5308-5331, 2018.

[10] D. Yang, A. Alsadoon, P. W. C. Prasad, A. K. Singh, and A. Elchouemi, “An


Emotion Recognition Model Based on Facial Recognition in Virtual Learning
Environment,” Procedia Computer Science, vol. 125, p. 2-10, 2018.

[11] C.-K. Chiou and J. C. R. Tseng, “An intelligent classroom management system
based on wireless sensor networks,” in 2015 8th International Conference on Ubi- Media
Computing (UMEDIA), Colombo, Sri Lanka, 2015, p. 44-48.

[12] I. J. Goodfellow et al., “Challenges in Representation Learning: A report on three


machine learning contests,” arXiv:1307.0414 [cs, stat], juill. 2013.

[13] A. Fathallah, L. Abdi, and A. Douik, “Facial Expression Recognition via Deep
Learning,” in 2017 IEEE/ACS 14th International Conference on Computer Systems and
Applications (AICCSA), Hammamet, 2017, p. 745-750.

[14] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple
features,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, vol. 1, p. I-511-I-
518.

[15] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line


Learning and an Application to Boosting,” Journal of Computer and System Sciences,
vol. 55, no 1, p. 119-139, août 1997.

[16] Opencv.opencv.org.

[17] Keras.keras.io.

[18] Tensorflow.tensorflow.org.

[19] aionlinecourse.com/tutorial/machine-learning/convolution-neural-network.
Accessed 20 June 2019.

67
[20] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional
neural network,” in 2017 International Conference on Engineering and Technology
(ICET), Antalya, 2017, p. 1-6.

[21] ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/. Accessed 05 July


2019.

68

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy