0% found this document useful (0 votes)

10 views20 pages

ml2 Copy

The document outlines a project on Facial Emotion Recognition (FER) using a Vision Transformer (ViT) model to classify emotions from facial expressions in real-time. It details the methodology including data collection, preprocessing, model design, and evaluation metrics such as accuracy, precision, and recall, highlighting the challenges of class imbalance in the dataset. The model achieved an overall accuracy of 87.93%, demonstrating its effectiveness in recognizing emotions across various conditions.

Uploaded by

nguyen23062004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

ml2 Copy

Uploaded by

nguyen23062004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF HANOI

DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY

MACHINE LEARNING AND DATA MINING II

Facial Emotion Recognition

Group Members

Tran Bao Nguyen 22BI13342

Nguyen Thu Trang 22BI13426

Le Quang Minh 22BI13286

May, 2025
Table of Contents

Abstract 2

1 Introduction 3
1.1 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Expected Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Materials and Methods 5

2.1 Data collection and pre-processing . . . . . . . . . . . . . . . . . . . . . 6
2.2 Model Design and Implementation . . . . . . . . . . . . . . . . . . . . . 9
2.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Precision and Recall . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Top K Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.4 Live Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Result and Discussions 13

3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Challenges Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Model Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Conclusion and Future work 18

4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

References 19

1
Abstract

Facial Expression Recognition (FER) plays a significant role in enhancing human-computer

interaction, mental health monitoring, and emotion-aware systems. Recognizing facial
emotions accurately remains a challenging task due to variations in expressions, lighting
conditions, and occlusions. Traditional approaches often struggle with real-time per-
formance and generalization across diverse datasets. In this study, we propose a deep
learning-based approach utilizing a Vision Transformer (ViT) model for multi-class facial
emotion classification. The workflow involves comprehensive data preprocessing, model
training with fine-tuned hyperparameters, and performance evaluation using metrics such
as Accuracy, Precision, Recall, and Top-K Accuracy. Our implementation leverages the
power of transfer learning and transformer-based architectures to achieve robust perfor-
mance across multiple emotion categories. This project contributes toward the devel-
opment of intelligent systems that can interpret and respond to human emotions more
effectively, with potential applications in healthcare, education, and human-centered AI
systems.

2
Chapter 1

Introduction

1.1 Context and Motivation

Facial expressions are among the most powerful and universal forms of non-verbal com-
munication. They play a vital role in human interaction, conveying a range of emotions,
such as happiness, sadness, anger, surprise, fear, and disgust, without the use of spoken
language. With the advancement of artificial intelligence and computer vision, auto-
matic recognition and interpretation of facial cues has become an increasingly important
research area known as Facial Emotion Recognition (FER).

FER systems aim to detect human faces from images or video input and classify the
facial expressions into discrete emotion categories. These systems have a wide array
of real-world applications, including mental health monitoring, driver alertness systems,
adaptive e-learning platforms, entertainment, and human-computer interaction.

Around the world, researchers and developers have explored both traditional and modern
deep learning techniques to tackle FER:

• Traditional approaches often rely on handcrafted features such as Local Binary Pat-
terns (LBP), Gabor filters, or geometric landmarks, followed by machine learning
models such as SVMs or decision trees. However, these methods typically strug-
gle in complex, real-world settings due to their sensitivity to lighting conditions,
occlusions, and facial variability.

• Modern approaches leverage deep learning models, particularly Convolutional Neu-

ral Networks (CNNs), which automatically extract features from raw images. Re-
cent advances include the use of Residual Networks (ResNet), attention-based mod-
els, and Vision Transformers (ViT), which offer superior generalization and accu-

3
racy in challenging data sets..

To overcome the limitations of traditional methods, our project utilizes a deep learning-
based FER pipeline powered by a Vision Transformer (ViT). The model is trained on
a labeled emotion dataset to classify facial expressions from still images and is then
deployed in a real-time application using webcam input.

1.2 Expected Outcomes

The input to our system is a live video feed (webcam) capturing real-time facial expres-
sions. The output is a bounding box drawn around each detected face along with a
predicted emotion label such as ”happy”, ”sad”, ”angry”, etc., updated frame-by-frame.
This allows for real-time facial emotion recognition and visualization.

4
Chapter 2

Materials and Methods

Our project will follow this pipeline from collecting the data of fruits’ quality, analyze
the data, pre-process the data, building and implement a CNN model and then evaluate
the performance of our model on some metrics.

Figure 2.1: Workflow diagram of Facial Emotion Recognition

5
2.1 Data collection and pre-processing

In this research, we collect data from the FER in Kaggle.

Data collection

The Facial Emotion Recognition dataset contains 35887 grayscale images of human faces,
each labeled with one of seven basic emotion categories: Angry, Disgust, Fear, Happy,
Sad, Surprise, and Neutral. Each image is of size 48x48 pixels, capturing a wide range
of facial expressions in varied lighting conditions, poses, and backgrounds. This diversity
makes the dataset suitable for training robust and real-world applicable facial emotion
recognition models. The dataset is pre-split into a training set of 28,821 images and a
test set of 7,066 images, enabling effective model development and unbiased evaluation.

Figure 2.2: Class Distribution of Image per Label of Facial Emotion Recognition train
set

6
Figure 2.3: Class Distribution of Image per Label of Facial Emotion Recognition test set

The class distribution in the FER training set exhibits significant variation in the num-
ber of images across different emotion categories. While some classes such as Happy
contain a large number of samples (7164), others like Disgust are severely underrepre-
sented (436 images). This imbalance can introduce bias during model training, leading
to poor generalization and misclassification of minority emotions. Such skewed distribu-
tions may result in decision boundaries that overly favor majority classes. To mitigate
this issue, we apply oversampling techniques, which increase the number of instances in
underrepresented classes by replicating existing samples. This helps balance the dataset
and improves the model’s ability to learn from all emotion categories more effectively.

Data pre-processing

Facial Emotion Recognition dataset

The Facial Expression Recognition (FER) dataset from Kaggle is primarily used in this
research for training and testing. The original training set is split into two subsets: 80%
for training and 20% for validation. During preprocessing, each image is assigned a nu-
merical label based on its corresponding emotion class. This label encoding enables the
model to interpret categorical emotions as numeric targets during training. Each image
is resized and normalized to a standardized format to ensure consistent input dimensions
for the model.

7
Due to significant class imbalance in the dataset where emotions like Happy are overrep-
resented and Disgust severely underrepresented, oversampling is applied to the minority
classes in the training set. This is done by replicating images in the underrepresented
classes until each class contains an approximately equal number of samples. This tech-
nique reduces bias toward majority classes and improves the model’s ability to generalize
across all emotion categories.

After oversampling, the dataset is restructured into a standardized format suitable for
deep learning workflows. Emotion labels are converted into consistent integer identifiers,
and the dataset is organized to ensure type consistency and eﬀicient access during train-
ing. Data is subsequently loaded in mini-batches of 32 and shuffled at the beginning of
each epoch to promote training stability and improve generalization performance.

Figure 2.4: Class Distribution of Facial Emotion Recognition train set after augmentation

8
2.2 Model Design and Implementation

Figure 2.5: ViT model structure

We have developed a facial emotion classification model based on the Vision Transformer
(ViT) architecture, specifically using the pretrained version vit-base-patch16-224-in21k
developed by Google. The model is designed to extract hierarchical features from input
images and perform classification across seven basic emotions: sad, disgust, angry, neu-
tral, fear, surprise, and happy.

The Vision Transformer (ViT) model processes each input image by resizing it to 224×224
pixels and dividing it into non-overlapping patches of size 16×16 pixels. This results in
a total of 196 patches per image. Each patch is flattened into a vector and linearly
projected into a fixed-dimensional embedding. A learnable positional embedding is then
added to each patch embedding to preserve spatial structure across the image. These
196 positional patch embeddings are passed into a sequence of Transformer Encoder
layers. Each encoder layer consists of two main components: a Multi-Head Self-Attention
mechanism and a feed-forward neural network (MLP), both with layer normalization and
residual (skip) connections. Within the attention mechanism, each patch embedding is
transformed into a Query (Q), Key (K), and Value (V) vector. Attention scores are
computed for every pair of patches using the formula:
( )
QK T
Attention(Q, K, V ) = Softmax √ V
dk

9
These scores determine how strongly one patch should attend to others, allowing the
model to capture global contextual relationships across the image. Multi-head attention
enables the model to learn from different perspectives simultaneously. The outputs of
all attention heads are concatenated and passed through the MLP to extract deeper and
more complex features. At the final stage, a special classification token is used as the
aggregate representation of the image, which is passed through a separate MLP head to
produce the predicted emotion class.

2.3 Model Evaluation

We evaluate the model’s effectiveness based on five metric values: Accuracy, Precision,
Recall and Top K Accuracy.

2.3.1 Accuracy

Accuracy measures the overall correctness of the model when predicting facial expressions
across all emotion classes.

TP + TN
Accuracy =
TP + TN + FP + FN

In this context, True Positive (TP) represents the number of correctly predicted samples
for a specific emotion class, while True Negative (TN) refers to the number of correctly
identified samples that do not belong to that class. False Positive (FP) occurs when the
model incorrectly predicts a face as belonging to a certain emotion, and False Negative
(FN) happens when the model fails to detect an emotion it should have.

However, Accuracy can sometimes be misleading in the presence of imbalanced datasets,

where some emotions (like Happy) are overrepresented, and others (like Disgust) are
underrepresented.

2.3.2 Precision and Recall

To account for class imbalances, we will compute these two metrics:

10
Precision is the ratio of correctly predicted samples for a given emotion class among all
samples predicted to belong to that class.

TP
Precision =
TP + FP
Recall is the proportion of correctly predicted samples for a given emotion among all
actual samples of that emotion.

TP
Recall =
TP + FN

2.3.3 Top K Accuracy

Top-K Accuracy extends the traditional accuracy metric by evaluating whether the cor-
rect emotion label is among the model’s top K predictions. For example, Top-1 Accuracy
corresponds to standard accuracy, while Top-3 Accuracy considers a prediction correct if
the true label is within the top three predicted classes.

This metric is especially valuable in facial expression recognition, where subtle differences
between emotions can lead to prediction ambiguity. Top-K Accuracy provides a more
forgiving and realistic assessment of model performance in such scenarios.

2.3.4 Live Detection

Figure 2.6: System workflow diagram

11
This is a system workflow diagram.

1. Model Setup
The system begins by loading a fine-tuned Vision Transformer (ViT) model specif-
ically trained for facial emotion classification. This model is capable of identifying
various emotional states such as happy, sad, angry, surprise, and more. To ensure
consistency and compatibility with the model’s expected input, an image proces-
sor is applied. This processor includes a transformation pipeline that standardizes
the input images by resizing, converting to tensors, and normalizing based on the
model’s training configuration.

2. Face Detector
The system employs OpenCV’s Haar Cascade classifier to identify faces in real-
time video frames. This lightweight and eﬀicient face detection technique scans
each frame from the webcam and locates facial regions. Once detected, each face
is cropped from the original frame and prepared for emotion analysis in the next
stage.

3. Emotion Prediction
The cropped face images undergo preprocessing steps such as resizing, normaliza-
tion, and format conversion to align with the ViT model’s input requirements.
These processed images are then passed through the model, which predicts the
most probable emotion category.

4. Live Display and Feedback

For user interaction and real-time feedback, the system visually annotates the live
webcam feed. Bounding boxes are drawn around each detected face, and the pre-
dicted emotion label is displayed just above the box.

5. Data logging
To support further analysis and potential improvements, the system logs each detec-
tion event. It saves the coordinates of detected faces along with their corresponding
emotion predictions into a CSV file. Additionally, cropped face images are stored
locally, allowing users to examine individual samples or augment datasets for re-
training or evaluation purposes.

12
Chapter 3

Result and Discussions

This section presents the performance of our model on the test set from the Facial Ex-
pression Recognition (FER) dataset. The evaluation was conducted using multiple per-
formance metrics, including accuracy, precision, recall, Top K Accuracy. These metrics
help assess the effectiveness of the model in handling both common and rare emotion
classes.

3.1 Results

The facial emotion recognition model demonstrates strong and consistent performance
across all major evaluation metrics. It achieved an overall accuracy of 87.93%, with
precision and recall both at 87.91% and 87.93%, respectively. These metrics suggest
that the model is not only accurate but also reliable in maintaining a balance between
detecting true positives and minimizing false predictions.

13
Figure 3.1: Top K Accuracy for test set

Further insight is provided by the Top-K Accuracy analysis, which shows that the model’s
performance significantly improves when more prediction options are considered. While
the Top-1 accuracy is already high at around 88%, it climbs to 95% for Top-2, and
continues to increase, approaching 99% by Top-5. This indicates that the model can
still identify the correct emotion within its top predictions, which is especially useful in
applications where multi-label or ranked predictions are acceptable.

14
Figure 3.2: Confusion Matrix for test set

The confusion matrix offers a deeper look at class-wise performance. The model excels
at detecting emotions such as ”disgust” and ”sad”, which exhibit very few misclassifica-
tions. However, some confusion remains among similar expressions, for instance, ”angry”
is occasionally predicted as ”happy” or ”neutral”, and ”happy” sometimes overlaps with
”surprise” or ”fear”. These confusions are likely due to overlapping facial features be-
tween these emotions, suggesting potential for improvement via better feature separation
or emotion-specific data augmentation.

In order to evaluate our CNN model performance, we also implemented a pretrained

ResNet-18 on this dataset and then compare the result of two models.

15
Model Accuracy Precision Recall
ResNet-18 71% 71% 71%
ViT 87.93% 87.91% 87.93%

Table 3.1: The difference between our ViT and the Resnet-18 model

ViT is significantly higher than traditional CNN-based models (ResNet). Because Con-
volutional Neural Networks (CNNs) typically use small kernels to extract local features
from an image. While this is effective for capturing fine-grained patterns, it limits the
model’s ability to understand long-range dependencies across spatial regions. In contrast,
Vision Transformers (ViT) divide the image into patches and process all patches simul-
taneously using self-attention, allowing the model to learn global spatial relationships,
such as those between the eyes and mouth. In the context of facial emotion recognition
(FER), emotional expressions are often represented by a combination of changes across
multiple facial regions rather than localized features alone. Therefore, ViT offers a sig-
nificant advantage by modeling these broader spatial interactions, leading to improved
performance in recognizing complex emotions.

3.2 Challenges Encountered

During the development of the facial expression recognition model, we encountered sev-
eral challenges that negatively impacted performance. One of the major issues was class
imbalance, where the number of images across different emotion categories was highly un-
even. For example, the ”Happy” class had significantly more images compared to classes
like ”Disgust” or ”Surprise” (as shown in Figure X). This imbalance led the model to favor
majority classes during prediction, resulting in lower accuracy for minority classes. For
instance, the model was more likely to misclassify subtle emotions like ”Disgust” as ”Neu-
tral” or ”Angry” due to the lack of suﬀicient training samples for the underrepresented
classes.

16
3.3 Model Limitations

Despite the strong performance of our model, there are still several limitations to consider.
First, the model’s accuracy is highly dependent on the diversity and representativeness
of the dataset. In real-world scenarios, variations such as lighting conditions, occlusions),
and head poses can negatively impact performance, especially if these variations are
underrepresented in the training data.

17
Chapter 4

Conclusion and Future work

4.1 Conclusion

In this project, we introduced a Vision Transformer (ViT)-based model for real-time facial
expression recognition using live webcam input. The system captures frames from a live
video stream, detects faces, and classifies emotions into seven categories: Angry, Disgust,
Fear, Happy, Neutral, Sad, and Surprise. Experimental results indicate that the ViT
model performs well in recognizing emotions across varying facial expressions, achieving
strong evaluation metrics including quite high accuracy and balanced precision-recall
scores.

4.2 Future work

Although the model achieved promising results, several areas can be improved in future
work. Firstly, we aim to expand the dataset to include more diverse facial expressions,
ethnicities, lighting conditions, and age groups to enhance the model’s generalization and
robustness in real-world scenarios.

Additionally, we want to improve the system’s performance under challenging conditions

such as occlusions, low resolution, or extreme facial angles.

Finally, we plan to develop a user-friendly application that integrates our FER model
with live webcam input, allowing real-time emotion detection with visual feedback. This
application would serve as a practical demonstration of the model’s potential in domains
such as education, healthcare, and human-computer interaction.

18
References

3
No ratings yet
3
25 pages
Journal (AI&ML)
No ratings yet
Journal (AI&ML)
19 pages
Facial Emotion Recognition
No ratings yet
Facial Emotion Recognition
18 pages
Bachelor S Thesis Project BTP Template For NSUT NSIT
No ratings yet
Bachelor S Thesis Project BTP Template For NSUT NSIT
27 pages
IEEE Conference Template 2
No ratings yet
IEEE Conference Template 2
4 pages
Facial Expression Recognition Using Facial Landmarks: A Novel Approach
No ratings yet
Facial Expression Recognition Using Facial Landmarks: A Novel Approach
5 pages
Mini Project 3rd PPT Final
No ratings yet
Mini Project 3rd PPT Final
24 pages
RPL 10
No ratings yet
RPL 10
9 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Hybrid Facial Expression Recognition (FER2013) Model For Real-Time Emotion Classification and Prediction
No ratings yet
Hybrid Facial Expression Recognition (FER2013) Model For Real-Time Emotion Classification and Prediction
9 pages
IJIVP Vol 12 Iss 1 Paper 6 2531 2536
No ratings yet
IJIVP Vol 12 Iss 1 Paper 6 2531 2536
7 pages
RM Assignment Jan 2025
No ratings yet
RM Assignment Jan 2025
23 pages
Facial Expression Recognition System
No ratings yet
Facial Expression Recognition System
12 pages
Image Based Facial Emotion
No ratings yet
Image Based Facial Emotion
22 pages
Facial Emotion Recognition And...
No ratings yet
Facial Emotion Recognition And...
15 pages
Facial Emotion Detection Using Deeplearning
No ratings yet
Facial Emotion Detection Using Deeplearning
4 pages
Mini Project 0th
No ratings yet
Mini Project 0th
13 pages
Facial Expressionrecognition System
No ratings yet
Facial Expressionrecognition System
11 pages
MDPI Article Template 1 - 43
No ratings yet
MDPI Article Template 1 - 43
18 pages
Real-Time Facial Emotion Detection Application With Image Processing Based On Convolutional Neural Network (CNN)
No ratings yet
Real-Time Facial Emotion Detection Application With Image Processing Based On Convolutional Neural Network (CNN)
10 pages
Facial Emotion Detection
No ratings yet
Facial Emotion Detection
10 pages
Begaj 2020
No ratings yet
Begaj 2020
6 pages
Facial EMotion
No ratings yet
Facial EMotion
13 pages
Research Paper1
No ratings yet
Research Paper1
6 pages
Finalproject1finalbikki Vaibhavi
No ratings yet
Finalproject1finalbikki Vaibhavi
38 pages
Emotion Recognition Mid Sem - PPTM
No ratings yet
Emotion Recognition Mid Sem - PPTM
20 pages
Miniprojectpresentation1 210617135622
No ratings yet
Miniprojectpresentation1 210617135622
11 pages
Final
No ratings yet
Final
28 pages
Facial Emotion Recognition Thesis
100% (3)
Facial Emotion Recognition Thesis
5 pages
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
No ratings yet
From Image To Emotion Exploring CNN Architectures For Facial Emotion Recognition
6 pages
Research 1.6
No ratings yet
Research 1.6
5 pages
Facial Sentiment Analysis Using AI Techniques State-of-the-Art, Taxonomies, and Challenges
No ratings yet
Facial Sentiment Analysis Using AI Techniques State-of-the-Art, Taxonomies, and Challenges
25 pages
Report
No ratings yet
Report
22 pages
Final Report On Facial Emotion Detection Using Machine Learning
No ratings yet
Final Report On Facial Emotion Detection Using Machine Learning
12 pages
Engproc 42 00003
No ratings yet
Engproc 42 00003
5 pages
Chandigarh University: Bachelor of Technology
No ratings yet
Chandigarh University: Bachelor of Technology
11 pages
Thesis Presentation
No ratings yet
Thesis Presentation
17 pages
Mini Project 1st
No ratings yet
Mini Project 1st
20 pages
Ataiml 03.01 03
No ratings yet
Ataiml 03.01 03
12 pages
Improved Facial Emotion Recognition Using Convolutional Neural Network
No ratings yet
Improved Facial Emotion Recognition Using Convolutional Neural Network
6 pages
IEEE Format 1
No ratings yet
IEEE Format 1
4 pages
Deep Facial Expression Recognition: A Survey: Shan Li and Weihong Deng, Member, IEEE
No ratings yet
Deep Facial Expression Recognition: A Survey: Shan Li and Weihong Deng, Member, IEEE
25 pages
Facial Expression Detection Using Live Camera
No ratings yet
Facial Expression Detection Using Live Camera
11 pages
Shruti
No ratings yet
Shruti
54 pages
F6 EDAI2 Project Poster
No ratings yet
F6 EDAI2 Project Poster
1 page
Facial Emotion Recognition FER Through Custom Lightweight CNN Model Performance Evaluation in Public Datasets
No ratings yet
Facial Emotion Recognition FER Through Custom Lightweight CNN Model Performance Evaluation in Public Datasets
17 pages
Mini Project 3rd
No ratings yet
Mini Project 3rd
14 pages
A Transfer Learning Approach For Facial Emotion Recognition Using A Deep Learning Model
No ratings yet
A Transfer Learning Approach For Facial Emotion Recognition Using A Deep Learning Model
12 pages
Face Emotion Recognition
No ratings yet
Face Emotion Recognition
16 pages
.Facial Emotion Recognition Using Convolutional Neural Network
No ratings yet
.Facial Emotion Recognition Using Convolutional Neural Network
4 pages
Updated Survey Paper - Edited
No ratings yet
Updated Survey Paper - Edited
10 pages
Report
No ratings yet
Report
8 pages
1 s2.0 S1568494623000157 Main
No ratings yet
1 s2.0 S1568494623000157 Main
18 pages
4402 Real-Time Facial Emotion Detection Using Machine
No ratings yet
4402 Real-Time Facial Emotion Detection Using Machine
5 pages
Facial Emotion Detection Using CNN
No ratings yet
Facial Emotion Detection Using CNN
8 pages
Ai Research 1
No ratings yet
Ai Research 1
17 pages
Documentation
No ratings yet
Documentation
49 pages
Automated Facial Expression Recognition Using Deep Learning Techniques - An Overview (#657395) - 1125402
No ratings yet
Automated Facial Expression Recognition Using Deep Learning Techniques - An Overview (#657395) - 1125402
15 pages
An Experimental Study in Real-Time Facial Emotion Recognition On New 3RL Dataset
No ratings yet
An Experimental Study in Real-Time Facial Emotion Recognition On New 3RL Dataset
9 pages
Fundamentals of Digital Image Processing
From Everand
Fundamentals of Digital Image Processing
Dandak Kaniyar
No ratings yet
Lipid Chemistry BSN
No ratings yet
Lipid Chemistry BSN
53 pages
Holsetpartnumbers 2008
No ratings yet
Holsetpartnumbers 2008
1 page
Oracle Data Encryption
No ratings yet
Oracle Data Encryption
40 pages
Project Report
100% (1)
Project Report
58 pages
Brochure - Titrator - T50
No ratings yet
Brochure - Titrator - T50
2 pages
Unit Conversion Table: Distance Foot (FT) Inch (In) Meter (M) Centimeter (CM) Mile (Mi)
No ratings yet
Unit Conversion Table: Distance Foot (FT) Inch (In) Meter (M) Centimeter (CM) Mile (Mi)
2 pages
GM Screen Daggerheart - Portrait
No ratings yet
GM Screen Daggerheart - Portrait
4 pages
Damage Stability-3
No ratings yet
Damage Stability-3
1 page
App T Da Pam 73-1 S
No ratings yet
App T Da Pam 73-1 S
4 pages
1, S4 New Curriculum Chemistry Chapter 3 - Trendsin The Periodic Table
No ratings yet
1, S4 New Curriculum Chemistry Chapter 3 - Trendsin The Periodic Table
9 pages
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
No ratings yet
Taking The Control System For Granted - Ensuring The Integrity of Sub-Sil Instrumented Functions
5 pages
ORF309 Probability
No ratings yet
ORF309 Probability
28 pages
Rt6-Xxx: Telecontrolli
No ratings yet
Rt6-Xxx: Telecontrolli
2 pages
Imaging and Design For Online Environment
No ratings yet
Imaging and Design For Online Environment
2 pages
B.A. Revised Syllabus
No ratings yet
B.A. Revised Syllabus
41 pages
MCC1106 - Industrial Automation and Robotics - MTech SEM 1
No ratings yet
MCC1106 - Industrial Automation and Robotics - MTech SEM 1
1 page
Class 9 Cbse Board Syllabus
No ratings yet
Class 9 Cbse Board Syllabus
7 pages
Poster ASME 2022 EN 013
No ratings yet
Poster ASME 2022 EN 013
1 page
Pick&Place Station Assembly Instructions
No ratings yet
Pick&Place Station Assembly Instructions
20 pages
C++ All Modules
No ratings yet
C++ All Modules
68 pages
Rule of Thumb Formulae
80% (5)
Rule of Thumb Formulae
54 pages
ABB DCS Function Code 15
No ratings yet
ABB DCS Function Code 15
2 pages
Mathematics: Grade 8
No ratings yet
Mathematics: Grade 8
180 pages
32 + 44 B10F-Ball-Valve
No ratings yet
32 + 44 B10F-Ball-Valve
1 page
EE Review 2
No ratings yet
EE Review 2
5 pages
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
No ratings yet
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
2 pages
Tutorial 3 - Revised Solution
No ratings yet
Tutorial 3 - Revised Solution
11 pages
S Block dpp2
No ratings yet
S Block dpp2
3 pages
HCHD 300
No ratings yet
HCHD 300
12 pages
Final Exam SEE3433 Mei (Solution)
No ratings yet
Final Exam SEE3433 Mei (Solution)
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ml2 Copy

Uploaded by

ml2 Copy

Uploaded by

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF HANOI

DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY

MACHINE LEARNING AND DATA MINING II

Facial Emotion Recognition

Tran Bao Nguyen 22BI13342

Nguyen Thu Trang 22BI13426

Le Quang Minh 22BI13286

2 Materials and Methods 5

3 Result and Discussions 13

4 Conclusion and Future work 18

Facial Expression Recognition (FER) plays a significant role in enhancing human-computer

1.1 Context and Motivation

• Modern approaches leverage deep learning models, particularly Convolutional Neu-

1.2 Expected Outcomes

Materials and Methods

Figure 2.1: Workflow diagram of Facial Emotion Recognition

In this research, we collect data from the FER in Kaggle.

Facial Emotion Recognition dataset

Figure 2.5: ViT model structure

2.3 Model Evaluation

However, Accuracy can sometimes be misleading in the presence of imbalanced datasets,

2.3.2 Precision and Recall

To account for class imbalances, we will compute these two metrics:

2.3.3 Top K Accuracy

2.3.4 Live Detection

Figure 2.6: System workflow diagram

4. Live Display and Feedback

Result and Discussions

In order to evaluate our CNN model performance, we also implemented a pretrained

3.2 Challenges Encountered

Conclusion and Future work

4.2 Future work

Additionally, we want to improve the system’s performance under challenging conditions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.