Facial Expression Recognition System Using Weight-Based Approach
Facial Expression Recognition System Using Weight-Based Approach
WEIGHT-BASED APPROACH
ABSTRACT
In this paper we present a method to identify the facial expressions of a user by
processing images taken from a webcam. This is done by passing the image
through 3 stages -face detection, feature extraction, and expression recognition.
Face detection is accomplished by implementing the concept of Haar-like features
to prioritize reduction in detection time over accuracy. Following this, facial
features are extracted from their respective regions of interest using Gabor filters
and classified based on facial ratios and symmetry. We plotted a minimal number
of facial points (12 points) and compared the positions of these points on the
neutral expression image and the current expression image. This allowed us to
recognize 8 expressions based on the facial movements made as opposed to
identifying just the 6 basic expressions as done by research papers previously. We
were able to reach an accuracy level of approximately 92% for expression
recognition when tested with the JAFFE database. This is a considerably high
performance when taking into account the simplicity of the algorithm involved.
S ( J , J )
i 1
ai ai cos(i i) matrix back into the 27x18 dimension.
c. for each scale s from 1 to 5
40 40
i 1
ai2 i1 a i 2 (3) for each orientation j from 1 to 8
result = fft2(double(W27x18),32,32);
Features = ifft2(Gabor(s,j)* result,27,18);
This similarity measure returns real values in the end for(j)
range [−1, +1], where a value closer to +1 means a end for (s)
higher similarity between the input jets. Gabor features are converted to a vector by
reducing the matrix by nine times for faster
In order to search for a given feature on a new face calculations. Hence, 2160 features are presented
image, a general representation of that facial feature per image.
point is required. As proposed in [6], the general All images and Gabor features are saved and
appearance of each feature point can be modeled by procedure is repeated for all positive and negative
bundling the Gabor jets extracted from several images in the database.
manually marked examples of that feature points (e.g. Network is initialized and trained using this
eye corners) in a stack-like structure called a “Gabor image database to reach a low error rate.
bunch”.[10]
4.2.2. Detection of any one feature (image_scan)
4 ALGORITHM Create 2 templates (C2,C3) of the feature to be
detected.
4.1Face Detection Produce convolutions of the templates with the
1. Read the 640x480 input image given image(C1)
2. Convert image to grayscale Corr_1 = double(conv2 (C1,C2,'same'));
3. Use a trained database to find the portion of the Corr_2 = double(conv2 (C1,C3,'same'));
given image that gives the best results for face Save the combined state of the 2 convolution
detection using Haar-like features. results in a cell matrix. 0 would indicate no
4. Return the points at which faces are centered in detection possible and 1 would indicate a possible
the image. detection.
while there exists (x,y) in Cell whose state is 1, do into a cascade of classifiers. An input window is
Let the element be i,j passed from one classifier in the cascade to the next
Cut out the 27x18 image centered at this point. as long as each classifier classifies the window as a
Invalidate the state so it is not detected again face. The threshold of each classifier is set to a high
Find the simulated network value calculated detection rate. The rates are such that the later stages
using the net matrix and the given ‘imcut’ cover harder detection procedures. Once a face was
image detected, the Haar classifier cropped the image for
Save the point (i,j) if the simulated network ROI detection (see fig. 2).
value is greater than the required threshold.
End while 5.2. ROI Detection
Find the centroid of the area covered by the group The next step in the automatic facial point
detection is to determine the Region Of Interest
of points that had acceptable network values. This
indicates the feature detection point. (ROI) for each point, that is, to define a large region
which contains the point that we want to detect. To
Return the value of all the possible feature
achieve this we first detect the nose and then
positions detected.
calculate regions of interest for the other features
relative to the nose, as it forms the central point in
4.2.3. Collective detection of all features
the image. This can be seen in fig. 3
Create Regions Of Interest (as shown in Fig 3)
Apply ratio to classify among detected points as 5.3. Feature Extraction
seen in the figure (show figure ) The feature vector for each facial point is
Apply symmetry to synchronize some feature extracted from the 27x18 pixels image patch
point positions centered on that point. This feature vector is used to
learn the pertinent point’s patch template and, in the
4.3 Expression Recognition testing stage, to predict whether the current point
Weights are assigned to each expression based represents a certain facial point or not. This 27x18
on the movements made on the face when a neutral pixels image patch is extracted from the gray scale
image of the user is compared with an image of the image of the ROI and from 40 representations of the
user making an expression. Some movements are ROI obtained by filtering the ROI with a bank of 40
given negative weights, hence decreasing the Gabor filters at 8 orientations and 5 spatial. Thus,
probability of that expression. The resulting sum 486x40=19440 features are obtained, which are
after these weights are applied gives the overall reduced by 9 times to have 2160 features represent
probability of that expression. one facial point. Each feature contains the following
1. The positive weights are assigned according to information: (i) the position of the pixel inside the
the priority of movements for each expression as 27x18 pixels image patch, (ii) whether the pixel
given in table 1. originates from a grayscale or from a Gabor filtered
2. Negative weights are assigned when the representation of the ROI.
movement made rules out the possibility of
existence of the expression in question.
5 FUNCTIONAL ARCHITECTURE
6 RESULTS
Movement Meaning
NLB Lip’s vertical Movement
Fig 6 : Inaccurate detections of lip bottom and LS1LS2 Lip corners movement. Positive value
eyebrow tops on a larger scale than other points implies widening of the mouth
produced in the image. LSN Distance of the lip corners from the
nose
ETN Distance of the eye top from the nose
ETEB Width of the eyes
BTN Amount by which the brow
raises/lowers
BTET Distance between the eye and top of
the eyebrow.
Table 2: Movements Recorded in the graph
7 CONCLUSION Left Eye bottom (LEB) 89%
Right eye top (RET) 83%
Results obtained for expression recognition were Right Eye bottom (REB) 86%
92% accurate and the feature point detections were Left Brow top (BT1) 81%
86% accurate as inferred from the individual Right Brow top (BT2) 82%
accuracies given in Table 4. In future work we will Left Brow side(BS1) 93%
investigate effects of using a reduced number of Right Brow side (BS2) 90%
features for classification. Also, we plan to conduct Left Lip Corner (LS1) 79%
extensive experimental studies using other publicly Right Lip corner (LS2) 80%
available face databases. Lip bottom (LB) 84%
Additionally, we plan to have better methods of
Table 4: facial feature points detection rate for the
classification and a bigger image database which can
JAFFE database.
increase the performance in facial feature detection.
8 REFERENCES