0% found this document useful (0 votes)
9 views91 pages

Lec 6-7

The lecture discusses the analysis of user emotions through text, highlighting its importance in affective computing alongside voice and facial expressions. It explores various applications of text-based emotion analysis, including sentiment analysis in online comments and personalized advertisements. The lecture also emphasizes the significance of typography in conveying emotions and the necessity for fine-grained emotion representation in text analysis.

Uploaded by

chaaya0605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views91 pages

Lec 6-7

The lecture discusses the analysis of user emotions through text, highlighting its importance in affective computing alongside voice and facial expressions. It explores various applications of text-based emotion analysis, including sentiment analysis in online comments and personalized advertisements. The lecture also emphasizes the significance of typography in conveying emotions and the necessity for fine-grained emotion representation in text analysis.

Uploaded by

chaaya0605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Affective Computing

Prof. Jainendra Shukla


Prof. Abhinav Dhall
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi

Week - 06
Lecture - 01
Emotion Analysis with Text

Hello and welcome. I am Abhinav Dhall from the Indian Institute of Technology Ropar.
Friends, today we are going to discuss about analysis of emotion of a user through the text
modality and this is part of the ongoing lecture series in Affective Computing. So, till now we
have discussed how we can use the voice signal of a user, the facial expressions, the head
pose of a user to understand the emotion which either the user is feeling or the one which is
perceived.

Now, there is another modality which is very commonly used in the affective computing
community that is the text and the reason is quite obvious. We see so many documents
around us. We see conversations happening on chatting platforms. So, you would have seen
billions of users on platforms such as WhatsApp, Telegram and so forth and they are
conversing with text.

So, how can we understand the affective state of a user when the communication medium is
text? And on contrary to the analysis with faces and voice what is really interesting to note
with something with text is that text is not just about what is being communicated by a user in
real time, but it is also about let us say looking at the emotion conveyed in chapter of a book,
looking at the emotion conveyed by a particular comment which was posted few months ago
on a website.

So, both online and offline text is being analyzed for understanding the emotion in different
use cases.

328
(Refer Slide Time: 02:24)

So, moving forward in this lecture, first we are going to talk about why text is important as a
modality for understanding the affect of a user. Then we are going to look at some of the
applications. From this we will go a bit back in time. So, we will explore how emotion has
been conveyed through topography, how different fonts are used to convey the different
emotions to a user when he or she is reading certain text.

And then we will move on to the databases. So, we will look at some of the resources which
are available in the affective computing community where text has been labeled for its
perceived effect.

329
(Refer Slide Time: 03:15)

Now, when we are typically looking at a conversation. Now this conversation let us say is
between a user and a virtual avatar. So, we can have the camera modality, we can have the
microphone, right. Now, when you are recording what the person is saying, you can also use
speech to text in this pursuit and understand the lexical content.

What is being said that conveys the emotion, how is it being said, is the communication more
metaphorical or there are some implicit meanings which are being conveyed, right. So, there
are several applications which are based on this construct of how emotion is conveyed
through text.

Now, typically when we talk about emotions in text, friends you will see that in literature
sometimes the keyword emotion and sentiment could be used interchangeably. However, this
is a very simple yet extremely important difference. Emotion is what a user is feeling. So,
either the user can tell his or her emotional state or what is being perceived by another person
who is communicating with the user. So, the perceived emotion.

Now, with respect to text, let us say you are reading a comment about certain product on a
website. Now, in this case, we are interpreting the affect conveyed by the written text in the
comment as a third person. Now, this is the sentiment. So, what is the sentiment which a
particular lines or text that is communicating to the user who is reading that text?

330
Now, we see that this type of sentiment analysis in text that is based on the text categorization
according to what is the affective relevance. So, you read a comment on a website regarding a
product. After you finish reading, you can actually tell if the user who was writing his or her
feedback about that product felt positive or negative about that product, right.

Further, this is not only limited to the comments on these platforms, online platforms, but
also it is very commonly used in understanding of the opinion for markets, right. Let us say a
news article comes in about a certain company, about a certain product. Now, what is the
sentiment which is conveyed in that news article?

And with that, we can try to understand that let us say what is the generic public opinion
about that particular context regarding which that news article is written. Further emotion in
also is commonly used in computer assisted creativity. Now, an example of that is,
personalized advertisement that can be created by understanding the affective state of the user
and the same can be now conveyed in the form of text to the user.

So, that the appropriate emotion is conveyed in this text. Now, this can lead to a more
persuasive communication with the user, right. Let us say the text which you read in an image
based advertisement that is using the words which are conveying let us say the excitement
which a product could bring to a user, right. So, that can be conveyed within that text of how
the framing of the text is done. Further for adding expressivity in human computer
interaction, we use the emotion which is reflected through text.

What that means? Let us say a user is interacting with the machine. The machine is going to
give a verbal feedback. Now, this verbal feedback would of course, be a speech to text
conversion. Now, this speech to text conversion would from the user would give the content
about the lexical knowledge of what the user was saying. And now when the machine has to
give this verbal feedback, it will do text to speech synthesis.

But what it means the appropriate text which needs to be replied back to the user that can
have the emotional words added to it so as to convey the appropriate emotion, right. So that
means, the word selection for conveying the correct affect. And then of course from the
understanding perspective from speech to text when the user says is extremely important.

And in both these cases the text based affect analysis comes into the picture. You need to pick
up the right words to convey to the user. You need to understand the text which the user

331
spoke to you. The same is also friends applicable in a use case such as the question answering
system. So, how to convey the emotion behind let us say answer which a machine is giving to
a user, right.

So, that would be based on the word selection and so forth. That means your system needs to
pick up the right words which will convey the right emotion.

(Refer Slide Time: 09:31)

Now, from this let us move to how emotion has been conveyed in the past decades in the field
of typography, ok. Now, typically typographers they would use these minute information to
convey how the content is interpreted in terms of the emotion which is supposed to be
conveyed by a text.

Now, friends if you notice the alphabets uppercase A and lowercase a in two different fonts
Georgia and Verdana, right. You can actually see here that the way these characters are
represented in these two fonts is actually going to in a very subtle way convey the affect
which is supposed to be understood by the user. This not only is limited to the shape of the
characters, but also to the placement as well.

Now, observe this example here. In this case you notice that there is difference in the gap
between the alphabets. So, if one wanted to read this stanza it would actually of course, not
only be a bit difficult to read it out, but it will be non trivial to understand, sometimes the
affect which is conveyed. Now, if we improve this in the second example wherein at least the

332
gaps are more standardized we see that the reading and then further the interpretation that
improves.

Now; however, when we are introducing symmetry with respect to these gaps in the third
case notice how trivial it is to read and poetry must be collected and published to lay the base
for a modern culture and so forth, right. So, the emotion that is now conveyed more clearly
and the content is also understood by the user.

(Refer Slide Time: 11:55)

Now, there is this very interesting work which was proposed in 1924 by Poffenberger and
Barrows. What they are saying is the way in which a line is drawn, the style of the line would
represent the emotion which is supposed to be conveyed by let us say a word which is written
by using the style of those lines.

Now, notice this friends. So, here you have different styles of how lines can be written you
know for example, A to R here are the different styles. Now, with the curvature and the
frequency of change of the line you could have let us say smooth lines or you could have
these zigzag seesaw pattern lines.

If you use that you can convey different type of emotions, ok. For example when we wanted
to convey angry, furious kind of emotion we can use these seesaw angular sloping forward
lines and when we use these lines these shapes and then we create these words. For example,

333
here you see angry you can actually understand that the intensity of the emotion which is
supposed to be conveyed by that word it varies in these two different iterations.

So, of course, this is very subjective. I would say that the second iteration of angry that to me
is far more intense as compared to the first iteration where we are using more smoother lines,
right. So, if you are using these angular patterns, you are able to convey far more intense
angry react emotion.

Now, let us pick up the sad emotion, right. So, in this case if you have a gentler curve which
is sloping down and then you use these suggested typeface equivalent, you can actually
observe that this is slanting backwards sloping down. So, text in this would be perceived a bit
more seriously in the gravity of the content which it is conveying.

(Refer Slide Time: 15:26)

Now, if you were to compare that with happy friendly then you know it is observed that if
you have these gentle curve balance lines which are not sloping down like sad then this is
more easily conveyed, right. So, just by changing the typeface the orientation of the basic
components which are coming in together for the word formation.

We can have the user interpret the emotion from the text easily and this is how you can
convey the emotional content clearly to the user. Now, let us look at a bit relatively recent
work. So, in this Juni and Gross in their work from 2008 they got a set of labelers

334
(Refer Time: 15:40) who added ratings, ok. Now, these ratings were given to articles as being
let us say funnier and angrier or in other words more satirical. And they wanted to compare
Times New Roman font with the Arial font.

Now, this kind of questionnaire was given and what you can actually see here is, let us say
when we are talking about the question is which looks the happiest and you see these three
options, right. So, of course, this is subjective, but the orientation in which the text is written
which is of course, the font style that essentially was given to have effect on the user.

So, that means, the same word would be conveyed a bit differently in its emotional content.
So, they found that you know in for example, in this case when you look at calm, the word
calm itself is better conveyed in the second version because of the stability which has there in
the alphabets, right.

(Refer Slide Time: 16:58)

Now, let us look at another very relevant interesting work. Now, in this work Larson's and
Picard they wanted to understand the aesthetics and its effect on reading, ok. Now, what do
we have here? We have two versions of the same text. And if you close you observe they are
in different font styles.

What they found was that you know if you have a good typography then it should affect the
perceived affect and the real affect of the person, right. So, it could elevate the mood when
you are reading let us say this article as compared to the same article in a different style.

335
(Refer Slide Time: 17:52)

So, to this end they looked at two task. The first is friends the relative subjective duration
RSD. So, in this the participant’s perception of how long they have been performing a task
was evaluated. What was seen is when you have poor typographic conditions, the
participants, they underestimated the duration of the task by 24 seconds on the average.

So, let us say they have been performing a task for n seconds when the text which they were
reading as part of the task they were given with poor typographic conditions. They said well
you know the duration for which I have been working on this task is roughly 24 seconds less
than the actual duration. Why? Because In this they had more cognitive load because of the
poor typography.

However, when good typography condition based text was presented they underestimated the
duration by an average of 3 minutes and 18 seconds which means the reading was far more
easier, right. And that is how the time the perception of time passing by that was faster. Now,
what we learn is that good quality typography is responsible for greater engagement during
the reading task, the user is more engaged and it is a more immersive experience.

Now, they also did a candle task. So, this is a very old cognitive task from the 1945 proposed
by Duncker. So, what the task is, you have a candle, you have the match sticks, you have
some pins. What you want to do is, you want to take this candle, you want to place it on the
wall somehow that when you are going to burn that candle the wax should not fall on the
table, right.

336
Now, in this particular case from the studies participants 4 of the 10 participants were in the
good typography so they read the instructions. While 0 of 9 participants in the were in the
poor typographic conditions. Now, of course, the participants who were given the same
content in good typographic conditions they performed better in this case.

(Refer Slide Time: 20:32)

Now, with respect to the conditions of the how emotion is presented, I have already used in
the beginning terms you know the positive emotion or negative emotion which is perceived
let us say from a comment. But in a lot of cases that when you are looking at the text this
might not be enough, ok. So, we see that in the works looking at text modality for emotion
positive and negative is very commonly used.

But fine grained emotion annotation that even though it is more effective is lesser used in the
works of course, there are the very obvious reasons right, data annotations and so forth. Now,
if we take an example friends, for two emotions fear and anger. Now, both of this are
expressed negative opinion of a person let us say towards something.

But the latter, is more relevant in marketing or socio-political monitoring of the public
sentiment, right. So, anger is more relevant in the marketing or socio-political monitoring.
But both fear and anger are negative; that means, you know we need to have a more fine
grained representation of emotion when we are looking at text.

337
It is also shown that let us say when people are fearful, they tend to have more pessimistic
view of the future. While angry people tend to have a more optimistic view, right. Even
though both of these fear and anger are negative in their nature. But the intention for people
that is different, right. So, when we are trying to understand the users their affective state
through the text modality using fine grained annotation that is more clearer and more useful.

Further fear generally is a passive emotion while anger is more likely to lead to an action,
right. So, there is very fundamental difference in what a user would be doing after they are
you know experiencing these 2 emotions. Therefore you would like to have more fine grained
transformation not just simply seeing positive or negative.

(Refer Slide Time: 23:12)

Now, the question of course, comes friends is when you are looking at text and you want to
go fine grained emotion representation, either we go for categorical classes or we look at the
continuous emotion representations through the 4 dimensions. Now, the dimensional model
that has been very less used in emotion detection literature. But of course, in the recent works
it has shown to be more promising in the case of the text data.

Further, if you look at an example it is essential to identify the difference between fear and
anger, right. You because of how fear and anger have different post objectives we will have to
let us say look at the categorical data, but because it is difficult to label when can actually
have the same represented on the continuous dimensions as well.

338
Now, when you look at fear, with respect to its representation on the valence axis, fear is
negative and its representation on the arousal axis, the intensity is low or high. And when you
look at the dominance axis fear is submissive. Now, compare that with the anger emotion. On
the valence axis it is negative.

So, this is same as fear. On the arousal it could be low or high intensity and the dominance
that is submissive. Now, the only difference would be that when you are actually having let
us say the valence and arousal, where are exactly these placed right, fear and anger. That
would have an effect on better understanding of the emotional state of the user.

(Refer Slide Time: 25:24)

Now, in the same direction let us look at some examples, ok. These examples will convey to
us how complex emotions can be when we are analyzing the emotion conveyed by a user in
the text modality. So, let me read a statement friends. “The cook was frightened when he
heard the order, and said to Cat-skin, You must have let a hair fall into the soup; if it be so,
you will have a good beating”.

Now, here you have a complex statement where the sub parts are also reflecting different
emotions. However, this mainly is expressing fear. Now, let us look at the second example the
statement is “When therefore she came to the castle gate she saw him, and cried aloud for
joy.”

339
Now, if you notice the content, you have the word cried, you have the word joy. Is it sad? Is it
happy? We can only understand the meaning in the terms of emotion when we look at the
whole statement and the semantic meaning of it, right. So, this statement actually is an
expression of joy even though the word cried is there in the statement, right. So, this actually
reflects how complex emotion representation to text can be.

Now, let us look at another example friends. The statement is “Gretel was not idle; she ran
screaming to her master, and cried: You have invited a fine guest”, ok. Now, if you look at the
emotion this is actually an expression of angry and disgust. However, notice towards the end
of the statement you have invited a fine guest. So, there is a bit of anger there is a bit of
sarcasm as well, but it is only understood through when you analyze the whole statement,
right.

So, you can actually see the relation between some words which are towards the end, but
which are related to the words in the beginning, right. So, the emotion is conveyed when you
start linking the words and this represents the complexity of emotions when you are looking
at the text.

(Refer Slide Time: 27:58)

Now, emotions they can be implicit, right. Emotion expression is very context sensitive and
complex. We have seen that in these three examples. It is noted that you know a considerable
portion of emotion expressions they are not explicit. Implicitly through the text you would be

340
actually trying to understand the conveyed emotion, ok. For example, ‘be laid off’ or another
statement ‘go on a first date’.

Now, these contain emotional information without specifying any emotional lexicon, right.
So, there is an implicit emotion representation here. Now, in 2009 Erik Cambria and others
they proposed an approach to overcome this issue by building a knowledge base which
merges common sense and affective knowledge.

Example, spending time with friends causes happiness. Getting into a car wreck makes one
angry, right. So, adding common sense and the affect knowledge, if this happens this would
be the generic affect which would be there.

(Refer Slide Time: 29:26)

So, emotions can also be represented metaphorically, right. So, if you see expressions of
many emotions such as anger, they are metaphorical. Therefore, they cannot be assessed by
the literal meaning of the expression. Listen to this example friends, ‘he lost his cool’ or ‘you
make my blood boil’. So, these are metaphorical in nature. It is not actually boiling the blood
it is actually the emotion which is conveyed anger, right.

Now, it is difficult to create a lexical or a machine learning method which can identify
emotions in such text. And this is without first solving the problem of understanding the
metaphorical expression, right. So, in these kind of statements we need to understand the

341
metaphor so as to be able to solve the riddle of what is the emotion which is being conveyed
by the text.

Now, emotions, these are very complex conceptual structure and the structure could be
studied by systematic investigation of expressions that are understood metaphorically. So,
given that you know they are such a complex construct we need to understand the expression
which are understood metaphorically.

So, that means, of course, you know if you wanted to create a system which could understand
the implicitly presented emotions and the metaphorical ones you would need these datasets.
And of course, in that you would need these samples where emotion is presented
metaphorically.

(Refer Slide Time: 31:12)

Now, in the same direction friends, the problem of detecting emotions from text is a more
multi classification, multi class classification problem. This is simply because there is so
much complexity of human emotion, how people are presenting emotion, what they speak,
how they speak. Of course, you know this is all about the inter and intra subject variability.

Then some emotions or some speakers in a statement would speak and statement and then the
emotion would be implicitly represented which saw some examples. Some subjects will very
commonly use metaphors. So, that you know would mean that first we need to understand the
metaphor itself.

342
And then there is an importance of context in identifying emotions right, you need to
understand in what context something was said, right. Is the context a comment about a
product, is the context that someone is orating, someone is telling about how their day was
and in that particular use case scenario you would be then understanding the emotion, right.

So, the system needs to understand the context as well. Further as we have seen with voice
based emotion and face based emotion analysis, cross cultural and intra cultural variation of
emotion is there. Even if you are looking at the same language different people will be
expressing the same context in different manners.

Same content in different manners which would mean that there would there is a lot of
variation even though let us say the subjects are trying to convey the same emotion, but in
different styles, right. So, this actually makes the task complex and that means, we will be
required to have a multi class approach here.

And then of course, there are a lot more challenges right, when you are talking about text the
moment you go to different languages. Their emotion will be represented differently,
metaphors would be different, phrases would be different then the systems would need to
adapt to these different language and cultural scenarios. So, there are a large number of
challenges when we are looking at the text based emotion analysis.

(Refer Slide Time: 34:05)

343
Now, friends let us look at some of the standard resources which are available for text based
emotion and understanding. There is a dataset called ISEAR which is essentially the
International Survey on Emotional Emotion Antecedents and Reactions by Scherer and
Wallbott from 1994.

Now, in this there are 3000 people who were asked to report situations in which they
experienced each of the seven major emotions. So, it is a categorical approach and they were
also asked how they reacted to them, right. So, it is essentially that the user will recount about
an event and then they would then you know write about it and when they are writing about,
they are recalling an event then the emotion would be elicited.

Now, the other one friends is the EmotiNet Knowledge Base. In this, the authors started from
around a 1000 samples from this ISEAR dataset and clustered examples within each emotion
category based on the language similarity.

(Refer Slide Time: 35:16)

Now, another dataset is alm’s Annotated Fairy Tale dataset proposed in 2005. In this dataset
there are 1580 sentences from children fairy tales and they annotated with the categories from
the Ekman’s classical categorical emotion representation. Then there is the SemEval-2007
data by Strapparava and Mihalcea. In this we have 1250 new headlines extracted from news
websites and these are again annotated for the categorical emotions.

344
(Refer Slide Time: 35:55)

Now, let us look at another resource. Friends, this is also very commonly used resource, this
is called the Affective Norms for English Words ANEW. In this case the ANEW contains
2000 words which are annotated based on the dimensional model. So, these are about the
valence and arousal with the dimensions of valence arousal and dominance.

So, here for example, you see the words and the mean valence arousal and dominance that is
presented here for these particular words and in the bracket, you see the standard deviation.
So, these are again collected from a large number of labelers and that is how you have the
mean and standard deviation.

345
(Refer Slide Time: 36:45)

Now, if we observe the plot of pleasure versus arousal and the data points are for men and
women, you notice here these are quite inter related, they are overlaying. However, you will
see the data coming from men labelers, that is actually a bit more on the outside female data
label.

So, essentially the arousal and pleasure dimensions they are more concentrate. This is just to
show you that there is a bit of a difference when it comes to the labels as well based on the
difference in gender. Of course, you know when you have n number of labelers who are
labeling the app perceived affect of certain statements we are using text in this case then we
are going to observe some difference.

And that is why we would like to compute basic statistics around the labels which are
assigned by different labelers so that we can have a final label for the statement.

346
(Refer Slide Time: 37:51)

Another resource, friends, is the SentiWordNet. Now, in this case the lexical resources that
focuses on the polarity of subjective term. So, this is a bit different approach. What we are
saying is for mining of the opinion you can actually have an objective score which is 1 minus
positive plus negative, ok. Now, the rationale is as follows: the given text has factual nature.
If there is no presence of a positive or a negative opinion on it, ok.

Now, let us look at an example in this case. So, here you see this is positive, this is negative
and this is the opinion. So, if you pick up a statement for example, a short life. Now, in this
case there is a word with a bit of negative connotation here, right. So, nothing positive P is
equals to 0, negative is 0.125 and then of course, we are subtracting. So, you know this is the
opinion.

Let us take another example, friends. So, in this case you know let us say the statement is his
was short and stocky, short in nature, a short smokestack, right. So, you have more negative
connotation and the objective score that is 0.25 as compared to here where you had 0.875. So,
this is the opinion about the statement.

347
(Refer Slide Time: 39:35)

Now, let us look at another standard resource this is the NRC word emotion association
Lexicon. Now, collected in 2013 this is based on prior work which largely focus on positive
and negative sentiments, ok. So, what the authors they said well, let us crowd source and
annotate 14000 words in English language. When you are crowdsourcing you can have a
large number of labelers who are annotating the data.

And they have different versions of the lexicon in over 100 languages which are translated
using the Google Translate and this is friends the heat map, ok. So, if you look at the
sentiment this just shows the overall label. So, this has the highest number and of frequency
and this has the lowest frequency.

348
(Refer Slide Time: 40:30)

Now, some other lexicon resources this is the Linguistic Inquiry and Word Count LIWC,
another very commonly used resource, it was proposed in 2001. Contains 6400 words which
were annotated for emotions and each word or word stem defines one or more word
categories. So, there are more than 70 classes. An example is the word ‘cried’, now it is part
of four word categories sadness, negative emotion, overall affect, and a past tense verb.

Now, another one is, friends, WordNet Affect proposed in 2004. In this case it was developed
from word net through a selection and tagging of a subset of synset representing the affect
meaning which is subset of the work word net dataset. Then there is the DepecheMood which
was proposed in 2014. And this one actually again is based on crowd-sourcing to annotate
35000 words. So, this is actually a bit larger resource as compared to the earlier ones.

349
(Refer Slide Time: 41:45)

Now, typically for categorical models, NRC data, that is used as the lexicon. When we are
talking about the continuous dimension the ANEW that is used as a lexicon. Now, further
friends the emotion of the text can be assigned based on the closeness, cosine similarity of the
vectors to the vectors for each category. So, this is one way.

You are saying well, I have a representation for the text. Now, I am going to use the cosine
similarity distance metric and I am going to assign the category or dimension. So, you would
have some samples, some data points. Now these data points would have some category or
some dimensional emotion based intensity assigned to them and ANEW sample comes in.

You compute the cosine similarity between this sample and the samples which already have
the labels and you assign the one which is closest, right. So, the closest sample from the data
will use those labels and assign it to the new sample.

Now, this simply means, well you know you define the similarity between a given input text I
and an emotion class E of j and then the similarity for the categorical classification is
formally represented as simply saying well, I want to compute the maximum for the
similarity. If the similarity is greater than a threshold, if it is greater than a threshold you say
well it is a non neutral emotion.

If it is neutral then essentially you are saying that the similarity between the input text and the
emotion class that is less than a certain threshold, right. So, in you can define the class in

350
such a manner. But it simply means two things: you need a representation for the text; you
have the cosine similarity metric, you can use some other metric for computing the distance
as well.

And you keep a threshold where you say well, if the distance is larger than this particular
threshold, there is an emotion conveyed, if the distance is less than the threshold then this is a
neutral statement. So, this way you can have unsupervised learning for the understanding of
emotion using the text. So, friends with this we reach towards the end of lecture 1 of Text
Based Emotion Analysis.

What we discussed was, why is text based analysis important, what are the applications, how
the subtle details in topography, how fonts are represented, how the basic components in form
the lines

(Refer Time: 44:45) when they come together they would represent different emotions to the
user and have an effect in inducing emotions in the user. And later on, we discussed about
some of the most commonly used resources for text based affect understanding.

Thank you.

351
Affective Computing
Prof. Jainendra Shukla
Prof. Abhinav Dhall
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi

Week - 06
Lecture - 02
Emotion Analysis with Text

Hello and welcome. I am Abhinav Dhall from the Indian Institute of Technology, Ropar and
friends today we are going to discuss about Analysis of Emotion from Text. So, this is the
2nd lecture in the text analysis for emotion recognition in the Affective Computing series.

So, in the last lecture we discussed about what is the importance of analysis of the emotion
from the text. The text could be in the form of a tweet, could be a document and then we
discussed, how simple organization of text in the terms of the font, the curvature of the lines
which are constituting the font, essentially topography, how that affects the perception of
emotion.

(Refer Slide Time: 01:11)

And today first I am going to discuss with you a very important building block of a text based
emotion recognition system which is the important features. How are we going to understand
the context, the linguistics, how are we going to represent the text in a tweet or a document.

352
Later on, I am going to discuss with you few methods, which are proposed in the academic
community which are predicting emotion from text.

Later on we are going to switch the gears a bit and we are going to discuss about how
emotion recognition can be performed when there is a conversation happening. When two
people are let us say conversing through text, the perception of the emotion and the
understanding of the emotional state of the user that can be very dynamic. So, how the
emotion changes, how the perception changes, over the conversation flow. So, we are going
to look into that.

(Refer Slide Time: 02:19)

So, let us start with one of the most commonly used feature representation for text and to link
it to how we were using audio and video for emotion recognition, Recall we discussed that
we can have a bag of words based representation, right. So, we can represent parts of an
image or parts of a speech sample as a word and then we can create a representation.

Similarly, when we are talking about your bag of words, typically what we are saying is let us
say you have a document. Now, for the sake of the example, imagine the document contains
information about the different cities in India ok. So, this one let us say talks about Delhi,
then the second document talks about Kolkata and then the third one talks about Chennai and
so forth, right. So, we have a long list of documents.

353
Now, the content in each document would be different that would mean the number of words
in each document will be different. Now, what we want to do, in the plain vanilla bag of
words representation friends, is we want to take all the documents which are present during
the training and we want to create a large repository, which contains all the words, so all the
words in my training documents, ok.

From this we can apply a simple clustering technique like k means, we have discussed this in
facial expression based emotion recognition and that is going to give me a dictionary, which
will contain the frequently occurring words. So, you can say these frequently occurring words
are the building block of my documents.

And once I have identified this frequently occurring words, I can then have a vector
representation for each document separately, which is essentially the frequency of occurrence
of these important words in my document, right. So, that is a vector representation.

So, for example, if you are talking about Delhi, then the word for example, parliament and
government, history these might be more common within the Delhi based document as
compared to let us say, document discussing another city which is not the state capital and is
let us say New City, right.

So, you will have a histogram, a vectorised representation, then standard method you can
train a machine learning model and predict the emotion, which let us say the text is
conveying about the city, right. Now, as you would have figured out friends here, what we are
simply doing is?

We are counting the number of words, the unique words in a document and then trying to see
the frequently used words across all the documents, how many times they are occurring in my
current document. Now, imagine when you were talking about the input stage you know the
first step, wherein I said well you create a repository of all the words in all the documents.

354
(Refer Slide Time: 06:48)

So, we were considering the words here individually. What will that mean? So, let us say a
line in my document says Delhi is the capital of India. Now, you treat each word here
separately Delhi is one word, is one word, the is one word and so forth, right. Now, very early
on, a concept was introduced specially for natural language processing, where we said well,
let us also have combination of words which are occurring together. Now, that is referred to
as n-grams.

So, now when n is equals to 1, you observe this scenario where Delhi is a separate word, is a
separate word, the is a separate word, separate entities. When you have, n is equals to 2,
notice what is going to happen? I am going to say in the same statement Delhi is the capital
of India. I am going to take these two consecutive words together.

So, ‘Delhi is’, now this is one unit, ‘the capital’ that is another unit and ‘of India’ that is
another unit, ok. One could move a step further and say I am also going to now take three
neighbors together as one unit, as one word. So, what will that be in our example statement?
When n is equals to 3, a 3 gram a trigram you would say ‘Delhi is the’, is one word, is one
unit.

And once you have established, the different combinations of words and the appropriate
representation for n-gram, n is equals to 2, 3 depending on the value, you can then repeat all
the steps which we just discussed for bag of words. Wherein now these words before k means

355
could be a combination of sequential words right, each word here could be for example
‘Delhi is the’.

Now, that is one unit, one word in the bag of words sense and one could then train the system.
Why would that be useful? Simply because when let us say you encounter a individual unit
like this where we say Delhi is the. So, from the words meaning perspective, what we
understand is? That after this some information about Delhi is going to be presented, right.

So, here we say Delhi is the, and then for a trigram where n is equals to 3, you would have
‘capital of India’ as a second unit, right. So, learning the relationship between sequentially
occurring words as well, so that is another way of extracting information for our final goal
which is understanding emotion from text analysis, right.

Now, similar to this you can also have another concept, wherein you can say well, I am
looking at the words individually or sequential words as one unit. How about I also take into
consideration, the importance of a word, ok. Now, what could be importance of a word?
There are different ways to discuss and describe this. One could say well, if we have the word
‘Delhi’ being repeated in a document several times, there is a high probability that the
document is discussing about Delhi, something in Delhi, something around Delhi, right.

The other word is let us say you have a word “the”. Now the word “the” is a common word
and it would be repeated across different documents. So, even if let us see in a vector
representation of a document, you know I am just going to draw histogram here, the word
“the” could be repeated multiple times in a document. And other words are of you know
similar or smaller frequency, you would find that in other document histograms, the again
could be very high in value.

So, large frequency, so “the” does not helps me much, in discriminating between the types of
document and perhaps from the context of emotion recognition, it is not helping me much in
discriminating between if let us say the perceived contents emotion from a document is class
x and then in other document you know the perceived emotion is class y. Because ‘the’ is
happening so many times, it is not really helping me much, right.

So, how do we take care of the situation, how simply we can encode this information within
our representation of the text? So, for that friends, we have the very popularly and commonly
used TF-IDF representation.

356
(Refer Slide Time: 12:22)

Now, what is that? The first part of this is the term frequency TF. Simply that means, how
many times a word is appearing in a document. Now, how many times for example, the word
Delhi is occurring in a document. If that word is occurring multiple times, then it could be
important ok. You have seen words like “the” and “a”, they happen multiple times you know
all the documents, right.

So, that could mean I cannot only depend on the term frequency, what I; what would I need?
To balance it I would need a inverse document frequency it is called IDF. What it does is, It
tells the importance relevance of a word, how is it doing it. Well, you say for a given word, I
am going to compute the relevance by saying how many data points do I have in terms of
documents. Let us say I have N data points.

Simply speaking, I am going divided by the number of times this particular word term
happens in some of the documents ok. So, let us say M. ‘The’ happen so many times. Now, I
would like to use this as a weight ok and so that I do not do a very harsh treatment to the
words which are happening multiple times in some of the documents, I am simply adding a
logarithmic to it.

So, this fellow is going to give a weight, essentially if you were to understand in the
importance saliency terms. So, for a given term its TF-IDF value would be its term frequency
multiplied by its inverse document frequency. Now, Delhi may be a common keyword in
some of the documents and it is repeated multiple times.

357
So, you will see a high value of TF, but it will not let us say be giving you a low value for
IDF, so that means, salience. However, the word “the” would have a high TF, but a low IDF
because it is happening all across multiple times in multiple documents.

(Refer Slide Time: 14:50)

Now, friends let us look at a system proposed for a text-based emotion recognition. Now, we
are going to discuss a classification task. So, this task proposed by Gishe and others is for
classifying the perceived mood in blog posts. Now, the approach of the authors was as
follows, they curated a large dataset of about 815000 posts from a website called LiveJournal
and then they asked the labellers to also indicate that current mood while they are were
writing a particular blog.

They had 132 categories of mood different keywords which the labellers, they gave to their
particular blogs, how they were feeling. And now, let us look at the features which the
authors used. The first is which we have already discussed looking at the bag of the words
based representation. Now, what the authors also did is, they added what is referred to as part
of speech?

Now, part-of-speech is essentially tagging the words in your data with them being a noun or
an adjective or a verb. Because the number of nouns, the number of adjectives and you know
the number of verbs in a particular document also gives us vital meta information about the
task. Second, the authors used what is referred to as a point mutual information.

358
So, this is a feature which was proposed back in 99 and this point wise mutual information
simply gives us the degree of association between two terms. How much two terms are
related? Now, based on your point wise mutual information, the other feature which the
authors used is the point wise mutual information hyphen information retrieval.

Now, this is a work from 2001 and this gives us the probability for PMI using search engine
hits. So, you use the point wise mutual information and then you see, what is the probability
of retrieving a certain result given certain keywords based on the PMI feature which
essentially how close are two terms, you know what is the degree of association. Further they
would then go forward, combine this and then do classification.

(Refer Slide Time: 17:28)

Now, we saw a paradigm shift right, as with any machine learning system, we saw earlier
support vector machines, knife base kind of algorithms were used and then deep neural
networks they kind of came back in the early 2012-2011. Which affected vision analysis,
speech analysis and the same effect was felt in text analysis as well.

Now, to this end, we now are going to talk about representation learning based features. So,
how representation learning based features can be used to map an input text into its emotion
category. To mention one of the most popular ones which are used in the community there is
the Word2Vec, then an extension from Word2Vec called FastText and later is the GloVe in
feature representation. So, let us look at what is Word2Vec.

359
(Refer Slide Time: 18:42)

Now, friends Word2Vec was proposed by Mikolov and others in 2013. Now, in this
representation the authors leveraged the ability of a network to understand the presence of
certain words together. So, if you look at a statement, words which are in a sequence are
related to each other. So, can we use this observation, this very simple observation and learn
the vectorized representation.

Now, what that means is: We would have an auto encoder to recall what was an auto
encoder? You have a neural network, where the first part is your encoder and then the second
part is a decoder. In between here, we have what is referred to as a latent representation
which is essentially nothing but a compressed representation of the input and using this
compressed representation which we input into the decoder, we get the reconstruction of the
original input let me call it I dash.

Now, since we are talking about text and Word2Vec in specific, if we are inputting a series of
words into my encoder then I would like to learn the relationship between them and I am
going to use the latent representation, which is essentially going to give to me the vectorized
representation of the input words, so what we are doing is: We input into our network a
one-hot encoding.

So, let us say the term is Delhi is the capital of India. Now, when you have the word Delhi
under the focus, you keep the corresponding indexes 1, every other word gets a 0. Now, let us

360
say in this case of Word2Vec you say Delhi is the capital of India, right. So, you could use
Delhi is dash of India.

Now, the word capital is linked to the country which comes afterwards and it is linked to
Delhi as well because Delhi is the capital of India. Therefore, the author, they said well, let us
have two types of representations two ways of computing the representation. The first one is
referred to as your continuous bag of words. Now let us see what is that.

(Refer Slide Time: 21:32)

So, here you have your words as the input. So, you are saying Delhi is the.. I am not going to
input capital here ok. So, this word is not input, this is the input here and later it is of India
ok. Now, these are one-hot encoding vectors representation of the words, what we are saying
is?

The network would not learn the presence of this word the representation for this word by
submitting, the precursor and the content which comes afterwards the content which was
before, the content which is afterwards and that is going to give me the output for
representation of capital.

Another representation which was proposed in the same work is called your skip gram. Now,
in that you are saying well you should be able to predict the neighboring words, right. So, if
you had capital, capital city, capital of a country, city capital of a country, right. So, they the

361
words before and afterwards have a relationship with the word which is being input so, how
about trying to predict the neighborhood words.

And in both these independent pursues of your continuous bag of words and your skip grams
we are learning the representation ok. So, this is a very powerful representation friends,
which has been extensively used in predicting the vector representation for input words. Now,
there have been some famous, widely used extensions of Word2Vec and that I will give an
example to you is for example, your document 2 vector, right.

So, what is the vector representation for a document? So, again that is based on the concept
of Word2Vec. Now, recall we wanted emotions, right. So, what will that mean? You take the
vector representation for each word using Word2Vec pre-trained network, then you do the
pooling as we have done, discussed earlier and then you can learn a machine learning system
to predict.

(Refer Slide Time: 23:52)

Now, another representation which I would if you like to mention to you is friends, the global
vector for word representation proposed in 2015 ok. In short popularly referred to as GloVe.
Now, this is an unsupervised technique to learn the representation and simply it is based on
creating and then little learning the word to word co occurrence matrix.

362
So, the authors created a matrix which is telling us, that what is the probability of occurrence
of words together based on that, we learn a network and we are going to get a representation
which I am going to be able to use later.

(Refer Slide Time: 24:42)

Now, let us look at one example where this representation learning is used. So, friends we
have seen bag of words ok, we know this term bag of words now. So, from bag of words
there is a an extension called bag of concepts. So, I do not want to input individual words or
your bigrams or trigrams that is 2 words together, 3 words together as 1 unit, but I want to
input into my system a vector which is created from a bag of concepts, right.

Now, let us look at one such work proposed by Kim and others, this you have the raw text
data coming in you extract the representation using Word2Vec. Now, that gives you for a
document, the representation for each word. What we do is, we then compute a clustering on
the representation which was extracted from Word2Vec and later we do this weighted scheme
similar to TF-IDF which we have seen, right.

So, we are actually looking at the importance of each word. Now, notice in the plain vanilla
bag of words, when you are doing vectorization of an input sample you would increment the
bin in your histogram during the vectorization by let us say 1, when a particular new word
arrives you have checked that word.

363
In this case we want the histogram to be a weighted accumulation of the content in a certain
bin, right. So, you are doing your TF-IDF and then you are getting a representation which
you can further use for predicting the emotion.

(Refer Slide Time: 26:44)

Here is another system, friends, on similar lines. Now, this work by Kratzwald and others that
is deep learning for affective computing, text based emotion recognition in decision support.
So, how we can use the affective states for predicting? The you know decisions which a
machine could take and also assisting the user.

So, what do we have here? We have set of documents as training data. We have a pre-trained
network which is learned, trained on a sentiment labelled corpus. Now, let us look at the
features, the authors first are extracting bag of words and word embeddings from your
pre-trained network.

Then you are predicting the network with the features you are doing feature fusion here for
these two and then you are predicting the emotion. In parallel you use the word embeddings,
which we have seen how they are extracted in the earlier slides. Then you extract the feature
from this transfer learning based model concatenate and then you have a recurrent neural
network.

You combine the outputs together, so you do the decision fusion and that gives you the
affective states. Now, one thing to note here friends, I will like to draw your attention here is

364
for the second part ok, for the second part here which is; when you have the word embedding
and the features from transfer learning input into a recurrent neural network. What is
typically happening in your RNN? Let us say the statement again which we have been using
right; the example, Delhi is the capital of India.

(Refer Slide Time: 28:41)

Now, when you are having a RNN what you are saying is essentially, you get a feature
representation for this word you input it to the cell and along with that you have the feature
representation of the second word available to the cell. So, there is a sequence which is being
followed, right. So, this you could say that the sequence in which the words are arriving in
my statement, I am learning that pattern in recurrent fashion in the recurrent neural network
way, right.

So, you have the cells you are inputting from the prior and to from the current representation.
Now, this has been very commonly used in natural language processing, you know the
recurrent neural networks, your LSTMs, Bi-LSTMs and they are processing that information
sequentially.

365
(Refer Slide Time: 29:23)

Now, the community slowly has started moving to other approaches where; we are saying
well we do not need sequential approach. But how about if I was able to parallelize and these
are also referred to as your non-regressive models and after this I will show you few
examples of that as well ok.

Now, let us look again at another system proposed by Shelke and others, which is using social
media data as input to predict the emotion from the social media post. Now, notice in social
media post, you are not only going to have the text or you could also have emoticons.

So, in this very interesting work the authors are first pre-processing the input text from social
media post these are the different platforms from this they are fetching the information. You
see they are doing the tokenization that is word level individual unit creation, they are
removing the punctuations; they are removing the stop word during the stop word removal,
removing URLs and also doing lemmatization.

Now, for the emoticons, they are also adding labels to it as well. For example, they say well
you see the joy emoticon, let us say you know the representation for that is a one. And for the
text they are actually using another mood analysis system it is called the Depeche mood
emotion analysis.

So, they combine, it extract features from the emoticons from the text and then they are doing
a machine learning based ranking. They rank the presence of the emoticons along with the

366
text and then input that into a deep neural network to predict what is the emotion, which is
perceived from let us say post on Facebook or on Twitter and so forth.

(Refer Slide Time: 31:10)

Now, let us look at another work friends, in this work the authors Batbaatar and others, they
proposed a work called semantic emotion neural network for emotion recognition from text.
What do we have in this work? We have the input words; we are extracting two types of
embeddings from this work.

The first is a we have a pre-trained semantic encoder; it takes one word as an input at time
and gives you the representation which is relating to the semantic information. In parallel the
same word is input into an encoder which is giving us a representation which has the emotion
representation. For the earlier one, there is a recurrent neural network and we get the fully
connected layer.

In the emotion channel we are using a CNN based representation, we fuse it, get the hidden
network and then we are fusing this, concatenating. So, what is happening here? We are
getting your feature fusion and then we are predicting the emotion class. So, what did we
achieve here? We had a semantic pre-trained representation, we had a emotion representation
and then we are fusing it together.

367
(Refer Slide Time: 32:28)

Now, as I was referring to you earlier, about these non-regressive techniques coming right,
where we could input the data in parallel. After Word2Vec type of representations, the
community has moved to attention based systems. Now, attention is simply saying an input
statement comes in, what is important, what is not so important and how do you learn that
well there is a whole mechanism and I am writing it here for you, to check out separately.

So, there is a paper called ‘Attention is all you need’. This is a seminal work proposed to
2017. which discusses how attention can be applied to a network for text task particularly in
this. And this has given birth to neural networks, which are referred to as transformers. Now,
these are non-regressive methods that is, the tokens which are input into it, the words which
are input into it are parallelly taken care of.

And similar to Word2Vec, the transformer based representation which is extremely common
now in natural language processing and hence for natural language processing based emotion
prediction is called BERT. So, this is your Bidirectional Encoder Representation which is
based on a transformer architecture.

So, here is a work by Huang and others, what they are doing here is? They are modeling the
utterances ok, so they are doing utterance pre-processing and they are looking at the
personality tokenization. So, this is the pre-processing part, they input it into a pre-trained
BERT model, which is called a friend BERT. And then the second one from the emotion

368
corpora, they are input into a chat BERT. So, these are of course, you know as the name
suggest, have been trained on different types of data.

The representation which they get, they are actually using that to do a pre-training on the
input data and they are using a masked language model and a next sentence prediction what
would come afterwards and for the output coming from the chat BERT, they are using it for a
emotion pre-training for the Twitter data. Data, they do fine tuning and then they are doing
the evaluation ok. So, you can use these BERT base representation nowadays as well.

(Refer Slide Time: 34:51)

Now, here is another work. So, this is again by Kumar and others, it is called a BERT base
dual channel explainable text emotion recognition system ok. Now, notice it is explainable
ok. Now, you have the input representation for each word. So, these are the tokens which you
are inputting into your BERT module.

You get the feature embeddings from the BERT module, but do we have data? We have a
RNN-CNN model and a CNN-RNN model this is just to exploit the temporal relationship, we
concatenate the feature output and then we have a classifier. Now, this classifier is telling us
one of these four states: anger, happy, hate or sad.

Further the authors also propose an explainability module, which is looking at the intra
cluster distance, inter cluster distance of the outputs to explain, why we have anger or happy

369
as a particular label which is predicted. So, notice how the transition has happened for text
based emotion recognition systems.

In the beginning we were using bag of words, bag of concepts based representation. And now
we are in 2023 we are using systems such as BERT which are based on transformers.

(Refer Slide Time: 36:08)

Now, let us change the conversation a bit friends, we have been talking about emotion
recognition in text when the text is coming from, let us say a blog, a document, a tweet or a
post on social media. There is another dimension, rather very important dimension to emotion
analysis, that is during conversations. Conversation could be between two humans. So, a
human conversation, it could be a human machine conversation, it could be human machines
conversation or humans and machines conversation.

So, as the conversation happen, the emotion changes the intensity changes. Now, what we
have here on the screen is an example from Poria and others ok. What you see is a dialogue
between two characters, now this is coming from a very popular sitcom, when what you see
as the dialogue proceeds. So, here is the time axis you would notice that the subject is
showing different emotions, right.

Here this fellow Chandler is showing joy, then there is a neutral, then there is a surprise after,
then input coming from this person so this he said something, then this say this guy Joey says

370
something, now emotion is elicited and you know this is actually showing surprise for
Chandler and so forth.

So, you see how for both the subjects the emotion based on the text that is varying here is
right. So, that means, for emotion recognition when conversation is happening, we would
require a more dynamic approach. We require an approach which would be utilizing, what the
other person said and what the user under focus replied back, right. And you could use a time
series information as well to get the context, the long term context.

(Refer Slide Time: 37:58)

Now, here is an example of a work, where emotion recognition is performed during a


conversation. This work by Hu and others is titled as dialogue CRN, the contextual reasoning
networks for emotion recognition during conversations. What we have friends here is the
timeline and the input conversation between individuals.

What we are doing is, we are extracting the feature representations, which are getting us the
situation level context and also the speaker level context again these are the feature
representations. Then the authors they are proposing utterance module, you know this is
extracting the situation in which the conversation is happening again these are using the
neural networks.

In parallel we are analysing the speaker level clue as well, that is individually what a person
said. This information is fused, then there is a classifier to predict what is the emotion during

371
a conversation, right. So, two takeaways from this analyse the situation, analyse the content
which a particular current person is speaking. Situation is going to vary, based on how the
conversation is happening and the conversation is also going to be affected by the situation
and the context where a person is, what are they speaking and so forth.

(Refer Slide Time: 39:22)

Now, here is another work by Yeh and others, this work is an interaction aware network for
speech emotion recognition in spoken dialogues, alright. Now, in this what do you see here
some conversation text which is there over time, So, you are extracting the utterances, here
you have a GRU you know you are a current networks.

And you are also adding attention again this is coming from the work which I was referring
to attention is all you need. Then you are looking at the utterances of the speakers in parallel.
So, you know you have M and F let us say male speaker and female speaker and then again, a
bi-directional GRU extracting the representation concatenating and then predicting the
emotion.

Notice how as compared to text based emotion recognition the architectures are changing
when there is a conversation. Because one person speaks, then the other person speaks, right.
So, the system needs to not only analyse the content spoken by one subject at a given time,
but in parallel also look at the conversation happening together as well.

372
(Refer Slide Time: 40:34)

Here is another work friends, Lian and others 2019 which is called domain adversarial
learning for emotion recognition ok. What you have here is you have the utterances as input,
now this is text and audio, which are coming in now you would wonder why text and audio.
Well, it is possible that the conversation is happening in the voice modality you do speech to
text and you get the text which was being spoken ok.

Now, notice how for the different utterances, the conversation is being mapped for its
dynamic through a GRU then you are adding again a attention layer and what we are doing is
at different times, we are predicting the emotion and the speaker level was speaking emotion
and speaker level and so forth.

(Refer Slide Time: 41:21)

373
Now, friends this was a brief introduction to the very wide variety of works, which have been
proposed in the literature for emotion recognition during conversations. I also invite you to
look at two of the survey works. Now these are very detailed survey works which are looking
at the different aspects of affect prediction using text, if you are interested in going deeper
into this area.

So, friends with this, we come to the end of the today's lecture, we discussed about the
different features which have been proposed in the literature for analysing the text which is
essentially to create a vector representation, we started with the bag of words with
representation, we talked about what is n-gram, then we looked at what is the concept of the
TF-IDF and later on how we can have a bag of concept.

From this we moved on to, that how with the progress in deep neural network, we are using
your representation learning in the form of pre-trained networks such as Word2Vec. From
that the community has moved to attention based systems. Wherein now we are using
transformer like architectures for predicting the emotion. And in the same context we moved
a bit forward when let us say a conversation is happening.

When you see a machine and a person interacting or two human beings interacting, there is a
very dynamic play of emotion which is happening. Therefore, the system needs to not only
understand the individual person's utterances what a person said, but also look at how the
relationship between the utterances that is being analysed.

Thank you.

374
Affective Computing
Prof. Jainendra Shukla
Prof. Abhinav Dhall
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi

Week - 06
Lecture - 20
Tutorial: on Emotion Recognition using Text

Hi everyone, I welcome you all in this Affective Computing Tutorial on Emotion Recognition
using Text. In this tutorial, we will work with uh, text data and try to see how to extract
emotions from the text information.

(Refer Slide Time: 00:39)

So, directly starting with the dataset, for this tutorial, we will be using this DailyDialog
dataset, which is essentially a manually labelled multi-turn dialogue dataset. So, this dataset
contains basically 13,118 multi-turn dialogues and dialogues in this dataset reflect daily life
communications. So, one can easily download this dataset from this link and this data is
published under creative common license.

375
(Refer Slide Time: 01:09)

So, talking about the dataset file information, dataset will contain a couple of files. In these
files, we will be basically interested in dialog underscore text file, which essentially contains
these transcribed dialogues. And second file of our interest will be dialogue underscore
emotion dot text, which contains the emotion and rotation in the original dialogue underscore
text file.

Dialogue annotation will look something like number from 0 to 6, where number 0 will be
representing no emotion, number 1 will be representing anger, number 2 will represent
disgust, and 3 as a fear, 4 happiness, 5 sadness and 6 surprise.

376
(Refer Slide Time: 01:55)

Now, let me give you a very a brief overview of this tutorial. In this tutorial, we will be
performing following experiment on Google Colab. First, we will start with data preparation,
where we will be reading text file from Google Drive and we will be cleaning these text files
in terms of removing HTML tags, removing non-alphabetic characters, extra white space and
removing stop words.

Later, we will be using these common feature extraction method from text data, namely, bag
of word, TF-IDF and Word2vec (Refer Time: 02:34) And after extracting these particular
features, we will be performing our emotion classification using machine learning classifiers.

377
(Refer Slide Time: 02:45)

The coding part, we will start with importing all the essential libraries and the code will look
something like this. After importing libraries, we will define our data and our label path.
Code will look something like this. So, after defining our path variable, we will write the
code to read all the text files from our Google drive and save them into Python list.

For that, I will write a function which will look something like this. So, you can simply pause
this video and try to iterate through each line in this code.

(Refer Slide Time: 03:44)

378
And now, we will simply call this function and our data will look something like this. So, we
got input data and corresponding labels over here. Maybe I can show you couple of data
instances also, so it will look. So, very first line in our data set is ‘the kitchen stinks’. Let me
show some other line, maybe the second line. So, second line saying, ‘I will throw out the
garbage’ or maybe I can show some random line also that will line add position 67. ‘May I sit
here?’ ok, So, he is asking a question over here.

So, these line belong to conversation and these line are annotated by annotators into some
emotion classes. So, I can show you the emotion classes also. So, here you can see like each
line belonging to some sort of a emotion class over here like zero belong to neutral. So, after
reading these text files, our first task will be to clean these files.

Most of this text might contain some sort of a noise in terms of punctuation marks, maybe
some sort of a hyperlinks. So, before our analysis, we will consider to remove all these sort of
a potential noises in our text data. For that, I will write a function and my function will look
something like this.

(Refer Slide Time: 05:56)

So, my function is clean text where I will be passing text into this function and it will be
trying to remove any sort of a HTML text, then it will remove any sort of a non-alphabetic
characters and it will remove any extra white spaces in the text. And in final iteration, it will
simply convert my text into lowercase and give it back.

379
So, let me run this cleaning function on my whole data. So, my data is now clean, so maybe I
can show you couple of text instances and try to see the difference between original data and
the clean data. So, as we can see here, we have no extra spaces like this and our full stop sign
is also removed and all the text is in lower case.

Maybe I will show you (Refer Time: 07:05) also. Here again, you can see like all the
punctuation marks are removed and all the text is in lowercase only. So, after performing
cleaning operation on our text data, we will start with feature extraction. So, our first feature
will be bag of word which we will be implementing using count vectorizer.

So, count vectorizer is basically a tool used in natural language processing to convert words
in a document into a numerical representation. And this numerical representation can be
easily understand by a machine learning algorithm. It basically counts the number of
occurrences of each word and create a table with count for each word in the document. And
this table can be used for various tasks are like as text classification, maybe sentiment
analysis, topic modelling.

So, to perform our bag of word based feature representation here, we will be using intrinsic
function in a sklearn library known as count vectorizer. So, you can see the code will look
something like this. We will use this count vectorizer function and we are also passing
argument calls stop word equal to English. So, this argument basically will enable a count
vectorizer to remove all the possible stop word present in English language from our text.

And then after removing the stop word, we will make this BOG bag of word representations.
If you want to remove some particular sort of stop words, you can also pass a list of words in
this argument. So, as we can see count vectorizer has converted the text representation into
some particular vector representation and I can see here the dimension of that vector
representation.

Now, I can use this representation to learn our machine learning classifier. So, in this case, we
will be using, multinomial Naive-Bayes algorithm classify the particular emotional classes.
So, before that, we just need to divide our data into respective train and test set. For that, I
will be using basic train test split function from a sklearn and let us say the test size is 33
percent.

380
So, we have divided our data into respective train and test sets. Now, I will be using this
multinomial, Naive-Bayes classifier and try to see our classification results ok. So, here I can
see that we are getting a train score of 84 percent and a test score of 81 percent given that we
have a 6 class classification problem and the chance level will come around 16 percent., So,
84 percent train accuracy and 81 percent test accuracy is a good score here.

So, after using this count to vectorizer, maybe we can try another sort of feature
representation technique known as TF-IDF. So, TF-IDF is basically a; is basically a better
version of this count vectorizer. TF-IDF takes into account the importance of word in a
document by multiplying the word count by inverse document frequency.

So, what is this inverse document frequency? IDF is basically calculated as logarithm of ratio
of total number of documents to the number of documents containing the word. This way,
words that are frequent in a particular document, but rare in the corpus as a whole are given
more weight weightage. And this approach can help to better capture the meaning and
importance of word in a document.

And in general, TF-IDF is considered to be a more advanced and more effective techniques
than count vectorizer. So, to use this technique, I will be again using another function from a
sklearn library and our code will look something like this ok. Now, we have gotten those
TF-IDF representation over here.

So, we can try to train our classifier on these sort of features and try to see, do we get any sort
of improvement in term of our emotion classification accuracies. For this, I will again divide
my data into train and test splits with TF-IDF as a input over here.

381
(Refer Slide Time: 12:02)

And I will be reusing my multinomial code and try to see, do I get any sort of better
performance here? So, in this case, we can see, we get slightly better test accuracy over here.
I will also like to see, this change in a classifier has any sort of effect in our performance.

So, for this, I will be using linear SVC, which is basically a support vector machine with
linear kernel. Here, we can see that we get slightly better train score, but test score is
somewhat similar. So, after count vectorizer and TF-IDF representations, we will move to our
third type of representation, which is called Word2Vec.

So, Word2Vec is a type of neural network model used for natural language processing. It was
developed by researchers at Google in back in 2013. And the purpose of Word2Vec is to
create words embedding, which are numerically representation of word that can be used in
machine learning models.

This model, I mean, this Word2Vec takes a large corpus of text input and produce a vector for
each word in the corpus. These vectors are designed to capture the semantic relationship
between the words. So, in our case, we will be using a pre-trained sort of Word2Vec model.

So, we will download this pre-trained model using Jensen API and the code will look
something like this. So, downloading this code may take a good amount of time because this
model will be a little bit larger model. So, please have some patience. So, after downloading
our model, we will write a function, a name as prepare Word2Vec.

382
So, this function will basically prepare training data consisting of vectors from Word2Vec
model and the code will look something like this. And after that, I will simply create my
Word2Vec representation by calling this function.

(Refer Slide Time: 14:40)

And this code also might take a little bit of time ok. Now, my Word2Vec representations are
created. I will simply use my train test split over these Word2Vec representations and later I
will simply learn my machine learning classifier. And the code will look something like this.

Let us see how linear SVM classify our motion classes using these Word2Vec
representations. So, as we can see this Word2Vec representations give us a very nice score
using linear SVC. So, concluding this tutorial, in this tutorial, we explored a public dataset
for classifying different sort of emotions.

We started with cleaning text data by removing HTML tags, non-alphabetic characters and
extra-white spaces. Then we used our three different sort of feature extraction method, bag of
word, TF-IDF and Word2Vec and we used machine learning based classifier for classifying
these emotion classes.

383
Affective Computing
Prof. Jainendra Shukla
Prof. Abhinav Dhall
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi

Week - 07
Lecture - 01
Emotions in Physiological Signals

Welcome friends, so in today's class we are going to look at the Emotions in Physiological
Signals, a very interesting and in fact very dear to my heart kind of topic. So, here is the
Agenda for today's class.

(Refer Slide Time: 00:38)

In this class, we are going to first look at why we are we have we are interested in looking in
the emotions in physiological signals and what are some of the relevant physiological
signals? We will be looking at the emotions in heart and not only metaphorically, but also
physiologically and also, we will try to understand the emotions in another very popular
physiological signal which is known as the skin conductance.

And finally we will also look at the emotions in EEG signal, which is commonly known as
the brain activity signals. And finally, we will conclude with the discussion of some
additional physiological signals, that can also be used for the recognise for the emotions for
understanding and analysing the emotions; perfect.

384
(Refer Slide Time: 01:27)

So, let us start first with the trying to understand the Physiological Signals itself.

(Refer Slide Time: 01:32)

So, first thing that we have to understand that the physiological signals, they originate from
the activity of the autonomic nervous system and hence it cannot be triggered by any
conscious or intentional control. Now, without making so many jargon, making use of so
many jargon words, what essentially it means that it cannot be controlled or faked or it cannot
be consciously produced. For example, the way we can do with the facial expressions and in
since it cannot be consciously or intentionally controlled,

385
So, what it means that you cannot fake it. I mean, we cannot easily at least we cannot easily
fake the emotions that are getting produced in the physiological signals. And in fact, that is
what has given a rise in the popularity of the emotions in a physiological signals and it is
analysis. So, other important characteristic related to the physiological signals and emotions
is that it does not really require the user to pay a lot of attention to provide the data of the
emotion that is there in the physiological signals.

So, for example, if you want to get the emotions in facial expressions, you may have to ask
the participant or the user to you know maybe look at in a particular direction, there should be
certain standard setup. And this all will require the user to pay a bit of attention to you know
how the user is sitting, how the user is looking at the camera and so on and so forth.

But in this case, nothing of this sort is required. And for the same reason, it is very very
helpful even to get the emotions among the individuals those who may have some sort of
attention deficiency as well. So, that was the second advantage. And more importantly, why
suddenly there is another reason why it has become very very popular to analyze the
emotions in physiological signals is because of the advancements in the wearable
technologies.

So, from the recent advancements that we have seen over one or two decades, the it has
enabled the acquisition of the physiological signals such as the ones that we are going to
discuss now in an un-intrusive manner and also, I would say in relatively low cost fashion.

So, basically this low cost and un-intrusive possibility of capturing the physiological signals
in a low cost using a low cost device and in an un-intrusive manner, has what also given the
rise to the popularity of the emotions and its analysis in physiological signals.

386
(Refer Slide Time: 04:13)

So, these are the some of the physiological signals that are commonly used when we try to
understand the emotions right. So, for example, the very first one is the blood pressure. So, I
believe the blood pressure is something that we all know. And of course, while it may not be
very, very expressive, the emotions may not be very very expressive in the blood pressure.
There has been certain studies which has looked the emotions in the blood pressure.

Of course, then the other thing is the electrocardiogram. So basically electrocardiogram, it


relates to the cardiac activity of the human. So, as you can see in the right side diagram, this
particular refers to the cardiac activity of the humans and from this cardiac activity, which is
from the ECG using the ECG signals, we get the heart rate.

And then with the heart rate we can get the emotions that are associated that are expressed in
the heart activity of the humans. Other is very, very popular method is again is the
electroencephalogram. So, basically electroencephalogram is also known as the EEG signals
as you can see on the right side of the screen.

So basically, EEG as you can understand that basically to analyze the EEG, you need to we
attach a cap kind of thing on the head of the humans. And then we try to with the help of
certain electrodes, we try to capture the brain activity of the individual and through which we
try to analyze the emotions.

387
So, that is the EEG signals. Other thing is the Electromyogram. So, basically
electromyogram, it basically refers to as you can see here. So, this is the what is referring to
the electromyogram. So, basically electromyogram, it refers to the activity of the muscles
right.

So basically again, it involves attachment of some electrodes to the body to the muscles and
through which we try to get a certain expression of the body posture through which we can in
turn get the understanding of the emotions. Not so, popular method when it comes to the
emotion analysis, but nevertheless this is another option. Another very popular method is the
Galvanic skin response.

So, basically galvanic skin response as you can see here in the right side of the image.
Galvanic skin response, it refers to the skin conductance. It is also known as the skin
conductance or the electrodermal activity. It refers to the skin conductance or the conductance
of the skin as the name implies of the human body through which we can understand certain
components of the emotion such as arousal and so on and so forth.

So, again we will look at it in a bit more detail and nevertheless of course the other
component is the respiration. Respiration is of course is the respiratory activity of the
humans. So, we all have seen that you know the respiration itself may change depending
upon the type of the emotions that we are experiencing to certain to do a good extent and
hence it has also been used in research, in literature to analyze and understand the emotions.

Nevertheless, again last, but not the least is the temperature of the human body. So, the
temperature of the human body, it has also been found to be correlated with certain
expression of the certain types of emotions. So nevertheless, I mean this is by no means this
is an exhaustive list, apart from this for example you can also look at the EOG signals which
is the activity of the human eyes.

Similarly for example, you can also look at the gaze patterns by looking by making use of the
eye tracking devices and so on so forth. But nevertheless, for the sake of this course, mostly
we will try to stick to the ECG signals. We will look at the EEG signals, we will look at the
GSR and of course by the virtue of the ECG, we will also look at the heart rate right. So,
these are the few signals that we are going to analyze and more or less many of the findings
can be extended to the other signals as well; perfect.

388
(Refer Slide Time: 08:02)

So, with that, let us try to look at the emotions and how it is being expressed and how that can
help us in the analysis of the emotions in the heart rate; perfect.

(Refer Slide Time: 08:12)

So, about the heart rate, we all the reason of course the heart has been shown to be
metaphorically correlated with the emotions in so many ways, in poems, in literature, in arts,
in movies.

389
But, the physiology says that it does not only relate to the emotions in a metaphorical way,
but definitely it relates to in a physiological manner and the reason that it relates, and one of
the reasons that it relates to the emotions is that the activity of the heart. It also interacts with
the brain and ultimately it impacts how we experience an emotion.

And one of the best example of this particular is that imagine that you are giving a
presentation or for example, you are doing a lecture such as myself now and you are trying to
recall something that you have to present during the presentation in front of an audience or
for example, even in an online setting.

So, and if you are notable to recall that, then imagine what happens? Imagine it is a big
audience; it is a very important presentation. So, what happens that suddenly you start feeling
anxious, you start feeling aroused and then you know you start feeling sweaty actually.

You know the is like that there is a proverb that you know like you start feeling sweaty here,
sweatiness, lots of sweatiness and that is how you know. So, in this particular activity, if you
try to analyse it, then using your brain, your while your brain is trying to recall some stats,
some facts.

At the same time, your heart is getting the signal from the brain that, ok, maybe it is not able
to recall and then since it knows the importance and the brain also communicates the
importance of the presentation for you. So accordingly it tries to, it emulates the experience
that we are having as of now in front of the audience and hence, we start feeling sweatiness
or even our heart rate starts increasing, right.

So, that is one very good example of how the activity of the heart, it also correlates with the
activity of the brain and how overall it impacts the emotions that we experience. Specifically,
the heart rate usually has been shown to have a very good correlation with the arousal.

If you recall, when we were looking at the arousal and the valence and the dominance model,
VAD or the PAD model, which is the dimensional model of the emotions, then arousal was
the component which was sort of giving you the energy that was there in the present in the in
an emotion.

So basically, this heart rate has been shown to have a good correlation with the arousal. And
hence, it can help us in so many ways in trying to understand the emotional state in a.. in a

390
very good detail. Overall, it is not hard to realize that the heart rate has been shown to be a
good indicator of the overall physical activation and the effort.

So imagine, you know, if you are simply walking, your heart rate is steady and smooth, but
the moment you start running, of course you can immediately notice an increase in the heart
rate activity, right. And for the same reason, it is not very hard to understand how it is can be
an indicator of the overall physical activation and the effort.

Nevertheless, it has when it comes to the emotional activity, it has also been found to be
correlated with the fear, panic, anger, appreciation, etcetera right. So, these are some of the
feelings and mostly while trying to analyze this thing, we try to look at the emotional arousal
from the.. using the heart rate.

(Refer Slide Time: 11:51)

Now, having understood that how heart rate can help us in analysing different emotional
states, particularly emotional arousal.

Let us try to understand how can we measure, how the heart rate is measured so many of you
have, we have seen the use of the ECG. For example, the electrocardiography,
electrocardiogram for measurement of the heart rate, so basically ECG you may have seen
that it is the way ECG measures the heart rate is it monitors the electrical changes on the
surface of the skin.

391
And many cases, what happens that there are multiple electrodes, usually 4 electrodes, which
are placed on the four different parts of the human body. And for example, one particular
scenario where it is placed on the palm, such as if you can see in the image. So for example,
here one electrode on the right arm, one electrode on the left arm, one electrode on the right
leg, similarly another electrode on the left leg.

So, this is what a four electrode configuration is, that is commonly used, similarly other
configuration that can be used. For example, one electrode here, one electrode here, one
electrode here, one electrode here and this kind of configuration is also used to get, to
monitor the electrical changes that is there on the surface of the screen through which we try
to observe the heart rate.

We try to get the heart rate. Now of course, the question the next question that you would like
to ask, that the amount of the electricity that we observe on the human skin is it too much and
of course it is not too much. We are not a walking electrical transformer right. So, the even
though the electrical current is very very small, is in microvolts.

But the electrodes that we use in the electrocardiography are designed in such a way that it
can pick this even the small changes in this electrical, small electrical changes on the skin in a
reliable fashion and that is how we measure the heart rate of the human body. Another very
popular method is the use of the PPG signals, which is also known as the photo
plethysmography.

So, in the photo plethysmography or the PPG signal, what we do? We measure the pulse
signals at the various locations on the body. And usually for example, it is very common to
attach a PPG clip on the lobe, on the fingertip or for example even on the ear lobe or for
example on either capillary tissue such as you know on the legs as well.

So, if you look at this diagram on the right hand side, the way a PPG clip is attached. What it
does? It makes use of the tri sensors, which usually consists of some LEDs, some IR LEDs
and some photo diodes. So, basically what happens that the IR LEDs, it emits the it emits
light

And then of course, depending upon whether the blood is actually flowing through the veins
at that appropriate time or not, the amount of lights they get absorbed by it. And then

392
accordingly, whatever is the remaining light that is collected by the photo diode that is
beneath this PPG clip.

Now of course, depending upon how much light it has emitted and how much light was
collected, it could really analyse that ok, whether the blood was flowing or not. And of
course, when the of course, when whether the blood was flowing or not. What it means? that
of course, when your heart pumps the blood, I mean it has to go at certain point, then of
course then it stops for a very few amount of time, then of course, it pumps and then there is a
rise in the blood flow right.

So basically, this is how these are the two configurations in which this IR LED and the photo
diodes are placed in the PPG clip and that is how it measures the flow, blood flow and
accordingly tries to estimate the heart rate of the human body. And this is the same kind of
PPG clip you may have seen, which has been very, very popularly used for example, in the
oximeters as well.

So, you may have seen the oximeters and then you simply place the oximeter in one of your
fingers, fingertips and then you get the heart rate from there. So, this is again and as you can
see that this is of course the attachment is much quicker than in comparison to the ECG
setups. In the ECG setup, you need to use this 4 electrode system, it is kind of very intrusive.

And of course, in comparison this PPG clips, they are relatively easy to use and they are of
course less bothersome or cumbersome for the participants and hence they are a bit more
popular for the analysis of the heart rate signals than in comparison to the ECG. But we will
talk down the line that of course, it compromises the efficiency of the how effectively is
calculating the and the precision of the heart rate essentially. So, but nevertheless, it is a
popular choice for the analysis of the heart rates.

393
(Refer Slide Time: 16:36)

So, these are the two popular methods through which we observe the heart rate in the
physiological experiments and then of course, now once we have observed and obtained the
heart rate then there are certain parameters that we can use to analyse the effects of the heart
rate right. And of course, one simple parameter that we can talk about is the heart rate itself.

So, basically the heart rate, it is typically expressed as the beats per minute and as we already
understand by heart rate what we simply mean, we simply mean the frequency of the
complete heart beat from its generation to the beginning of the next within a particular time
window right.

And then we have already seen that an increased heart rate for example, it typically reflects
increased arousal. So, as I was giving the example, if you are giving an presentation in front
of an audience and you are not able to recall certain very important fact or some stats that you
want to present and immediately you will feel aroused and you will feel aroused and
accordingly you will feel that there is an increase in the heart rate; sorry, other and of course
this is the raw data that you have. Other way commonly used data is the inter beat interval.

So, basically the inter beat interval as the name itself suggest, rather than looking at the how
frequently the heart beat is occurring, what it simply looks is at the time interval between the
individual beats of the heart rate. So, for example, one beat another beat. So, what is the time
interval between these two beats? And usually, this time interval is measured in units of
milliseconds. Just one sec [FL].

394
So, usually it is measured in units of milliseconds as opposed to the frequency that as we see
in the heart rate. But these 2, as you can see this is the raw data itself and the raw data itself
many times it does not give you a lot of information. Hence, there is another very popularly
used parameter which is the heart rate variability.

Which and the heart rate variability basically as the name itself suggest, it expresses the
natural variation of the inter-beat values from the beat to beat. What it simply means as you
can see in the diagram that is given below. So, for example, this is giving you the, this
particular diagram is giving you the data of the 2 and a half seconds of the heart rate,
heartbeat data right.

So, if you can see this is the one particular heartbeat, I mean it is so basically this is a this is a
typical ECG signal, how the typical ECG signal looks like. So, in the typical ECG signal
apart from the P and the T waves, so you have this QRS wave which is the central and most
visually obvious part of tracing the ECG signals.

So, basically in the QRS signal what you have that it represents the depolarization of the right
and the left ventricles of the heart and at the same time it represents the contraction of the
large ventricular muscles. So, basically this is the most important part of the ECG signal that
we use to analyse the ECG abnormality and so on so forth.

So, now, ok and together this QRS signal it occurs together as an event and hence it is
referred to a single QRS waveform. So, in the Q so basically one heartbeat you can represent
it as a single QRS waveform. So now, the this one single now the intermediate interval
essentially what it represents, it represents the distance between the 1 R interval from the 1 R,
of the 1 waveform and from the another R of the next waveform right.

So, basically the inter beat interval is represented as let us say 859 milliseconds here and then
70, of course it can be translated to the beats per minute which is of course, you simply have
to divide the if the inter beat interval is the 793 milliseconds, then how much is going to be
the beats per minute.

And then for example if you take the example of the 793 that if the inter beat interval is the
793 milliseconds then what is going to be the beats per minute. So, it simply has a matter of
time, you know so you can simply take that ok 1 second is equal to 1000 milliseconds and
then there are 60 seconds per minute. So, basically if you divide 60 into 1000 divide by 793

395
here then you will get something around 76 BPMs around 76 beats per minute and that is
how you calculate how much is the beats per minute right.

So, basically this heart rate variability what it essentially gives you it gives the variation that
is there between the inter beat values from beat to beat or from 1 R wave to another 1 R peak
to the another R peak. So, for example, you if you can see that while the first one was 859
milliseconds the inter beat point interval for the second it was the 793 milliseconds and for
the third its 720 milliseconds.

So, there is a variation between these between the times at which these beats are occurring
right and it turns out that the heart rate variability is a very important parameter when it
comes to the analysis of the emotional arousal as well as the emotional regulation.

(Refer Slide Time: 22:31)

So, for example, it has been found to that heart rate variability to decrease under the
conditions of the acute time pressure and the emotional stress. What it means, that the heart
beat is.. heart beat is more consistent when we are under a lot of stress.

Now, that may sound a bit unnatural, but what it simply means that you know whenever you
are having a low heart rate variability, then it simply means that you are not able to cope up
with the stress, because in order to cope up with the stress or in order to cope up with some
physically arousing event that is happening around, you need, For example, you need to have
more supply of the blood from your heart and accordingly the heart rate should pump at a

396
faster rate right. Accordingly for example, let us say if you are relaxed it should be pumping
at a lower rate. So, there should be a good variations in the heart rate at which the heart rates
are coming, but when there is no variation low heart rate variability means there is no
variation or there is very less variation in the heartbeat.

And accordingly, what it means that you remain stressed, you remain aroused and you are not
able to cope up with the conditions that are happening around you. Similarly, in contrast if
you have the higher heart rate variability. So, not the higher heart rate, but the higher the heart
rate variability.

So, higher heart rate variability has shown to be correlated with the relaxed position, what it
simply means that your body has a strong ability to tolerate stress or is already recovering
from a prior accumulated stress. So, and it has been found you can see you will see n number
of studies you can come across, n number of studies which has consistently shown that higher
heart rate variability is of course, desirable.

Now, I think you have understood that, but more importantly if you have the more.. healthier
you are, at least the more healthier your heart is, the more healthier you are at the emotional
level, the higher the heart rate variability you will have. Not the higher heart rate, but the
higher heart rate variability you will have.

(Refer Slide Time: 24:39)

397
And it has it is such powerful source of information when it comes to the analysis of the
emotions in not only the emotional arousal, but the emotional regulations. So, we will see a
very beautiful example here. So, for example, imagine that there are two conditions one is the
condition A, we will call it this is the condition A and then we will call there is another
condition B.

Now, of course, the waveform that we are seeing here what it is giving you, it is giving you
let us say the ECG data for certain condition A for a certain period of time and for the same
period of time more or less the this B is representing the ECG data for the condition B. Of
course, you can see now that this is the this is what this is nothing but this is the QRS wave
right. So, you can see that this is the Q this is the R this is the S.

So, basically for the analysis purpose as we say we simply look at the and the R to R interval
the distance between the R to R beats and accordingly we try to analyze the beats per minute
perfect. So, now let us say these are the two conditions A and B and from the look at from
looking at the heart rate we are not able to make much sense out of it. But of course, since we
have this intermediate interval what we can do we can calculate the average R, R distance for
the condition A and as well as for the condition B.

Now, I hope that this is not hard to understand that how did we calculate. So, for example, if
you have to calculate the average for the condition A, then you can simply do it as a for
example the average R R A will be represented as 744 plus 427, so 744 plus 427 plus 498
plus 931 right. So basically, all together if you look at this and then if you were to divide this
by 4, then this is how you are going to come across 650 milliseconds and similarly you can
calculate it for the B condition as well.

And now one interesting thing to observe when we are calculating the average let us say R, R
for a condition A condition B that for both the conditions we are getting the similar values
650 milliseconds for condition A, 650 milliseconds for the condition B. So, in this case we
are not able to make much make we are not able to make much information out of it.

398
(Refer Slide Time: 27:04)

Now, let us try to look at another interesting characteristic which is using the heart rate
variability, which is known as the RMSSD so, basically R so ok so this is represented as
RMSSD. So, basically RMSSD as the name itself says it means the root mean square of the
successive differences and the way it is calculated that you simply calculate the square of the
sum of the square of all the RR intervals.

So, for example, you calculate RR1 minus RR2 square similarly RR2 minus RR3 square so
on so forth you take a mean of it and then you simply take a square root of it right. So, that is
how it becomes root mean square of the successive differences. And then it turns out that the
RMSSD has been found to be more informative then let us say the average RR interval.

So, in order to get the root mean square of this thing let us try to calculate. So, the way for
example, we will calculate the RMSSD let us try to calculate the RMSSD for the condition A.
So, if you try to calculate the RMSSD for the condition A this is this will be equal to we will
have to do 744 this is 744 again minus 427 the whole square plus 427 minus 498.

Of course, since we are taking the whole square, so it does not matter whether we are
subtracting the smaller from the bigger or the bigger from the smaller plus 498 minus 931 the
whole square and of course you will have to take the first you will have to take the mean of
the entire thing.

399
So, mean is going to become the you will have to take the mean and then you will have to
take the square root of the square root of the entire thing. So, let me try to do the complete
formula right. So, this is how more or less it is going to look like. So, if you were to if you
were to let us say I will try to just you know summarize it here. So, if you were to take the
square root and the average of this thing then what is going to be turn out to be?

So, for example, if you look at the 744 minus 427 square, so this will turn out to be 317 the
square plus 427 minus 498 it will become 71 square, similarly 498 minus 931 it will become
433 the whole square right and of course, all you will have to take the average of everything.

So, basically this will be the average by 3, so all together it turns out that it is going to
become if you take the average if you take the sum it up and take the average then it is going
to it will be it will become equal to the square root of 97673 and then it is going to become
equal to around 312.526 which you can roughly say that it is become it is equal to the 313
milliseconds perfect.

(Refer Slide Time: 30:07)

So, this is how you calculate the RMSSD for the condition A. Similarly, you can calculate the
RMSSD for the condition B ok. So, let me just try to rub it again, so that you can very clearly
look at it. I hope that you can understand now that how to calculate it right.

So, then what you have that you have the similarly you calculated the RMSSD RMSSD or
RMSSA for A and in the same way you can calculate the RMSSD for the B. So, for the B

400
what you will have to do again you will have to I will just write it down for you and then rest
of the things you can calculate easily.

So, it will become 630 minus 675 the square please pay attention that this is I am calculating
for the B and plus 675 minus 655 the square plus and of course this divide by 3 right. So, if
you were to calculate this thing this is going to come out as 31 milliseconds. So, now we have
some interesting results here for the RMSSD1 and 2.

(Refer Slide Time: 31:13)

So, again to highlight we have condition A condition B, when we calculated the average RR
values for both the conditions, we got exactly the same values even though the conditions are
different. But when we calculated the RMSSD, so the RMSSD for A it came out to be 313
and similarly RMSSD for B it came out to be 31.

So, definitely now we can observe a clear difference between the condition A and the
condition B. Now, what can we say from the difference between the condition A and the
condition B. So, since the condition B is representing the lower heart rate variability, please
pay attention to this thing condition B is represent this is what this is representing the heart
rate variability right.

So, the condition B is representing the lower heart rate variability and we already know that
the lower heart rate variability is associated with the more arousing condition right. So, for
the same reason we can conclude that the condition B in this case is physiologically more

401
arousing than condition A. Of course, assuming that this both were observed in the standard
conditions and from the same participant right.

So, for example, this is very nice example of how where the heart rate simple heart rate
cannot distinguish or discriminate between two conditions, the heart rate variability can
easily do so.

(Refer Slide Time: 32:37)

Now, let us try to look at some of the factors that we left before, that we already know that
there are 2 ways in which we can capture the heart rate. We can capture the heart rate either
using a ECG or we can capture the heart rate using the PPG. Now, we already talked about
that how the ECG is bit more accurate measurement of the heart rate than from the PPG.

But of course, we already saw that it is a bit intrusive and it is not very comfortable for the
participant. So, for the same reason the PPG based measurements have been used very
commonly and it has been shown that if we can take the longer time windows for the analysis
of the heart rate for the PPG signals. As derived from the PPG signals then it can give some
accurate estimates of the heart rate.

So, for example, one common rule of thumb here is that if you can take the data of at least 5
minutes that is roughly corresponding to 300 samples if we take 60 bits per minute, then
roughly 5 minutes of if you take if you do the analysis of the heart rate over a period of 5

402
minutes as collected from the PPG signals; then roughly it gives you the same accuracy as
given by the ECG signals.

So, that is what is one thing that you want to keep in mind, when you are using the PPG over
the PPG based heart rate measurement over the ECG based heart rate measurement ok. But
and then ok so this was about the measurement one thing that one other important thing that
we have to keep in mind that emotion is not the only factor that affects the heart rate and
hence the heart rate variability.

So, there are several factors including for example the age, posture, level of physical
conditioning, breathing frequency etcetera. And many times, you know so what so we can
call it that most many of these factors we often refer to as the individual variability. You may
also have observed this term while we were talking about the emotions in the week 2 I
believe.

So, the individual very so for example, like with the age as the age increases the heart rate
variability it has shown to be decreasing. And this is also one of the reason when why you
will see that the elderly people they become anxious more anxious or they become very
easily anxious in comparison to the younger adults right.

So that is for example one of the reason and then similarly, the posture of course what
particular if you are not sitting in a comfortable posture itself, of course you will have to put
more effort your body will have to put more effort in order to stabilize you and for the same
reason maybe the heart rate itself will be a will be it will impact the heart rate according and
so on so forth.

403
(Refer Slide Time: 35:25)

So, you have to keep this in mind while doing the analysis of the heart rate while doing the
analysis of the emotions in the heart rate. Nevertheless, once you have overcome this thing
then heart rate can give you a good heart rate, can give you a good estimate of the emotional
state particularly with respect to the arousal.

But now there is a catch. while if you only look at the heart rate it can tell us that ok whether
there is an arousal or not, but it will not be able to tell you whether the arousal was because of
the positive or the negative stimulus content or alternatively while it can tell you about the
arousal it cannot tell you about the valence, that is the direction of the emotion.

It does not have a lot to say about the direction of the valence and why? Simply because the
both the positive and the negative stimuli they have been shown to result in an increase in the
arousal triggering changes in the heart rate, hence triggering changes in the heart rate. So,
now of course, both heart triggering changes in the heart rate.

And hence they are impacting the arousal, but you do not have a way to discriminate whether
the arousal was because of the negative or the positive stimuli or alternatively what was the
valence associated with the arousal that I am observing. And hence heart rate has been shown
to be closely related to the arousal and then it is used essentially for the analysis of the
arousal only.

404
So, as I said that, it you may not want to use the heart rate or the ECG or the PPG
measurements for the analysis of the valence, when it come to the when it comes to the
analysis of the emotions. Now, the question is what can you do then?

Of course, in this case, you know usually what we do when we try to make use of the
physiological signals or in general any other sensor as well, we try to make use of a
multi-modal data. In this case for example, the heart rate can be simply combined with
certain other analysis which is the facial expression analysis that you have already seen.

It can also be analysis with combined with some other physiological signals, such as the EEG
or for example, it can also be combined with the eye tracking modalities. In order to
understand a bit more about the direction of the emotion rather than just trying to understand
the emotional arousal content that is there in the emotion.

(Refer Slide Time: 37:45)

Perfect. So, with that we finish the part with the heart rate. Now, next we are going to talk
about the skin conductance and the emotions.

405
Affective Computing
Prof. Jainendra Shukla
Prof. Abhinav Dhall
Department of Computer Science and Engineering
Indraprastha Institute of Information Technology, Delhi

Week - 07
Lecture - 22
Tutorial Emotions in Physiological Signals

Hello, I am Shrivatsa Mishra. In the last lecture, we saw how we could use PsychoPy to
present stimuli as well as collect sensor data. PsychoPy allows us to integrate several devices
to collect multiple different types of data including physiological signals. Since we have also
seen how emotions can be extrapolated from physiological data, in this lecture, I shall explain
how we can analyze Electrodermal Activity or EDA using Python.

EDA has been closely linked to autonomic emotions and cognitive processing. And EDA is
widely used as a sensitive index for emotional processing and sympathetic activity.
Investigations of EDA have also been used to eliminate wider areas of inquiry such as
physiology, personality disorders, conditioning and neuropsychology.

We shall use sample EDA data collected through an Empatica E4 device. This is a device that
can be worn and streamed data through Bluetooth or store that data locally. We can extract
that data using an Empatica E4 app and for analysis, we will use flirt module in Python.

(Refer Slide Time: 01:33)

406
(Refer Slide Time: 01:41)

Hello, I am Ritik Vatsal and I will be introducing you to the Empatica E4 wristband data
collection device. The Empatica E4 is a wristband that collects your physiological data. It has
a single button and a LED light on the surface and two sensor points on the back with
electrodes that need to touch your skin when you wear the device. To wear the device, just
place it on your wrist with this side on the top and just make sure that it is comfortable
enough that you can do daily task.

But it is tight enough that both the electrodes at the bottom touch your skin at all times. After
that, we are ready to start the device. To start the device, just press and hold the top button for
three seconds. A blue light would come on indicating that the device has started.

The blue light would start blinking for about 15 to 20 seconds while the device initializes the
setup and checks all the sensors and all. After the watch has finished blinking for up to a
minute, the light turns red. When the light is red, you know that the recording has started.

After some time, the red light would fade off and the light would turn black. That would
mean that the data is still being recorded, but the watch has turned off the light to preserve
battery. While recording, you can single press the button to mark events in real time on the
watch like this.

407
The watch light would again turn on for a small time. Finally, to turn off the watch and stop
the data from.. stop data recording, you can just press and hold the button for three seconds
and that is it. The watch has now stopped data recording.

(Refer Slide Time: 03:15)

This is the splash screen of the E4 manager device that you would be greeted by when you
will first log into the application. On this screen, you now need to connect your device to the
USB port on your laptop or PC. I will do that now, ok.

(Refer Slide Time: 03:29)

408
Now, as you can see, my watch is ready and I see the one session that I just recorded. To
move forward with this, I just simply click the Sync Sessions button and all the sessions
would be synced and would be available for me to view, ok.

(Refer Slide Time: 03:50)

Now, we can click the view session button and we can see all the sessions that have been
recorded. This is the session for today’s date which has just been processed, ok. By clicking
this button, we can view more about the session, yeah.

(Refer Slide Time: 04:01)

409
(Refer Slide Time: 04:06)

So, this is the E4 website where all the session details are stored. You can just click on the
sessions icon and, there. The sessions are sorted in a date wise manner which you can see in
the top left menu. We just click on today’s date and we can see today’s session which was of
2 minutes and 1 second. We can either download this or view this in more detail.

(Refer Slide Time: 04:22)

The red lines denote the markers that we placed on the watch by pressing the button. By
going back to the Sessions menu, we can download this whole session in a zip file format.

410
(Refer Slide Time: 04:35)

(Refer Slide Time: 04:41)

So, after we download the zip file, we can simply extract the file, extract here and we will see
all the different modalities that the device has recorded in a simple and convenient CSV
format. Shrivatsa will explain how to extract the data further from this. Thank you.

411
(Refer Slide Time: 04:50)

The Empatica E4 device provides us with multiple types of data such as the heart rate
variability; inter beat intervals as well as electrodermal activity. For this assignment, we
should just be using the electrodermal activity. Now, this data is stored in a CSV file. The
first value in the CSV file is the start time of the entire data. This is stored locally on the
Empatica E4 or using the computer if you are streaming it on (Refer Time: 05:13)

You can reset the clock according to whatever system you want by just connecting it and
using the Empatica app. The second value is the frequency at which the data is collected. This
is in hertz. So, since the example for the EDA is 4, it means 4 data points are collected every
second. Now, let us move on to the code. For this, I shall be using Google Colab as it is easy
to use and readily available.

412
(Refer Slide Time: 05:45)

The data used is 5 minute sample collected during another study using the Empatica E4 itself.
We will start by installing flirt module in Python using the pip command. This will take some
time, but flirt is a library that will allow you to extract the data very easily as well as extract
the features from the same data.

(Refer Slide Time: 06:03)

413
(Refer Slide Time: 06:07)

Now, that this has been installed, we will import the library. We will be importing flirt library
as well as its reader. We will also be importing NumPy, seaborn as well as Matplotlib to
graph the data itself.

(Refer Slide Time: 06:22)

Importing these libraries usually takes some time. Now, that they have been imported, let us
import the data itself. The data is stored in an EDA dot csv file and we can simply get it
through an reader function in the flirt module itself. Upon running it and printing it, we find
that this is stored in a data frame.

414
A data frame is a pandas data type and in this, there are two columns. The date and time of
when it was recorded as well as the EDA value for that. Since we know, as we know the
frequency is 4, therefore, as we see every successive frame is at a difference of point two fifth
of a second.

(Refer Slide Time: 07:01)

Now, let us just graph this data naturally. So, there are two ways we can graph. We could use
Matplotlib or Seaborn. I can show you using both. So, for Matplotlib, we will just use the plt
dot plot function and plot the EDA value in this as well as using the Seaborn function.

(Refer Slide Time: 07:22)

415
As we can see at the start, there is a large variation. This can be ignored as we usually take
smaller chunk.

(Refer Slide Time: 07:27)

Next, let us move on to extracting the features. We can simply use the get_eda_features
function in the flirt module to get the features. This will take some time. Now, that we have
got the data.

(Refer Slide Time: 07:46)

416
(Refer Slide Time: 07:52)

We can look at what all features it has extracted. There are two main components to the
overall complex refer to as the EDA. The first is the tonic level EDA. This relates to the
signals' slower acting components as well as background characteristics. The most common
measure of this component is the skin conducting level on SEL.

And changes in the SEL are thought to reflect general changes in the autonomic response.
The other components is the phasic component, which refers to the signals' faster changing
elements. The Skin Conductance Response or the SCR is what is the major component in
this.

Recent evidences suggest that both components are important and rely on different neural
mechanisms. Crucially, it is important to be aware that the phasic SCR, which often receives
the most attention, only makes up a small proportion of the overall EDA complex.

417
(Refer Slide Time: 08:56)

Now, let us graph the two different values. First, we have the tonic value as well, tonic mean
as well as the phasic mean. We graph this using seaborn, this lineplot function. Over here, we
put the x value as the datetime in the y value as the tonic mean. Upon graphing it, in our case,
we obtain this graph. As we can see, the tonic value is always greater than the phasic value.
This is because as stated, phasic values make up a smaller percentage of the total value as
compared to the tonic values.

In summary, what we have done is imported the library for flirt, imported the data which is
stored in a CSV file, read it, graph the base EDA data, extracted the phasic and tonic levels
from the EDA data, as well as graph them. Using just these two basic datas, we are able to
extract a lot of different things about the data itself and we will need to look into greater
depth into these to understand more. Even now, there is much research being done on the
field and new advances are being made.

Thank you for listening to me.

418

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy