0% found this document useful (0 votes)
37 views33 pages

ANN Text and Sequence Processing

This document discusses recurrent neural networks (RNNs) and their application to sequence modeling problems. It begins by explaining why RNNs are useful for sequence data and how they differ from traditional feedforward neural networks. It then covers the basic mechanics of RNNs, including how they can retain information about previous elements in a sequence through their hidden state. The document discusses different types of RNN architectures for one-to-one, one-to-many, many-to-one, and many-to-many problems and provides examples like language translation. It also covers backpropagation through time for training RNNs on variable length sequences.

Uploaded by

Muhammad Hanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views33 pages

ANN Text and Sequence Processing

This document discusses recurrent neural networks (RNNs) and their application to sequence modeling problems. It begins by explaining why RNNs are useful for sequence data and how they differ from traditional feedforward neural networks. It then covers the basic mechanics of RNNs, including how they can retain information about previous elements in a sequence through their hidden state. The document discusses different types of RNN architectures for one-to-one, one-to-many, many-to-one, and many-to-many problems and provides examples like language translation. It also covers backpropagation through time for training RNNs on variable length sequences.

Uploaded by

Muhammad Hanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

ATNW-4118

Artificial Neural Networks


BSCS-8 / BSSE-8

Lecture # 9

Text and
Sequence Processing
Why Sequence Models?
• Examples of Sequence Data

2
Instructor: Tanzila Kehkashan
Conversion of FFNN into a RNN
• RNN works on the principle of saving the output of a particular layer and feeding this back to
the input in order to predict the output of the layer.

• Nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks.
• A, B, and C are the parameters of the network used to improve the output of the model.

3
Instructor: Tanzila Kehkashan
Fully Connected Recurrent Neural Network
• At any given time t, the current input is a combination of input at x(t) and x(t-1).
• Output at any given time is fetched back to the network to improve on the output.

4
Instructor: Tanzila Kehkashan
How Does RNN Work?

5
Instructor: Tanzila Kehkashan
Motivating Example
• X : Harry and Hermione invented a new spell.

• Now, to represent each word of the sentence, we use x<t>:


• x<1> = Harry
• x<2> = Hermione, and so on
• For the above sentence, the output will be:
• y=1010000
• Here, 1 represents that word represents a person’s name (and 0 means it’s anything but).
• Below are a few common notations we generally use:
• Tx = length of input sentence
• Ty = length of output sentence
• x(i) = ith training example
• x(i)<t> = tth training of ith training example
Tx(i) Tanzila
= length
Kehkashan of ith input sentence

6
• Instructor:
Why Sequence Models?
How do we represent n individual word in a sequence?
• This is where we lean on a vocabulary, or a dictionary.
• This is a list of words that we use in our representations.
• A vocabulary might look like this:
• Size of the vocabulary might vary depending on the application.
• One potential way of making a vocabulary is by picking up the most frequently occurring
words from the training set.
• Now, suppose we want to represent the word ‘harry’ which is in 4075th
position in our vocabulary.
• We one-hot encode this vocabulary to represent ‘harry’:
• To generalize, x<t> is an one-hot encoded vector.
• We will put 1 in 4075th position and all remaining words will be represented as 0.
• If the word is not in our vocabulary, we create an unknown <UNK> tag and add it in the
vocabulary.

7
Instructor: Tanzila Kehkashan
Why not Use a Standard Neural Network?
• We use Recurrent Neural Networks to learn mapping from X to Y, when either X or Y, or both
X and Y, are some sequences.
• But why can’t we just use a standard neural network for these sequence problems?
• Example: Suppose we build the below neural network:

• Problems:
1. Inputs and outputs do not have a fixed length, i.e., some input sentences can be of 10
words while others could be <> 10. The same is true for the eventual output
2. We will not be able to share features learned across different positions of text if we use
a standard neural network

8
Instructor: Tanzila Kehkashan
Recurrent Neural Network (RNN) Model
• We need a representation that will help us to parse through different sentence lengths as
well as reduce the number of parameters in the model.
• This is where we use a recurrent neural network. This is how a typical RNN looks like:

• A RNN takes the first word (x<1>) and feeds it into a neural network layer which predicts an
output (y’<1>).
• This process is repeated until the last time step x<Tx> which generates the last output y’<Ty>.
• This is the network where the number of words in input as well as the output are same.

9
Instructor: Tanzila Kehkashan
Recurrent Neural Network (RNN) Model
• RNN scans through the data in a left to right sequence.
• Note that the parameters that the RNN uses for each time step are shared.
• We will have parameters shared between each input and hidden layer (Wax), every timestep
(Waa) and between the hidden layer and the output (Wya).
• So if we are making predictions for x<3>, we will also have information about x<1> and x<2>.
• A potential weakness of RNN is that it only takes information from the previous timesteps
and not from the ones that come later.
• This problem can be solved using bi-directional RNNs. For now, let’s look at forward
propagation steps in a RNN model:
• a<0> is a vector of all zeros and we calculate the further activations similar to that of a
standard neural network:
• a<0> = 0
• a<1> = g(Waa * a<0> + Wax * x<1> + ba)

10
y<1>Tanzila
• Instructor: = g’(W
Kehkashan * a<1> + b )
ya y
Recurrent Neural Network (RNN) Model
• Similarly, we can calculate the output at each time step. The generalized form of these
formulae can be written as:

• We can write these equations in an even more simpler way:

• We horizontally stack Waa and Wya to get Wa. a<t-1> and x<t> are stacked vertically.
• Rather than carrying around 2 parameter matrices, we now have just 1 matrix.
• And that, in a nutshell, is how forward propagation works for recurrent neural networks.
• Recurrent neural nets use backpropagation algorithm, but it is applied for every timestamp. It
is commonly known as Backpropagation Through Time (BTT).

11
Instructor: Tanzila Kehkashan
Backpropagation Through Time (BTT)
• Backpropagation steps work in the opposite direction to forward propagation.
• We have a loss function which we need to minimize in order to generate accurate
predictions. The loss function is given by:

• We calculate the loss at every timestep and finally sum all these losses to calculate the final
loss for a sequence:

• In forward propagation, we move from left to right, i.e., increasing the indices of time t.
• In backpropagation, we are going from right to left, i.e., going backward in time (hence the
name backpropagation through time).
• So far, we have seen scenarios where the length of input and output sequences was equal.
• But what if the length differs?

12
Instructor: Tanzila Kehkashan
Application of RNNs
• We can have different types of RNNs to deal with use cases where sequence length differs.

• In other way, there are many ways to implement an RNN model:

• One to One: given some scores of a championship, you can predict the winner.

13
Instructor: Tanzila Kehkashan
Application of RNNs
• One to Many: given an image, you can predict what the caption is going to be.

• Consider the example of music generation


where we want to predict the lyrics using
the music as input.
• In such scenarios, the input is just a single
word (or a single integer), and the output
can be of varied length.
• RNN architecture for this type of problems
looks like the below:

14
Instructor: Tanzila Kehkashan
Application of RNNs
• Many to One: given a tweet, you can predict the sentiment of that tweet.
• We pass a sentence to model and it returns
sentiment or rating corresponding to that
sentence.

• Here the input sequence can have varied


length, whereas there will only be a single
output.

• The RNN architecture for such problems will


look something like this:

15
Instructor: Tanzila Kehkashan
Application of RNNs
• Many to Many: given an English sentence, you can translate it to its German equivalent.

• Named Entity Recognition example fall


under this category.
• We have a sequence of words, and for
each word, we have to predict whether it is
a name or not.

• RNN architecture for such a problem looks like this:

• For every input word, we predict a corresponding


output word.

16
Instructor: Tanzila Kehkashan
Application of RNNs

• Many to Many: consider the machine translation application where we take an input
sentence in one language and translate it into another language.
• It is a many-to-many problem but the length of the input sequence might or might not be
equal to the length of output sequence.

• In such cases, we have an encoder part and


a decoder part.
• Encoder part reads the input sentence and
the decoder translates it to the output
sentence:

17
Instructor: Tanzila Kehkashan
Language Model and Sequence Generation
• Suppose we are building a speech recognition system and we hear the sentence “the apple
and pear salad was delicious”.
• What will the model predict – “the apple and pair salad was delicious” or “the apple and pear
salad was delicious”?
• Speech recognition system picks a sentence by using a language model which predicts the
probability of each sentence.
• But how do we build a language model?
• Suppose we have an input sentence: Cats average 15 hours of sleep a day.
• Steps to build a language model will be:
• Step 1 – Tokenize the input, i.e. create a dictionary
• Step 2 – Map these words to a one-hot encode vector. We can add <EOS> tag which
represents the End Of Sentence
• Step 3 – Build an RNN model

18
Instructor: Tanzila Kehkashan
Language Model and Sequence Generation
• We take the first input word and make a prediction for that.
• Output here tells us what is the probability of any word in the dictionary.
• Second output tells us the probability of the predicted word given the first input word:

• Each step in our RNN model looks at some set of preceding words to predict the next word.

19
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
Issues with backpropagation
1. Vanishing Gradient
• Consider the two sentences:
• The cat, which already ate a bunch of food, was full.
• The cat, which already ate a bunch of food, were full.
• Which of the above two sentences is grammatically correct?
• It’s the first one.

• Basic RNNs are not good at capturing long term dependencies.


• This is because during backpropagation, gradients from an output y would have a hard
time propagating back to affect the weights of earlier layers. So, in basic RNNs, the
output is highly affected by inputs closer to that word.

20
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
1. Vanishing Gradient
• When making use of back-propagation the goal is to calculate the error which is
actually found out by finding out the difference between the actual output and the
model output and raising that to a power of 2.

21
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
Issues with backpropagation
2. Exploding Gradient
• Working of the exploding gradient is similar but the weights here change drastically
instead of negligible change.

22
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
How to overcome these Challenges?

23
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks
• LSTM networks are a special kind of RNN because they are capable of learning long-term
dependencies.

24
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

25
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

26
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

27
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

28
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

29
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• We will feed an LSTM with correct sequences from the text of three symbols a
inputs and 1 labeled symbol.
• Eventually the neural network will learn to predict the next symbol correctly.

30
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• How to Train the Network?

31
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• A unique integer value is assigned to each symbol because LSTM inputs can only
understand real numbers.

ote
fo otn
b l e in
ail a
v

32
e a
d
co
Instructor: Tanzila Kehkashan
33
Instructor: Tanzila Kehkashan

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy