0% found this document useful (0 votes)

37 views33 pages

ANN Text and Sequence Processing

This document discusses recurrent neural networks (RNNs) and their application to sequence modeling problems. It begins by explaining why RNNs are useful for sequence data and how they differ from traditional feedforward neural networks. It then covers the basic mechanics of RNNs, including how they can retain information about previous elements in a sequence through their hidden state. The document discusses different types of RNN architectures for one-to-one, one-to-many, many-to-one, and many-to-many problems and provides examples like language translation. It also covers backpropagation through time for training RNNs on variable length sequences.

Uploaded by

Muhammad Hanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views33 pages

ANN Text and Sequence Processing

Uploaded by

Muhammad Hanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

ATNW-4118

Artificial Neural Networks

BSCS-8 / BSSE-8

Lecture # 9

Text and
Sequence Processing
Why Sequence Models?
• Examples of Sequence Data

2
Instructor: Tanzila Kehkashan
Conversion of FFNN into a RNN
• RNN works on the principle of saving the output of a particular layer and feeding this back to
the input in order to predict the output of the layer.

• Nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks.
• A, B, and C are the parameters of the network used to improve the output of the model.

3
Instructor: Tanzila Kehkashan
Fully Connected Recurrent Neural Network
• At any given time t, the current input is a combination of input at x(t) and x(t-1).
• Output at any given time is fetched back to the network to improve on the output.

4
Instructor: Tanzila Kehkashan
How Does RNN Work?

5
Instructor: Tanzila Kehkashan
Motivating Example
• X : Harry and Hermione invented a new spell.

• Now, to represent each word of the sentence, we use x<t>:

• x<1> = Harry
• x<2> = Hermione, and so on
• For the above sentence, the output will be:
• y=1010000
• Here, 1 represents that word represents a person’s name (and 0 means it’s anything but).
• Below are a few common notations we generally use:
• Tx = length of input sentence
• Ty = length of output sentence
• x(i) = ith training example
• x(i)<t> = tth training of ith training example
Tx(i) Tanzila
= length
Kehkashan of ith input sentence

6
• Instructor:
Why Sequence Models?
How do we represent n individual word in a sequence?
• This is where we lean on a vocabulary, or a dictionary.
• This is a list of words that we use in our representations.
• A vocabulary might look like this:
• Size of the vocabulary might vary depending on the application.
• One potential way of making a vocabulary is by picking up the most frequently occurring
words from the training set.
• Now, suppose we want to represent the word ‘harry’ which is in 4075th
position in our vocabulary.
• We one-hot encode this vocabulary to represent ‘harry’:
• To generalize, x<t> is an one-hot encoded vector.
• We will put 1 in 4075th position and all remaining words will be represented as 0.
• If the word is not in our vocabulary, we create an unknown <UNK> tag and add it in the
vocabulary.

7
Instructor: Tanzila Kehkashan
Why not Use a Standard Neural Network?
• We use Recurrent Neural Networks to learn mapping from X to Y, when either X or Y, or both
X and Y, are some sequences.
• But why can’t we just use a standard neural network for these sequence problems?
• Example: Suppose we build the below neural network:

• Problems:
1. Inputs and outputs do not have a fixed length, i.e., some input sentences can be of 10
words while others could be <> 10. The same is true for the eventual output
2. We will not be able to share features learned across different positions of text if we use
a standard neural network

8
Instructor: Tanzila Kehkashan
Recurrent Neural Network (RNN) Model
• We need a representation that will help us to parse through different sentence lengths as
well as reduce the number of parameters in the model.
• This is where we use a recurrent neural network. This is how a typical RNN looks like:

• A RNN takes the first word (x<1>) and feeds it into a neural network layer which predicts an
output (y’<1>).
• This process is repeated until the last time step x<Tx> which generates the last output y’<Ty>.
• This is the network where the number of words in input as well as the output are same.

9
Instructor: Tanzila Kehkashan
Recurrent Neural Network (RNN) Model
• RNN scans through the data in a left to right sequence.
• Note that the parameters that the RNN uses for each time step are shared.
• We will have parameters shared between each input and hidden layer (Wax), every timestep
(Waa) and between the hidden layer and the output (Wya).
• So if we are making predictions for x<3>, we will also have information about x<1> and x<2>.
• A potential weakness of RNN is that it only takes information from the previous timesteps
and not from the ones that come later.
• This problem can be solved using bi-directional RNNs. For now, let’s look at forward
propagation steps in a RNN model:
• a<0> is a vector of all zeros and we calculate the further activations similar to that of a
standard neural network:
• a<0> = 0
• a<1> = g(Waa * a<0> + Wax * x<1> + ba)

10
y<1>Tanzila
• Instructor: = g’(W
Kehkashan * a<1> + b )
ya y
Recurrent Neural Network (RNN) Model
• Similarly, we can calculate the output at each time step. The generalized form of these
formulae can be written as:

• We can write these equations in an even more simpler way:

• We horizontally stack Waa and Wya to get Wa. a<t-1> and x<t> are stacked vertically.
• Rather than carrying around 2 parameter matrices, we now have just 1 matrix.
• And that, in a nutshell, is how forward propagation works for recurrent neural networks.
• Recurrent neural nets use backpropagation algorithm, but it is applied for every timestamp. It
is commonly known as Backpropagation Through Time (BTT).

11
Instructor: Tanzila Kehkashan
Backpropagation Through Time (BTT)
• Backpropagation steps work in the opposite direction to forward propagation.
• We have a loss function which we need to minimize in order to generate accurate
predictions. The loss function is given by:

• We calculate the loss at every timestep and finally sum all these losses to calculate the final
loss for a sequence:

• In forward propagation, we move from left to right, i.e., increasing the indices of time t.
• In backpropagation, we are going from right to left, i.e., going backward in time (hence the
name backpropagation through time).
• So far, we have seen scenarios where the length of input and output sequences was equal.
• But what if the length differs?

12
Instructor: Tanzila Kehkashan
Application of RNNs
• We can have different types of RNNs to deal with use cases where sequence length differs.

• In other way, there are many ways to implement an RNN model:

• One to One: given some scores of a championship, you can predict the winner.

13
Instructor: Tanzila Kehkashan
Application of RNNs
• One to Many: given an image, you can predict what the caption is going to be.

• Consider the example of music generation

where we want to predict the lyrics using
the music as input.
• In such scenarios, the input is just a single
word (or a single integer), and the output
can be of varied length.
• RNN architecture for this type of problems
looks like the below:

14
Instructor: Tanzila Kehkashan
Application of RNNs
• Many to One: given a tweet, you can predict the sentiment of that tweet.
• We pass a sentence to model and it returns
sentiment or rating corresponding to that
sentence.

• Here the input sequence can have varied

length, whereas there will only be a single
output.

• The RNN architecture for such problems will

look something like this:

15
Instructor: Tanzila Kehkashan
Application of RNNs
• Many to Many: given an English sentence, you can translate it to its German equivalent.

• Named Entity Recognition example fall

under this category.
• We have a sequence of words, and for
each word, we have to predict whether it is
a name or not.

• RNN architecture for such a problem looks like this:

• For every input word, we predict a corresponding

output word.

16
Instructor: Tanzila Kehkashan
Application of RNNs

• Many to Many: consider the machine translation application where we take an input
sentence in one language and translate it into another language.
• It is a many-to-many problem but the length of the input sequence might or might not be
equal to the length of output sequence.

• In such cases, we have an encoder part and

a decoder part.
• Encoder part reads the input sentence and
the decoder translates it to the output
sentence:

17
Instructor: Tanzila Kehkashan
Language Model and Sequence Generation
• Suppose we are building a speech recognition system and we hear the sentence “the apple
and pear salad was delicious”.
• What will the model predict – “the apple and pair salad was delicious” or “the apple and pear
salad was delicious”?
• Speech recognition system picks a sentence by using a language model which predicts the
probability of each sentence.
• But how do we build a language model?
• Suppose we have an input sentence: Cats average 15 hours of sleep a day.
• Steps to build a language model will be:
• Step 1 – Tokenize the input, i.e. create a dictionary
• Step 2 – Map these words to a one-hot encode vector. We can add <EOS> tag which
represents the End Of Sentence
• Step 3 – Build an RNN model

18
Instructor: Tanzila Kehkashan
Language Model and Sequence Generation
• We take the first input word and make a prediction for that.
• Output here tells us what is the probability of any word in the dictionary.
• Second output tells us the probability of the predicted word given the first input word:

• Each step in our RNN model looks at some set of preceding words to predict the next word.

19
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
Issues with backpropagation
1. Vanishing Gradient
• Consider the two sentences:
• The cat, which already ate a bunch of food, was full.
• The cat, which already ate a bunch of food, were full.
• Which of the above two sentences is grammatically correct?
• It’s the first one.

• Basic RNNs are not good at capturing long term dependencies.

• This is because during backpropagation, gradients from an output y would have a hard
time propagating back to affect the weights of earlier layers. So, in basic RNNs, the
output is highly affected by inputs closer to that word.

20
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
1. Vanishing Gradient
• When making use of back-propagation the goal is to calculate the error which is
actually found out by finding out the difference between the actual output and the
model output and raising that to a power of 2.

21
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
Issues with backpropagation
2. Exploding Gradient
• Working of the exploding gradient is similar but the weights here change drastically
instead of negligible change.

22
Instructor: Tanzila Kehkashan
Challenges in Training a RNN Model
How to overcome these Challenges?

23
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks
• LSTM networks are a special kind of RNN because they are capable of learning long-term
dependencies.

24
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

25
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

26
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

27
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

28
Instructor: Tanzila Kehkashan
Long Short Term Memory (LSTM) Networks

29
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• We will feed an LSTM with correct sequences from the text of three symbols a
inputs and 1 labeled symbol.
• Eventually the neural network will learn to predict the next symbol correctly.

30
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• How to Train the Network?

31
Instructor: Tanzila Kehkashan
LSTM Network – Use case

• A unique integer value is assigned to each symbol because LSTM inputs can only
understand real numbers.

ote
fo otn
b l e in
ail a
v

32
e a
d
co
Instructor: Tanzila Kehkashan
33
Instructor: Tanzila Kehkashan

2@software Maintenance
No ratings yet
2@software Maintenance
75 pages
Robotic Process Automation
100% (1)
Robotic Process Automation
6 pages
1-Operating System Security
100% (1)
1-Operating System Security
55 pages
Data Warehouse Maintenance Plan
No ratings yet
Data Warehouse Maintenance Plan
88 pages
SDA Lecture 1
No ratings yet
SDA Lecture 1
34 pages
Unit 1 - Introduction To Software Project Management
No ratings yet
Unit 1 - Introduction To Software Project Management
21 pages
System Design
No ratings yet
System Design
16 pages
KNIME Usage and Benefits
No ratings yet
KNIME Usage and Benefits
3 pages
Hotel Management System
No ratings yet
Hotel Management System
37 pages
Theme 5
No ratings yet
Theme 5
11 pages
Software Engg Course File
No ratings yet
Software Engg Course File
6 pages
Input/output of Requirements Engineering Process
No ratings yet
Input/output of Requirements Engineering Process
7 pages
Dice Resume CV Shed
No ratings yet
Dice Resume CV Shed
4 pages
EC6502 Principles of Digital Signal Processing
No ratings yet
EC6502 Principles of Digital Signal Processing
320 pages
Data Architecture Theory
No ratings yet
Data Architecture Theory
8 pages
Postal: Ese + Gate + Psus
No ratings yet
Postal: Ese + Gate + Psus
3 pages
MIT 6.S191 - Recurrent Neural Networks
No ratings yet
MIT 6.S191 - Recurrent Neural Networks
84 pages
Image and Vision Computing: Rama Chellappa
No ratings yet
Image and Vision Computing: Rama Chellappa
3 pages
Continuous Deployment
No ratings yet
Continuous Deployment
2 pages
Comparative Study of PID Tuning Methods For Processes With Large & Small Delay Times
No ratings yet
Comparative Study of PID Tuning Methods For Processes With Large & Small Delay Times
7 pages
Psoc
100% (1)
Psoc
2 pages
Automatic Accident Detection Using Deep Learning
No ratings yet
Automatic Accident Detection Using Deep Learning
6 pages
Mpti
No ratings yet
Mpti
4 pages
Lec 10
No ratings yet
Lec 10
37 pages
Mahendra Engineering College: Lecture Handouts
No ratings yet
Mahendra Engineering College: Lecture Handouts
3 pages
Itp09 Sia Midterm Exam Reviewer
No ratings yet
Itp09 Sia Midterm Exam Reviewer
3 pages
Question Bank Control System (PC-EE 503) Discipline: EE1 & EE2, 5th Semester
No ratings yet
Question Bank Control System (PC-EE 503) Discipline: EE1 & EE2, 5th Semester
7 pages
Problem Set 3
No ratings yet
Problem Set 3
3 pages
CPQM
No ratings yet
CPQM
2 pages
AA 549/21 JUL/BOS-ORD: - Not For Real World Navigation
No ratings yet
AA 549/21 JUL/BOS-ORD: - Not For Real World Navigation
44 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Systems Thinking
No ratings yet
Systems Thinking
6 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
21 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
GenAI Module2
No ratings yet
GenAI Module2
190 pages
Software Development Life Cycle: Lecture-6
No ratings yet
Software Development Life Cycle: Lecture-6
57 pages
Day 4
No ratings yet
Day 4
22 pages
Lecture8 421
No ratings yet
Lecture8 421
85 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
NLP Lecture 6
No ratings yet
NLP Lecture 6
57 pages
RNN-1
No ratings yet
RNN-1
50 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Embedding
No ratings yet
Embedding
55 pages
NLP Unit-3A Notes
No ratings yet
NLP Unit-3A Notes
28 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Module2 L7 RNN LSTM
No ratings yet
Module2 L7 RNN LSTM
47 pages
2 U4-Rnn
No ratings yet
2 U4-Rnn
17 pages
Recurrent Neural Networks (RNN) : Subtitle
No ratings yet
Recurrent Neural Networks (RNN) : Subtitle
53 pages
cs224n-2021-LSTM NN
No ratings yet
cs224n-2021-LSTM NN
59 pages
Unit 3 Chapter 1 RNN
No ratings yet
Unit 3 Chapter 1 RNN
121 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Dl-Unit 5
No ratings yet
Dl-Unit 5
10 pages
06 - LLM
No ratings yet
06 - LLM
18 pages
Lec14 RNN3 8 Feb 18
No ratings yet
Lec14 RNN3 8 Feb 18
16 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
T3-Slide - 002 - Vanilla RNNs
No ratings yet
T3-Slide - 002 - Vanilla RNNs
25 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Unit 5
No ratings yet
Unit 5
76 pages
RNN
No ratings yet
RNN
23 pages
Unit V
No ratings yet
Unit V
32 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Module 5
No ratings yet
Module 5
21 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Unit 3
No ratings yet
Unit 3
30 pages
Module 06
No ratings yet
Module 06
5 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Unit 3
No ratings yet
Unit 3
8 pages
Steps For Training A Recurrent Neural Network: Advantages
No ratings yet
Steps For Training A Recurrent Neural Network: Advantages
13 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
No ratings yet
Recurrent Neural Networks Tutorial, Part 1 - Introduction To RNNs - WildML
8 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Mastering TensorFlow: From Basics to Expert Proficiency
From Everand
Mastering TensorFlow: From Basics to Expert Proficiency
William Smith
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ANN Text and Sequence Processing

Uploaded by

ANN Text and Sequence Processing

Uploaded by

ATNW-4118

Artificial Neural Networks

• Now, to represent each word of the sentence, we use x<t>:

• We can write these equations in an even more simpler way:

• In other way, there are many ways to implement an RNN model:

• Consider the example of music generation

• Here the input sequence can have varied

• The RNN architecture for such problems will

• Named Entity Recognition example fall

• RNN architecture for such a problem looks like this:

• For every input word, we predict a corresponding

• In such cases, we have an encoder part and

• Basic RNNs are not good at capturing long term dependencies.

• How to Train the Network?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.