0% found this document useful (0 votes)

17 views56 pages

RNN and LSTM - Explanation by Example

The document explains the workings of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks using examples related to predicting dinner choices and writing a children's book. It discusses the limitations of RNNs in handling long-term dependencies and introduces LSTMs as a solution that incorporates memory components to improve predictions. Additionally, it highlights various applications of LSTMs, including language translation and speech recognition.

Uploaded by

popipe4982

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views56 pages

RNN and LSTM - Explanation by Example

Uploaded by

popipe4982

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

How Recurrent Neural Networks

and Long Short-Term Memory

Work – By Example
2017-2021
Based on the notes from Brandon Brohrer

1
Explanation using examples
We will attempt to explain the functionality of

• RNNs

• LSTMs

By using few examples

2
RNN – Guess what we have for Dinner tonight?
• Every night for dinner, we have either:

₋ Pizza, or
₋ Sushi, or
₋ Waffles

• And repeat again

3
Guess the dinner tonight?
Voting Process  Prediction
Outputs: (?)
3 choices
• pizza,
• sushi,
• waffles

Inputs: (?)
what can affect what
we have for dinner, for
example,
• day of the week,
• month,
• late meeting
4
Pizza, Sushi, Waffles, & repeat - Re-examine the data
Let’s simplify our assumptions
Assume that the choice of
dinner does not depend on the
day of the week, month, or late
meetings
Let’s assume that the data
follows a simple pattern of
• Pizza,
• sushi,
• waffles and
• repeat
Therefore, we just need to
know what we had last night 5
What happens if we do not know what we had last night?

• e.g., I was not home last night,

I cannot remember,
…

• Then, it will be helpful to have:

• A prediction of what we might have had yesterday night

6
What do we need to know to make a prediction re dinner night?
• Generally we need
to know:

• A prediction of
what we might
have had last
night
or
• Information
about the dinner
last night

7
Side note - Vectors

Neural networks can

understand vectors
better

Native language of
NNs is vectors

8
Side note - Vectors as statements
ONE HOT ENCODING

The list (vector) includes

all possibilities for the
days of the week

All of them are ZERO

Except the one that is
true that is Tuesday is
ONE

“It is Tuesday”

9
Side note - One Hot Vector for our example
A vector: a list of values
We have 3 choices for
dinner
-Pizza,
-Sushi,
-Waffles

“we have Sushi”

The one hot vector
representing this
statement is:

0
1
0 10
Input/Output vector
- Input: Two vectors

1. A vector for
prediction of dinner
for yesterday

2. A vector for actual

dinner yesterday

- Output: One vector

1. A vector for dinner
prediction for today

11
Recurrent Neural Networks

12
RNN - Create a feedback from output to the input

We can now connect

the output to the input
to create the predicted
vector with a delay
Pt-1

Dotted line signifies

the delay
Pt

If the output vector

denotes (t) then the
it
feedback line denotes
(t-1)

13
Dinner example - Unwrapped recurrent network
Now we can go as
far back as we want

Let’s say we have

the dinner
information for 2
weeks ago for
example

14
Example: A network to write a children’s book
The collection and/or
dictionary of the words
that we have to write
this book is rather small:

₋ Doug
₋ Jane
₋ Spot
₋ saw
₋ .

Objective: to put these

words together in right
order to write a book
15
RNN to write a book
- 3 vectors
Pt ❷
1. A vector of the
words that we
have now (it) ❸ Pt-1

2. A vector of the
prediction of the
❶
words (Pt) it

3. A vector of the
words that may The new information (it) indicates what is the current
word, e.g., if it is Doug then the vector is [0 1 0 0 0 0]’
come next (Pt-1)
16
Trained RNN – new information vector (it)
Let’s try to work out
this RNN

After the training is

done when the new
information is
₋ Jane,
₋ Doug or
₋ Spot

we expect that the

trained RNN would
point to
₋ saw or
₋ . 17
Working out our RNN – prediction vector (Pt-1)
if the predicted
word is
- Jane, or
- Doug, or
- Spot

Similarly we expect
that the trained net
would point to
- Saw, or
- .

18
Working out our RNN
if the present word is
- saw, or
- .

The trained net would

point to
- Jane or
- Doug
- Spot

As a name should
appear after saw or .
19
A representation for our RNN
The input is a collection
(concatenation) of the new
information and the
predicated values

This is denoted by

The activation function used

here is tanh denoted by

Making the output behave

well
20
Side note – how does tanh work
Tanh is the squashing function
Regardless of the input
everything will always be
between -1 & +1 (very
important)

For the input values between

-1 & +1 the output value is
very close or equal to the
original input

For the values greater than +1

the output value is +1

For the values less than -1

the output value is -1 21
Why RNN may not work ?
Doug saw Doug.
(after saw we expect a
name that name could be
Doug)

Jane saw Spot saw …

(after saw we expect a
name and after a name we
can expect saw …)

Spot. Doug. Jane.

(after a name we can
expect .)

22
What may not work so far?

Problem:

We have short term

memory

We only look back one

time step & do not use
the information from
further back

23
RNN
A simple architecture of
a RNN Feedback
delay

Your input is a
combination of:

- the new information

&
- what you predicted in
the last step (time
wise)

24
How do we fix this?
We need to modify
the existing
architecture

One solution is to add

a memory capabilities

How do we add a
memory component ?

25
Introduction of the memory component
Adding memory
component

to enable the network

to remember what
happens many steps memory

ago (from further

back)

26
Side note - Element-by-Element Addition/Plus Junction

27
Side note - Element-by-Element Multiplication/Times Junction

28
Gating
We can use time junction to
control what percentage of
the an input (a signal) goes
through, i.e., gating

In this example, the 1st

element of the signal goes
through completely
whereas the 3rd element is
completely masked

29
Side note - Sigmoid Function

30
Memory Component: forget & keep
Memory
component: Prediction
from last
round
• To forget some
of the previous
prediction and

• to keep the rest

31
How does the forget gate work?
1. A combination of the previous
prediction & new information
goes thru net1 & a prediction
is made accordingly
a copy of the
2. A copy of the prediction will prediction from the
be given to the forget gate last round will be
net2: what to forget passed to the
a combination of the
forget gate
previous prediction &
Note: new information
net2 is different from net1 & its ❷
task is to learn what to forget &
when to forget net1: what to predict

A part of this will be forgotten &

❶
the remaining will be added to
the prediction
32
Add a selection layer – net3

We do not necessarily
need to send the entire
prediction to the
input/output
net3: what to select

To select with part of

the prediction goes
back to the
input/output

33
How does the selection gate work?
In the previous layer
(forget/keep) we combined our
memory with our prediction

1. We need to have a filter to

select which part of
combined memory +
prediction to go out

2. We also need to add a new

tanh after the elementwise
add to make sure everything
is still bet -1 & +1 (addition
might have caused an
increase beyond -1/+1)
34
Where does learning happen so far?

• net1: to learn to PREDICT

• net2: to learn what to FORGET/KEEP

• net3: to learn what to SELECT

35
Add an ignore/attention layer – net4
To ignore some of
the possible
predictions

net4: what to ignore

36
How does the ignore layer work?
Some of the possible
predictions that are not
immediately relevant to
be ignored

Not to unnecessarily
complicate the predictions
(by having too many of
them) in the memory as
going forward

37
Where does learning happen?

• net1: to learn to predict

• net2: to learn what to forget/keep

• net3: to learn what to select

• net4: to learn what to ignore

38
LSTM Structure

③② ① ④

39
Side note
• A multiplicative input gate unit learns to protect the constant
error flow within the memory cell from perturbation by
irrelevant inputs

• Likewise, a multiplicative output gate unit learns to protect

other units from perturbation by currently irrelevant
memory contents stored in the memory cell

40
Running a simple example
Assume this LSTM is
already trained

net1, net2, net3 ,net4 are

known

41
Information going through
① So far we have …
“Jane saw Spot.”
and the new word is “Doug”
② We also know from
previous prediction that the
next word can be “Doug,
②
Jane, Spot”
③ We pass this info through
③
net 1, 2, 3, 4 to
1. Predict ①
2. Ignore
3. Forget
4. Select
42
net1 - Prediction Step
④ The new word is “Doug”, net1 should predict that the next word is “saw”
Also, net1 should know that since the new word is “Doug” it should not see the word
“Doug” again very soon

net1 to make 2 predictions:

1. A positive prediction for
“saw”
2. A negative prediction for
“Doug” (do not expect to
see “Doug” in the near
future) ④

43
net2 - Ignore Step
This example is simple,
we do not need to focus on
ignoring anything

This prediction of
₋ “saw”
₋ “not Doug”
⑤
is passed forward

44
net3 - Forget Step

For the sake of

simplicity, assume,
there is no memory at
the moment
⑥
Therefore,
• “saw”
• “not Doug” ⑤

going forward
④

45
net4 - Selection Step
The selection mechanism
(net4) has learned that when
the most recent word was a
name then the next is either saw
• “saw” or saw
• “.” saw Doug
⑦ Doug

net4 blocks any other words

from coming out so
₋ “not Doug” gets blocked
₋ “saw” goes out
as the prediction for the next
time step
46
Next Prediction Process
So we take a step forward in
time now the word “saw” is
our most recent word and
our most recent prediction

They get passed forward to

all of these neural networks
(net 1, 2, 3, 4) and we get
a new set of predictions

47
net1 - Prediction Step

Because the word “saw” just

occurred we now predict that
the words
• “Doug”,
• “Jane”, or
• “Spot”
might come next

we will pass over ignoring

and attention in this example
again & we will take those
predictions forward

48
net3 - Forget Step
Now the other thing that we
need to consider is our
previous set of possibilities

Remember that we already

had the words
• saw
• not Doug
that we maintain internally
from previous step

They get passed to a

forgetting gate

49
net3 - Forget Step
At the forgetting gate we know:
The last word that occurred was
the word “saw” then the
network can forget it but the
network should keep any ① ⑤
④
predictions about names
For net3: 
③
• forgets “saw” ⑥
• keeps “not Doug”
& now at  we have: ①
②
• a positive vote for “Doug”
• a positive vote for “not Doug”
( or a negative vote for
 After this point the network has only “Jane”
“Doug”)
they cancel each other & “spots” Those get passed forward
50
net4 - Selection Step
The selection gate knows that

• the word “saw” just

occurred and
• a name should happen
next

• so it passes through these

predictions for names and
for the next time step then
we get predictions of
• “Jane”
• “spot”
51
Some mistakes may not happen
This network can avoid:
• Doug saw Doug.
• Jane saw Spot saw …
• Spot. Doug. Jane.

That is because LSTM can look back two, three, many time steps and
use that information to make good predictions about what's going to
happen next.

Note: vanilla recurrent neural networks they can actually look back
several time steps as well but not very many.
52
LSTM Applications
• Translation of text from one language to another language

Even though translation is not a word to word process, it's a phrase to phrase or
even in some cases a sentence to sentence process, LSTMS are able to
represent those grammar structures that are specific to each language and
what it looks like is that they find the higher-level idea and translate it from one
mode of expression to another, just using the bits and pieces that we just
walked through.

53
LSTM Applications
• Translation of speech to text

Speech is just some signals that vary in time. It takes them and uses that then to
predict what text -what word- is being spoken and it can use the history -the
recent history of words- to make a better guess for what's going to come next.

54
LSTM Applications
• LSTMS are a great fit for any information that is embedded in time –
like audio, video

• An agent taking in information from a set of sensors and then based

on that information, making a decision and carrying out an action.

• It’s inherently sequential and actions taken now can influence what is
sensed and what should be done many times steps down the line.

55
Some interesting applications

LSTM PPT
No ratings yet
LSTM PPT
22 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Lec 10
No ratings yet
Lec 10
37 pages
E Flux Journal Issue 2
No ratings yet
E Flux Journal Issue 2
36 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
The Portable Star - Isaac Asimov
No ratings yet
The Portable Star - Isaac Asimov
17 pages
Unit1.5 Mathematics Helps Control Nature and The Occurrences in The World F
88% (66)
Unit1.5 Mathematics Helps Control Nature and The Occurrences in The World F
2 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
Jakobsson, - Rmann - Jakobsson, Sverrir - The Routledge Research Companion To The Medieval Icelandic Sagas-Taylor
No ratings yet
Jakobsson, - Rmann - Jakobsson, Sverrir - The Routledge Research Companion To The Medieval Icelandic Sagas-Taylor
377 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
The Occult Knowledge - Strategies of Epi PDF
100% (3)
The Occult Knowledge - Strategies of Epi PDF
61 pages
RNN 2
No ratings yet
RNN 2
144 pages
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
Unit 3 - Part 02
No ratings yet
Unit 3 - Part 02
40 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
CE6146 Lecture 4
No ratings yet
CE6146 Lecture 4
53 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Formal/Official Letters: Sample - Letter To The Editor
No ratings yet
Formal/Official Letters: Sample - Letter To The Editor
10 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages
Aids Ii
No ratings yet
Aids Ii
42 pages
Deep Learning
No ratings yet
Deep Learning
39 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
Module 5
No ratings yet
Module 5
21 pages
Assessment of SI Dysfunction
No ratings yet
Assessment of SI Dysfunction
24 pages
Sequence Modeling - Recurrent Networks: Biplab Banerjee
No ratings yet
Sequence Modeling - Recurrent Networks: Biplab Banerjee
66 pages
Bộ 5 đề thi giữa HK1 môn Tiếng Anh 12 năm 2022-2023 có đáp án Trường THPT Nguyễn Gia Thiều
No ratings yet
Bộ 5 đề thi giữa HK1 môn Tiếng Anh 12 năm 2022-2023 có đáp án Trường THPT Nguyễn Gia Thiều
30 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
NLP - L8 LSTM
No ratings yet
NLP - L8 LSTM
7 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Module 4
No ratings yet
Module 4
14 pages
Humanoid
No ratings yet
Humanoid
21 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
AAHE Principles of Good Practice PDF
No ratings yet
AAHE Principles of Good Practice PDF
3 pages
ProCash NDC DDC V3021 InstallationManual en PDF
No ratings yet
ProCash NDC DDC V3021 InstallationManual en PDF
432 pages
RNN
No ratings yet
RNN
28 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
LSTM
No ratings yet
LSTM
22 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
Homework Sheets For Second Graders
100% (1)
Homework Sheets For Second Graders
7 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Introduction To Long Short Term Memory LSTM
No ratings yet
Introduction To Long Short Term Memory LSTM
6 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
11U.Essay Writing 101
No ratings yet
11U.Essay Writing 101
18 pages
Binar Sistem Theory
0% (1)
Binar Sistem Theory
28 pages
Heat Transfer
No ratings yet
Heat Transfer
12 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Stock Prediction Using Recurrent Neural Network (RNN)
0% (1)
Stock Prediction Using Recurrent Neural Network (RNN)
24 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Rap Vs Stone Column
No ratings yet
Rap Vs Stone Column
2 pages
Glasgow Theses Service Theses@gla - Ac.uk
No ratings yet
Glasgow Theses Service Theses@gla - Ac.uk
117 pages
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
Securedoc - Manual Cleanup
No ratings yet
Securedoc - Manual Cleanup
14 pages
Unit 3
No ratings yet
Unit 3
8 pages
For Seminar
No ratings yet
For Seminar
17 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
Song Analysis Rakan Zaid M XI-H
No ratings yet
Song Analysis Rakan Zaid M XI-H
2 pages
Introductio 1
No ratings yet
Introductio 1
9 pages
Lakbay Aral
No ratings yet
Lakbay Aral
9 pages
Human Genome Project
No ratings yet
Human Genome Project
36 pages
Metrics For Software Project Size Estimation
No ratings yet
Metrics For Software Project Size Estimation
3 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
Colortrac Cx40 Utilities Service Manual
No ratings yet
Colortrac Cx40 Utilities Service Manual
21 pages
DLD Number System and Conversion
No ratings yet
DLD Number System and Conversion
18 pages
A Conducting Checkerboard: Joseph Henry Laboratories, Princeton University, Princeton, NJ 08544
No ratings yet
A Conducting Checkerboard: Joseph Henry Laboratories, Princeton University, Princeton, NJ 08544
5 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
SWOT Analysis
No ratings yet
SWOT Analysis
22 pages
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
100% (1)
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
20 pages
LP For Reading and Writing Skills
No ratings yet
LP For Reading and Writing Skills
4 pages
Articulation Assignment Final
No ratings yet
Articulation Assignment Final
7 pages
Introduction to Deep Learning
From Everand
Introduction to Deep Learning
Eugene Charniak
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

RNN and LSTM - Explanation by Example

Uploaded by

RNN and LSTM - Explanation by Example

Uploaded by

How Recurrent Neural Networks

and Long Short-Term Memory

By using few examples

• And repeat again

• e.g., I was not home last night,

• Then, it will be helpful to have:

• A prediction of what we might have had yesterday night

Neural networks can

The list (vector) includes

All of them are ZERO

“we have Sushi”

2. A vector for actual

- Output: One vector

We can now connect

Dotted line signifies

If the output vector

Let’s say we have

Objective: to put these

After the training is

we expect that the

The trained net would

The activation function used

Making the output behave

For the input values between

For the values greater than +1

For the values less than -1

Jane saw Spot saw …

Spot. Doug. Jane.

We have short term

We only look back one

- the new information

One solution is to add

to enable the network

ago (from further

In this example, the 1st

• to keep the rest

A part of this will be forgotten &

To select with part of

1. We need to have a filter to

2. We also need to add a new

• net1: to learn to PREDICT

• net2: to learn what to FORGET/KEEP

• net3: to learn what to SELECT

net4: what to ignore

• net1: to learn to predict

• net2: to learn what to forget/keep

• net3: to learn what to select

• net4: to learn what to ignore

• Likewise, a multiplicative output gate unit learns to protect

net1, net2, net3 ,net4 are

net1 to make 2 predictions:

For the sake of

net4 blocks any other words

They get passed forward to

Because the word “saw” just

we will pass over ignoring

Remember that we already

They get passed to a

• the word “saw” just

• so it passes through these

• An agent taking in information from a set of sensors and then based

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.