0% found this document useful (0 votes)

3 views32 pages

11 RNN

The document is a lecture on Recurrent Neural Networks (RNNs) from a Natural Language Processing course, covering topics such as finite state machines, training methods, and various RNN architectures including LSTM. It discusses the importance of maintaining state for tasks like text classification and language modeling, as well as challenges like vanishing and exploding gradients. Additionally, it introduces advanced concepts like bidirectional RNNs, deep RNNs, and sequence-to-sequence models.

Uploaded by

Celin Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views32 pages

11 RNN

Uploaded by

Celin Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CS5740: Natural Language Processing

Spring 2017

Recurrent Neural Networks

Instructor: Yoav Artzi

Adapted from Yoav Goldberg’s Book and slides by Sasha Rush

Overview
• Finite state models
• Recurrent neural networks (RNNs)
• Training RNNs
• RNN Models
• Long short-term memory (LSTM)
Text Classification
• Consider the example:
– Goal: classify sentiment
How can you not see this movie?
You should not see this movie.
• Model: unigrams and bigrams
• How well will the classifier work?
– Similar unigrams and bigrams
• Generally: need to maintain a state to
capture distant influences
Finite State Machines
• Simple, classical way of representing
state
• Current state: saves necessary past
information
• Example: email address parsing
Deterministic Finite State Machines
• 𝑆 – states
• Σ – vocabulary
• 𝑠$ ∈ 𝑆 – start state
• 𝑅: 𝑆 ×Σ → 𝑆 – transition function

• What does it do?

– Maps input 𝑤, , … , 𝑤/ to states 𝑠, , … , 𝑠/
– For all 𝑖 ∈ 1, … , 𝑛
𝑠3 = 𝑅(𝑠36, , 𝑤3 )
• Can we use it for POS tagging? Language
modeling?
Types of State Machines
• Acceptor
– Compute final state 𝑠/ and make a decision
based on it: 𝑦 = 𝑂(𝑠/ )
• Transducers
– Apply function 𝑦3 = 𝑂(𝑠3 ) to produce output
for each intermediate state
• Encoders
– Compute final state 𝑠/ , and use it in another
model
Recurrent Neural Networks
• Motivation:
– Neural network model, but with state
– How can we borrow ideas from FSMs?
• RNNs are FSMs …
– … with a twist
– No longer finite in the same sense
RNN
• 𝑆 = ℝ;<=> - hidden state space
• Σ = ℝ;=? - input state space
• 𝒔$ ∈ 𝑆 - initial state vector
• 𝑅 ∶ ℝ;=? ×ℝ;<=> → ℝ;<=> - transition
function
• Simple definition of 𝑅:
𝑅BCDE/ 𝒔, 𝒙 = tanh( 𝒙, 𝒔 𝑾 + 𝒃)

* Notation: vectors and matrices are bold Elman (1990)

RNN
• Map from dense sequence to dense
representation
– 𝒙, , … , 𝒙/ → 𝒔, , … , 𝒔/
– For all 𝑖 ∈ 1, … , 𝑛
𝒔3 = 𝑅 𝒔36, , 𝒙
– 𝑅 is parameterized, and parameters are shared
between all steps
– Example:
𝒔N = 𝑅 𝒔O , 𝒙N = ⋯ = 𝑅(𝑅 𝑅 𝑅 𝒔$ , 𝒙, , 𝒙Q , 𝒙O , 𝒙N )
RNNs
• Hidden states 𝒔3 can be used in different
ways
• Similar to finite state machines
– Acceptor
– Transducer
– Encoder
• Output function maps vectors to symbols:
𝑂: ℝ;<=> → ℝ;RST
• For example: single layer + softmax
𝑂 𝒔3 = softmax(𝒔3 𝑾 + 𝒃)
Graphical Representation
Recursive Representation Unrolled Representation
Graphical Representation
Training
• RNNs are trained with SGD and Backprop
• Define loss over outputs
– Depends on supervision and task
• Backpropagation through time (BPTT)
– Run forward propagation
– Run backward propagation
– Update all weights
• Weights are shared between time steps
– Sum the contributions of each time step to the gradient
• Inefficient
– Batch helps, common but tricky to implement with
variable-size models
RNN: Acceptor Architecture
• Only care about the output from the last hidden
state
• Train: supervised, loss on prediction
• Example:
– Text classification
Language Modeling
• Input: 𝑋 = 𝑥, , … , 𝑥/
• Goal: compute 𝑝(𝑋)
• Bi-gram decomposition:
/

𝑝 𝑋 = ] 𝑝(𝑥3 ∣ 𝑥36, )
3_,
• With RNNs, can do non-Markovian models:
/

𝑝 𝑋 = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, )
3_,
RNN: Transducer Architecture
• Predict output for every time step
Language Modeling
• Input: 𝑋 = 𝑥, , … , 𝑥/
• Goal: compute 𝑝(𝑋)
• Model:
/

𝑝 𝑋 = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, )
3_,
𝑝 𝑥3 𝑥, , … , 𝑥36, = 𝑂 𝒔3 = 𝑂(𝑅 𝒔36, , 𝒙3 )
𝑂 𝒔3 = softmax(𝑠3 𝑾 + 𝒃)
• Predict next token 𝑦`3 as we go:
𝑦`3 = argmax𝑂(𝒔3 )
RNN: Transducer Architecture
• Predict output for every time step
• Examples:
– Language modeling
– POS tagging
– NER
RNN: Encoder Architecture
• Similar to acceptor
• Difference: last state is used as input to
another model and not for prediction
𝑂 𝑠3 = 𝑠3 à 𝑦/ = 𝑠/
• Example:
– Sentence embedding
Bidirectional RNNs
• RNN decisions are based on historical data only
– How can we account for future input?
• When is it relevant? Feasible?
Bidirectional RNNs
• RNN decisions are based on historical data only
– How can we account for future input?
• When is it relevant? Feasible?
• When all the input is possible. So not in real-time input, for example.
• Probabilistic model, for example for language modeling:
/

𝑝 𝑋 = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, , 𝑥3c, , … , 𝑥/ )

3_,
Deep RNNs
• Can also make RNNs deeper (vertically) to
increase the model capacity
RNN: Generator
• Special case of the transducer architecture
• Generation conditioned on 𝒔$
• Probabilistic model:
/

𝑝 𝑋 𝑠$ = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, , 𝑠$ )
3_,
Example: Caption Generation
• Given: image 𝐼
• Goal: generate caption
• Set 𝒔$ = CNN(𝐼)
• Model:
/

𝑝 𝑋 𝐼 = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, , 𝐼)
3_,

Examples from Karpathy

and Fei-Fei 2015
Sequence-to-Sequence
• Connect encoder and
generator
• Many alternatives:
– Set generator 𝒔;$ to
encoder output 𝒔g/
– Concatenate
generator 𝒔;$ with
each step input
during generation
• Examples:
– Machine translation
– Chatbots
– Dialog systems
• Can also generate
other sequences – not
only natural language!
Long-range Interactions
• Promise: Learn long-range interactions of
language from data
• Example:
How can you not see this movie?
You should not see this movie.
• Sometimes: requires ”remembering” early
state
– Key signal here is at 𝑠, , but gradient is at 𝑠/
Long-term Gradients
• Gradient go through (many) multiplications
• OK at end layers à close to the loss
• But: issue with early layers
• For example, derivative of tanh
𝑑
tanh 𝑥 = 1 − tanhQ 𝑥
𝑑𝑥
– Large activation à gradient disappears
• In other activation functions, values can
become larger and larger
Exploding Gradients
• Common when there is
not saturation in activation
(e.g., ReLu) and we get
exponential blowup
• Result: reasonable short-
term gradient, but bad
long-term ones
• Common heuristic:
– Gradient clipping:
bounding all gradients by
maximum value
Vanishing Gradients
• Occurs when multiplying small values
– For example: when tanh saturates
• Mainly affects long-term gradients
• Solving this is more complex
Long Short-term Memory (LSTM)

Hochreiter and Schmidhuber (1997)

LSTM vs. Elman RNN
LSTM

Output

Cell State

ft = (Wf [ht 1 , xt ] + bf )
it = (Wi [ht 1 , xi ] + bf )
ct =ft ct 1 + it tanh(Wc [ht 1 , xi ] + bc )
ot = (Wo [ht 1 , xi ] + bo )
ht =ot tanh(ct )

Input
Image by Tim Rocktäschel

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
No ratings yet
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
9 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Lecture8 421
No ratings yet
Lecture8 421
85 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
10 RNN
No ratings yet
10 RNN
56 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
NLP Lecture 6
No ratings yet
NLP Lecture 6
57 pages
UNIT-3 Sequence Modeling
No ratings yet
UNIT-3 Sequence Modeling
20 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
Lecture 5
No ratings yet
Lecture 5
102 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
RNNs
No ratings yet
RNNs
22 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
RNN With LSTM
No ratings yet
RNN With LSTM
36 pages
Recurrent Neural Networks (RNN) : Subtitle
No ratings yet
Recurrent Neural Networks (RNN) : Subtitle
53 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
Unit 3 - Part 02
No ratings yet
Unit 3 - Part 02
40 pages
Lec 10
No ratings yet
Lec 10
37 pages
Sequence Learning Problem
No ratings yet
Sequence Learning Problem
42 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
Bianchi
No ratings yet
Bianchi
62 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
AN2DL 04 2324 RecurrentNeuralNetworks
No ratings yet
AN2DL 04 2324 RecurrentNeuralNetworks
34 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
cs224n spr2024 Lecture06 Fancy RNN
No ratings yet
cs224n spr2024 Lecture06 Fancy RNN
56 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Embedding
No ratings yet
Embedding
55 pages
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
No ratings yet
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
24 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Sequence Models - Merged
No ratings yet
Sequence Models - Merged
67 pages
RNN-1 All
No ratings yet
RNN-1 All
44 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Lecture10 Lstms
No ratings yet
Lecture10 Lstms
34 pages
The Unreasonable Effectiveness of Recurrent Neural Networks
No ratings yet
The Unreasonable Effectiveness of Recurrent Neural Networks
1 page
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
RNN and LSTM
No ratings yet
RNN and LSTM
65 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Big data unit 1
No ratings yet
Big data unit 1
33 pages
Python For BME and EEE
No ratings yet
Python For BME and EEE
9 pages
It Criterion 7 28.6.25 Final
No ratings yet
It Criterion 7 28.6.25 Final
17 pages
Unit 5 - Part A
No ratings yet
Unit 5 - Part A
13 pages
DAA FINAL Lab Rec
No ratings yet
DAA FINAL Lab Rec
47 pages
Aiml Record
No ratings yet
Aiml Record
42 pages
Ad3381 DBDM Lab Programs Print Final
No ratings yet
Ad3381 DBDM Lab Programs Print Final
98 pages
AI - Module 1&2
No ratings yet
AI - Module 1&2
2 pages
Unit 1 Question and Answers
100% (1)
Unit 1 Question and Answers
29 pages
Slides 11 - Image Pattern Classification
No ratings yet
Slides 11 - Image Pattern Classification
86 pages
CS 236 Section 3
No ratings yet
CS 236 Section 3
59 pages
G5Aiai Introduction To AI: Graham Kendall
No ratings yet
G5Aiai Introduction To AI: Graham Kendall
48 pages
Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Big Data
No ratings yet
Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Big Data
21 pages
Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
No ratings yet
Understanding Deep Learning DNN RNN LSTM CNN and R-CNN
6 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
A Comprehensive Review On Fake News Detection With Deep Learning
No ratings yet
A Comprehensive Review On Fake News Detection With Deep Learning
20 pages
Associative Memory Hop Field Networks
No ratings yet
Associative Memory Hop Field Networks
66 pages
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
No ratings yet
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
35 pages
Long Short-Term Memory
No ratings yet
Long Short-Term Memory
9 pages
Module 1
No ratings yet
Module 1
66 pages
ANN Architectures
No ratings yet
ANN Architectures
26 pages
3 Deep Learning Overview v3.5
No ratings yet
3 Deep Learning Overview v3.5
85 pages
Unit 1
No ratings yet
Unit 1
23 pages
DL4CV Seq Att
No ratings yet
DL4CV Seq Att
63 pages
Unit 3 DLT
No ratings yet
Unit 3 DLT
10 pages
3 2MLP Extra Notes
No ratings yet
3 2MLP Extra Notes
27 pages
CNN - NASA Battery Dataset
No ratings yet
CNN - NASA Battery Dataset
7 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
10.2.4 RNN-Context
No ratings yet
10.2.4 RNN-Context
10 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
4 pages
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
No ratings yet
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
5 pages
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
No ratings yet
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
15 pages
Transformers Explained "Attention Is All You Need."
No ratings yet
Transformers Explained "Attention Is All You Need."
28 pages
LSTM Architecture Presentation
No ratings yet
LSTM Architecture Presentation
18 pages
Intro To Neural Networks
No ratings yet
Intro To Neural Networks
100 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

11 RNN

Uploaded by

11 RNN

Uploaded by

CS5740: Natural Language Processing

Recurrent Neural Networks

Instructor: Yoav Artzi

Adapted from Yoav Goldberg’s Book and slides by Sasha Rush

• What does it do?

* Notation: vectors and matrices are bold Elman (1990)

𝑝 𝑋 = ] 𝑝(𝑥3 ∣ 𝑥, , … , 𝑥36, , 𝑥3c, , … , 𝑥/ )

Examples from Karpathy

Hochreiter and Schmidhuber (1997)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.