0% found this document useful (0 votes)

13 views16 pages

DS303 RNN LSTM

The document provides an overview of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, emphasizing their design for processing sequential data and addressing challenges like long-term dependencies. RNNs maintain hidden states to capture temporal dependencies, while LSTMs introduce gated mechanisms to improve gradient flow and memory retention. The document discusses various RNN architectures, backpropagation techniques, and the advantages of LSTMs in applications such as language processing and speech recognition.

Uploaded by

Adrika Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views16 pages

DS303 RNN LSTM

Uploaded by

Adrika Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DS303 : Introduction to Machine Learning

Sequence Modeling Techniques: RNNs and LSTMs

Manjesh K. Hanawal

April 16, 2025

Recurrent Neural Network
(RNN)
Motivation

▶ Traditional neural networks (e.g., feedforward, CNNs) struggle

with sequential data due to lack of temporal modeling.
▶ Inputs are treated independently, ignoring order and context
— not suitable for time-dependent tasks.
▶ Real-world problems often involve sequences with
dependencies across time (e.g., language, speech, stock
prices).
▶ Need for a model that can:
▶ Handle variable-length sequences
▶ Capture temporal patterns and contextual dependencies
▶ RNNs are designed specifically for sequential data, enabling
memory of past inputs and dynamic behavior over time.
Examples

Videos as image sequences:

Recognizing activities like Surya Part-of-speech tagging: Tag
Namaskar requires capturing prediction depends on current
temporal context. and previous words.
Introduction : RNN
▶ Recurrent Neural Networks (RNNs) are a class of neural
networks designed to process sequential data.
▶ RNNs maintain a hidden state that captures information from
previous time steps, allowing them to learn temporal
dependencies.
▶ They are widely used in tasks such as language modeling, text
generation, speech recognition, and time-series forecasting.
▶ The recurrent structure enables the network to use its memory
to influence current outputs based on past inputs.

Figure: Basic RNN structure for sequence learning

Important Design Patterns of RNN: (1/3)
▶ Recurrent networks that produce an output at each time step
and have recurrent connections between hidden units.

Figure: Type 1 RNN: An input sequence is mapped to an output

sequence, with recurrent connections across time. The network uses
shared parameters (U for input-to-hidden, W for hidden-to-hidden, V for
hidden-to-output), and computes a loss at each time step. The right side
shows the unrolled computational graph.
Important Design Patterns of RNN: (2/3)
▶ Recurrent networks that produce an output at each time step
and have recurrent connections only from the output at one
time step to the hidden units at the next time step.

Figure: Type 2 RNN: Recurrence exists only from the output to the
hidden layer. Each time step is trained independently, enabling
parallelization. While less expressive than fully recurrent RNNs, this
structure is simpler and easier to train.
Important Design Patterns of RNN: (3/3)
▶ Recurrent networks with recurrent connections between
hidden units, that read an entire sequence and then produce a
single output.

Figure: Type 3 RNN: A sequence-to-one model that processes the entire

input sequence and produces a single output at the end. Useful for tasks
like classification, where a fixed-size representation summarizes the
sequence.
Backpropagation Through Time (BPTT) (1/2)
▶ Recurrent neural networks compute hidden states and outputs
for sequences using shared weights across time.
▶ At each time step t = 1 to τ , we apply:

a(t) = Wh(t−1) + Ux (t) + b (Pre-activation)

h(t) = tanh(a(t) ) (Hidden state)
(t) (t)
o = Vh +c (Output logits)
ŷ (t) = softmax(o (t) ) (Predicted output)
▶ Parameters: U (input-hidden), W (hidden-hidden), V
(hidden-output), with biases b, c.
▶ Total loss over the sequence:
τ
X
L=− log pmodel (y (t) | x (1) , ..., x (t) )
t=1

▶ BPTT unfolds the RNN through time and applies

backpropagation to compute gradients over all time steps.
Backpropagation Through Time (BPTT) (2/2)
▶ BPTT is used to compute gradients for RNNs by unrolling the
network over time.
▶ Gradients are computed by backpropagating from the final
time step to the beginning.
▶ Assumes softmax output and negative log-likelihood loss:
∂L (t)
(t)
= ŷi − 1i,y (t)
∂oi
▶ Gradient at hidden layer at final step:
∇h(τ ) L = V ⊤ ∇o (τ ) L
▶ Recursive update for t < τ :

∇h(t) L = W ⊤ ∇h(t+1) L ⊙ (1 − (h(t+1) )2 ) + V ⊤ ∇o (t) L
▶ Parameter gradients (e.g., for W and U) are accumulated
over time steps:
⊤
X
∇W L = diag 1 − (h(t) )2 (∇h(t) L)h(t−1)
t
Challenge of Long-Term Dependencies (1/2)

▶ Main Problem: Gradients in RNNs tend to:

▶ Vanish most of the time
▶ Explode rarely, but severely
▶ Why It Happens:
▶ Repeated application of functions per time step causes
gradient decay/explosion
▶ Long-term dependencies involve products of many Jacobians
→ exponentially smaller gradients
▶ Simplified Example:
▶ h(t) = W ⊤ h(t−1) , recursively becomes h(t) = W t h(0)
▶ If W = QΛQ ⊤ , then h(t) = QΛt Q ⊤ h(0)
▶ Eigenvalues:
▶ |λ| < 1 → vanishing gradients
▶ |λ| > 1 → exploding gradients
Challenge of Long-Term Dependencies (2/2)
▶ Comparison to Non-Recurrent Networks:
▶ With separate weights at each step, careful variance control
can stabilize gradients
▶ Sussillo (2014) shows deep feedforward networks avoid
vanishing gradients via scaling
▶ Empirical Findings:
▶ Bengio et al. (1994): SGD fails to train standard RNNs on
sequences of just 10–20 steps
▶ Why It’s Hard to Avoid:
▶ Even stable memory representations cause gradients to vanish
▶ Long-term dependencies yield smaller gradients than
short-term ones
▶ Conclusion:
▶ Learning long-term dependencies is fundamentally hard for
RNNs
▶ Remains one of the core challenges in deep learning
Long Short Term Memory
(LSTM)
LSTM: Long Short-Term Memory
Key Idea: LSTM introduces self-loops to allow gradients to flow
for long durations.
Why LSTM?
▶ Standard RNNs struggle with long-term dependencies due to
vanishing/exploding gradients.
▶ LSTM uses gated mechanisms to control information flow.
Self-loop Weighting:
▶ Self-loop weight is dynamically controlled by another hidden
unit.
▶ This allows flexible memory retention based on context.

LSTM has proven successful in applications such as:

▶ Handwriting and speech recognition
▶ Machine translation and image captioning
LSTM Cell Structure

Components:
▶ Forget Gate ft
▶ Input Gate it
▶ Output Gate ot
▶ Cell state ct (long term
memory)
▶ Hidden state ht (short term
memory)
Gating Equations:
LSTM Cell Block Diagram
ft = σ(Wf [ht−1 , xt ] + bf )

it = σ(Wi [ht−1 , xt ] + bi )
LSTM Cell Update and Output

State and Output Updates:

c̃t = tanh(Wc [ht−1 , xt ] + bc )

ct = ft ⊙ ct−1 + it ⊙ c̃t
ot = σ(Wo [ht−1 , xt ] + bo )
ht = ot ⊙ tanh(ct )
Key Benefits:
▶ Learns long-term dependencies
▶ Gated structure avoids vanishing/exploding gradients
▶ Strong empirical performance in many NLP and vision tasks

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
Stock Prediction Using Recurrent Neural Network (RNN)
0% (1)
Stock Prediction Using Recurrent Neural Network (RNN)
24 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
10DL
No ratings yet
10DL
20 pages
Deep Arch MSC 2024
No ratings yet
Deep Arch MSC 2024
83 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
4-Recurrent Neural Network
No ratings yet
4-Recurrent Neural Network
21 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
DL Notes
No ratings yet
DL Notes
35 pages
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
No ratings yet
Modelling Time Series With Neural Networks: Volker Tresp Summer 2017
24 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
RNNs
No ratings yet
RNNs
22 pages
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
No ratings yet
Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey
15 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
No ratings yet
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
50 pages
RNN
No ratings yet
RNN
4 pages
Lec 10
No ratings yet
Lec 10
37 pages
Unit IV
No ratings yet
Unit IV
22 pages
STMs and LSTM Variations For Prediction
No ratings yet
STMs and LSTM Variations For Prediction
16 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
What Is An RNN
No ratings yet
What Is An RNN
6 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
Unit 4
No ratings yet
Unit 4
50 pages
Unit 4
No ratings yet
Unit 4
34 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
RNN
No ratings yet
RNN
23 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
11 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
No ratings yet
Chap 10-2 Sequence Modeling Recurrent and Recursive Net-Hyun-Lim Yang
39 pages
RNN With LSTM
No ratings yet
RNN With LSTM
36 pages
Unit 4 - Merged
No ratings yet
Unit 4 - Merged
13 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
Towardsdatascience
No ratings yet
Towardsdatascience
10 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
44 pages
RNN 2
No ratings yet
RNN 2
144 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
Unit 9 ANN
No ratings yet
Unit 9 ANN
14 pages
Recent Advances
No ratings yet
Recent Advances
21 pages
Assignment # 02
No ratings yet
Assignment # 02
1 page
Data Representation in ML
No ratings yet
Data Representation in ML
21 pages
Object Detection in Satellite Images by Faster R-CNN Incorporated With Enhanced ROI Pooling (FrRNet-ERoI) Framework
No ratings yet
Object Detection in Satellite Images by Faster R-CNN Incorporated With Enhanced ROI Pooling (FrRNet-ERoI) Framework
18 pages
AI Algorithms Summary by Djemoui Badr
No ratings yet
AI Algorithms Summary by Djemoui Badr
5 pages
Sample Template - Advance Data Science Students
No ratings yet
Sample Template - Advance Data Science Students
3 pages
Classify Uppercase Letters and Lowercase Letters Using Perceptron Network
No ratings yet
Classify Uppercase Letters and Lowercase Letters Using Perceptron Network
6 pages
CV - Michigan Med
No ratings yet
CV - Michigan Med
1 page
C V
No ratings yet
C V
2 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
Chapter 9 - ANNs
No ratings yet
Chapter 9 - ANNs
25 pages
Artificial Intelligence Chapter 20.5: Neural Networks
No ratings yet
Artificial Intelligence Chapter 20.5: Neural Networks
84 pages
III B. Tech II Sem (R21) CSE-AI - DL - Model Question Paper Set-2
No ratings yet
III B. Tech II Sem (R21) CSE-AI - DL - Model Question Paper Set-2
3 pages
Project Ideas
No ratings yet
Project Ideas
10 pages
1.2.1. Machine Learning
No ratings yet
1.2.1. Machine Learning
2 pages
Curated List of AI and Machine Learning Resources From Around The Web - by Robbie Allen - Machine Learning in Practice - Medium
No ratings yet
Curated List of AI and Machine Learning Resources From Around The Web - by Robbie Allen - Machine Learning in Practice - Medium
9 pages
Spam News Detection
No ratings yet
Spam News Detection
5 pages
C15-Momentum RMSProp Adam
No ratings yet
C15-Momentum RMSProp Adam
23 pages
Bits F464
No ratings yet
Bits F464
3 pages
SLR Ocr
No ratings yet
SLR Ocr
28 pages
Chapter3 Classification Summary Final
No ratings yet
Chapter3 Classification Summary Final
11 pages
Disaster Assessment From Satellite Imagery Using Deep Learning
No ratings yet
Disaster Assessment From Satellite Imagery Using Deep Learning
8 pages
R-21-Cse-major Project Title & Abstract Form
No ratings yet
R-21-Cse-major Project Title & Abstract Form
1 page
[Studies in Computational Intelligence 740] Khaled Shaalan,Aboul Ella Hassanien,Fahmy Tolba (eds.) - Intelligent Natural Language Processing_ Trends and Applications (2018, Springer International Publishing).pdf
No ratings yet
[Studies in Computational Intelligence 740] Khaled Shaalan,Aboul Ella Hassanien,Fahmy Tolba (eds.) - Intelligent Natural Language Processing_ Trends and Applications (2018, Springer International Publishing).pdf
763 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
Image Animations On Driving Videos
No ratings yet
Image Animations On Driving Videos
6 pages
GHJGKJ
No ratings yet
GHJGKJ
16 pages
3 Hours / 70 Marks: Seat No
No ratings yet
3 Hours / 70 Marks: Seat No
2 pages
Artificial Intelligence Assignment 1: 13) Construct The Partitioned Semantic Net Representations For The Following
No ratings yet
Artificial Intelligence Assignment 1: 13) Construct The Partitioned Semantic Net Representations For The Following
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DS303 RNN LSTM

Uploaded by

DS303 RNN LSTM

Uploaded by

DS303 : Introduction to Machine Learning

Sequence Modeling Techniques: RNNs and LSTMs

April 16, 2025

▶ Traditional neural networks (e.g., feedforward, CNNs) struggle

Videos as image sequences:

Figure: Basic RNN structure for sequence learning

Figure: Type 1 RNN: An input sequence is mapped to an output

Figure: Type 3 RNN: A sequence-to-one model that processes the entire

a(t) = Wh(t−1) + Ux (t) + b (Pre-activation)

▶ BPTT unfolds the RNN through time and applies

▶ Main Problem: Gradients in RNNs tend to:

LSTM has proven successful in applications such as:

State and Output Updates:

c̃t = tanh(Wc [ht−1 , xt ] + bc )

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.