0% found this document useful (0 votes)

6 views

DLAI4 Networks Recurrent

This document discusses recurrent neural networks and their training. Recurrent neural networks are specialized for sequential data and have feedback connections, making their training more complex than feedforward networks. Backpropagation through time and long short-term memory networks are introduced as methods for training recurrent neural networks.

Uploaded by

rujunhuang2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

DLAI4 Networks Recurrent

Uploaded by

rujunhuang2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Deep Learning and Artificial Intelligence Epiphany 2024

Lecture 14- recurrent neural networks

James Liley

Reading list and references

[Calin, 2020, Chapter 17]

[Zhang et al., 2021, Chapters 9,10]

Contents
1 Introduction 2

2 Dynamical systems 2

3 Recurrent neural networks 3

4 Training recurrent neural networks 4

4.1 Backpropagation through time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Long short-term memory networks (LSTMs) 5

5.1 LSTM setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

6 Applications 7

james.liley@durham.ac.uk 1 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

1 Introduction
So far, we have only really looked at feedforward neural networks, characterised by an architecture in which we
start with inputs, progress through layers, and get to an output.
In this lecture we will look at recurrent neural networks, and depart from this architecture a bit. Recurrent
neural networks are neural networks specialised to model sequential data.

2 Dynamical systems
We shall start by considering a deterministic dynamical system.
Definition 1. A deterministic discrete dynamical system {ht }, t ∈ {1, 2, . . . }, is a sequence of random variables
given by
ht = f (ht−1 ; θ) (1)
for t > 1, where f (the ‘transition function’) is a Borel-measurable function and θ is a set of parameters.
We assume states ht are random variables, although all randomness comes from the first state h0 . We denote
by H (t) the σ-algebra generated by ht , and note that H (t) ⊆ H (t − 1).
Let’s quickly establish a simple property of these objects (example 17.1.1, Calin [2020]):
Theorem 1. Suppose {ht } is a deterministic discrete dynamical system with transition function f and that f
satisfies ||∂f /∂h||2 ≤ λ < 1. Then we have
P ( lim ht = c) = 1 (2)
t→∞
for some constant c; that is, ht → c almost surely.
Proof. By the mean-value theorem, we have
∂f
||f (h; θ) − f (h′ ; θ)||≤ sup || ||||h − h′ ||≤ λ||h − h′ ||
h ∂h
that is, f is a contraction. Recalling that ht are random variables, suppose we consider two values taken h0 . Let
ω, ω ′ be the corresponding values in the sample space of h0 , so the values taken are h0 (ω), h1 (ω ′ ).
Let us consider the values h1 (ω) = f (h0 (ω), θ), h1 (ω ′ ) = f (h0 (ω ′ ), θ), , h2 (ω) = f (h1 (ω), θ), h2 (ω ′ ) =
f (h1 (ω ′ ), θ), h3 (ω) = f (h2 (ω), θ), h3 (ω ′ ) = f (h2 (ω ′ ), θ), . . . . We have
||ht (ω) − ht (ω ′ )|| = ||f (ht−1 (ω), θ) − f (ht−1 (ω ′ ), θ)||≤ λ||ht−1 − ht−1 (ω ′ )||
≤ λ2 ||ht−2 (ω − ht−2 (ω ′ )||≤ . . . ≤ λt ||h1 (ω) − h1 (ω ′ )
→0 (3)
for almost all ω. Hence, with respect to the probability measure of h1 , we have P ({ω : limt→∞ ht (ω) = c}) = 1, as
needed.

In this case, the sequence H (t) of σ−algebras gradually loses all its information: the information I(ht ) tends to
0. This is analogous to the vanishing gradient problem which we encountered in the chapter on training of neural
networks.
We now consider a more general dynamical system, in which we inject new randomness at each t, governed by
a random process Xt (for our purposes, a random process is just a sequence of random variables, not necessarily
independent, but defined on the same probability space).
Definition 2. A discrete dynamical system {ht }, t ∈ N, is a sequence of random variables given by
ht = f (ht−1 , Xt ; θ) (4)
for t > 1, where f is a Borel-measurable function, Xt is a random process, and θ is a set of parameters.
Now our sequence of σ−algebras is a little more complicated. Recalling our notation of S(x) as the σ-algebra
generated by x, and denoting It = S(Xt ) we have
S({ht−1 , Xt }) = S (S(ht−1 ) ∪ S(Xt )) = S(H− ∪ It )
and hence
S(ht ) ⊆ S({ht−1 , Xt }) = S(H− ∪ It )

james.liley@durham.ac.uk 2 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

3 Recurrent neural networks

Let’s suppose that we draw a discrete dynamical system as in figure 1, and add an ‘output’ from each ht as an
outcome Yt as in figure 2.

Figure 1: Illustration of a dynamical system with random process Xt

Figure 2: Illustration of a dynamical system with random process Xt and output Yt

If we consider each block Xt → ht → Yt separately, and model this as a neural network with output Yt , hidden
layer ht and input Xt , then we get what is called a recurrent neural network.
Making this setup a bit more formal, let us say that the sample space of ht is Rk , and the sample space of X t
is Rp . We set this up as:

ht = ϕ (W ht−1 + U Xt + b)
Yt = V ht + C

with appropriately sized matrices W , U , and V . We immediately note a few points

1. Note that each ht depends also on ht−1 : this is what distinguishes a recurrent neural network from a series
of feedforward networks
2. The matrices W , U , and V do not depend on t. This corresponds to the function f being fixed with t.
3. The values ht are vectors with all elements in the range of ϕ().
4. The parameter θ is (W, U, b)
5. We can quickly see the resemblance to a hidden Markov model
6. Recurrent neural networks typically use the activation function ϕ() = tanh().

james.liley@durham.ac.uk 3 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

The essence of the ‘sequence’ of data comes from the matrix W (or the function f in our dynamical system
formulation).

4 Training recurrent neural networks

Recurrent neural networks encounter training examples Xt , Yt sequentially, updating as they go. This does not
lend itself directly to the backpropagation algorithm we have studied for feedforward networks.
Suppose our network has input Xt , output Yt , and target output Zt , over T total times. We have a series of loss
estimates Lt comparing how similar each Yt is to each Zt , with total loss L as
T
X
L= Lt
t=0

As usual, we have a range of choices for L.

4.1 Backpropagation through time

We will consider a backpropagation for T = 2 with activation function ϕ. We have:

hi = ϕ (W hi−1 + U Xi + b)
Yi = V hi + c

for i ∈ {1, 2}. We will use the shorthand:

ai = W hi + U Xi + b (5)
In order to perform gradient descent on the loss function, we need to compute five gradients:

∂L ∂L ∂L ∂L ∂L
∇L = , , , , (6)
∂W ∂V ∂U ∂b ∂c
We start with
∂hi ∂ ∂ai
= ϕ (ai ) = ϕ′ (ai ) = ϕ′ (ai )hi−1
∂W ∂W ∂W
We similarly have
∂hi
= ϕ′ (ai ) (7)
∂b
∂Li
The value Li = Loss(Yi , Zi ) depends on hi only through Yi . Let us write γi = ∂Yi . Now we have

∂Li ∂Li ∂Yi

= = γi V
∂hi ∂Yi ∂hi
and
∂hi ∂ ∂ai
= ϕ(ai ) = ϕ′ (ai ) = ϕ′ (ai )W
∂hi−1 ∂hi−1 ∂hi−1
and
∂hi ∂ ∂ai
= ϕ(ai ) = ϕ′ (ai ) = ϕ′ (ai )Xi
∂U ∂U ∂U
Now we can compute the gradients we need. Since Li depends on V only through Yi , we have
∂L X ∂Li
=
∂V i
∂V
X ∂Li ∂Yi
=
i
∂Yi ∂V
X ∂Li ∂
= (V hi + c)
i
∂Yi ∂V
X
= γi hi
t

james.liley@durham.ac.uk 4 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

and
∂L X ∂Li
=
∂c i
∂c
X ∂Li ∂Yi
=
i
∂Yi ∂c
X ∂Li ∂
= (V hi + c)
i
∂Yi ∂c
X ∂Li
=
t
∂Yi

Computing the gradient of L with respect to W gets difficult. The value L1 depends on W only through h1 , but
L2 depends on W through both h1 and h2 (see figure 2) and in general Li depends on W through all the values
h1 , h2 , . . . hi . To make it more difficult, hi also depends on hi−1 . Let us start by considering just L2 :

∂L2 ∂L2 dh2

=
∂W ∂h2 dW
∂L2 d
= ϕ(W h1 + U X2 + b)
∂h2 dW

∂L2 ∂h2 ∂h2 dh1
= +
∂h2 ∂W ∂h1 dW
= γ2 V (ϕ′ (a2 )h1 + ϕ′ (a2 )W ϕ′ (a1 )h0 ) (8)
∂Li
To calculate ∂W , let us denote
dhi
di = (9)
dW
so d1 = ϕ′ (a1 )h0 , and d2 = ϕ′ (a2 )h1 + ϕ′ (a2 )W ϕ′ (a1 )h0 = ϕ′ (a2 )h1 + ϕ′ (a2 )W d1 . Now, working recursively:

dhi d ∂hi ∂hi dhi−1

di = = ϕ(W hi−1 + U Xi + b) = + = ϕ′ (ai )hi−1 + ϕ′ (a2 )W di−1
dW dW ∂W ∂hi−1 dW

and, finally,
∂L X ∂Li
=
∂W i
∂W
X ∂Li dhi
=
i
∂hi dW
X
= γi V d i
i

∂L ∂L
Similar derivations hold for ∂U and ∂b , which are left as exercises.

4.2 Problems
This does not look great - there are lots of V ’s and W ’s getting multiplied together - and indeed, recurrent neural
networks are very susceptible to both exploding gradients and vanishing gradients. If the eigenvalues of W are all
less than 1, we are very susceptible to vanishing gradients; if any of them exceed 1, we are susceptible to exploding
gradients.

5 Long short-term memory networks (LSTMs)

Not examinable
A tidy way to (partly) overcome the vanishing gradient problem intrinsic to recurrent neural networks is the
long short-term memory network (LSTM) Hochreiter and Schmidhuber [1997].

james.liley@durham.ac.uk 5 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

A recurrent network can be thought of as having long-term ‘memory’ encoded by weights, which are updated
during training, and change slowly. The activations of the network (from calculating a forward pass) constitute
short-term memory. In a sense, the vanishing gradient problem arises because long-term memory is too salient,
and the network cannot react to new information quickly. LSTMs introduce a new type of memory encoded by a
particular type of neuron.
To each hi we attribute a memory Ci , sometimes called an internal state. We may control whether the internal
state:
• should be affected by a given input,
• should affect the output
• should be changed

This is easier if we actually formalise it.

5.1 LSTM setup

The flow of information through an LSTM is illustrated in figure 3.

Figure 3: Information flow through a long-short-term-memory network. Adapted from Zhang et al. [2021].

The flow of information consists of a few steps. Data passes through a series of ‘gates’, called ‘forget’ and ‘input’.
In detail (adapted from Zhang et al. [2021]):
1. We begin with what we previously had: at time t, data Xt and a hidden state ht−1 from the previous time,
and we will output a current hidden state ht and a Yt . We also have a ‘memory’ from the previous time Ct−1 ,
and we will output Ct .

2. We begin by computing four functions F , I, O and C̃ of ht and Xt . All are neural networks, parametrised by
weight matrices WF , WI , WO , WC̃ for ht and UF , UI , UO , UC̃ for Xt respectively, and biases bF , bI , bO , bC̃ .
The first three use logistic activation functions ϕ(x) = σ(x) = (1 + exp(−x))−1 , so have values in (0, 1), and
C̃ uses ϕ(x) = tanh(x) so has values in (−1, 1).

F (ht−1 , Xt ) = σ(WF ht−1 + UF Xt + bF )

I(ht−1 , Xt ) = σ(WI ht−1 + UI Xt + bI )
O(ht−1 , Xt ) = σ(WO ht−1 + UO Xt + bO )
C̃(ht−1 , Xt ) = σ(WC̃ ht−1 + UC̃ Xt + bC̃ )
(10)

The function F is short for ‘forget’, I is ‘input’ and O is ‘output’. The ‘forget’ function tells us how much of
the old Ct value we will forget, and the ‘input’ function tells us how much of the new input we will retain.
The dimensions of F are the same as that of Ct ,

james.liley@durham.ac.uk 6 of 7
Deep Learning and Artificial Intelligence Epiphany 2024

3. We combine F with the previous memory Ct−1 using a Hadamard product ⊙ (element-wise multiplication),
and combine I and C̃ the same way. We add these together to get the new memory state Ct :

Ct = F (ht−1 , Xt ) ⊙ Ct−1 + I(ht−1 , Xt ) ⊙ C̃(ht−1 , Xt )

4. Finally, we take the Hadamard product of O(ht , Xt ) and tanh(Ct ) to get the new hidden state ht .

Why does this help? In a sense, it allows resetting of the hidden state, and the network learns when to do this, in
particular learning to skip irrelevant observations.

6 Applications
Exercises

1. For a deterministic discrete dynamical process {ht }, show that H (t) = H (t − 1) for all random
variables h1 of appropriate dimension if and only if the transition function f (x, θ) is invertible in f .
dhi ∂hi ∂hi
2. Let αi = dW , βi = ∂W , and γi = ∂hi−1 . Find a recursion for ai , and show that for n > 1
 
n−1
X n
Y
αn = βn +  γj  β i (11)
i=1 j=i+1

∂L
3. Derive the backpropagation-through-time formula for ∂U

4. What are the consequences in an LSTM if F = 0? What if F = 1?

5. What are the consequences in an LSTM if I = 0? What if I = 1?

References
Ovidiu Calin. Deep Learning Architectures: A Mathematical Approach. Springer, 2020.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
Aston Zhang, Zachary C Lipton, Mu Li, and Alexander J Smola. Dive into deep learning. arXiv preprint
arXiv:2106.11342, 2021. URL https://d2l.ai/index.html.

james.liley@durham.ac.uk 7 of 7

Signal Reconstruction Using Neural Networks
No ratings yet
Signal Reconstruction Using Neural Networks
16 pages
Chapter 4 Data Sci
No ratings yet
Chapter 4 Data Sci
58 pages
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
No ratings yet
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
57 pages
Lec 10 New
No ratings yet
Lec 10 New
57 pages
Recurrent Neural Networks - Hinton
No ratings yet
Recurrent Neural Networks - Hinton
57 pages
RNN
No ratings yet
RNN
47 pages
M3 L4 RNN Regularization
No ratings yet
M3 L4 RNN Regularization
24 pages
RNN
No ratings yet
RNN
28 pages
L14 Exploding and Vanishing Gradients
No ratings yet
L14 Exploding and Vanishing Gradients
13 pages
Imperial Dlcourse2022 Rnn Notes
No ratings yet
Imperial Dlcourse2022 Rnn Notes
9 pages
AN2DL_04_2324_RecurrentNeuralNetworks
No ratings yet
AN2DL_04_2324_RecurrentNeuralNetworks
34 pages
04 Rnn Slides
No ratings yet
04 Rnn Slides
55 pages
chapter_Recurrent_Neural_Networks
No ratings yet
chapter_Recurrent_Neural_Networks
10 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
CS60010: Deep Learning: Recurrent Neural Network
No ratings yet
CS60010: Deep Learning: Recurrent Neural Network
44 pages
LSTMDerivadas
No ratings yet
LSTMDerivadas
10 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
Module5_notes
No ratings yet
Module5_notes
23 pages
Unit-5-updated
No ratings yet
Unit-5-updated
125 pages
2304.11461v1
No ratings yet
2304.11461v1
15 pages
Recurrent Neural Network - Fundamentals of Deep Learning
No ratings yet
Recurrent Neural Network - Fundamentals of Deep Learning
16 pages
RNN LSTM
No ratings yet
RNN LSTM
71 pages
Two Applications of Deep Learning in The Physical Layer of Communication Systems
No ratings yet
Two Applications of Deep Learning in The Physical Layer of Communication Systems
10 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
Module 2
No ratings yet
Module 2
44 pages
Ch06 Deep Feedforward Networks
No ratings yet
Ch06 Deep Feedforward Networks
90 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
ce0e9d
No ratings yet
ce0e9d
22 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
dis6-sol
No ratings yet
dis6-sol
6 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
No ratings yet
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
50 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Technical DL U4-6
No ratings yet
Technical DL U4-6
98 pages
Technical DL U4-6
No ratings yet
Technical DL U4-6
98 pages
Ảnh màn hình 2025-04-10 lúc 10.10.40
No ratings yet
Ảnh màn hình 2025-04-10 lúc 10.10.40
63 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
Lecture2
No ratings yet
Lecture2
67 pages
DL-2
No ratings yet
DL-2
62 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
No ratings yet
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
33 pages
appendhhdhdh
No ratings yet
appendhhdhdh
17 pages
Udacity Deep LEarning Part4 RNN
No ratings yet
Udacity Deep LEarning Part4 RNN
338 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
20 pages
Deep Neural Networks IID
No ratings yet
Deep Neural Networks IID
36 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Detail of Posts Female 19
No ratings yet
Detail of Posts Female 19
19 pages
Home Economics Strand Time Management Skills and Academic Performance of Senior High Students of Pundaguitan Natonal High School
No ratings yet
Home Economics Strand Time Management Skills and Academic Performance of Senior High Students of Pundaguitan Natonal High School
20 pages
Detailed Lesson Plan: Date: June 20, 2019
No ratings yet
Detailed Lesson Plan: Date: June 20, 2019
2 pages
Cursive Step 4 Rev PDF
100% (1)
Cursive Step 4 Rev PDF
64 pages
Understanding Research Paradigm 25-03-20!16!31 03
No ratings yet
Understanding Research Paradigm 25-03-20!16!31 03
22 pages
Memory
No ratings yet
Memory
97 pages
Speech Styles LP
No ratings yet
Speech Styles LP
5 pages
Nirf SRCC 2024
No ratings yet
Nirf SRCC 2024
8 pages
Grade VI-Computer (V II) MK-Oct Assess, 24.
No ratings yet
Grade VI-Computer (V II) MK-Oct Assess, 24.
4 pages
no.1-ST-teologie Patristica Si Neopatristica-Croitoru
No ratings yet
no.1-ST-teologie Patristica Si Neopatristica-Croitoru
17 pages
Https Student - Nielit.gov - in WEB ViewResult - Aspx Qs ZQoIhiZ iYfvlfulidehNDPabiW8in08uS qTQdF0Ic
No ratings yet
Https Student - Nielit.gov - in WEB ViewResult - Aspx Qs ZQoIhiZ iYfvlfulidehNDPabiW8in08uS qTQdF0Ic
1 page
Synopsis Final
No ratings yet
Synopsis Final
9 pages
Tati Vallejos - PRESENTE CONTINUO 1
No ratings yet
Tati Vallejos - PRESENTE CONTINUO 1
8 pages
02 Vocabulary Worksheet
No ratings yet
02 Vocabulary Worksheet
5 pages
4th Sem Results
No ratings yet
4th Sem Results
2 pages
Adverbs of Frequency: Grammar Quiz
No ratings yet
Adverbs of Frequency: Grammar Quiz
1 page
Differentiating Hardware and Software System
No ratings yet
Differentiating Hardware and Software System
11 pages
ALL INDIA SENIORITY LIST OF ENGINEER CADRE OFFICERS SUPERINTENDING ENGINEER AS ON 01 Jan 2021 3 DT 20 JAN 2021
No ratings yet
ALL INDIA SENIORITY LIST OF ENGINEER CADRE OFFICERS SUPERINTENDING ENGINEER AS ON 01 Jan 2021 3 DT 20 JAN 2021
9 pages
Notes On MULTICULTURAL CURRICULUM WRITE-UP
No ratings yet
Notes On MULTICULTURAL CURRICULUM WRITE-UP
15 pages
Thévenin Equivalent Circuits Q
No ratings yet
Thévenin Equivalent Circuits Q
30 pages
Virtual Arkansas Courses
No ratings yet
Virtual Arkansas Courses
1 page
Kisi kisi bahasa inggris
No ratings yet
Kisi kisi bahasa inggris
7 pages
Alam M-U, Et Al. BMJ Open 2017,7, E015508. Doi, 10.1136-Bmjopen-2016
No ratings yet
Alam M-U, Et Al. BMJ Open 2017,7, E015508. Doi, 10.1136-Bmjopen-2016
10 pages
Formal Lesson Plan Template 3
No ratings yet
Formal Lesson Plan Template 3
2 pages
Agusan Del Sur State College of Agriculture and Technology
No ratings yet
Agusan Del Sur State College of Agriculture and Technology
13 pages
2019 Advanced Control of Membrane Fouling in Filtration Systems Using
No ratings yet
2019 Advanced Control of Membrane Fouling in Filtration Systems Using
24 pages
Week 6 Information Technology and Community Health
No ratings yet
Week 6 Information Technology and Community Health
15 pages
English Skit On Teenage Struggles
No ratings yet
English Skit On Teenage Struggles
3 pages
G9-OCEAN CURRENTS - Learning Plan
No ratings yet
G9-OCEAN CURRENTS - Learning Plan
6 pages
Building A Quantamental Investing Team
No ratings yet
Building A Quantamental Investing Team
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DLAI4 Networks Recurrent

Uploaded by

DLAI4 Networks Recurrent

Uploaded by

Deep Learning and Artificial Intelligence Epiphany 2024

Lecture 14- recurrent neural networks

Reading list and references

[Calin, 2020, Chapter 17]

3 Recurrent neural networks 3

4 Training recurrent neural networks 4

5 Long short-term memory networks (LSTMs) 5

3 Recurrent neural networks

Figure 1: Illustration of a dynamical system with random process Xt

Figure 2: Illustration of a dynamical system with random process Xt and output Yt

with appropriately sized matrices W , U , and V . We immediately note a few points

4 Training recurrent neural networks

As usual, we have a range of choices for L.

4.1 Backpropagation through time

for i ∈ {1, 2}. We will use the shorthand:

∂Li ∂Li ∂Yi

∂L2 ∂L2 dh2

dhi d ∂hi ∂hi dhi−1

5 Long short-term memory networks (LSTMs)

This is easier if we actually formalise it.

5.1 LSTM setup

F (ht−1 , Xt ) = σ(WF ht−1 + UF Xt + bF )

Ct = F (ht−1 , Xt ) ⊙ Ct−1 + I(ht−1 , Xt ) ⊙ C̃(ht−1 , Xt )

4. What are the consequences in an LSTM if F = 0? What if F = 1?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.