Lecture 11
Lecture 11
Lecture 11
· Each unit u
of recurrent
layer L has a
-
state/h
<, u
&
-
memory
unit
of the
RVN
Seqseq - , specifically LSTN
remember/(
& gating (on/off) 1 0
.
,
0 . 5 ,
0 0
.
Through time
didn't e
das
~
&
do
in
>
-
I
Motivational Example
• Introduction to RNN:
• https://www.youtube.com/watch?v=LHXXI4-IEns (10 min)
vanishing gradient causes to
>
- problem us
move
away from RNN towards LSTM Set dimensionality of parameter matrix Vj such that Vjhjt results
in a vector of dimension c (# classes)
Unfolding
>
-
LSTM and GRU
• Watch: https://www.youtube.com/watch?v=8HyCNIVRbSU (11min)
ct-1 ct ht-1 ht
ht-1 ht
xt xt
-
Minimal Gated Recurrent Unit
Each “cell” is made up of multiple units (sizel). One unit shown here:
“Forget” or “Update” gate
true for
both
Whig SGD)
(aery !
Advanced RNN Architectures
• Other important extensions to RNNs include:
• A generalization of an RNN is a recursive neural network
• bi-directional RNNs
• RNNs with attention (see extended Ch6 material on course wiki)
• Attention-only networks = transformers…
• sequence-to-sequence RNN models.
• Frequently used to build neural machine translation models and other models for text to
text transformations.
• Will see this later in the textbook (section 7.7)…
• Combinations of CNN+LSTM
• Image Captioning
• Video:
• CNN on indiv frames to extact feature vectors, LSTM for time-series
• Or look at 3D conv with fixed time window
Textbook Recommended Readings for RNN:
• An extended version of Chapter 6 with RNN unfolding, bidirectional RNN, and attention
• The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy (2015)
• Recurrent Neural Networks and LSTM by Niklas Donges (2018)
• Understanding LSTM Networks by Christopher Olah (2015)
• Introduction to RNNs by Denny Britz (2015)
• Implementing a RNN with Python, Numpy and Theano by Denny Britz (2015)
• Backpropagation Through Time and Vanishing Gradients by Denny Britz (2015)
• Implementing a GRU/LSTM RNN with Python and Theano by Denny Britz (2015)
• Simplified Minimal Gated Unit Variations for Recurrent Neural Networks by Joel Heck and
Fathi Salem (2017)
Transformers: “Attention is all you need”
Great 1-hour Transformers Tutorial
• Transformers (“Attention is all you need” 2017)
• Attention: watch ~11 mins from 10min mark: Transformers with Lucas Beyer
• “LSTM is Dead. Long Live Transformers” (2019)
• https://www.youtube.com/watch?v=S27pHKBEp30 (~45min)
• Warning: we won’t cover NLP for a few weeks…
• Code and pre-trained models: github.com/huggingface/transformers
• “The Illustrated Transformer”: http://jalammar.github.io/illustrated-transformer/
• Transformers replacing CNN for image analysis…
• “An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale”
https://arxiv.org/pdf/2010.11929.pdf (2021)
• ViT = Vision Transformer
• Break image into patches; flatten each patch into a vector; add positional information (where
did patch come from within image?); get ‘sequence’ of encoded patches; compute key, value,
query using linear layer; compute attention; MLP/FFNN; …