0% found this document useful (0 votes)
8 views21 pages

Lecture 11

Lecture 11 introduces recurrent neural networks (RNN) and Long Short-Term Memory (LSTM) networks, focusing on their application in analyzing sequential data. Key concepts include the differences between RNNs and multi-layer perceptrons (MLPs), the function of gating sub-units in LSTMs, and a software framework for building RNNs. The lecture also covers advanced RNN architectures, including bi-directional RNNs and attention mechanisms, and suggests additional readings and resources for further understanding.

Uploaded by

Esraa Al dn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Lecture 11

Lecture 11 introduces recurrent neural networks (RNN) and Long Short-Term Memory (LSTM) networks, focusing on their application in analyzing sequential data. Key concepts include the differences between RNNs and multi-layer perceptrons (MLPs), the function of gating sub-units in LSTMs, and a software framework for building RNNs. The lecture also covers advanced RNN architectures, including bi-directional RNNs and attention mechanisms, and suggests additional readings and resources for further understanding.

Uploaded by

Esraa Al dn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

SYSC4415

Introduction to Machine Learning

Lecture 11

Prof James Green


jrgreen@sce.Carleton.ca
Systems and Computer Engineering, Carleton University
Learning Objectives for Lecture 11
• Introduce recurrent neural networks (RNN) and Long Short-Term
Memory (LSTM) networks for analyzing sequential data
• Understand how a recurrent neural network (RNN) differs from a MLP
• Understand the function of each gating sub-unit within an LSTM
• Introduce at least one software framework for building, training, and
testing RNN/LSTM
Pre-Lecture Assignment
• Chapter 6.2.2(72-75 = 4 pages)
• https://www.youtube.com/watch?v=WCUNPb-5EYI
• (RNN and LSTM at a conceptual level; 26min, or 17min @ 1.5 speed)
Key terms
• Recurrent neural networks (RNNs), state, softmax function,
backpropagation through time, long short-term memory (LSTM),
Gated Recurrent Unit (GRU), minimal gated GRU, bi-directional RNNs,
attention, sequence-to-sequence RNN, recursive neural network.
In-Class Activities
• Review key concepts in the chapter through discussion,
PollEverywhere questions
• Tutorial: Review a Jupyter notebook that builds, trains, and tests a
LSTM network using Keras
RNN : used to label , classify ,
generate sequences
>
- different from FNN (contains Wops)

· Each unit u
of recurrent
layer L has a
-
state/h
<, u
&
-

memory
unit
of the

RVN
Seqseq - , specifically LSTN

element by element addition

remember/(
& gating (on/off) 1 0
.

,
0 . 5 ,
0 0
.

Parameters : , , blu , in matrix) , Enu


found using gradient descent with backpropagation .

Through time
didn't e
das
~

&
do
in
>
-
I
Motivational Example
• Introduction to RNN:
• https://www.youtube.com/watch?v=LHXXI4-IEns (10 min)
vanishing gradient causes to
>
- problem us

move
away from RNN towards LSTM Set dimensionality of parameter matrix Vj such that Vjhjt results
in a vector of dimension c (# classes)

Recurrent Neural Network Hidden Layers


Softmax function:
Input sequence:
=>
-
RNN Unfolding/unrolling
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

• Left: https://www.youtube.com/watch?v=S0XFd0VMFss (2-6 of 8min)

• Right: https://www.youtube.com/watch?v=_h66BW-xNgk&index=1&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI (~15 min mark)


RNN Unfolding

Unfolding
>
-
LSTM and GRU
• Watch: https://www.youtube.com/watch?v=8HyCNIVRbSU (11min)

ct-1 ct ht-1 ht

ht-1 ht
xt xt
-
Minimal Gated Recurrent Unit
Each “cell” is made up of multiple units (sizel). One unit shown here:
“Forget” or “Update” gate

Textbook: Minimal Gated GRU (gated recurrent unit)

1) New potential memory cell value f(inputs and ht-1);&


g1 = tanh
2) Memory forget gate; 1=forget=take new, 0=keep=ignore new; f(inputs and ht-1)
- g2 “gate function” uses sigmoid function  Fgate value;
3) New memory cell value. Either take new h~ value, or keep ht-1; f(Fgate and ht-1)
4) Vector of new memory cell values. 1 per unit in this layer
5) Output vector;-
g3=softmax

Discussion of dimensions of signals in an LSTM: https://mmuratarat.github.io//2019-01-19/dimensions-of-lstm


-

true for
both
Whig SGD)
(aery !
Advanced RNN Architectures
• Other important extensions to RNNs include:
• A generalization of an RNN is a recursive neural network
• bi-directional RNNs
• RNNs with attention (see extended Ch6 material on course wiki)
• Attention-only networks = transformers…
• sequence-to-sequence RNN models.
• Frequently used to build neural machine translation models and other models for text to
text transformations.
• Will see this later in the textbook (section 7.7)…
• Combinations of CNN+LSTM
• Image Captioning
• Video:
• CNN on indiv frames to extact feature vectors, LSTM for time-series
• Or look at 3D conv with fixed time window
Textbook Recommended Readings for RNN:
• An extended version of Chapter 6 with RNN unfolding, bidirectional RNN, and attention
• The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy (2015)
• Recurrent Neural Networks and LSTM by Niklas Donges (2018)
• Understanding LSTM Networks by Christopher Olah (2015)
• Introduction to RNNs by Denny Britz (2015)
• Implementing a RNN with Python, Numpy and Theano by Denny Britz (2015)
• Backpropagation Through Time and Vanishing Gradients by Denny Britz (2015)
• Implementing a GRU/LSTM RNN with Python and Theano by Denny Britz (2015)
• Simplified Minimal Gated Unit Variations for Recurrent Neural Networks by Joel Heck and
Fathi Salem (2017)
Transformers: “Attention is all you need”
Great 1-hour Transformers Tutorial
• Transformers (“Attention is all you need” 2017)
• Attention: watch ~11 mins from 10min mark: Transformers with Lucas Beyer
• “LSTM is Dead. Long Live Transformers” (2019)
• https://www.youtube.com/watch?v=S27pHKBEp30 (~45min)
• Warning: we won’t cover NLP for a few weeks…
• Code and pre-trained models: github.com/huggingface/transformers
• “The Illustrated Transformer”: http://jalammar.github.io/illustrated-transformer/
• Transformers replacing CNN for image analysis…
• “An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale”
https://arxiv.org/pdf/2010.11929.pdf (2021)
• ViT = Vision Transformer
• Break image into patches; flatten each patch into a vector; add positional information (where
did patch come from within image?); get ‘sequence’ of encoded patches; compute key, value,
query using linear layer; compute attention; MLP/FFNN; …

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy