CISC 867 Deep Learning: 12. Recurrent Neural Networks
CISC 867 Deep Learning: 12. Recurrent Neural Networks
1
Sequential Data
2
Time Series
3
Dimensionality of Vectors
4
Time Series Terminology
7
Example: The Jena Climate Dataset
8
Jena Climate Dataset: A Closer Look
9
Jena Climate Dataset: A Closer Look
10
Reading the Data
fname = "jena_climate_2009_2016.csv"
with open(fname) as f:
data = f.read()
lines = data.split("\n")
lines = lines[1:] # The first line in the file is header information
temperature = np.zeros((len(lines),))
raw_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
values = [float(x) for x in line.split(",")[1:]]
temperature[i] = values[1]
raw_data[i] = values
12
Visualizing the Data
plt.plot(range(0, len(temperature)),
temperature)
plt.plot(range(0, 1440),
temperature[:1440])
13
Inputs and Target Outputs
14
Inputs and Target Outputs
16
Creating a Training Set
• What is the smallest and largest legal value for the start
point?
• Smallest: 0
• Largest: timeseries length – 720 – 24*6.
– Why? We need enough room to choose 720 elements, plus enough
room to look 24 hours past the last element, to get the target value that
we aim to forecast.
target value
input length = 720
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
• Notice that unit 𝑈2,2 receives inputs not only from input units, but also
from second-layer unit 𝑈2,1 .
• Similarly, unit 𝑈2,3 receives inputs not only from input units, but also
from second-layer unit 𝑈2,2 .
• These connections between units of the same layer are called recurrent
connections.
𝑧2,1 𝑧2,2 𝑧2,3
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
• Notice that all units in the second layer share the same weights.
• The weights connecting two input units to a second-layer unit are
denoted with the same two symbols.
• There is also a new symbol, the recurrent weight 𝑢2,1 for the recurrent
connections between 𝑈2,1 and 𝑈2,2 , and between 𝑈2,2 and 𝑈2,3 .
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
model = keras.Sequential([keras.Input(shape=(3,2)),
keras.layers.SimpleRNN(1)])
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
model = keras.Sequential([keras.Input(shape=(3,2)),
keras.layers.SimpleRNN(3)])
• Here the SimpleRNN layer has three units per time step.
• We do not show the connections and weights anymore.
– Within a time step, all three 2nd layer units are connected to all two
inputs.
– All 2nd layer units from the previous step are inputs to all 2nd layer units
in the next step.
𝑈2,1,1 𝑈2,1,2 𝑈2,1,3 𝑈2,2,1 𝑈2,2,2 𝑈2,2,3 𝑈2,3,1 𝑈2,3,2 𝑈2,3,3
SimpleRNN(3)
Input(2)
35
𝑡−1 𝑡 𝑡+1
An RNN Network for Jena Climate
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])
callbacks = [keras.callbacks.ModelCheckpoint("jena_LSTM1_16.keras",
save_best_only=True)]
36
An RNN Network for Jena Climate
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=False),
keras.layers.Dense(1),])
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])
Dense(1)
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])
Dense(1)
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])
Dense(1)
LSTM(M)
Input(N)
𝑡−1 𝑡 𝑡+1 43
SimpleRNN Computations at Time t
𝑐𝑡−1
𝑧𝑡 𝑐𝑡
𝑧𝑡−1 LSTM(M)
𝑧𝑡
Input(N)
Time 𝑡 45
LSTM Computations at Time t
𝑖𝑡 = 𝜎 𝑊𝑖 𝑥𝑡 + 𝑈𝑖 𝑧𝑡−1 + 𝐵𝑖
𝑘𝑡 = 𝜎 𝑊𝑘 𝑥𝑡 + 𝑈𝑘 𝑧𝑡−1 + 𝐵𝑘
𝑓𝑡 = 𝜎 𝑊𝑓 𝑥𝑡 + 𝑈𝑓 𝑧𝑡−1 + 𝐵𝑓
𝑐𝑡 = 𝑖𝑡 ∗ 𝑘𝑡 + 𝑐𝑡−1 ∗ 𝑓𝑡
47
An Intuitive Interpetation
𝑖𝑡 = 𝜎 𝑊𝑖 𝑥𝑡 + 𝑈𝑖 𝑧𝑡−1 + 𝐵𝑖
𝑘𝑡 = 𝜎 𝑊𝑘 𝑥𝑡 + 𝑈𝑘 𝑧𝑡−1 + 𝐵𝑘
𝑓𝑡 = 𝜎 𝑊𝑓 𝑥𝑡 + 𝑈𝑓 𝑧𝑡−1 + 𝐵𝑓
𝑐𝑡 = 𝑖𝑡 ∗ 𝑘𝑡 + 𝑐𝑡−1 ∗ 𝑓𝑡
48
An Intuitive Interpetation
𝑖𝑡 = 𝜎 𝑊𝑖 𝑥𝑡 + 𝑈𝑖 𝑧𝑡−1 + 𝐵𝑖
𝑘𝑡 = 𝜎 𝑊𝑘 𝑥𝑡 + 𝑈𝑘 𝑧𝑡−1 + 𝐵𝑘
𝑓𝑡 = 𝜎 𝑊𝑓 𝑥𝑡 + 𝑈𝑓 𝑧𝑡−1 + 𝐵𝑓
𝑐𝑡 = 𝑖𝑡 ∗ 𝑘𝑡 + 𝑐𝑡−1 ∗ 𝑓𝑡
49
The Vanishing Gradient Problem
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝜕𝐸 𝜕𝐸 𝜕𝑧2,3 𝜕𝑎2,3
=
𝜕𝑤2,1,1 𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑤2,1,1
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2
57
The Vanishing Gradient Problem
𝜕𝐸
• Each term of will correspond to the influence of a single
𝜕𝑤𝑖
time step.
– The influence of time step 120 will be a product of 3 numbers.
– The influence of time step 119 will be a product of 5 numbers.
– The influence of time step 118 will be a product of 7 numbers.
…
– The influence of time step 1 will be a product of 241 numbers.
• So, the influence of time step 1 will be a product of 241
numbers, which will usually be between 0 and 1.
• This will be a very small quantity.
• Overall, the influence of a time step drops exponentially as we
move from the end towards the beginning of the input time
series.
58
LSTMs and Vanishing Gradients
LSTM(M)
Input(N)
𝑡−1 𝑡 𝑡+1 59
Detour: ResNet
60
GRU
https://en.wikipedia.org/wiki/Gated_recurrent_unit
61
Recurrent Dropout
62
Recurrent Dropout
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.LSTM(32, dropout=0.3,
recurrent_dropout=0.25),
keras.layers.Dropout(0.5),
keras.layers.Dense(1),])
63
Recurrent Dropout
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.LSTM(32, dropout=0.3,
recurrent_dropout=0.25),
keras.layers.Dropout(0.5),
keras.layers.Dense(1),])
64
Bidirectional Layers
Recurrent
Input
𝑡−1 𝑡 𝑡+1 65
Bidirectional Layers
𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3 66
Recurrent layer (SimpleRNN, Recurrent layer (SimpleRNN,
LSTM, GRU) processing LSTM, GRU) processing
information in chronological information in REVERSE
order. chronological order.
𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6
𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
67
Recurrent layer (SimpleRNN, Recurrent layer (SimpleRNN,
LSTM, GRU) processing LSTM, GRU) processing
information in chronological information in REVERSE
order. chronological order.
𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6
𝒙1 𝒙2 𝒙3 Both layers 𝒙1 𝒙2 𝒙3
receive the
𝑡=1 𝑡=2 𝑡=3 exact same 𝑡=1 𝑡=2 𝑡=3
inputs. 68
These two recurrent layers, combined, form what we call a
bidirectional layer.
𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6
𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
69
The output of the bidirectional layer is the concatenation
of the outputs of the two recurrent layers.
𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6
𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
70
Bidirectional Layers in Keras
model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.Bidirectional(keras.layers.LSTM(32)),
keras.layers.Dense(1),])
71
RNN Summary
72