DS303 RNN LSTM
DS303 RNN LSTM
Manjesh K. Hanawal
Figure: Type 2 RNN: Recurrence exists only from the output to the
hidden layer. Each time step is trained independently, enabling
parallelization. While less expressive than fully recurrent RNNs, this
structure is simpler and easier to train.
Important Design Patterns of RNN: (3/3)
▶ Recurrent networks with recurrent connections between
hidden units, that read an entire sequence and then produce a
single output.
Components:
▶ Forget Gate ft
▶ Input Gate it
▶ Output Gate ot
▶ Cell state ct (long term
memory)
▶ Hidden state ht (short term
memory)
Gating Equations:
LSTM Cell Block Diagram
ft = σ(Wf [ht−1 , xt ] + bf )
it = σ(Wi [ht−1 , xt ] + bi )
LSTM Cell Update and Output
ct = ft ⊙ ct−1 + it ⊙ c̃t
ot = σ(Wo [ht−1 , xt ] + bo )
ht = ot ⊙ tanh(ct )
Key Benefits:
▶ Learns long-term dependencies
▶ Gated structure avoids vanishing/exploding gradients
▶ Strong empirical performance in many NLP and vision tasks