Unit-5-updated

The document provides an overview of sequence modeling using recurrent and recursive neural networks, detailing their structures, functionalities, and applications in processing sequential data. It covers concepts such as computational graphs, bidirectional RNNs, encoder-decoder architectures, and challenges like long-term dependencies, along with solutions like Long Short-Term Memory (LSTM) networks. The content is aimed at understanding how RNNs can effectively handle variable-length sequences and complex mappings in tasks like machine translation and image captioning.

Uploaded by

Raj kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Unit-5-updated

Uploaded by

Raj kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 125

Deep Learning

Welcome to the Course AI62

Unit V

Sequence Modelling: Recurrent and Recursive Nets:

Unfolding Computational Graphs, Recurrent Neural Networks,
Bidirectional RNNs, Encoder-Decoder Sequence-to-
Sequence Architectures, Deep Recurrent Networks,
Recursive Neural Networks, The Long Short-Term Memory
and Other Gated RNNs
Sequence Modeling:
● Recurrent networks can scale to much longer sequences than would be
practical for networks without sequence-based specialization.
● Most recurrent networks can also process sequences of variable length.
● RNNs may also be applied in two dimensions across spatial data such as
images, and even when applied to data involving time, the network may
have connections that go backwards in time, provided that the entire
sequence is observed before it is provided to the network
Sequence Modeling: Recurrent
and Recursive Nets
Recurrent neural networks or RNNs
 RNNs are a family of neural networks for processing sequential data.
 Early ideas found in machine learning and statistical models of the
1980s:
 sharing parameters across different parts of a model.
 Recurrent networks share parameters in a different way. Each
member of the output is a function of the previous members of the
output.
 Each member of the output is produced using the same update rule
applied to the previous outputs.
 This recurrent formulation results in the sharing of parameters
through a very deep computational graph.
Today: Recurrent Neural Networks

e.g. Image Captioning e.g. Video classification

e.g. Sentiment e.g. Machine Translation
image -> sequence of on frame level
Classification seq of words -> seq of
words sequence of words - words
Vanilla
Neural > sentiment
Networks
Computational Graphs
Computational Graphs
 A computational graph is a way to formalize the structure of a
set of computations, such as those involved in mapping
inputs and parameters to outputs and loss.
 A typical dynamic system

recurrent because the definition of s at time t refers

back to the same definition at time t − 1
A system driven by external data
RNN

 A recurrent network with no outputs.

 This recurrent network just processes information from the input x by
incorporating it into the state h that is passed forward through time.
RNN
 We can represent the unfolded recurrence after t steps with a function
g(t)

 The unfolding process thus introduces two major advantages.

 Regardless of the sequence length, the learned model always has the same input
size, because it is specified in terms of transition from one state to another state.
 It is possible to use the same transition function f with the same parameters at
every time step
Unfolding Computational Graphs
Different ways of Unfolding
computational graphs
RNN: Computational Graph
RNN: Computational Graph
RNN: Computational Graph Many to
Many
RNN: Computational Graph Many to One
RNN: Computational Graph One to Many
Variety of recurrent neural networks
Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units
Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units
Recurrent networks that produce an output at each time step and have
recurrent connections only from the output at one time step to the hidden
units at the next time step
Recurrent networks with recurrent connections between hidden units, that read an entire
sequence and then produce a single output
Simple RNN Implementation
Unrolling the recurrence in RNN
Plain Vanilla Recurrent Network

xt
Recurrent Connections

W
ht ht = ψ(Ux t + W h t−1 )

xt
Recurrent
Connections ŷ t

V yˆt= φ(Vh t )

W
ht ht = ψ(Ux t + W h t−1 )