0% found this document useful (0 votes)
25 views26 pages

CH4_AA1.1-Sequence Models (1)

Uploaded by

Saadaoui Mayssa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views26 pages

CH4_AA1.1-Sequence Models (1)

Uploaded by

Saadaoui Mayssa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter5: Sequence models

Unit: Deep Learning

1
Learning Outcomes
• At the end of this chapter, you will:
• Understand how to build and train Recurrent Neural Networks
(RNNs), and commonly-used variants such as GRUs and LSTMs.
• Be able to apply sequence models to natural language problems,
• Be able to apply sequence models to audio applications, including
speech recognition

2
What is sequential data ?
• Sequence of numbers in time:
• Stock price
• Earthquake sensor data
• Sequence of words: texts
• Sound: speech, sound
• Image: videos

3
What tasks can be done with sequential data?
• Time Series: prediction time series problem such as stock market predictions.
• Image Captioning: caption an image by analyzing the present action.

• Natural Language Processing: Text mining and Sentiment

• Machine Translation:

4
What tasks can be done with sequential data?
• Speech recognition:

• Music generation : generating classical music using recurrent neural


networks
• DNA sequence analysis: Predicting Transcription Factor Binding Sites

5
Different types of Sequence Modeling Tasks
• Sentiment classification

Image classification Music generation Sentiment classification Machine translation Speech recognition
Image captioning Video activity recognition

6
CNN limitations
• Fixed input size: CNNs require a fixed input size. This can be a problem
when dealing with sequences of varying length, such as in natural
language processing or speech recognition.
• Lack of memory: CNNs have no memory of previous inputs, which
means that they can't easily capture temporal dependencies in
sequential data.
• Order invariance: CNNs are order-invariant, which means that they treat
all inputs equally regardless of their position in the sequence. This makes
it difficult for the network to capture the order-dependent patterns that
are often present in sequential data.

7
Recurrent Neural network : RNN
• RNNs are designed to handle sequences of varying length
• They have memory that allows them to capture temporal
dependencies in the data.
• They also have the ability to process inputs in a sequential order,
which allows them to capture order-dependent patterns.
• RNNs have become the standard approach for processing sequential
data such as speech, text, and time series data.

8
Graphical Representation of RNN

• Xi : Input at tti1 t2 t3

• hi : hidden state at ti
• Yi : output at ti

9
Back Propagation Though Time (BPTT)

• Assume that f1 is a linear function :


• And L is the MSE loss function:

10
Back Propagation Though Time (BPTT)

• Assume that


=

11
Back Propagation Though Time

12
RNN limitations
• Two common problems that occur during the backpropagation : the
vanishing and exploding gradients.

• =

• If is a sigmoid function,
• If is a tanh function,

13
RNN limitations

• Assume that:

• If then gradient will vanish


• If then gradient will explode

14
Exploding Gradients with vanilla RNNs
• To avoid exploding gradient :
• gradient clipping : where at each timestamp, we can check if the
gradient > threshold and if it is, we normalize it.

• Where g is the gradient of the loss function

15
Vanishing Gradients with vanilla RNNs
• To tackle the vanishing gradient problem these are the possible
solutions:
• Use ReLU instead of tanh or sigmoid activation function.
• Proper initialization of the weights can reduce the effect of
vanishing gradients.
• Truncated BPTT
• Using gated cells such as LSTM or GRUs

16
Truncate BPTT
• Moving window through the training process

Forward propagation
Backward propagation

• Advantages:
• Help to avoid exploding/vanishing gradient
• Much faster than the simple BPTT, and also less complex
• disadvantage:
• dependencies of longer than the chunk length, are not taught during the
training process.
17
Long Short-Term Memory (LSTM)

18
LSTM : Input Gate
• The input gate decides what new information will be stored in the long-term
memory.
• It only works with the information from the current input and the short-term
memory from the previous time step.

19
LSTM : Input Gate
• i1 : filter which selects what information can pass through it and what
information to be discarded.
• The sigmoid function will transform the values to be between 0 and 1,
• 0 indicates that part of the information is unimportant,
• 1 indicates that the information will be used

• i2 : uses tanh function to regulate the network.

• The final outcome represents the information to be kept in the long-term


memory and used as the output.

20
Forget Gate
• The forget gate decides which information from the long-term memory should
be kept or discarded.
• multiply the incoming long-term memory by a forget vector generated by the
current input and incoming short-term memory.
• Uses Sigmoid function

• The outputs from the Input gate and the Forget gate
will undergo a pointwise addition to give a new version
of the long-term memory.

21
Output Gate
• The output gate will take the current input, the previous short-term memory,
and the newly computed long-term memory to produce the new short-term
memory.
• the previous short-term memory and current input will be passed into a sigmoid function to
create the third and final filter.
• the new long-term memory is passed through an activation tanh function.

22
Gated Recurrent Units : GRU
• A variant of the RNN architecture introduced in 2014 by Cho, et al.
• Uses gating mechanisms to control and manage the flow of
information between cells in the neural network.

23
Reset Gate
• multiply and with their respective weights and summing them before passing
the sum through a sigmoid function.

• will first be multiplied by a trainable weight and will then undergo an element-
wise multiplication (Hadamard product) with the reset vector.
• This operation will decide which information is to be kept from the previous time
steps together with the new inputs.

24
Update Gate
• Both the Update and Reset gate vectors are created using the same formula, but,
the weights multiplied with the input and hidden state are unique to each gate.
• The purpose of the Update gate here is to help the model determine how much
of the past information stored in the previous hidden state needs to be retained
for the future.

25
Combining the outputs

• : The purpose of this operation is for the Update gate to determine what portion
of the new information should be stored in the hidden state.

26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy