0% found this document useful (0 votes)
42 views97 pages

RNN LSTM Gru R

Uploaded by

Ayan Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views97 pages

RNN LSTM Gru R

Uploaded by

Ayan Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

1

2
3
RNN:
• Recurrent Neural Network(RNN) is a type of Neural Network where the
output from the previous step is fed as input to the current step.
• In traditional neural networks, all the inputs and outputs are independent
of each other.
• Still, in cases when it is required to predict the next word of a sentence,
the previous words are required and hence there is a need to remember
the previous words.
• Thus RNN came into existence, which solved this issue with the help of a
Hidden Layer. The main and most important feature of RNN is its Hidden
state, which remembers some information about a sequence.

• The state is also referred to as Memory State since it remembers the


previous input to the network. It uses the same parameters for each input
as it performs the same task on all the inputs or hidden layers to produce
the output.

• This reduces the complexity of parameters, unlike other neural networks.

4
Types Of RNN:

There are four types of RNNs based on the number of inputs and
outputs in the network.

1.One to One
2.One to Many
3.Many to One
4.Many to Many

5
6
Sentiment analysis
Movie Rating

7
8
9
Recurrent Neural Network Architecture:

• RNNs have the same input and output architecture as any other
deep neural architecture. However, differences arise in the way
information flows from input to output.
• Unlike Deep neural networks where we have different weight
matrices for each Dense network in RNN, the weight across the
network remains the same.

10
11
12
How does RNN work?:

• The Recurrent Neural Network consists of multiple fixed activation


function units, one for each time step.
• Each unit has an internal state which is called the hidden state of
the unit. This hidden state signifies the past knowledge that the
network currently holds at a given time step.
• This hidden state is updated at every time step to signify the change
in the knowledge of the network about the past.

• The hidden state is updated using the following recurrence relation:-

13
14
15
Simple RNN:

16
17
18
19
Forward and Backward Propagation in
RNN

20
21
Forward Propagation

22
Backward Propagation

23
Backward Propagation

24
Backward Propagation

25
26
Backward Propagation:

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Issues of Standard RNNs:

1. Vanishing Gradient: Text generation, machine translation,


and stock market prediction are just a few examples of the
time-dependent and sequential data problems that can be
modelled with recurrent neural networks. You will discover,
though, that the gradient problem makes training RNN difficult.

2. Exploding Gradient: An Exploding Gradient occurs when a


neural network is being trained and the slope tends to grow
exponentially rather than decay. Large error gradients that
build up during training lead to very large updates to the
neural network model weights, which is the source of this
issue.

48
Training through RNN:

1. A single-time step of the input is provided to the network.


2. Then calculate its current state using a set of current input and the previous
state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the
information from all the previous states.
5. Once all the time steps are completed the final current state is used to
calculate the output.
6. The output is then compared to the actual output i.e the target output and
the error is generated.
7. The error is then back-propagated to the network to update the weights and
hence the network (RNN) is trained using Backpropagation through time.

49
Advantages and Disadvantages of Recurrent Neural
Network

• Advantages

1. An RNN remembers each and every piece of information


through time. It is useful in time series prediction only
because of the feature to remember previous inputs as well.
This is called Long Short Term Memory.
2. Recurrent neural networks are even used with convolutional
layers to extend the effective pixel neighborhood.

• Disadvantages

1. Gradient vanishing and exploding problems.


2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as
an activation function.
50
Applications of Recurrent Neural Network

1. Language Modelling and Generating Text


2. Speech Recognition
3. Machine Translation
4. Image Recognition, Face detection
5. Time series Forecasting

Variation Of Recurrent Neural Network (RNN)


• To overcome the problems like vanishing gradient and exploding
gradient descent several new advanced versions of RNNs are
formed some of these are as;

1. Bidirectional Neural Network (BiNN)


2. Long Short-Term Memory (LSTM)

51
Difference between RNN and Simple Neural Network
RNN is considered to be the better version of deep neural when the data is
sequential. There are significant differences between the RNN and deep neural
networks they are listed as:

Recurrent Neural Network Deep Neural Network

Weights are same across all the layers number Weights are different for each layer of the
of a Recurrent Neural Network network

Recurrent Neural Networks are used when the A Simple Deep Neural network does not have any
data is sequential and the number of inputs is special method for sequential data also here the
not predefined. number of inputs is fixed

The Numbers of parameter in the RNN are


The Numbers of Parameter are lower than RNN
higher than in simple DNN

Exploding and vanishing gradients is the the These problems also occur in DNN but these are
major drawback of RNN not the major problem with DNN

52
Bidirectional Recurrent Neural Network:

• Recurrent Neural Networks (RNNs) are a particular class of neural networks


that was created with the express purpose of processing sequential input,
including speech, text, and time series data.
• RNNs process data as a sequence of vectors rather than feedforward neural
networks, which process data as a fixed-length vector.
• Each vector is processed depending on the hidden state from the previous
phase.
• The network can store data from earlier steps in the sequence in a type of
memory by computing the hidden state by taking into account both the
current input and the hidden state from the previous phase.
• RNNs are thus well suited for jobs that call for knowledge of the context and
connections among sequence elements.
• Even though conventional RNNs can handle variable-length sequences, they
sometimes have trouble with the vanishing gradient problem.
• Gradients during backpropagation become extremely small at this point,
making it challenging for the network to learn from the data.
• Many RNN versions, such LSTMs, and GRUs, which use gating methods to
regulate the flow of information and enhance learning, have been created to
53
Bidirectional Recurrent Neural Network:

54
Working of Bidirectional Recurrent Neural Network
1. Inputting a sequence: A sequence of data points, each represented as a vector with the same
dimensionality, are fed into a BRNN. The sequence might have different lengths.
2. Dual Processing: Both the forward and backward directions are used to process the data. On the
basis of the input at that step and the hidden state at step t-1, the hidden state at time step t is
determined in the forward direction. The input at step t and the hidden state at step t+1 are used to
calculate the hidden state at step t in a reverse way.
3. Computing the hidden state: A non-linear activation function on the weighted sum of the input
and previous hidden state is used to calculate the hidden state at each step. This creates a memory
mechanism that enables the network to remember data from earlier steps in the process.
4. Determining the output: A non-linear activation function is used to determine the output at each
step from the weighted sum of the hidden state and a number of output weights. This output has two
options: it can be the final output or input for another layer in the network.
5. Training: The network is trained through a supervised learning approach where the goal is to
minimize the discrepancy between the predicted output and the actual output. The network adjusts
its weights in the input-to-hidden and hidden-to-output connections during training through
backpropagation.

55
To calculate the output from an RNN unit, we use the following
formula:

The hidden state at time t is given by a combination of


Ht (Forward) and Ht (Backward). The output at any given hidden
state is :

56
The training of a BRNN is similar to backpropagation
through a time algorithm. BPTT algorithm works as
follows:

• Roll out the network and calculate errors at each


iteration
• Update weights and roll up the network.
• However, because forward and backward passes in a
BRNN occur simultaneously, updating the weights for the
two processes may occur at the same time. This produces
inaccurate outcomes. Thus, the following approach is
used to train a BRNN to accommodate forward and
backward passes individually.

57
• Advantages of Bidirectional RNN:

• Context from both past and future: With the ability to process sequential input both forward and
backward, BRNNs provide a thorough grasp of the full context of a sequence. Because of this, BRNNs are
effective at tasks like sentiment analysis and speech recognition.
• Enhanced accuracy: BRNNs frequently yield more precise answers since they take both historical and
upcoming data into account.
• Efficient handling of variable-length sequences: When compared to conventional RNNs, which require
padding to have a constant length, BRNNs are better equipped to handle variable-length sequences.
• Resilience to noise and irrelevant information: BRNNs may be resistant to noise and irrelevant data
that are present in the data. This is so because both the forward and backward paths offer useful
information that supports the predictions made by the network.
• Ability to handle sequential dependencies: BRNNs can capture long-term links between sequence
pieces, making them extremely adept at handling complicated sequential dependencies.

• Disadvantages of Bidirectional RNN:

• Computational complexity: Given that they analyze data both forward and backward, BRNNs can be
computationally expensive due to the increased amount of calculations needed.
• Long training time: BRNNs can also take a while to train because there are many parameters to optimize,
especially when using huge datasets.
• Difficulty in parallelization: Due to the requirement for sequential processing in both the forward and
backward directions, BRNNs can be challenging to parallelize.
• Overfitting: BRNNs are prone to overfitting since they include many parameters that might result in too
complicated models, especially when trained on short datasets.
• Interpretability: Due to the processing of data in both forward and backward directions, BRNNs can be
tricky to interpret since it can be difficult to comprehend what the model is doing and how it is producing
predictions.

58
LSTM
59
Long Short-Term Memory: LSTM

Its is an improved version of recurrent neural network designed by Hochreiter &


Schmidhuber.

A traditional RNN has a single hidden state that is passed through time, which
can make it difficult for the network to learn long-term dependencies. LSTMs
model address this problem by introducing a memory cell, which is a container
that can hold information for an extended period.

LSTM architectures are capable of learning long-term dependencies in


sequential data, which makes them well-suited for tasks such as language
translation, speech recognition, and time series forecasting.

60
61
62
63
LSTM Architecture:

• The LSTM architectures involves the memory cell which is controlled by


three gates: the input gate, the forget gate, and the output gate. These gates
decide what information to add to, remove from, and output from the
memory cell.

• The input gate controls what information is added to the memory cell.

• The forget gate controls what information is removed from the memory cell.

• The output gate controls what information is output from the memory cell.

• This allows LSTM networks to selectively retain or discard information as it


flows through the network, which allows them to learn long-term
dependencies.

• The LSTM maintains a hidden state, which acts as the short-term memory of
the network. The hidden state is updated based on the input, the previous
hidden state, and the memory cell’s current state.

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
Applications of LSTM

Some of the famous applications of LSTM includes:

• Language Modeling: LSTMs have been used for natural language processing tasks
such as language modeling, machine translation, and text summarization. They can be
trained to generate coherent and grammatically correct sentences by learning the
dependencies between words in a sentence.
• Speech Recognition: LSTMs have been used for speech recognition tasks such as
transcribing speech to text and recognizing spoken commands. They can be trained to
recognize patterns in speech and match them to the corresponding text.
• Time Series Forecasting: LSTMs have been used for time series forecasting tasks
such as predicting stock prices, weather, and energy consumption. They can learn
patterns in time series data and use them to make predictions about future events.
• Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in data
that deviate from the norm and flag them as potential anomalies.
• Recommender Systems: LSTMs have been used for recommendation tasks such as
recommending movies, music, and books. They can learn patterns in user behavior and
use them to make personalized recommendations.
• Video Analysis: LSTMs have been used for video analysis tasks such as object
detection, activity recognition, and action classification. They can be used in
combination with other neural network architectures, such as Convolutional Neural
87
LSTM VS RNN:
Feature LSTM (Long Short-term Memory) RNN (Recurrent Neural Network)

Has a special memory unit that allows it to learn


Memory long-term dependencies in sequential data
Does not have a memory unit

Can be trained to process sequential data in both Can only be trained to process sequential data in
Directionality forward and backward directions one direction

More difficult to train than RNN due to the


Training complexity of the gates and memory unit
Easier to train than LSTM

Long-term dependency learning Yes Limited

Ability to learn sequential data Yes Yes

Machine translation, speech recognition, text Natural language processing, machine


Applications summarization, natural language processing, time translation, speech recognition, image
series forecasting processing, video processing

88
Long Short-Term Memory (LSTM) is a powerful type of
recurrent neural network (RNN) that is well-suited for
handling sequential data with long-term dependencies. It
addresses the vanishing gradient problem, a common
limitation of RNNs, by introducing a gating mechanism that
controls the flow of information through the network. This
allows LSTMs to learn and retain information from the past,
making them effective for tasks like machine translation,
speech recognition, and natural language processing.

89
GRU
90
Gated Recurrent Unit Networks: GRU

Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that
was introduced by Cho et al. in 2014 as a simpler alternative to Long Short-
Term Memory (LSTM) networks. Like LSTM, GRU can process sequential data
such as text, speech, and time-series data.

The basic idea behind GRU is to use gating mechanisms to selectively update
the hidden state of the network at each time step. The gating mechanisms are
used to control the flow of information in and out of the network. The GRU has
two gating mechanisms, called the reset gate and the update gate.

The reset gate determines how much of the previous hidden state should be
forgotten, while the update gate determines how much of the new input
should be used to update the hidden state. The output of the GRU is calculated
based on the updated hidden state.

91
92
To solve the Vanishing-Exploding gradients problem often encountered
during the operation of a basic Recurrent Neural Network, many
variations were developed. One of the most famous variations is
the Long Short Term Memory Network(LSTM). One of the lesser-
known but equally effective variations is the Gated Recurrent Unit
Network(GRU).

93
94
95
96
97

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy