RNN LSTM Gru R
RNN LSTM Gru R
2
3
RNN:
• Recurrent Neural Network(RNN) is a type of Neural Network where the
output from the previous step is fed as input to the current step.
• In traditional neural networks, all the inputs and outputs are independent
of each other.
• Still, in cases when it is required to predict the next word of a sentence,
the previous words are required and hence there is a need to remember
the previous words.
• Thus RNN came into existence, which solved this issue with the help of a
Hidden Layer. The main and most important feature of RNN is its Hidden
state, which remembers some information about a sequence.
4
Types Of RNN:
There are four types of RNNs based on the number of inputs and
outputs in the network.
1.One to One
2.One to Many
3.Many to One
4.Many to Many
5
6
Sentiment analysis
Movie Rating
7
8
9
Recurrent Neural Network Architecture:
• RNNs have the same input and output architecture as any other
deep neural architecture. However, differences arise in the way
information flows from input to output.
• Unlike Deep neural networks where we have different weight
matrices for each Dense network in RNN, the weight across the
network remains the same.
10
11
12
How does RNN work?:
13
14
15
Simple RNN:
16
17
18
19
Forward and Backward Propagation in
RNN
20
21
Forward Propagation
22
Backward Propagation
23
Backward Propagation
24
Backward Propagation
25
26
Backward Propagation:
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Issues of Standard RNNs:
48
Training through RNN:
49
Advantages and Disadvantages of Recurrent Neural
Network
• Advantages
• Disadvantages
51
Difference between RNN and Simple Neural Network
RNN is considered to be the better version of deep neural when the data is
sequential. There are significant differences between the RNN and deep neural
networks they are listed as:
Weights are same across all the layers number Weights are different for each layer of the
of a Recurrent Neural Network network
Recurrent Neural Networks are used when the A Simple Deep Neural network does not have any
data is sequential and the number of inputs is special method for sequential data also here the
not predefined. number of inputs is fixed
Exploding and vanishing gradients is the the These problems also occur in DNN but these are
major drawback of RNN not the major problem with DNN
52
Bidirectional Recurrent Neural Network:
54
Working of Bidirectional Recurrent Neural Network
1. Inputting a sequence: A sequence of data points, each represented as a vector with the same
dimensionality, are fed into a BRNN. The sequence might have different lengths.
2. Dual Processing: Both the forward and backward directions are used to process the data. On the
basis of the input at that step and the hidden state at step t-1, the hidden state at time step t is
determined in the forward direction. The input at step t and the hidden state at step t+1 are used to
calculate the hidden state at step t in a reverse way.
3. Computing the hidden state: A non-linear activation function on the weighted sum of the input
and previous hidden state is used to calculate the hidden state at each step. This creates a memory
mechanism that enables the network to remember data from earlier steps in the process.
4. Determining the output: A non-linear activation function is used to determine the output at each
step from the weighted sum of the hidden state and a number of output weights. This output has two
options: it can be the final output or input for another layer in the network.
5. Training: The network is trained through a supervised learning approach where the goal is to
minimize the discrepancy between the predicted output and the actual output. The network adjusts
its weights in the input-to-hidden and hidden-to-output connections during training through
backpropagation.
55
To calculate the output from an RNN unit, we use the following
formula:
56
The training of a BRNN is similar to backpropagation
through a time algorithm. BPTT algorithm works as
follows:
57
• Advantages of Bidirectional RNN:
• Context from both past and future: With the ability to process sequential input both forward and
backward, BRNNs provide a thorough grasp of the full context of a sequence. Because of this, BRNNs are
effective at tasks like sentiment analysis and speech recognition.
• Enhanced accuracy: BRNNs frequently yield more precise answers since they take both historical and
upcoming data into account.
• Efficient handling of variable-length sequences: When compared to conventional RNNs, which require
padding to have a constant length, BRNNs are better equipped to handle variable-length sequences.
• Resilience to noise and irrelevant information: BRNNs may be resistant to noise and irrelevant data
that are present in the data. This is so because both the forward and backward paths offer useful
information that supports the predictions made by the network.
• Ability to handle sequential dependencies: BRNNs can capture long-term links between sequence
pieces, making them extremely adept at handling complicated sequential dependencies.
• Computational complexity: Given that they analyze data both forward and backward, BRNNs can be
computationally expensive due to the increased amount of calculations needed.
• Long training time: BRNNs can also take a while to train because there are many parameters to optimize,
especially when using huge datasets.
• Difficulty in parallelization: Due to the requirement for sequential processing in both the forward and
backward directions, BRNNs can be challenging to parallelize.
• Overfitting: BRNNs are prone to overfitting since they include many parameters that might result in too
complicated models, especially when trained on short datasets.
• Interpretability: Due to the processing of data in both forward and backward directions, BRNNs can be
tricky to interpret since it can be difficult to comprehend what the model is doing and how it is producing
predictions.
58
LSTM
59
Long Short-Term Memory: LSTM
A traditional RNN has a single hidden state that is passed through time, which
can make it difficult for the network to learn long-term dependencies. LSTMs
model address this problem by introducing a memory cell, which is a container
that can hold information for an extended period.
60
61
62
63
LSTM Architecture:
• The input gate controls what information is added to the memory cell.
• The forget gate controls what information is removed from the memory cell.
• The output gate controls what information is output from the memory cell.
• The LSTM maintains a hidden state, which acts as the short-term memory of
the network. The hidden state is updated based on the input, the previous
hidden state, and the memory cell’s current state.
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
Applications of LSTM
• Language Modeling: LSTMs have been used for natural language processing tasks
such as language modeling, machine translation, and text summarization. They can be
trained to generate coherent and grammatically correct sentences by learning the
dependencies between words in a sentence.
• Speech Recognition: LSTMs have been used for speech recognition tasks such as
transcribing speech to text and recognizing spoken commands. They can be trained to
recognize patterns in speech and match them to the corresponding text.
• Time Series Forecasting: LSTMs have been used for time series forecasting tasks
such as predicting stock prices, weather, and energy consumption. They can learn
patterns in time series data and use them to make predictions about future events.
• Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in data
that deviate from the norm and flag them as potential anomalies.
• Recommender Systems: LSTMs have been used for recommendation tasks such as
recommending movies, music, and books. They can learn patterns in user behavior and
use them to make personalized recommendations.
• Video Analysis: LSTMs have been used for video analysis tasks such as object
detection, activity recognition, and action classification. They can be used in
combination with other neural network architectures, such as Convolutional Neural
87
LSTM VS RNN:
Feature LSTM (Long Short-term Memory) RNN (Recurrent Neural Network)
Can be trained to process sequential data in both Can only be trained to process sequential data in
Directionality forward and backward directions one direction
88
Long Short-Term Memory (LSTM) is a powerful type of
recurrent neural network (RNN) that is well-suited for
handling sequential data with long-term dependencies. It
addresses the vanishing gradient problem, a common
limitation of RNNs, by introducing a gating mechanism that
controls the flow of information through the network. This
allows LSTMs to learn and retain information from the past,
making them effective for tasks like machine translation,
speech recognition, and natural language processing.
89
GRU
90
Gated Recurrent Unit Networks: GRU
Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that
was introduced by Cho et al. in 2014 as a simpler alternative to Long Short-
Term Memory (LSTM) networks. Like LSTM, GRU can process sequential data
such as text, speech, and time-series data.
The basic idea behind GRU is to use gating mechanisms to selectively update
the hidden state of the network at each time step. The gating mechanisms are
used to control the flow of information in and out of the network. The GRU has
two gating mechanisms, called the reset gate and the update gate.
The reset gate determines how much of the previous hidden state should be
forgotten, while the update gate determines how much of the new input
should be used to update the hidden state. The output of the GRU is calculated
based on the updated hidden state.
91
92
To solve the Vanishing-Exploding gradients problem often encountered
during the operation of a basic Recurrent Neural Network, many
variations were developed. One of the most famous variations is
the Long Short Term Memory Network(LSTM). One of the lesser-
known but equally effective variations is the Gated Recurrent Unit
Network(GRU).
93
94
95
96
97