0% found this document useful (0 votes)

8 views34 pages

Recurrent Neural Network

Recurrent Neural Networks (RNNs) are designed for processing sequential data and can remember previous inputs through hidden states. They face limitations such as the vanishing gradient problem, which hinders learning long-term dependencies, and require significant computational resources. Bidirectional RNNs enhance context understanding by processing sequences in both forward and backward directions, making them particularly effective for natural language processing tasks.

Uploaded by

havih35403

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views34 pages

Recurrent Neural Network

Uploaded by

havih35403

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Recurrent Neural

Network
Limitations of CNN
• Limitation on processing sequential data ( remembering past data)
• High computational requirements.
• Needs large amount of labeled data.
• Large memory footprint.
• Interpretability challenges.
• Limited effectiveness for sequential data.
• Tend to be much slower.
• Training takes a long time.
ANN VS RNN
RNN (Recurrent Neural Network)
An RNN is a neural network designed to work with sequential data — like text, time series, or
audio. What makes RNNs unique is that they can remember previous inputs using internal
memory called hidden states, which are carried forward through time.

Time Step (t)In RNNs, input data is processed one element at a time. Each element corresponds to a
time step.

For example, for the sentence: "I love cats“

We process it over 3 time steps:

t = 1 → "I“
t = 2 → "love“
t = 3 → "cats“
At each time step, the RNN updates its internal state and produces
an output.
RNN
• Hidden State (hₜ)The hidden state is the memory of the network. It's a
vector that captures what the model has learned from the sequence
up to time t.
• hₜ = tanh(W * xₜ + U * hₜ₋₁ + b)
Where: xₜ is the current input
hₜ₋₁ is the hidden state from the previous time step
W and U are weight matrices
b is a bias vector
Purpose: hₜ helps the model retain important information from the past.
RNN
• tanh Activation Function**tanh** (hyperbolic tangent) is a commonly used
activation function in RNNs. It squashes values between -1 and 1 and adds non-
linearity.
• hₜ = tanh(Wxₜ + Uhₜ₋₁ + b)

• Weights (W, U, V)RNNs use different weight matrices:

• W → Input-to-hidden weights
• U → Hidden-to-hidden weights (how previous memory affects the current state)
• V → Hidden-to-output weights (used for predictions)
• All these weights are shared across time steps — meaning the same transformation
is applied at every time step.
RNN steps
• Let’s say we’re reading a sentence:
• “I love AI”
• At each time step t,
• the RNN: Reads a word (converted into a vector)
• Combines it with memory from the previous step (hₜ₋₁)
• Applies weights and tanh to get a new hidden state (hₜ)
• Uses hₜ to make a prediction or to continue processing the sequence
RNN Steps

Step What Happens Why It Matters

So the network can understand
1. Input Convert words to vectors
them
2. Step 1 Process first word + initial memory Begin building context
Process next word + update
3. Step 2 Add more context
memory
4. Output Generate prediction from memory Produce output word
5. Loss Compare prediction to real word Know how wrong it was
6. BPTT Adjust all weights through time Learn better from sequences
7. Repeat Train on many sequences Improve general understanding
RNN Architecture
• ANN works very well on fixed input size.
• RNN works with dynamics input size.(Movie Reviews)
• Alike ANN, RNN works very well with dynamic sequential data which has
sequential meaning.
• RNN are special case of ANN they have memory to store previous data.
• RNN input (Timesteps, input features) Vocabulary of 5 words.(Movie,
was, good, bad,not)
RNN
Input of RNN is
(Timestep, input-features)
Review Sentiment
• movie was good 1
• movie was bad 0
• movie was not good 0
Shape of input is (3,5) for first two inputs and (4,5) for third input.
RNN
Review Sentiment
• X11 X12 X13
• X1 movie was good 1
• X2 movie(X21) was(X22) bad(X23) 0
• X3 Movie (X31) was(X32) not(X33) good(X34) 0
Let’s say input weights are denoted by wi,
output weights with wo,
feedback loop weights with wh
and biases for first hidden layer is bi and
bias for output layer is bo,
then mathematical formulation for above is given by :
BP in RNN
Vanishing Gradient in RNN
Vanishing Gradient Problem in RNNs
• What is it?
• A training issue in RNNs where gradients become extremely small during backpropagation.
• Prevents the model from learning dependencies from earlier time steps.
• When does it happen?
• During Backpropagation Through Time (BPTT).
• Gradients are propagated backward across many time steps.
• Why does it happen?
• Gradients are repeatedly multiplied by weights during each step.
• If weights < 1, repeated multiplication makes gradients shrink exponentially.
• This leads to very small updates for early layers.
• Impact on Learning:
• Long-term dependencies are not learned.
• The network “forgets” older inputs in a long sequence.
• Visual Example:
• Imagine a chain of signals fading like echoes — after a while, the signal (gradient) becomes too weak to matter.
• Affected Models:
• Mainly affects vanilla RNNs.
• LSTM and GRU were designed specifically to mitigate this issue.
• Where RNNs still work well:
• For short sequences or tasks that require recent context, like:
• Autocomplete
• Next-word prediction in short texts
RNN Application
• Next word predictions
• Image captioning[1 to Many RNN], Music generation
• Google Translate
• Language Detection
• Translation
• Question and answering
RNN Architecture
RNN Types

Binary Image
POS tagging- Fixed
Sentiment analysis Classification- Actually not
Image captioning length
required RNN
Google translation
–variable length
1. What is a Bidirectional RNN?
• • A Bidirectional RNN processes the input sequence in both forward
and backward directions.
• • It has two RNNs: one reads the input from start to end, and the
other from end to start.
• • The outputs from both RNNs are combined, giving more context at
each time step.
2. Why Use Bidirectional RNNs?
• • Forward RNN captures past context, backward RNN captures future
context.
• • Useful in tasks where understanding both previous and next words
matters:
• - Named Entity Recognition (NER)
• - Part-of-Speech (POS) Tagging
• - Machine Translation
• - Speech Recognition
3. Architecture Overview
• • Input sequence: x₁, x₂, ..., xₙ
• • Forward RNN: h₁→, h₂→, ..., hₙ→
• • Backward RNN: h₁←, h₂←, ..., hₙ←
• • Output at each time step: yₜ = f(hₜ→, hₜ←)
• • Commonly f is concatenation or summation.
Limitations of RNN
• During backpropagation, gradients can become too small, leading to the
vanishing gradient problem, or too large, resulting in the exploding gradient
problem as they propagate backward through time.
• In the case of vanishing gradients, the issue is that the gradient may
become too small where the network struggles to capture long-term
dependencies effectively.
• It can still converge during training but it may take a very very long time.
• In contrast, in exploding gradient problem, large gradient can lead to
numerical instability during training, causing the model to deviate from the
optimal solution and making it difficult for the network to converge to
global minima.
5. Summary
• • Bidirectional RNNs improve context understanding by reading
sequences in both directions.
• • Especially powerful in NLP tasks.
• • Easy to implement in Keras using the Bidirectional wrapper.
Total params: 31 (124.00 B)
Trainable params: 31 (124.00 B)
Non-trainable params: 0 (0.00 B)

array([[-0.22803581, -0.43848372, 0.28444105],

[ 0.10338819, -0.04307288, 0.64785296],
[-0.50025153, 0.42375594, 0.5810246 ],
[ 0.21662587, 0.45042878, 0.09036738],
[ 0.20034915, -0.86127126, -0.59780455]], dtype=float32)
• This is a major hurdle for RNNs, especially vanilla RNNs. It occurs when gradients (signals used to
update weights during training) become very small or vanish as they propagate backward through
the network during BPTT. This makes it difficult for the network to learn long-term dependencies in
sequences, as information from earlier time steps can fade away.
• In RNNs, information from previous time steps is used to influence the current output. This is
achieved by feeding the output of a layer back as input to the next layer in the sequence. However,
during back propagation through these layers:
• Gradients are multiplied by weights at each step.
• If these weights are all less than 1 in absolute value, this product becomes progressively smaller as
we go back in time. This shrinks the error gradients for earlier time steps, making it difficult for the
network to learn long-term dependencies present in the sequence.
• RNNs struggle to learn from long sequences due to vanishing gradients. This makes them
unsuitable for tasks like predicting future events based on long passages. However, RNNs excel at
analyzing recent inputs, which is perfect for short-term predictions like suggesting the next word on
a mobile keyboard.

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Algorithm To Create A Raw Dataset From DigizeIt Readings From A Kaplan-Meier Curve
No ratings yet
Algorithm To Create A Raw Dataset From DigizeIt Readings From A Kaplan-Meier Curve
3 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
MT1SP19
No ratings yet
MT1SP19
13 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Recurrent Neural Networks(RNNs)
No ratings yet
Recurrent Neural Networks(RNNs)
45 pages
DL_4_notes
No ratings yet
DL_4_notes
34 pages
Unit_3_rcnn
No ratings yet
Unit_3_rcnn
25 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
DL CO3- PPT 1
No ratings yet
DL CO3- PPT 1
22 pages
Unit IV
No ratings yet
Unit IV
22 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
RNN LSTM
No ratings yet
RNN LSTM
71 pages
module-4-RNN-LSTM-GRU
No ratings yet
module-4-RNN-LSTM-GRU
59 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Bianchi
No ratings yet
Bianchi
62 pages
DS303_RNN_LSTM
No ratings yet
DS303_RNN_LSTM
16 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
5
No ratings yet
5
9 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
RNNs
No ratings yet
RNNs
22 pages
Rnn Tutorial
No ratings yet
Rnn Tutorial
41 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
4-Recurrent Neural Network
No ratings yet
4-Recurrent Neural Network
21 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
RNN
No ratings yet
RNN
32 pages
semster_ dl
No ratings yet
semster_ dl
15 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
chapter 2
No ratings yet
chapter 2
68 pages
ch6_RNN
No ratings yet
ch6_RNN
25 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
NLP Lecture 6
No ratings yet
NLP Lecture 6
57 pages
lecture 11
No ratings yet
lecture 11
57 pages
dis6-sol
No ratings yet
dis6-sol
6 pages
Unit 4
No ratings yet
Unit 4
50 pages
lec-10
No ratings yet
lec-10
37 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
lec14-RNN3-8-Feb-18
No ratings yet
lec14-RNN3-8-Feb-18
16 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
6S191 MIT DeepLearning L2
No ratings yet
6S191 MIT DeepLearning L2
85 pages
CH4_AA1.1-Sequence Models (1)
No ratings yet
CH4_AA1.1-Sequence Models (1)
26 pages
DL-unit-4-part-2
No ratings yet
DL-unit-4-part-2
8 pages
module5
No ratings yet
module5
21 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
DL MODULE 5
No ratings yet
DL MODULE 5
10 pages
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
No ratings yet
Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021
71 pages
Top 25 Interview Questions On RNN - Reader View
No ratings yet
Top 25 Interview Questions On RNN - Reader View
9 pages
Time Series Rnn Lstm 1746197734
No ratings yet
Time Series Rnn Lstm 1746197734
25 pages
RNN
No ratings yet
RNN
23 pages
GenAI-Module2
No ratings yet
GenAI-Module2
190 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
RNN
No ratings yet
RNN
18 pages
Quantitative Management-Assignment Model
No ratings yet
Quantitative Management-Assignment Model
2 pages
Fem 2
No ratings yet
Fem 2
26 pages
1.6 Quadratics and Cubic
No ratings yet
1.6 Quadratics and Cubic
4 pages
1.3 Taylor Series
No ratings yet
1.3 Taylor Series
29 pages
Scilab Presentation
No ratings yet
Scilab Presentation
15 pages
Lecture Notes - Optimization and Control PDF
No ratings yet
Lecture Notes - Optimization and Control PDF
405 pages
LREC Tobacco3482 Appendix
No ratings yet
LREC Tobacco3482 Appendix
1 page
Optimization Theory
No ratings yet
Optimization Theory
4 pages
IntroductionToAutomataTheory Kandar
No ratings yet
IntroductionToAutomataTheory Kandar
8 pages
ML_lab_Dipali
No ratings yet
ML_lab_Dipali
36 pages
LP Transportation Problems
No ratings yet
LP Transportation Problems
23 pages
12.10 Nonconvex Programming (With Spreadsheets)
No ratings yet
12.10 Nonconvex Programming (With Spreadsheets)
3 pages
Annamalai University - BCA - SCIENTIFIC COMPUTING 5386 FR 154
No ratings yet
Annamalai University - BCA - SCIENTIFIC COMPUTING 5386 FR 154
3 pages
Math 8 Week 1-2 Part 1 - Factoring Polynomials With Common Factors
No ratings yet
Math 8 Week 1-2 Part 1 - Factoring Polynomials With Common Factors
29 pages
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
No ratings yet
Automatic Hyperparameter Tuning With Sklearn Using Grid and Random Search - by Bex T. - Towards Data Science
8 pages
PBD Using NL Analysis by G. H. Powell
No ratings yet
PBD Using NL Analysis by G. H. Powell
103 pages
The Hungarian Method For The Assignment Problem
100% (1)
The Hungarian Method For The Assignment Problem
10 pages
Basic Maths I Assignment 1
No ratings yet
Basic Maths I Assignment 1
1 page
Curr/Major/Minor Course Codeclass Sectio
No ratings yet
Curr/Major/Minor Course Codeclass Sectio
1,432 pages
file_5edb2945380c6
No ratings yet
file_5edb2945380c6
13 pages
Importance of Clustering
No ratings yet
Importance of Clustering
5 pages
Qin2000 Article GeneralMatrixRepresentationsFo
No ratings yet
Qin2000 Article GeneralMatrixRepresentationsFo
10 pages
Backpropagation in Convolutional Neural Networks
No ratings yet
Backpropagation in Convolutional Neural Networks
4 pages
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
No ratings yet
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
22 pages
Chap 11 - Special Matrices and Gauss-Seidel
No ratings yet
Chap 11 - Special Matrices and Gauss-Seidel
8 pages
LIN1 Solved
No ratings yet
LIN1 Solved
5 pages
Asymptotic and Divide and Conquer
No ratings yet
Asymptotic and Divide and Conquer
49 pages
Programming Assignment-4
No ratings yet
Programming Assignment-4
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Recurrent Neural Network

Uploaded by

Recurrent Neural Network

Uploaded by

Recurrent Neural

For example, for the sentence: "I love cats“

We process it over 3 time steps:

• Weights (W, U, V)RNNs use different weight matrices:

Step What Happens Why It Matters

array([[-0.22803581, -0.43848372, 0.28444105],

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.