0% found this document useful (0 votes)
8 views34 pages

Recurrent Neural Network

Recurrent Neural Networks (RNNs) are designed for processing sequential data and can remember previous inputs through hidden states. They face limitations such as the vanishing gradient problem, which hinders learning long-term dependencies, and require significant computational resources. Bidirectional RNNs enhance context understanding by processing sequences in both forward and backward directions, making them particularly effective for natural language processing tasks.

Uploaded by

havih35403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views34 pages

Recurrent Neural Network

Recurrent Neural Networks (RNNs) are designed for processing sequential data and can remember previous inputs through hidden states. They face limitations such as the vanishing gradient problem, which hinders learning long-term dependencies, and require significant computational resources. Bidirectional RNNs enhance context understanding by processing sequences in both forward and backward directions, making them particularly effective for natural language processing tasks.

Uploaded by

havih35403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Recurrent Neural

Network
Limitations of CNN
• Limitation on processing sequential data ( remembering past data)
• High computational requirements.
• Needs large amount of labeled data.
• Large memory footprint.
• Interpretability challenges.
• Limited effectiveness for sequential data.
• Tend to be much slower.
• Training takes a long time.
ANN VS RNN
RNN (Recurrent Neural Network)
An RNN is a neural network designed to work with sequential data — like text, time series, or
audio. What makes RNNs unique is that they can remember previous inputs using internal
memory called hidden states, which are carried forward through time.

Time Step (t)In RNNs, input data is processed one element at a time. Each element corresponds to a
time step.

For example, for the sentence: "I love cats“

We process it over 3 time steps:


t = 1 → "I“
t = 2 → "love“
t = 3 → "cats“
At each time step, the RNN updates its internal state and produces
an output.
RNN
• Hidden State (hₜ)The hidden state is the memory of the network. It's a
vector that captures what the model has learned from the sequence
up to time t.
• hₜ = tanh(W * xₜ + U * hₜ₋₁ + b)
Where: xₜ is the current input
hₜ₋₁ is the hidden state from the previous time step
W and U are weight matrices
b is a bias vector
Purpose: hₜ helps the model retain important information from the past.
RNN
• tanh Activation Function**tanh** (hyperbolic tangent) is a commonly used
activation function in RNNs. It squashes values between -1 and 1 and adds non-
linearity.
• hₜ = tanh(Wxₜ + Uhₜ₋₁ + b)

• Weights (W, U, V)RNNs use different weight matrices:


• W → Input-to-hidden weights
• U → Hidden-to-hidden weights (how previous memory affects the current state)
• V → Hidden-to-output weights (used for predictions)
• All these weights are shared across time steps — meaning the same transformation
is applied at every time step.
RNN steps
• Let’s say we’re reading a sentence:
• “I love AI”
• At each time step t,
• the RNN: Reads a word (converted into a vector)
• Combines it with memory from the previous step (hₜ₋₁)
• Applies weights and tanh to get a new hidden state (hₜ)
• Uses hₜ to make a prediction or to continue processing the sequence
RNN Steps

Step What Happens Why It Matters


So the network can understand
1. Input Convert words to vectors
them
2. Step 1 Process first word + initial memory Begin building context
Process next word + update
3. Step 2 Add more context
memory
4. Output Generate prediction from memory Produce output word
5. Loss Compare prediction to real word Know how wrong it was
6. BPTT Adjust all weights through time Learn better from sequences
7. Repeat Train on many sequences Improve general understanding
RNN Architecture
• ANN works very well on fixed input size.
• RNN works with dynamics input size.(Movie Reviews)
• Alike ANN, RNN works very well with dynamic sequential data which has
sequential meaning.
• RNN are special case of ANN they have memory to store previous data.
• RNN input (Timesteps, input features) Vocabulary of 5 words.(Movie,
was, good, bad,not)
RNN
Input of RNN is
(Timestep, input-features)
Review Sentiment
• movie was good 1
• movie was bad 0
• movie was not good 0
Shape of input is (3,5) for first two inputs and (4,5) for third input.
RNN
Review Sentiment
• X11 X12 X13
• X1 movie was good 1
• X2 movie(X21) was(X22) bad(X23) 0
• X3 Movie (X31) was(X32) not(X33) good(X34) 0
Let’s say input weights are denoted by wi,
output weights with wo,
feedback loop weights with wh
and biases for first hidden layer is bi and
bias for output layer is bo,
then mathematical formulation for above is given by :
BP in RNN
Vanishing Gradient in RNN
Vanishing Gradient Problem in RNNs
• What is it?
• A training issue in RNNs where gradients become extremely small during backpropagation.
• Prevents the model from learning dependencies from earlier time steps.
• When does it happen?
• During Backpropagation Through Time (BPTT).
• Gradients are propagated backward across many time steps.
• Why does it happen?
• Gradients are repeatedly multiplied by weights during each step.
• If weights < 1, repeated multiplication makes gradients shrink exponentially.
• This leads to very small updates for early layers.
• Impact on Learning:
• Long-term dependencies are not learned.
• The network “forgets” older inputs in a long sequence.
• Visual Example:
• Imagine a chain of signals fading like echoes — after a while, the signal (gradient) becomes too weak to matter.
• Affected Models:
• Mainly affects vanilla RNNs.
• LSTM and GRU were designed specifically to mitigate this issue.
• Where RNNs still work well:
• For short sequences or tasks that require recent context, like:
• Autocomplete
• Next-word prediction in short texts
RNN Application
• Next word predictions
• Image captioning[1 to Many RNN], Music generation
• Google Translate
• Language Detection
• Translation
• Question and answering
RNN Architecture
RNN Types

Binary Image
POS tagging- Fixed
Sentiment analysis Classification- Actually not
Image captioning length
required RNN
Google translation
–variable length
1. What is a Bidirectional RNN?
• • A Bidirectional RNN processes the input sequence in both forward
and backward directions.
• • It has two RNNs: one reads the input from start to end, and the
other from end to start.
• • The outputs from both RNNs are combined, giving more context at
each time step.
2. Why Use Bidirectional RNNs?
• • Forward RNN captures past context, backward RNN captures future
context.
• • Useful in tasks where understanding both previous and next words
matters:
• - Named Entity Recognition (NER)
• - Part-of-Speech (POS) Tagging
• - Machine Translation
• - Speech Recognition
3. Architecture Overview
• • Input sequence: x₁, x₂, ..., xₙ
• • Forward RNN: h₁→, h₂→, ..., hₙ→
• • Backward RNN: h₁←, h₂←, ..., hₙ←
• • Output at each time step: yₜ = f(hₜ→, hₜ←)
• • Commonly f is concatenation or summation.
Limitations of RNN
• During backpropagation, gradients can become too small, leading to the
vanishing gradient problem, or too large, resulting in the exploding gradient
problem as they propagate backward through time.
• In the case of vanishing gradients, the issue is that the gradient may
become too small where the network struggles to capture long-term
dependencies effectively.
• It can still converge during training but it may take a very very long time.
• In contrast, in exploding gradient problem, large gradient can lead to
numerical instability during training, causing the model to deviate from the
optimal solution and making it difficult for the network to converge to
global minima.
5. Summary
• • Bidirectional RNNs improve context understanding by reading
sequences in both directions.
• • Especially powerful in NLP tasks.
• • Easy to implement in Keras using the Bidirectional wrapper.
Total params: 31 (124.00 B)
Trainable params: 31 (124.00 B)
Non-trainable params: 0 (0.00 B)

array([[-0.22803581, -0.43848372, 0.28444105],


[ 0.10338819, -0.04307288, 0.64785296],
[-0.50025153, 0.42375594, 0.5810246 ],
[ 0.21662587, 0.45042878, 0.09036738],
[ 0.20034915, -0.86127126, -0.59780455]], dtype=float32)
• This is a major hurdle for RNNs, especially vanilla RNNs. It occurs when gradients (signals used to
update weights during training) become very small or vanish as they propagate backward through
the network during BPTT. This makes it difficult for the network to learn long-term dependencies in
sequences, as information from earlier time steps can fade away.
• In RNNs, information from previous time steps is used to influence the current output. This is
achieved by feeding the output of a layer back as input to the next layer in the sequence. However,
during back propagation through these layers:
• Gradients are multiplied by weights at each step.
• If these weights are all less than 1 in absolute value, this product becomes progressively smaller as
we go back in time. This shrinks the error gradients for earlier time steps, making it difficult for the
network to learn long-term dependencies present in the sequence.
• RNNs struggle to learn from long sequences due to vanishing gradients. This makes them
unsuitable for tasks like predicting future events based on long passages. However, RNNs excel at
analyzing recent inputs, which is perfect for short-term predictions like suggesting the next word on
a mobile keyboard.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy