Recurrent Neural Network
Recurrent Neural Network
Network
Limitations of CNN
• Limitation on processing sequential data ( remembering past data)
• High computational requirements.
• Needs large amount of labeled data.
• Large memory footprint.
• Interpretability challenges.
• Limited effectiveness for sequential data.
• Tend to be much slower.
• Training takes a long time.
ANN VS RNN
RNN (Recurrent Neural Network)
An RNN is a neural network designed to work with sequential data — like text, time series, or
audio. What makes RNNs unique is that they can remember previous inputs using internal
memory called hidden states, which are carried forward through time.
Time Step (t)In RNNs, input data is processed one element at a time. Each element corresponds to a
time step.
Binary Image
POS tagging- Fixed
Sentiment analysis Classification- Actually not
Image captioning length
required RNN
Google translation
–variable length
1. What is a Bidirectional RNN?
• • A Bidirectional RNN processes the input sequence in both forward
and backward directions.
• • It has two RNNs: one reads the input from start to end, and the
other from end to start.
• • The outputs from both RNNs are combined, giving more context at
each time step.
2. Why Use Bidirectional RNNs?
• • Forward RNN captures past context, backward RNN captures future
context.
• • Useful in tasks where understanding both previous and next words
matters:
• - Named Entity Recognition (NER)
• - Part-of-Speech (POS) Tagging
• - Machine Translation
• - Speech Recognition
3. Architecture Overview
• • Input sequence: x₁, x₂, ..., xₙ
• • Forward RNN: h₁→, h₂→, ..., hₙ→
• • Backward RNN: h₁←, h₂←, ..., hₙ←
• • Output at each time step: yₜ = f(hₜ→, hₜ←)
• • Commonly f is concatenation or summation.
Limitations of RNN
• During backpropagation, gradients can become too small, leading to the
vanishing gradient problem, or too large, resulting in the exploding gradient
problem as they propagate backward through time.
• In the case of vanishing gradients, the issue is that the gradient may
become too small where the network struggles to capture long-term
dependencies effectively.
• It can still converge during training but it may take a very very long time.
• In contrast, in exploding gradient problem, large gradient can lead to
numerical instability during training, causing the model to deviate from the
optimal solution and making it difficult for the network to converge to
global minima.
5. Summary
• • Bidirectional RNNs improve context understanding by reading
sequences in both directions.
• • Especially powerful in NLP tasks.
• • Easy to implement in Keras using the Bidirectional wrapper.
Total params: 31 (124.00 B)
Trainable params: 31 (124.00 B)
Non-trainable params: 0 (0.00 B)