Semster - DL
Semster - DL
● Computational Complexity: Since BiRNNs process data in both directions, they require
more computational resources and time compared to unidirectional RNNs.
● Memory Usage: The need to store both forward and backward states increases memory
usage, which can be a concern for large datasets or long sequences.
● Not Suitable for Real-time Processing: BiRNNs are less effective in real-time
applications where the future context is not available at the time of prediction.
● Training Difficulty: Like all RNNs, BiRNNs can still suffer from vanishing or
exploding gradient problems, making training difficult on long sequences.
5. What are the applications of RNN?
Recurrent Neural Networks (RNNs) have a wide range of applications in tasks involving
sequential data:
● Speech Recognition
● Natural Language Processing (NLP)
● Time Series Prediction
● Video Processing
● Music Generation
● Handwriting Recognition
● Anomaly Detection
● Recurrent neural networks are a class of artificial neural network commonly used for
sequential data processing. Unlike feedforward neural networks, which process data in a
single pass, RNNs process data across multiple time steps, making them well-adapted
for modelling and processing text, speech, and time series.
● This architecture enables RNNs to maintain a "memory" of previous inputs, making them
ideal for tasks where context or temporal dependencies are important, such as speech
recognition, language modeling, and time series prediction. At each time step, an RNN
takes the current input and combines it with the previous hidden state to produce the
output and the next hidden state.
Drawbacks:
● Struggles with long-term dependencies.
● Training is computationally expensive.
● Suffers from vanishing and exploding gradients.
Number of Layers Multiple layers of recurrent units Single layer of recurrent units.
stacked together.
Application Suitable for complex tasks like Suitable for simpler sequential
machine translation, speech tasks.
recognition.
● RNN Unfolding
RNN unfolding, or “unrolling,” is the process of expanding the recurrent structure over time
steps. During unfolding, each step of the sequence is represented as a separate layer in a
series, illustrating how information flows across each time step. This unrolling enables
backpropagation through time (BPTT), a learning process where errors are propagated across
time steps to adjust the network’s weights, enhancing the RNN’s ability to learn dependencies
within sequential data
1. Input Layer:
The input layer receives sequential data, where each input at time step t, denoted as (x_t) can
be a vector representing a feature (e.g., a word in NLP, or a value in a time series).
2. Hidden Layer:
● The hidden layer consists of recurrent units (like simple RNN cells, LSTM cells, or GRU
cells).
● The key characteristic of the RNN is the presence of loops in the hidden layer, which
allows the network to maintain a hidden state (h_t)
● The activation function (e.g., tanh or ReLU) introduces non-linearity to the model.
3. Output Layer:
● The output layer produces the result based on the hidden state at each time step.
● For tasks like sequence prediction, the output can be a probability distribution over
possible next states or labels.
Varieties of RNN:
● Vanilla RNN: The basic form of RNN with simple recurrent connections
● Long Short-Term Memory (LSTM): A specialized RNN designed to overcome the
vanishing gradient problem. It uses memory cells and gates (forget, input, output) to
maintain and control information over long sequences
● Gated Recurrent Unit (GRU): Similar to LSTM, but with a simplified architecture. GRUs
combine the forget and input gates into a single update gate
● Bidirectional RNN (BiRNN): Processes sequences in both forward and backward
directions, allowing the network to capture context from both past and future information,
enhancing performance in tasks like machine translation.
● Deep RNN: An architecture where multiple layers of RNNs are stacked together,
allowing for more complex representations and better performance on tasks requiring
deeper temporal context.
2.DEEP RNN
A deep RNN is simply an RNN with multiple hidden layers stacked on top of each other. This
stacking allows the network to learn more complex patterns and representations from the data.
Each layer in a deep RNN can capture different levels of abstraction, making it more powerful
than a single-layer RNN.
1. Encoder:
○ Processes the input sequence and compresses it into a fixed-length context
vector (also called a thought vector or latent representation).
○ Captures the information and semantics of the entire input sequence.
2. Decoder:
○ Takes the context vector as input and generates the output sequence step by
step.
○ Decodes the compressed information into the target sequence.
Workflow:
1. The encoder processes the input sequence and produces the context vector CCC.
2. The decoder uses CCC as input to generate the output sequence one token at a time.
3. The process continues until a predefined end-of-sequence token is generated.
How it Works:
1. Encoder:
○ The encoder is typically an RNN (or LSTM/GRU) that reads the input sequence
and compresses it into a fixed-size vector called the context vector or hidden
state. This vector represents the entire input sequence in a summarized form.
○ The encoder processes the input sequence one step at a time, updating its
hidden state based on the current input and previous hidden state.
2. Decoder:
○ The decoder is another RNN that generates the output sequence based on the
context vector received from the encoder. At each time step, the decoder predicts
the next element in the output sequence, taking the previous hidden state and its
previous output as input.
○ In a translation task, for instance, the encoder reads the source sentence (e.g.,
English), and the decoder generates the target sentence (e.g., French) one word
at a time.
Key Components:
● Bottleneck Problem: The entire input sequence is compressed into a single fixed-size
vector (context vector). If the sequence is long or complex, this vector may not capture
all the relevant information, leading to performance degradation, especially for long
sequences.
● At each step of the decoding process, the model computes a set of attention weights
that determine which parts of the input sequence (from the encoder) are most relevant
for generating the current output token.
● This allows the model to focus on specific parts of the input, dynamically adjusting the
attention as it generates the output sequence.