0% found this document useful (0 votes)
3 views7 pages

What are Recurrent Neural Networks.docx

Recurrent Neural Networks (RNN) are a type of artificial neural network that utilize internal memory to process sequences of data, making them suitable for tasks like speech recognition and text generation. RNNs differ from traditional neural networks by allowing outputs to depend on previous inputs and sharing parameters across layers, while they also face challenges such as the vanishing and exploding gradient problems. Advanced architectures like Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) have been developed to address these limitations and improve performance in sequential tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

What are Recurrent Neural Networks.docx

Recurrent Neural Networks (RNN) are a type of artificial neural network that utilize internal memory to process sequences of data, making them suitable for tasks like speech recognition and text generation. RNNs differ from traditional neural networks by allowing outputs to depend on previous inputs and sharing parameters across layers, while they also face challenges such as the vanishing and exploding gradient problems. Advanced architectures like Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) have been developed to address these limitations and improve performance in sequential tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

What are Recurrent Neural Networks (RNN)

A recurrent neural network (RNN) is the type of artificial neural


network (ANN) that is used in Apple’s Siri and Google’s voice search. RNN
remembers past inputs due to an internal memory which is useful for
predicting stock prices, generating text, transcriptions, and machine
translation.
In the traditional neural network, the inputs and the outputs are
independent of each other, whereas the output in RNN is dependent on
prior elementals within the sequence. Recurrent networks also share
parameters across each layer of the network. In feedforward networks,
there are different weights across each node. Whereas RNN shares the
same weights within each layer of the network and during gradient
descent, the weights and basis are adjusted individually to reduce the loss.

RNN

The image above is a simple representation of recurrent neural networks. If


we are forecasting stock prices using simple data [45,56,45,49,50,…], each
input from X0 to Xt will contain a past value. For example, X0 will have
45, X1 will have 56, and these values are used to predict the next number
in a sequence.

How Recurrent Neural Networks Work


In RNN, the information cycles through the loop, so the output is
determined by the current input and previously received inputs.
The input layer X processes the initial input and passes it to the middle
layer A. The middle layer consists of multiple hidden layers, each with its
activation functions, weights, and biases. These parameters are
standardized across the hidden layer so that instead of creating multiple
hidden layers, it will create one and loop it over.
Instead of using traditional backpropagation, recurrent neural networks
use backpropagation through time (BPTT) algorithms to determine the
gradient. In backpropagation, the model adjusts the parameter by
calculating errors from the output to the input layer. BPTT sums the error at
each time step as RNN shares parameters across each layer. Learn more
on RNNs and how they work at What are Recurrent Neural Networks?.

Types of Recurrent Neural Networks


Feedforward networks have single input and output, while recurrent neural
networks are flexible as the length of inputs and outputs can be changed.
This flexibility allows RNNs to generate music, sentiment classification, and
machine translation.
There are four types of RNN based on different lengths of inputs and
outputs.
●​ One-to-one is a simple neural network. It is commonly used for
machine learning problems that have a single input and output.
●​ One-to-many has a single input and multiple outputs. This is used
for generating image captions.
●​ Many-to-one takes a sequence of multiple inputs and predicts a
single output. It is popular in sentiment classification, where the input
is text and the output is a category.
●​ Many-to-many takes multiple inputs and outputs. The most common
application is machine translation.

Types of RNN

CNN vs. RNN


The convolutional neural network (CNN) is a feed-forward neural network capable of
processing spatial data. It is commonly used for computer vision applications such
as image classification. The simple neural networks are good at simple binary
classifications, but they can't handle images with pixel dependencies. The CNN
model architecture consists of convolutional layers, ReLU layers, pooling layers,
and fully connected output layers. You can learn CNN by working on a project such
as Convolutional Neural Networks in Python.

CNN Model Architecture

Key Differences Between CNN and RNN

●​ CNN is applicable for sparse data like images. RNN is applicable for
time series and sequential data.
●​ While training the model, CNN uses a simple backpropagation and
RNN uses backpropagation through time to calculate the loss.
●​ RNN can have no restriction in length of inputs and outputs, but CNN
has finite inputs and finite outputs.
●​ CNN has a feedforward network and RNN works on loops to handle
sequential data.
●​ CNN can also be used for video and image processing. RNN is
primarily used for speech and text analysis.

Limitations of RNN
Simple RNN models usually run into two major issues. These issues are related to
gradient, which is the slope of the loss function along with the error function.

1.​ Vanishing Gradient problem occurs when the gradient becomes so


small that updating parameters becomes insignificant; eventually the
algorithm stops learning.
2.​ Exploding Gradient problem occurs when the gradient becomes too
large, which makes the model unstable. In this case, larger error
gradients accumulate, and the model weights become too large. This
issue can cause longer training times and poor model performance.
The simple solution to these issues is to reduce the number of hidden layers within
the neural network, which will reduce some complexity in RNNs. These issues can
also be solved by using advanced RNN architectures such as LSTM and GRU.

RNN Advanced Architectures


The simple RNN repeating modules have a basic structure with a single tanh layer.
RNN simple structure suffers from short memory, where it struggles to retain
previous time step information in larger sequential data. These problems can easily
be solved by long short term memory (LSTM) and gated recurrent unit (GRU), as
they are capable of remembering long periods of information.

Simple RNN Cell

Long Short Term Memory (LSTM)

The Long Short Term Memory (LSTM) is the advanced type of RNN, which was
designed to prevent both decaying and exploding gradient problems. Just like RNN,
LSTM has repeating modules, but the structure is different. Instead of having a
single layer of tanh, LSTM has four interacting layers that communicate with each
other. This four-layered structure helps LSTM retain long-term memory and can be
used in several sequential problems including machine translation, speech
synthesis, speech recognition, and handwriting recognition. You can gain hands-on
experience in LSTM by following the guide: Python LSTM for Stock Predictions.
LSTM Cell

Gated Recurrent Unit (GRU)

The gated recurrent unit (GRU) is a variation of LSTM as both have design
similarities, and in some cases, they produce similar results. GRU uses an update
gate and reset gate to solve the vanishing gradient problem. These gates decide
what information is important and pass it to the output. The gates can be trained to
store information from long ago, without vanishing over time or removing irrelevant
information.

Unlike LSTM, GRU does not have cell state Ct. It only has a hidden state ht, and
due to the simple architecture, GRU has a lower training time compared to LSTM
models. The GRU architecture is easy to understand as it takes input xt and the
hidden state from the previous timestamp ht-1 and outputs the new hidden state ht.
You can get in-depth knowledge about GRU at Understanding GRU Networks.
GRU Cell

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy