0% found this document useful (0 votes)
31 views41 pages

Biological Neural Network Ar Ficial Neural Network

Uploaded by

udaynyapathi07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views41 pages

Biological Neural Network Ar Ficial Neural Network

Uploaded by

udaynyapathi07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

What is Ar ficial Neural Network?

The term "Ar ficial Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to one
another, ar ficial neural networks also have neurons that are interconnected to one another in
various layers of the networks. These neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Ar ficial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Ar ficial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.

Rela onship between Biological neural network and ar ficial neural network:

Biological Neural Network Ar ficial Neural Network

Dendrites Inputs
Cell nucleus Nodes

Synapse Weights

Axon Output

An Ar ficial Neural Network in the field of Ar ficial intelligence where it a empts to mimic the
network of neurons makes up a human brain so that computers will have an op on to understand
things and make decisions in a human-like manner. The ar ficial neural network is designed by
programming computers to behave simply like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an associa on point
somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such a manner
as to be distributed, and we can extract more than one piece of this data when necessary from our
memory parallelly. We can say that the human brain is made up of incredibly amazing parallel
processors.

Ar ficial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calcula ons to find
hidden features and pa erns.

Output Layer:

The input goes through a series of transforma ons using the hidden layer, which finally results in
output that is conveyed using this layer.
The ar ficial neural network takes input and computes the weighted sum of the inputs and includes
a bias. . This computa on is represented in the form of a transfer func on

It determines weighted total is passed as an input to an ac va on func on to produce the output.


Ac va on func ons choose whether a node should fire or not. Only those who are fired make it to
the output layer. There are dis nc ve ac va on func ons available that can be applied upon the sort
of task we are performing.

Advantages of Ar ficial Neural Network (ANN)

Parallel processing capability:

Ar ficial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the en re network:

Data that is used in tradi onal programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.

Capability to work with incomplete knowledge:

A er ANN training, the informa on may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.

Having a memory distribu on:

For ANN is to be able to adapt, it is important to determine the examples and to encourage the
network according to the desired output by demonstra ng these examples to the network. The
succession of the network is directly propor onal to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.

Having fault tolerance:

Extor on of one or more cells of ANN does not prohibit it from genera ng output, and this feature
makes the network fault-tolerance.

Disadvantages of Ar ficial Neural Network:

Assurance of proper network structure:

There is no par cular guideline for determining the structure of ar ficial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a tes ng solu on, it does not provide
insight concerning why and how. It decreases trust in the network.

Hardware dependence:
Ar ficial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realiza on of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presenta on mechanism to be resolved here will directly impact the
performance of the network. It relies on the user's abili es.

How do ar ficial neural networks work?

Ar ficial Neural Network can be best represented as a weighted directed graph, where the ar ficial
neurons form the nodes. The associa on between the neurons outputs and neuron inputs can be
viewed as the directed edges with weights. The Ar ficial Neural Network receives the input signal
from the external source in the form of a pa ern and image in the form of a vector. These inputs are
then mathema cally assigned by the nota ons x(n) for every n number of inputs.

A erward, each of the input is mul plied by its corresponding weights ( these weights are the details
u lized by the ar ficial neural networks to solve a specific problem ). In general terms, these weights
normally represent the strength of the interconnec on between neurons inside the ar ficial neural
network. All the weighted inputs are summarized inside the compu ng unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or something
else to scale up to the system's response. Bias has the same input, and weight equals to 1. Here the
total of weighted inputs can be in the range of 0 to posi ve infinity. Here, to keep the response in the
limits of the desired value, a certain maximum value is benchmarked, and the total of weighted
inputs is passed through the ac va on func on.

The ac va on func on refers to the set of transfer func ons used to achieve the desired output.
There is a different kind of the ac va on func on, but primarily either linear or non-linear sets of
func ons. Some of the commonly used sets of ac va on func ons are the Binary, linear, and Tan
hyperbolic sigmoidal ac va on func ons. Let us take a look at each of them in details:

Binary:

In binary ac va on func on, the output is either a one or a 0. Here, to accomplish this, there is a
threshold value set up. If the net weighted input of neurons is more than 1, then the final output of
the ac va on func on is returned as one or else the output is returned as 0.

Sigmoidal Hyperbolic:

The Sigmoidal Hyperbola func on is generally seen as an "S" shaped curve. Here the tan hyperbolic
func on is used to approximate output from the actual net input. The func on is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Ar ficial Neural Network:

There are various types of Ar ficial Neural Networks (ANN) depending upon the human brain neuron
and network func ons, an ar ficial neural network similarly performs tasks. The majority of the
ar ficial neural networks will have some similari es with a more complex biological partner and are
very effec ve at their expected tasks. For example, segmenta on or classifica on.

Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-evolved results
internally. As per the University of Massachuse s, Lowell Centre for Atmospheric Research. The
feedback networks feed informa on back into itself and are well suited to solve op miza on issues.
The Internal system error correc ons u lize feedback ANNs.

Feed-Forward ANN:

A feed-forward network is a basic neural network comprising of an input layer, an output layer, and
at least one layer of a neuron. Through assessment of its output by reviewing its input, the intensity
of the network can be no ced based on group behaviour of the associated neurons, and the output
is decided. The primary advantage of this network is that it figures out how to evaluate and recognize
input pa erns.

types of neural network models explained

There are many different types of ar ficial neural networks, varying in complexity. They share the
intended goal of mirroring the func on of the human brain to solve complex problems or tasks. The
structure of each type of ar ficial neural network in some way mirrors neurons and synapses.
However, they differ in terms of complexity, use cases, and structure. Differences also include how
ar ficial neurons are modelled within each type of ar ficial neural network, and the connec ons
between each node. Other differences include how the data may flow through the ar ficial neural
network, and the density of the nodes.

5 examples of the different types of ar ficial neural network include:

 Feedforward ar ficial neural networks


 Perceptron and Mul layer Perceptron neural networks

 Radial basis func on ar ficial neural networks

 Recurrent neural networks

 Modular neural networks

Feedforward ar ficial neural networks

As the name suggests, a Feedforward ar ficial neural network is when data moves in one direc on
between the input and output nodes. Data moves forward through layers of nodes, and won’t cycle
backwards through the same layers. Although there may be many different layers with many
different nodes, the one-way movement of data makes Feedforward neural networks rela vely
simple. Feedforward ar ficial neural network models are mainly used for simplis c classifica on
problems. Models will perform beyond the scope of a tradi onal machine learning model, but don’t
meet the level of abstrac on found in a deep learning model.

Perceptron and Mul layer Perceptron neural networks

A perceptron is one of the earliest and simplest models of a neuron. A Perceptron model is a binary
classifier, separa ng data into two different classifica ons. As a linear model it is one of the simplest
examples of a type of ar ficial neural network.

Mul layer Perceptron ar ficial neural networks adds complexity and density, with the capacity for
many hidden layers between the input and output layer. Each individual node on a specific layer is
connected to every node on the next layer. This means Mul layer Perceptron models are fully
connected networks, and can be leveraged for deep learning.

They’re used for more complex problems and tasks such as complex classifica on or voice
recogni on. Because of the model’s depth and complexity, processing and model maintenance can
be resource and me-consuming.

Radial basis func on ar ficial neural networks

Radial basis func on neural networks usually have an input layer, a layer with radial basis func on
nodes with different parameters, and an output layer. Models can be used to perform classifica on,
regression for me series, and to control systems. Radial basis func ons calculate the absolute value
between a centre point and a given point. In the case of classifica on, a radial basis func on
calculates the distance between an input and a learned classifica on. If the input is closest to a
specific tag, it is classified as such.

A common use for radial basis func on neural networks is in system control, such as systems that
control power restora on a er a power cut. The ar ficial neural network can understand the priority
order to restoring power, priori sing repairs to the greatest number of people or core services.

Recurrent neural networks

Recurrent neural networks are powerful tools when a model is designed to process sequen al data.
The model will move data forward and loop it backwards to previous steps in the ar ficial neural
network to best achieve a task and improve predic ons. The layers between the input and output
layers are recurrent, in that relevant informa on is looped back and retained. Memory of outputs
from a layer is looped back to the input where it is held to improve the process for the next input.
The flow of data is similar to Feedforward ar ficial neural networks, but each node will retain
informa on needed to improve each step. Because of this, models can be er understand the context
of an input and refine the predic on of an output. For example, a predic ve text system may use
memory of a previous word in a string of words to be er predict the outcome of the next word. A
recurrent ar ficial neural network would be be er suited to understand the sen ment behind a
whole sentence compared to more tradi onal machine learning models.

Recurrent neural networks are also used within sequence to sequence models, which are used for
natural language processing. Two recurrent neural networks are used within these models, which
consists of a simultaneous encoder and decoder. These models are used for reac ve chatbots,
transla ng language, or to summarise documents.

Modular neural networks

A Modular ar ficial neural network consists of a series of networks or components that work
together (though independently) to achieve a task. A complex task can therefore be broken down
into smaller components. If applied to data processing or the compu ng process, the speed of the
processing will be increased as smaller components can work in tandem.

Each component network is performing a different subtask which when combined completes the
overall tasks and output. This type of ar ficial neural network is beneficial as it can make complex
processes more efficient, and can be applied to a range of environments.

Challenges of ar ficial neural network models

Although there is huge poten al for leveraging ar ficial neural networks in machine learning, the
approach comes with some challenges. Models are complex, and it can be difficult to explain the
reasoning behind a decision in what in many cases is a black box opera on. This makes the issue
of explainability a significant challenge and considera on.

With all types of machine learning models, the accuracy of the final model depends heavily on the
quan ty and quality of training data available. A model built with an ar ficial neural network needs
even more data and resources to train than a tradi onal machine learning model. This means
millions of data points in contrast to the hundreds of thousands needed by a tradi onal machine
learning model.

The most complex ar ficial neural networks are o en referred to as deep neural networks,
referencing the mul -layered network architecture. Deep learning models are usually trained using
labelled training data, which is data with a defined input and output. This is known as supervised
machine learning, unlike unsupervised machine learning which uses unlabelled, raw training data.
The model will learn the features and pa erns within the labelled training data, and learn to perform
an intended task through the examples in the training data. Ar ficial neural networks need a huge
amount of training data, more so then more tradi onal machine learning algorithms. This is in the
realm of big data, so many millions of data points may be required.

Neural Network Architectures

Neural network architectures are the building blocks of deep learning models. They consist of
interconnected nodes, called neurons, which are organized in layers. Each neuron receives inputs,
computes mathema cal opera ons, and produces outputs.
Main Components of Neural Network Architecture

Neural network architectures consist of several components that work together to process and learn
from data. The main components of a neural network architecture are:

1. Input Layer: The input layer is the ini al layer of the neural network and is responsible for
receiving the input data. Each neuron in the input layer represents a feature or a ribute of
the input data.

2. Hidden Layers: Hidden layers are the intermediate layers between the input and output
layers. They perform computa ons and transform the input data through a series of
weighted connec ons. The number of hidden layers and the number of neurons in each
layer can vary depending on the complexity of the task and the amount of data available.

3. Neurons (Nodes): Neurons, also known as nodes, are the individual compu ng units within a
neural network. Each neuron receives input from the previous layer or directly from the
input layer, performs a computa on using weights and biases, and produces an output value
using an ac va on func on.

4. Weights and Biases: Weights and biases are parameters associated with the connec ons
between neurons. The weights determine the strength or importance of the connec ons,
while the biases introduce a constant that helps control the neuron's ac va on. These
parameters are adjusted during the training process to op mize the network's performance.

5. Ac va on Func ons: Ac va on func ons are special mathema cal formulas that add non-
linear behaviour to the network and allow it to learn complex pa erns. Common ac va on
func ons include the sigmoid func on, the rec fied linear unit (ReLU), and the hyperbolic
tangent (tanh) func on. Each neuron applies the ac va on func on to the weighted sum of
its inputs to produce the output. Each func on behaves differently and has its own
characteris cs. They help the network process and transform the input informa on, making
it more suitable for capturing the complexity of real-world data. Ac va on func ons help
neurons make decisions and capture intricate rela onships in the data, making neural
networks powerful tools for pa ern recogni on and accurate predic ons.

6. Output Layer: The output layer is the final layer of the neural network that produces the
network's predic ons or outputs a er processing the input data. The number of neurons in
the output layer depends on the nature of the task. For binary classifica on tasks, where the
goal is to determine whether something belongs to one of two categories (e.g., yes/no,
true/false), the output layer typically consists of a single neuron. For mul -class classifica on
tasks, where there are more than two categories to consider (e.g., classifying images into
different objects), the output layer consists of mul ple neurons.

7. Loss Func on: The loss func on measures the discrepancy between the network's predicted
output and the true output. It quan fies the network's performance during training and
serves as a guide for adjus ng the weights and biases. For example, if the task involves
predic ng numerical values, like es ma ng the price of a house based on its features, the
mean squared error loss func on may be used. This func on calculates the average of the
squared differences between the network's predicted values and the true values. On the
other hand, if the task involves classifica on, where the goal is to assign input data to
different categories, a loss func on called cross-entropy is o en used. Cross-entropy
measures the difference between the predicted probabili es assigned by the network and
the true labels of the data. It helps the network understand how well it is classifying the
input into the correct categories.

These components work together to process input data, propagate informa on through the network,
and produce the desired output. The weights and biases are adjusted during the training process
through op miza on algorithms to minimize the loss func on and improve the network's
performance.

Learning largely involves adjustments to the synap c connec ons that exist between the neurons.

Ar ficial Neural Networks (ANNs) are a type of machine learning model that are inspired by the
structure and func on of the human brain. They consist of layers of interconnected “neurons” that
process and transmit informa on.

There are several different architectures for ANNs, each with their own strengths and weaknesses.
Some of the most common architectures include:

Feedforward Neural Networks: This is the simplest type of ANN architecture, where the informa on
flows in one direc on from input to output. The layers are fully connected, meaning each neuron in a
layer is connected to all the neurons in the next layer.

Recurrent Neural Networks (RNNs): These networks have a “memory” component, where
informa on can flow in cycles through the network. This allows the network to process sequences of
data, such as me series or speech.

Convolu onal Neural Networks (CNNs): These networks are designed to process data with a grid-like
topology, such as images. The layers consist of convolu onal layers, which learn to detect specific
features in the data, and pooling layers, which reduce the spa al dimensions of the data.

Autoencoders: These are neural networks that are used for unsupervised learning. They consist of an
encoder that maps the input data to a lower-dimensional representa on and a decoder that maps
the representa on back to the original data.

Genera ve Adversarial Networks (GANs): These are neural networks that are used for genera ve
modeling. They consist of two parts: a generator that learns to generate new data samples, and a
discriminator that learns to dis nguish between real and generated data.

The model of an ar ficial neural network can be specified by three en es:

 Interconnec ons

 Ac va on func ons

 Learning rules

Interconnec ons:

Interconnec on can be defined as the way processing elements (Neuron) in ANN are connected to
each other. Hence, the arrangements of these processing elements and geometry of
interconnec ons are very essen al in ANN.
These arrangements always have two layers that are common to all network architectures, the Input
layer and output layer where the input layer buffers the input signal, and the output layer generates
the output of the network. The third layer is the Hidden layer, in which neurons are neither kept in
the input layer nor in the output layer. These neurons are hidden from the people who are
interfacing with the system and act as a black box to them. By increasing the hidden layers with
neurons, the system’s computa onal and processing power can be increased but the training
phenomena of the system get more complex at the same me.

There exist five basic types of neuron connec on architecture :

1. Single-layer feed-forward network

2. Mul layer feed-forward network

3. Single node with its own feedback

4. Single-layer recurrent network

5. Mul layer recurrent network

1. Single-layer feed-forward network

In this type of network, we have only two layers input layer and the output layer but the input layer
does not count because no computa on is performed in this layer. The output layer is formed when
different weights are applied to input nodes and the cumula ve effect per node is taken. A er this,
the neurons collec vely give the output layer to compute the output signals.

2. Mul layer feed-forward network


This layer also has a hidden layer that is internal to the network and has no direct contact with the
external layer. The existence of one or more hidden layers enables the network to be
computa onally stronger, a feed-forward network because of informa on flow through the input
func on, and the intermediate computa ons used to determine the output Z. There are no feedback
connec ons in which outputs of the model are fed back into itself.

3. Single node with its own feedback

Single Node with own Feedback

When outputs can be directed back as inputs to the same layer or preceding layer nodes, then it
results in feedback networks. Recurrent networks are feedback networks with closed loops. The
above figure shows a single recurrent network having a single neuron with feedback to itself.

4. Single-layer recurrent network


The above network is a single-layer network with a feedback connec on in which the processing
element’s output can be directed back to itself or to another processing element or both. A recurrent
neural network is a class of ar ficial neural networks where connec ons between nodes form a
directed graph along a sequence. This allows it to exhibit dynamic temporal behavior for a me
sequence. Unlike feedforward neural networks, RNNs can use their internal state (memory) to
process sequences of inputs.

5. Mul layer recurrent network

In this type of network, processing element output can be directed to the processing element in the
same layer and in the preceding layer forming a mul layer recurrent network. They perform the
same task for every element of a sequence, with the output being dependent on the previous
computa ons. Inputs are not needed at each me step. The main feature of a Recurrent Neural
Network is its hidden state, which captures some informa on about a sequence.

Error Correc on Learning:

Error correc on learning is a fundamental concept in machine learning where a system improves its
performance by learning from mistakes. The idea is based on receiving feedback on errors made
during predic ons or decisions and using that informa on to adjust the model or system itera vely.

Key Techniques:

1. Loss Func ons:

In supervised learning, models predict outcomes, and the difference between the predicted and
actual values is called the "error" or "loss." The model uses a loss func on to quan fy this error.

 Mean Squared Error (MSE): Measures the average squared difference between the actual
value and the predicted value, commonly used in regression tasks.

 Cross-Entropy Loss: Used in classifica on problems, it measures the difference between


predicted probabili es and actual class labels. A smaller cross-entropy value indicates be er
predic ons.
The primary goal is to minimize the loss, meaning the model's predic ons are becoming closer to the
actual values.

2. Gradient Descent & Backpropaga on:

Gradient Descent is an op miza on technique used to minimize the loss func on. It works by
compu ng the gradient (or slope) of the loss func on concerning each model parameter (weights
and biases) and then adjus ng those parameters in the direc on that reduces the error.

Backpropaga on is used specifically in neural networks to propagate the error backward from the
output layer through the hidden layers to the input layer. Each layer’s weights are updated based on
the calculated error, allowing the model to improve its accuracy by correc ng its mistakes in a
systema c way.

Key Steps:

 Compute error (loss) based on current model predic ons.

 Calculate the gradient of the loss func on (i.e., how much each weight contributed to the
error).

 Update the model weights to reduce the error.

3. Reinforcement Learning:

In reinforcement learning (RL), an agent learns by interac ng with an environment. The agent
receives rewards for correct ac ons and penal es for incorrect ones. Over me, the agent learns to
maximize its cumula ve reward by avoiding ac ons that lead to errors (nega ve rewards) and
repea ng ac ons that lead to rewards.

 Trial and Error: The agent uses trial and error to explore various ac ons and their outcomes.

 Learning from Mistakes: Errors (nega ve feedback) guide the agent toward be er decision-
making, correc ng its policy over me.

4. Boos ng Algorithms (Adaboost):

Boos ng algorithms are a form of ensemble learning that focus on correc ng the mistakes of weaker
classifiers by giving more weight to misclassified instances. Adaboost, for example, itera vely adjusts
the weight of misclassified samples, making the next classifier focus more on those errors. As a
result, the combined model performs be er.

 Each new classifier focuses more on the samples that the previous classifiers got wrong.

 The final predic on is made by combining the output of all classifiers, giving more weight to
the ones that performed be er on the harder examples.

Applica ons:

1. Neural Networks: Neural networks use backpropaga on and gradient descent to correct
errors, helping the model "learn" complex pa erns over me.

2. Reinforcement Learning Systems: RL is used in robo cs, game AI, and autonomous systems
where the system con nuously learns from the environment and corrects its mistakes.
3. Ensemble Learning: Algorithms like Adaboost and XGBoost use error correc on by refining
weak classifiers' mistakes and combining them for be er accuracy.

Conclusion:

Error correc on learning is a cri cal concept in machine learning that allows systems to con nuously
improve by learning from their mistakes. By minimizing loss and adjus ng to feedback, models can
become highly accurate and effec ve in various tasks, from classifica on and regression to decision-
making in complex environments.

Memory-based learning in neural networks involves using past informa on to make be er


predic ons or decisions, par cularly when the data has dependencies over me or within
sequences. Instead of solely learning sta c pa erns, these networks incorporate mechanisms to
remember and recall relevant informa on from previous inputs or events. This ability to retain and
manage memory is cri cal for tasks like language processing, me-series forecas ng, and sequence-
based decision-making.

Here are some key components and techniques used in memory-based learning:

### 1. **Recurrent Neural Networks (RNNs)**

RNNs were one of the earliest neural network architectures designed to handle sequen al data.
Unlike feedforward networks, RNNs have connec ons that form cycles, allowing them to maintain a
hidden state or "memory" that captures informa on from previous inputs. When processing a
sequence, RNNs update their hidden state at each step based on both the current input and the
hidden state from the previous step. This allows them to retain contextual informa on over me.
However, standard RNNs struggle with learning long-range dependencies because of issues like the
vanishing gradient problem, where gradients during backpropaga on become too small to update
the network effec vely.

### 2. **Long Short-Term Memory (LSTM)**

LSTMs were introduced to solve the vanishing gradient problem in tradi onal RNNs. They are
designed to capture both short-term and long-term dependencies in data more effec vely. LSTMs
include a more sophis cated memory structure called a "cell state," which flows through the
network and is modified by three gates: the input gate, the forget gate, and the output gate. The
input gate controls how much new informa on should be added to the cell state, the forget gate
determines how much informa on should be discarded, and the output gate decides how much
informa on from the cell state should be passed on to the next step. This ga ng mechanism allows
LSTMs to keep relevant informa on for a long me and forget irrelevant data, making them effec ve
in learning long-term dependencies in sequen al data.
### 3. **Gated Recurrent Units (GRU)**

GRUs are a simpler varia on of LSTMs that use fewer gates to control memory. They have two main
gates: a reset gate and an update gate. The reset gate determines how much of the past informa on
should be forgo en, while the update gate decides how much of the new input should be
incorporated into the hidden state. GRUs are computa onally faster than LSTMs and perform
similarly on many tasks, making them a popular choice when simplicity and efficiency are important.
Like LSTMs, GRUs are good at capturing long-term dependencies in data but are less computa onally
expensive to train.

### 4. **A en on Mechanism**

The a en on mechanism is designed to overcome the limita ons of RNNs and LSTMs when dealing
with long sequences. Instead of processing sequences step-by-step and relying heavily on a fixed-
length memory (as in LSTMs), the a en on mechanism dynamically decides which parts of the input
sequence to focus on when making predic ons. It works by assigning different weights to different
parts of the input sequence, allowing the network to a end to the most relevant parts while ignoring
less important informa on. This is par cularly useful in tasks like machine transla on, where certain
words in a sentence may be more important than others when transla ng to another language.

### 5. **Memory Networks (MemNN)**

Memory Networks explicitly integrate an external memory component that can be read from and
wri en to, enabling the network to store and recall relevant facts during reasoning. In tradi onal
neural networks, memory is implicit and short-term, ed to weights and hidden states, but in
Memory Networks, the memory is separate from the network’s computa on, ac ng as a form of
long-term storage. These networks are especially useful in tasks like ques on-answering, where the
model needs to reference specific informa on (e.g., facts or passages of text) from a larger dataset.
The memory module in these networks is typically organized as key-value pairs, where keys help
retrieve relevant memories during a task.

### 6. **Neural Turing Machines (NTMs)**

Neural Turing Machines (NTMs) extend the idea of memory networks by combining a neural network
controller (o en an RNN) with a differen able memory bank. NTMs are inspired by Turing machines,
where the controller can perform read and write opera ons to memory, mimicking how a computer
works with memory. NTMs are able to learn simple algorithms such as sor ng or copying sequences,
making them suitable for tasks that require complex memory manipula on. The external memory in
NTMs is accessible by the controller through differen able opera ons, allowing the en re system to
be trained using gradient descent, similar to standard neural networks.

### 7. **Transformers**

Transformers represent a significant advancement in memory-based learning and have largely


replaced RNNs and LSTMs in many sequen al tasks. They rely en rely on a en on mechanisms
(specifically, self-a en on) to model dependencies in data, without using any recurrent connec ons.
In a transformer, each word (or element in the sequence) a ends to every other word in the input
sequence simultaneously, allowing the model to learn both local and global dependencies efficiently.
This parallel processing capability makes transformers much faster to train than RNNs or LSTMs,
especially on large datasets. Transformers are the backbone of many state-of-the-art models in
natural language processing, such as BERT and GPT.

### 8. **Memory-Augmented Neural Networks (MANNs)**

Memory-Augmented Neural Networks (MANNs) expand on the concept of combining tradi onal
neural networks with external memory. In MANNs, the neural network acts as a controller that
interacts with an external memory system, which can store informa on over long periods. MANNs
are par cularly useful in tasks like few-shot learning, where the model needs to quickly adapt to new
informa on with limited training data. By leveraging memory, MANNs can recall similar past
examples and generalize be er in these low-data scenarios.

Memory-based learning techniques are cri cal for applica ons where long-term dependencies,
contextual understanding, or external knowledge retrieval are essen al. These networks allow for
more sophis cated reasoning, learning from sequences, and leveraging both short-term and long-
term memory for improved predic ons and decision-making.

Hebbian learning is one of the foundational learning rules in neural networks, often
summarized by the phrase "cells that fire together, wire together." It describes how the
strength of connections (synapses) between neurons can be adjusted based on the activity of
the neurons. Hebbian learning is biologically inspired and based on the principle that if two
neurons frequently activate together, the connection between them should be strengthened.

Key Concepts in Hebbian Learning:

1. Synaptic Weight Strengthening: When a neuron successfully activates another


neuron, the connection between them becomes stronger. This corresponds to the idea
that the correlation of neuron activities increases the synaptic weight.
2. Locality: The learning rule is local, meaning that only the interacting neurons and
their connections are involved in the weight update process.
3. Unsupervised Learning: Hebbian learning is generally considered an unsupervised
learning rule since it does not require labeled data or error signals to update weights.
The changes occur solely based on the activity levels of pre- and post-synaptic
neurons.

Mathematical Formula for Hebbian Learning

The basic formula for Hebbian learning in neural networks is:

Δwij=η⋅xi⋅yj\Delta w_{ij} = \eta \cdot x_i \cdot y_jΔwij=η⋅xi⋅yj

Where:
 Δwij\Delta w_{ij}Δwij is the change in the weight between the iii-th input neuron and
the jjj-th output neuron.
 η\etaη is the learning rate (a small positive constant).
 xix_ixi is the activation of the pre-synaptic neuron iii (input neuron).
 yjy_jyj is the activation of the post-synaptic neuron jjj (output neuron).

This rule states that the weight wijw_{ij}wij will increase if both xix_ixi and yjy_jyj are
positive (i.e., if both neurons are firing together).

Variants of Hebbian Learning

There are several variants and improvements to the basic Hebbian rule to make it more
suitable for different neural network architectures and tasks:

1. Oja's Rule: To prevent weights from growing indefinitely in Hebbian learning, Oja's
rule introduces a normalization term:

Δwij=η⋅yj⋅(xi−yj⋅wij)\Delta w_{ij} = \eta \cdot y_j \cdot (x_i - y_j \cdot w_{ij})Δwij
=η⋅yj⋅(xi−yj⋅wij)

This ensures that the weights stabilize over time, avoiding runaway weight growth.

2. Covariance Rule: A refinement of Hebbian learning that takes into account


deviations from the mean activation level of neurons:

Δwij=η⋅(xi−xˉ)⋅(yj−yˉ)\Delta w_{ij} = \eta \cdot (x_i - \bar{x}) \cdot (y_j -


\bar{y})Δwij=η⋅(xi−xˉ)⋅(yj−yˉ)

Here, xˉ\bar{x}xˉ and yˉ\bar{y}yˉ are the average activations of the pre- and post-
synaptic neurons, respectively.

3. Anti-Hebbian Learning: In this rule, the connection strength between neurons is


reduced when they are simultaneously active. This can be useful in cases where
decorrelation between inputs is desirable:

Δwij=−η⋅xi⋅yj\Delta w_{ij} = -\eta \cdot x_i \cdot y_jΔwij=−η⋅xi⋅yj

Properties and Limitations

 Correlation-Based Learning: Hebbian learning is purely correlation-based. If two


neurons are active at the same time, the weight between them increases, which can
lead to stable patterns of activation (e.g., in associative memory tasks).
 Lack of Error Minimization: Unlike backpropagation in modern neural networks,
Hebbian learning does not explicitly minimize an error function or loss. It only
updates weights based on local activity patterns.
 Biological Plausibility: Hebbian learning closely models how neurons in the brain
might adjust their connections based on repeated activation, making it biologically
plausible. However, it is less powerful and less efficient for complex tasks compared
to more advanced supervised learning methods like backpropagation.
Applications of Hebbian Learning

Hebbian learning is primarily used in:

 Associative Memory Networks: In Hopfield networks and other forms of associative


memory, Hebbian learning helps create associations between patterns, allowing the
network to retrieve entire patterns based on partial inputs.
 Self-Organizing Maps (SOMs): Hebbian learning rules play a role in the
organization of neurons in self-organizing maps, where neurons adjust their
connections based on the input data's similarity.
 Unsupervised Learning: Since Hebbian learning doesn't require labeled data, it is
often used in unsupervised learning scenarios, where the network is trying to capture
the underlying structure of the input data without explicit guidance.

In modern neural networks, Hebbian learning is not commonly used for training large-scale
models like deep networks, but it remains an important concept in the study of biologically
inspired and unsupervised learning algorithms

Competitive learning in neural networks is an unsupervised learning process where neurons


in the network compete to be activated. Only one neuron (or a small subset) in the network
"wins" the competition and updates its weights, while the others do not. The key idea is that
neurons learn to specialize, responding strongly to different input patterns.

Key Concepts in Competitive Learning

1. Competition Among Neurons: During training, neurons in a specific layer compete


with each other based on their response to a given input. The neuron with the highest
activation wins and adjusts its weights to be more like the input. Other neurons do not
change their weights.
2. Winner-Takes-All Rule: In most competitive learning schemes, the neuron with the
strongest response to an input is the only one allowed to update its weights. This is
known as the winner-takes-all strategy. It ensures that different neurons learn to
specialize on different regions of the input space.
3. Unsupervised Learning: Competitive learning is typically used in unsupervised
learning, where there are no target outputs provided. The goal is for the network to
discover patterns or clusters in the input data.
4. Weight Adjustment: The neuron that wins the competition updates its weights
according to a learning rule. The most common rule is to make the weights more
similar to the input, bringing the neuron's response closer to the pattern it
"recognizes."

Weight update rule:

wj=wj+η(x−wj)w_j = w_j + \eta (x - w_j)wj=wj+η(x−wj)

Where:

o wjw_jwj is the weight vector of the winning neuron jjj,


o xxx is the input vector,
o η\etaη is the learning rate.

This rule moves the weight vector of the winning neuron closer to the input vector.

Mathematical Framework of Competitive Learning

The competitive learning process can be described in three main steps:

1. Activation of Neurons: The network computes the activations of all neurons based
on the input vector xxx. Typically, the activation is the dot product between the input
and the neuron's weight vector wjw_jwj:

aj=x⋅wja_j = x \cdot w_jaj=x⋅wj

Where:

aja_jaj is the activation of neuron jjj,


o
xxx is the input vector,
o
wjw_jwj is the weight vector of neuron jjj.
o
2. Competition: The neuron with the highest activation is selected as the winner:

j∗=arg max j(aj)j^* = \arg \max_j (a_j)j∗=argjmax(aj)

The winning neuron j∗j^*j∗ is the one whose weight vector is closest to the input
vector.

3. Weight Update: Only the winning neuron's weights are updated:

wj∗=wj∗+η(x−wj∗)w_{j^*} = w_{j^*} + \eta (x - w_{j^*})wj∗=wj∗+η(x−wj∗)

This adjustment moves the winning neuron's weights closer to the input, allowing it to
specialize in recognizing similar inputs in the future.

Types of Competitive Learning

1. Basic Competitive Learning: This is the simple winner-takes-all approach, where


only one neuron updates its weights based on each input.
2. Soft Competitive Learning: Instead of allowing only one neuron to win, soft
competitive learning allows multiple neurons to update their weights. However, the
neuron with the strongest activation updates its weights the most, and the others
update theirs to a lesser extent.
3. Kohonen's Self-Organizing Maps (SOMs): SOMs are a form of competitive
learning in which neurons are organized into a grid, and the winner neuron and its
neighbors are updated. This creates a topological map where similar input patterns
activate neighboring neurons, preserving the structure of the input space.
o Neighborhood Function: In SOMs, not just the winning neuron updates its
weights, but also its neighbors, with the size of the update decreasing with
distance from the winner.
Weight update rule for SOMs:

wij(t+1)=wij(t)+η(t)⋅hij(t)⋅[x(t)−wij(t)]w_{ij}(t+1) = w_{ij}(t) + \eta(t) \cdot h_{ij}(t)


\cdot [x(t) - w_{ij}(t)]wij(t+1)=wij(t)+η(t)⋅hij(t)⋅[x(t)−wij(t)]

Where:

o wij(t)w_{ij}(t)wij(t) is the weight of neuron at position (i,j)(i, j)(i,j),


o η(t)\eta(t)η(t) is the learning rate at time ttt,
o hij(t)h_{ij}(t)hij(t) is the neighborhood function, determining how much the
neighboring neurons update their weights,
o x(t)x(t)x(t) is the input vector at time ttt.
4. Learning Vector Quantization (LVQ): LVQ is a supervised extension of
competitive learning. After the competition, the winning neuron updates its weights
based on whether it was correctly or incorrectly activated for the given input class. If
the winner's response was correct, it moves closer to the input vector; if incorrect, it
moves away.

Applications of Competitive Learning

1. Clustering: Competitive learning can be used to discover clusters in the data, with
each neuron learning to represent one cluster. This is similar to k-means clustering,
where each neuron learns to recognize a centroid in the input space.
2. Dimensionality Reduction: Self-organizing maps (SOMs) can be used for
dimensionality reduction, where the high-dimensional input data is mapped to a
lower-dimensional grid while preserving the relationships between input patterns.
3. Feature Extraction: Neurons in competitive learning networks can learn to specialize
in recognizing different features of the input, making them useful for tasks like feature
extraction.
4. Data Compression: Neurons in a competitive learning network can serve as
codebook vectors for compressing data. Each input is represented by the winning
neuron’s weight vector, leading to lossy compression.

Advantages of Competitive Learning

 Specialization: Neurons specialize in recognizing different patterns, which is useful


for tasks like clustering or feature extraction.
 Unsupervised Learning: It does not require labeled data, making it useful for
discovering hidden patterns in data.
 Biological Plausibility: Competitive learning is inspired by biological neural systems,
where neurons compete to represent different sensory inputs.

Limitations of Competitive Learning

 Sensitive to Initialization: Competitive learning can be sensitive to the initial values


of the weight vectors. Poor initialization may lead to suboptimal solutions.
 Winner Takes All: The winner-takes-all approach may lead to some neurons never
updating their weights, effectively becoming inactive or dead neurons.
 Lack of Flexibility: In basic competitive learning, only the winning neuron updates
its weights, which may lead to slow learning if many neurons are involved in the
competition.

Conclusion

Competitive learning is a powerful unsupervised learning mechanism used for clustering,


feature extraction, and dimensionality reduction. By allowing neurons to compete and
specialize, the network can learn distinct patterns in the data. While simple in its formulation,
competitive learning has inspired more advanced techniques like self-organizing maps and
learning vector quantization, expanding its applicability in machine learning tasks.
Basics of Reinforcement Learning
Reinforcement learning (RL) is a subfield of machine learning that
focuses on how an agent can learn to make independent decisions in
an environment in order to maximize the reward. It’s inspired by the
way animals learn via the trial and error method. Furthermore, RL aims
to create intelligent agents that can learn to achieve a goal by
maximizing the cumulative reward.
In RL, an agent applies some actions to an environment. Based on the
action applied, the environment rewards the agent. After getting the
reward, the agents move to a different state and repeat this process.
Additionally, the reward can be positive as well as negative based on
the action taken by an agent:

The goal of the agent in reinforcement learning is to build an optimal


policy that maximizes the overall reward over time. This is typically done
using an iterative process. The agent interacts with the environment to
learn from experience and updates its policy to improve its decision-
making capability.

3. Credit Assignment Problem


The credit assignment problem (CAP) is a fundamental challenge in
reinforcement learning. It arises when an agent receives a reward for a
particular action, but the agent must determine which of its previous
actions led to the reward.
In reinforcement learning, an agent applies a set of actions in an
environment to maximize the overall reward. The agent updates its
policy based on feedback received from the environment. It typically
includes a scalar reward indicating the quality of the agent’s actions.
The credit assignment problem refers to the problem of measuring
the influence and impact of an action taken by an agent on future
rewards. The core aim is to guide the agents to take corrective actions
which can maximize the reward.
However, in many cases, the reward signal from the environment
doesn’t provide direct information about which specific actions the
agent should continue or avoid. This can make it difficult for the agent to
build an effective policy.
Additionally, there’re situations where the agent takes a sequence of
actions, and the reward signal is only received at the end of the
sequence. In these cases, the agent must determine which of its
previous actions positively contributed to the final reward.
It can be difficult because the final reward may be the result of a long
sequence of actions. Hence, the impact of any particular action on the
overall reward is difficult to discern.

4. Example
Let’s take a practical example to demonstrate the credit assignment
problem.
Suppose an agent is playing a game where it must navigate a maze to
reach the goal state. We place the agent in the top left corner of the
maze. Additionally, we set the goal state in the bottom right corner. The
agent can move up, down, left, right, or diagonally. However, it can’t
move through the states containing stone:
As the agent explores the maze, it receives a reward of +10 for reaching
the goal state. Additionally, if it hits a stone, we penalize the action by
providing a -10 reward. The goal of the agent is to learn from the
rewards and build an optimal policy that maximizes the gross reward
over time.
The credit assignment problem arises when the agent reaches the goal
after several steps. The agent receives a reward of +10 as soon as it
reaches the goal state. However, it’s not clear which actions are
responsible for the reward. For example, suppose the agent took a long
and winding path to reach the goal. Therefore, we need to determine
which actions should receive credit for the reward.
Additionally, it’s challenging to decide whether to credit the last action
that took it to the goal or credit all the actions that led up to the goal.
Let’s look at some paths which lead the agent to the goal state:

As we can see here, the agent can reach the goal state with three
different paths. Hence, it’s challenging to measure the influence of
each action. We can see the best path to reach the goal state is path 1.
Hence, the positive impact of the agent moving from state 1 to state 5
by applying the diagonal action is higher than any other action from
state 1. This is what we want to measure so that we can make optimal
policies like path 1 in this example.

5. Solutions
The credit assignment problem is a vital challenge in reinforcement
learning. Let’s talk about some popular approaches for solving the
credit assignment problem. Here we’ll present three popular
approaches: temporal difference (TD) learning, Monte Carlo methods,
and eligibility traces method.
TD learning is a popular RL algorithm that uses a bootstrapping
approach to assign credit to past actions. It updates the value function
of the policy based on the difference between the predicted reward and
the actual reward received at each time step. By bootstrapping the
value function from the predicted rewards of future states, TD learning
can assign credit to past actions even when the reward is delayed.
Monte Carlo methods are a class of RL algorithms that use full episodes
of experience to assign credit to past actions. These methods estimate
the expected value of a state by averaging the rewards obtained in the
episodes that pass through that state. By averaging the rewards
obtained over several episodes, Monte Carlo methods can assign credit
to actions that led up to the reward, even if the reward is delayed.
Eligibility traces are a method for assigning credit to past actions based
on their recent history. Eligibility traces keep track of the recent history
of state-action pairs and use a decaying weight to assign credit to each
pair based on how recently it occurred. By decaying the weight of older
state-action pairs, eligibility traces can assign credit to actions that led
up to the reward, even if they occurred several steps earlier.

6. Conclusion
In this tutorial, we discussed the credit assignment problem in
reinforcement learning with an example. Finally, we presented three
popular solutions that can solve the credit assignment problem.

The credit assignment problem in neural networks refers to the challenge of determining
how to appropriately assign "credit" or "blame" to individual neurons for their contributions
to the overall network's error. In other words, when a neural network makes an error or
successfully achieves a task, how do we determine which parts of the network (which
neurons, layers, or weights) are responsible for that outcome? This is crucial for guiding
weight updates during the learning process.

Key Concepts

1. Global Error, Local Responsibility: In neural networks, the output is typically


influenced by the collective activity of many neurons, especially in deep networks
with multiple layers. The credit assignment problem arises when trying to understand
how each neuron or weight contributed to the overall error, which is global to the
network.
2. Backward Propagation of Error: In feedforward networks, this problem is tackled
by backpropagation, which computes the gradient of the error with respect to each
weight. Backpropagation solves the credit assignment problem by efficiently
propagating the error backward through the network to assign credit to each neuron.
3. Temporal Credit Assignment: In recurrent neural networks (RNNs), where the
output at one time step depends on inputs from previous time steps, the credit
assignment problem becomes more complex. Here, credit must be assigned not just
spatially (across the network's layers) but also temporally (across different time
steps). The challenge is to determine which past activations are responsible for the
network's error at the current time step.
4. Credit Assignment in Reinforcement Learning: In reinforcement learning, the
credit assignment problem is even more challenging because rewards are often
delayed. The agent receives feedback (reward or punishment) long after making a
series of actions, so it becomes difficult to determine which specific actions or
decisions contributed to the outcome. Techniques like temporal difference (TD)
learning and Q-learning are used to address this by spreading the credit across the
series of actions leading up to the reward.

Types of Credit Assignment Problems

1. Structural (Spatial) Credit Assignment:


o This refers to the problem of determining how to distribute the credit for an
outcome across different parts of the network, such as neurons or weights in
different layers. For instance, in a deep network, how do we assign credit to
neurons in hidden layers for an error in the output layer?
o Backpropagation is the primary method used in neural networks to handle
structural credit assignment by computing the gradient of the error with
respect to each parameter (weight).
2. Temporal Credit Assignment:
o This refers to determining how to assign credit to actions or neurons that
occurred at different points in time, particularly in recurrent neural networks
or systems with memory.
o Backpropagation Through Time (BPTT) is the technique used in RNNs to
address temporal credit assignment by unrolling the network across time and
propagating the error backward through the time steps.

Solutions to the Credit Assignment Problem

1. Backpropagation (for Feedforward Networks):


o In traditional feedforward networks, the backpropagation algorithm solves
the credit assignment problem by computing the gradient of the error with
respect to each weight. This allows for assigning the correct "credit" or
"blame" to each connection between neurons.
o During backpropagation, the error at the output layer is propagated backward
through the network, with each neuron receiving an update to its weights
proportional to its contribution to the error.

The weight update rule is:

Where:
o η\etaη is the learning rate,
o EEE is the total error of the network,
o wijw_{ij}wij is the weight connecting neuron iii to neuron jjj.
2. Backpropagation Through Time (BPTT) for RNNs:
o In recurrent neural networks (RNNs), the network is unrolled through time,
and the backpropagation algorithm is applied to each time step. This allows
the network to assign credit to previous time steps, solving the temporal
credit assignment problem.
o BPTT propagates the error from the final time step backward through the
network across previous time steps.
3. Reinforcement Learning (RL) Algorithms:
o In RL, credit assignment is especially challenging due to the delayed nature of
rewards. Several algorithms handle this:
 Temporal Difference (TD) Learning: TD learning adjusts the value
of a state-action pair based on the difference between the predicted
value of the current state and the observed reward plus the value of the
next state.
 Q-Learning: Q-learning is an off-policy TD control algorithm that
estimates the value of actions to maximize future rewards.
 Policy Gradient Methods: These methods adjust the policy directly
based on rewards obtained, assigning credit to actions taken.
4. Attention Mechanisms:
o In modern deep learning, particularly in sequence models, attention
mechanisms help address the credit assignment problem by allowing the
model to focus on relevant parts of the input when making a decision. This
gives the network a more interpretable way of assigning credit to specific parts
of the input sequence.

Challenges in Credit Assignment

1. Vanishing and Exploding Gradients:


o In deep networks or RNNs, the gradients propagated during backpropagation
can either shrink to near-zero (vanishing gradients) or grow excessively large
(exploding gradients). This makes it difficult to properly assign credit to
earlier layers or time steps. Various techniques, such as gradient clipping, the
use of LSTMs (Long Short-Term Memory), and batch normalization, help
mitigate this issue.
2. Delayed Rewards:
o In reinforcement learning, delayed rewards make it challenging to correctly
assign credit to actions that contributed to the final reward. Methods like TD
learning and eligibility traces are used to distribute credit across time.

Conclusion

The credit assignment problem is central to learning in neural networks, as it determines


how to update weights to improve the network's performance. In feedforward networks,
backpropagation efficiently handles this problem, but for more complex architectures like
recurrent neural networks or reinforcement learning scenarios, more advanced techniques like
BPTT, temporal difference learning, and attention mechanisms are needed to accurately
assign credit to neurons or actions. Addressing this problem effectively is key to optimizing
network performance and achieving generalization in neural network models.

What is an Activation Function?


An activation function in the context of neural networks is a
mathematical function applied to the output of a neuron. The purpose of
an activation function is to introduce non-linearity into the model,
allowing the network to learn and represent complex patterns in the
data. Without non-linearity, a neural network would essentially behave
like a linear regression model, regardless of the number of layers it has.
The activation function decides whether a neuron should be activated or
not by calculating the weighted sum and further adding bias to it. The
purpose of the activation function is to introduce non-linearity into the
output of a neuron.
Explanation: We know, the neural network has neurons that work in
correspondence with weight, bias, and their respective activation
function. In a neural network, we would update the weights and biases of
the neurons on the basis of the error at the output. This process is known
as back-propagation. Activation functions make the back-propagation
possible since the gradients are supplied along with the error to update
the weights and biases.
Elements of a Neural Network
Input Layer: This layer accepts input features. It provides information
from the outside world to the network, no computation is performed at
this layer, nodes here just pass on the information(features) to the hidden
layer.
Hidden Layer: Nodes of this layer are not exposed to the outer world,
they are part of the abstraction provided by any neural network. The
hidden layer performs all sorts of computation on the features entered
through the input layer and transfers the result to the output layer.
Output Layer: This layer bring up the information learned by the network
to the outer world.
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear
regression model. The activation function does the non-linear
transformation to the input making it capable to learn and perform more
complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-

Elements of the diagram are as follows:


Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
Here,
 z(1) is the vectorized output of layer 1
 W(1) be the vectorized weights assigned to neurons of hidden
layer i.e. w1, w2, w3 and w4
 X be the vectorized input features i.e. i1 and i2
 b is the vectorized bias assigned to neurons in hidden layer i.e.
b1 and b2
 a(1) is the vectorized form of any linear function.
(Note: We are not considering activation function here)

Layer 2 i.e. output layer :-


Note : Input for layer 2 is output from layer 1
z(2) = W(2)a(1) + b(2)
a(2) = z(2)
Calculation at Output layer
z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
Let,
[W(2) * W(1)] = W
[W(2)*b(1) + b(2)] = b
Final output : z(2) = W*X + b
which is again a linear function
This observation results again in a linear function even after applying a
hidden layer, hence we can conclude that, doesn’t matter how many
hidden layer we attach in neural net, all layers will behave same way
because the composition of two linear function is a linear function
itself. Neuron can not learn with just a linear function attached to it. A
non-linear activation function will let it learn as per the difference w.r.t
error. Hence we need an activation function.
Variants of Activation Function
Linear Function
 Equation : Linear function has the equation similar to as of a
straight line i.e. y = x
 No matter how many layers we have, if all are linear in nature,
the final activation function of last layer is nothing but just a
linear function of the input of first layer.
 Range : -inf to +inf
 Uses : Linear activation function is used at just one place i.e.
output layer.
 Issues : If we will differentiate linear function to bring non-
linearity, result will no more depend on input “x” and function
will become constant, it won’t introduce any ground-breaking
behavior to our algorithm.
For example : Calculation of price of a house is a regression problem.
House price may have any big/small value, so we can apply linear
activation at output layer. Even in this case neural net must have any
non-linear function at hidden layers.
Sigmoid Function
 It is a function which is plotted as ‘S’ shaped graph.
 Equation : A = 1/(1 + e-x)
 Nature : Non-linear. Notice that X values lies between -2 to 2, Y
values are very steep. This means, small changes in x would
also bring about large changes in the value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification,
where result is either 0 or 1, as value for sigmoid function lies
between 0 and 1 only so, result can be predicted easily to be 1 if
value is greater than 0.5 and 0 otherwise.
Tanh Function

 The activation that works almost always better than sigmoid


function is Tanh function also known as Tangent Hyperbolic
function. It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived from each
other.
 Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s
values lies between -1 to 1 hence the mean for the hidden layer
comes out be 0 or very close to it, hence helps in centering the
data by bringing mean close to 0. This makes learning for the
next layer much easier.
RELU Function
 It Stands for Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of
Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive
and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate
the errors and have multiple layers of neurons being activated
by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations. At
a time only a few neurons are activated making the network
sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh
function.
Softmax Function
The softmax function is also a type of sigmoid function but is handy
when we are trying to handle multi- class classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. the
softmax function was commonly found in the output layer of
image classification problems.The softmax function would
squeeze the outputs for each class between 0 and 1 and would
also divide by the sum of the outputs.
 Output:- The softmax function is ideally used in the output layer
of the classifier where we are actually trying to attain the
probabilities to define the class of each input.
 The basic rule of thumb is if you really don’t know what
activation function to use, then simply use RELU as it is a
general activation function in hidden layers and is used in most
cases these days.
 If your output is for binary classification then, sigmoid function is
very natural choice for output layer.
 If your output is for multi-class classification then, Softmax is
very useful to predict the probabilities of each classes.

The Least Mean-Squares (LMS) algorithm is a widely used adaptive filter


technique in neural networks, signal processing, and control systems.
Developed by Bernard Widrow and Ted Hoff in 1960, the LMS algorithm
is a stochastic gradient descent method that iteratively updates filter
coefficients to minimize the mean square error between the desired and
actual signals. This article provides a detailed technical overview of the
LMS algorithm, its applications, and its significance in neural networks.

Introduction to Least Mean-Squares (LMS) Algorithm


The Least Mean Squares (LMS) method is an adaptive algorithm widely
used for finding the coefficients of a filter that will minimize the mean
square error between the desired signal and the actual signal. It is mainly
utilized in training algorithms such as gradient descent, where the
network finalizes a target function by iteratively adjusting its weights
w.r.t. the error between predicted and actual outputs.
Neural networks are composed of simple input/output units called
neurons. The input and output units in a neural network are
interconnected, and each connection has an associated weight. It can be
used for both classification and regression. In this article, we will discuss
the least mean-square algorithm and how to construct a neural network
based on the LMS algorithm.
Key Concepts:
 Adaptive Filtering: Adaptive filters adjust their coefficients
based on the input signal. The LMS algorithm is an example of
an adaptive filter.
 Mean Square Error (MSE): This is the criterion the LMS
algorithm aims to minimize. MSE is the expectation of the square
of the error signal.
 Error Signal (e(n)): The difference between the desired signal
(d(n)) and the output of the filter (y(n)). e(n) = d(n) – x^T(n)w(n)
 Filter Coefficients (w(n)): The parameters of the filter that are
updated iteratively to minimize the MSE.
Mathematical Foundation of LMS algorithm
LMS algorithm is based on using the immediate values of cost
function \varepsilon(w) which can be expressed in terms of error signal
as:
where,
 e(n) is the error calculated by the difference between desired
and target output values.
 e^2(n) is used here because we are using squared error
and \frac{1}{2} is used for simplifying the calculation purpose.
Error e(n) = d(n) – x^T(n)w(n)
where
 d(n)-desired output value
 x^T(n)– transpose of input vector x
 w(n) – weight vector
So, cost function can be written as:
\varepsilon (w) = \frac{1}{2}(d(n)-x^T(n)w(n))^2
differentiating the cost function \varepsilon with respect to weight vector
w:

LMS Algorithm in Steepest Descent Method


Steepest descent method is a general optimization technique used to
find the minimum of a function. It iteratively updates the parameters in
the direction of the negative gradient of cost function.
In this method, weight updating works as:
w(n+1) = w(n) – \eta g(n) where \eta is the learning rate parameter and
g(n) is the gradient evaluated at vector point w(n)
Convergence Consideration of LMS algorithm
Convergence refers to the algorithm’s ability to reach a steady state
where the error signal becomes minimal. For the LMS algorithm,
convergence behavior of LMS algorithm depends on the input vector x(n)
and learning rate parameter value \eta.
Conditions for Convergence in Mean-Square:
LMS algorithm is convergent in mean-square when \eta satisfies the
following conditions:
 0<\eta < \frac{2}{\lambda_{max}}
 \lambda_{max} is the largest eigenvalue of correlation
matrix R_x
 0<\eta<\frac{2}{tr[R_x]} where tr[R_x] is the trace of correlation
matrix R_x
Stability of the LMS Algorithm
Stability refers to the algorithm’s ability to produce bounded outputs
over time. For the LMS algorithm, stability is closely related to the
learning rate parameter value \eta.
For stability , \eta must satisfy the following condition:
0<\eta<\frac{1}{\lambda_{max}}
This condition ensures that the weight updates do not diverge and
remain bounded.
Stability also involves considering the trade-off between convergence
speed and steady-state error. A larger \eta leads to the faster
convergence but higher steady-state error, while a smaller \eta results in
slower convergence but lower steady-state error.
Workings of Least Mean-Squares Algorithm in
Neural Networks
1. Initialization: Set the initial filter coefficients to zero and define other
necessary parameters such as learning rate parameter \eta and number
of iterations.
2. Iteration (for each time step n):
1. Compute the Filter Output: y(n) = w^T(n).x(n) : This is the
output of the filter for the input signal x(n) using the current
filter coefficients w(n).
2. Compute the Error Signal: e(n) = d(n)-y(n) : The error signal is
the difference between the desired signal d(n) and the actual
output y(n) of the filter.
3. Update the Filter Coefficients: w(n+1) = w(n) + \eta x(n)
x^T(n) : The filter coefficients are updated using the error signal,
the step-size parameter, and the input signal. This update rule is
derived from the gradient descent optimization method, aiming
to minimize the mean squared error.
Let’s discuss the signal flow graph. The diagram is shown below:

Signal flow graph


Explanation of the Signal Flow Graph:
 The signal \eta x(n) d(n) represents the scaled product of the
input signal and the desired signal.
 The summing junction calculates the error signal e(n) by
subtracting the product of the filter output and the input signal
from the desired signal.
 The product \eta x(n) x^T(n) is used to update the filter
coefficients w(n+1).
 The delay element z^{-1} represents the update of the filter
coefficients for the next time step.
Important points about LMS algorithm:
1. Feedback loop of weight vector w'(n) acts as low pass filter. It
allows low frequency error signal components to pass while
reducing high frequency ones.
2. Average time constant of this filtering is inversely proportional
to \eta.
3. In steepest-descent algorithm, w(n) follows the well-defined
path but in LMS algorithm, w'(n) follows random path.
4. LMS algorithm is also known as “Stochastic gradient algorithm”.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy