0% found this document useful (0 votes)
1K views35 pages

Notes of ANN

Uploaded by

Sahil Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views35 pages

Notes of ANN

Uploaded by

Sahil Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Notes of ANN

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks
that develop the structure of a human brain. Similar to the human brain that has
neurons interconnected to one another, artificial neural networks also have neurons
that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

1
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it


attempts to mimic the network of neurons makes up a human brain so that
computers will have an option to understand things and make decisions in a
human-like manner. The artificial neural network is designed by programming
computers to behave simply like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human
brain, data is stored in such a manner as to be distributed, and we can extract more
than one piece of this data when necessary from our memory parallelly. We can
say that the human brain is made up of incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an


example of a digital logic gate that takes an input and gives an output. "OR" gate,
which takes two inputs. If one or both the inputs are "On," then we get "On" in
output. If both the inputs are "Off," then we get "Off" in output. Here the output
depends upon input. Our brain does not perform the same task. The outputs to
inputs relationship keep changing because of the neurons in our brain, which are
"learning."

The architecture of an artificial neural network:


2
To understand the concept of the architecture of an artificial neural network, we
have to understand what a neural network consists of. In order to define a neural
network that consists of a large number of artificial neurons, which are termed
units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.

3
The artificial neural network takes input and computes the weighted sum of the
inputs and includes a bias. This computation is represented in the form of a transfer
function.

It determines weighted total is passed as an input to an activation function to


produce the output. Activation functions choose whether a node should fire or not.
Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are
performing.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one
task simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on
a database. The disappearance of a couple of pieces of data in one place doesn't
prevent the network from working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate
data. The loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to
the chosen instances, and if the event can't appear to the network in all its aspects,
it can produce false output.

4
Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural


networks. The appropriate network structure is accomplished through experience,
trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it
does not provide insight concerning why and how. It decreases trust in the
network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per
their structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be
resolved here will directly impact the performance of the network. It relies on the
user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give
us optimum results.

Science artificial neural networks that have steeped into the world in the mid-
20th century are exponentially developing. In the present time, we have
investigated the pros of artificial neural networks and the issues encountered in
the course of their utilization. It should not be overlooked that the cons of ANN

5
networks, which are a flourishing science branch, are eliminated individually, and
their pros are increasing day by day. It means that artificial neural networks will
turn into an irreplaceable part of our lives progressively important.

How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph,


where the artificial neurons form the nodes. The association between the neurons
outputs and neuron inputs can be viewed as the directed edges with weights. The
Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector. These inputs are then
mathematically assigned by the notations x(n) for every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these


weights are the details utilized by the artificial neural networks to solve a specific
problem ). In general terms, these weights normally represent the strength of the
interconnection between neurons inside the artificial neural network. All the
weighted inputs are summarized inside the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-
zero or something else to scale up to the system's response. Bias has the same
input, and weight equals to 1. Here the total of weighted inputs can be in the range
of 0 to positive infinity. Here, to keep the response in the limits of the desired
value, a certain maximum value is benchmarked, and the total of weighted inputs is
passed through the activation function.
6
The activation function refers to the set of transfer functions used to achieve the
desired output. There is a different kind of the activation function, but primarily
either linear or non-linear sets of functions. Some of the commonly used sets of
activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation
functions. Let us take a look at each of them in details:

Binary:

In binary activation function, the output is either a one or a 0. Here, to accomplish


this, there is a threshold value set up. If the net weighted input of neurons is more
than 1, then the final output of the activation function is returned as one or else the
output is returned as 0.

Sigmoidal Hyperbolic:

The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here
the tan hyperbolic function is used to approximate output from the actual net input.
The function is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Network Topology

A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the
following kinds −

Feedforward Network

It is a non-recurrent network having processing units/nodes in layers and all the


nodes in a layer are connected with the nodes of the previous layers. The
connection has different weights upon them. There is no feedback loop means the
signal can only flow in one direction, from input to output. It may be divided into
the following two types −
 Single layer feedforward network − The concept is of feedforward ANN
having only one weighted layer. In other words, we can say the input layer
is fully connected to the output layer.

7
 Multilayer feedforward network − The concept is of feedforward ANN
having more than one weighted layer. As this network has one or more
layers between the input and the output layer, it is called hidden layers.

Feedback Network

As the name suggests, a feedback network has feedback paths, which means the
signal can flow in both directions using loops. This makes it a non-linear dynamic
system, which changes continuously until it reaches a state of equilibrium. It may
be divided into the following types −
 Recurrent networks − They are feedback networks with closed loops.
Following are the two types of recurrent networks.

 Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as
both input and output.

8
 Jordan network − It is a closed loop network in which the output will go to
the input again as feedback as shown in the following diagram.

Characteristics of Artificial Neural Network


 It is neurally implemented mathematical model.
 It contains huge number of interconnected processing elements called neurons
to do all operations.
 Information stored in the neurons are basically the weighted linkage of neurons.
 The input signals arrive at the processing elements through connections and
connecting weights..
 It has the ability to learn , recall and generalize from the given data by suitable
assignment and adjustment of weights.
 The collective behavior of the neurons describes its computational power, and no
single neuron carries specific information .

Application of Neural Network

9
1.Every new technology need assistance from previous one i.e. data from previous
ones and these data are analyzed so that every pros and cons should be studied
correctly . All of these things are possible only through the help of neural network.

2. Neural network is suitable for the research on Animal behavior, predator/prey


relationships and population cycles .

3. It would be easier to do proper valuation of property, buildings, automobiles,


machinery etc. with the help of neural network.

4. Neural Network can be used in betting on horse races, sporting events and most
importantly in stock market .

5. It can be used to predict the correct judgement for any crime by using a large data of
crime details as input and the resulting sentences as output.

6. By analyzing data and determining which of the data has any fault ( files diverging
from peers ) called as Data mining, cleaning and validation can be achieved through
neural network.

7. Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments .

8. It can be used efficiently in Employee hiring so that any company can hire right
employee depending upon the skills the employee has and what should be it’s
productivity in future .

9. It has a large application in Medical Research .

10. It can be used to for Fraud Detection regarding credit cards , insurance or taxes by


analyzing the past records .

Difference between Biological Neurons and Artificial Neurons

10
BIOLOGICAL NEURONS ARTIFICIAL NEURONS

Major components: Axions, Dendrites, Major Components: Nodes, Inputs, Outputs,

Synapse Weights, Bias

The arrangements and connections of the

neurons made up the network and have three

layers. The first layer is called the input layer

and is the only layer exposed to external

Information from other neurons, in the form signals. The input layer transmits signals to

of electrical impulses, enters the dendrites the neurons in the next layer, which is called a

at connection points called synapses. The hidden layer. The hidden layer extracts

information flows from the dendrites to the relevant features or patterns from the received

cell where it is processed. The output signals. Those features or patterns that are

signal, a train of impulses, is then sent considered important are then directed to the

down the axon to the synapse of other output layer, which is the final layer of the

neurons. network.

A synapse is able to increase or decrease The artificial signals can be changed by

the strength of the connection. This is weights in a manner similar to the physical

where information is stored. changes that occur in the synapses.

Approx 1011 neurons. 102– 104 neurons with current technology

Neurons

Biological Neurons (also called nerve cells) or simply neurons are the fundamental
units of the brain and nervous system, the cells responsible for receiving sensory

11
input from the external world via dendrites, process it and gives the output through
Axons.

A biological Neuron

Cell body (Soma): The body of the neuron cell contains the nucleus and carries out
biochemical transformation necessary to the life of neurons.

Dendrites: Each neuron has fine, hair-like tubular structures (extensions) around it.
They branch out into a tree around the cell body. They accept incoming signals.

Axon: It is a long, thin, tubular structure that works like a transmission line.

Synapse: Neurons are connected to one another in a complex spatial arrangement.


When axon reaches its final destination it branches again called terminal
arborization. At the end of the axon are highly complex and specialized structures
called synapses. The connection between two neurons takes place at these synapses.

Dendrites receive input through the synapses of other neurons. The soma processes
these incoming signals over time and converts that processed value into an output,
which is sent out to other neurons through the axon and the synapses.

A single layer neural network is called a Perceptron. It gives a single output.

12
Perceptron

In the above figure, for one single observation, x0, x1, x2, x3...x(n) represents
various inputs(independent variables) to the network. Each of these inputs is
multiplied by a connection weight or synapse. The weights are represented as w0,
w1, w2, w3….w(n) . Weight shows the strength of a particular node.

b is a bias value. A bias value allows you to shift the activation function up or
down.

In the simplest case, these products are summed, fed to a transfer function
(activation function) to generate a result, and this result is sent as output.

Mathematically, x1.w1 + x2.w2 + x3.w3 ...... xn.wn = ∑ xi.wi

Now activation function is applied 𝜙(∑ xi.wi)

Multi-layer ANN

13
A fully connected multi-layer neural network is called a Multilayer Perceptron
(MLP).

It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it is
called a deep ANN.

An MLP is a typical example of a feedforward artificial neural network.


In this figure, the ith activation unit in the lth layer is denoted as ai(l)
.
The number of layers and the number of neurons are referred to as
hyperparameters of a neural network, and these need tuning. Cross-validation
techniques must be used to find ideal values for these.

The weight adjustment training is done via backpropagation. Deeper neural


networks are better at processing data. However, deeper layers can lead to
vanishing gradient problem. Special algorithms are required to solve this issue.

Notations

14
In the representation below:

 ai(in) refers to the ith value in the input layer

 ai(h) refers to the ith unit in the hidden layer

 ai(out) refers to the ith unit in the output layer

 ao(in) is simply the bias unit and is equal to 1; it will have the corresponding
weight w0

 The weight coefficient from layer l to layer l+1 is represented by wk,j(l)

A simplified view of the multilayer is presented here.


This image shows a fully connected three-layer neural network with 3 input
neurons and 3 output neurons. A bias term is added to the input vector.
A computational model of a neuron
In logistic regression, we composed a linear model z(x)z(x) with the logistic
function g(z)g(z) to form our predictor. This linear model was a combination of
feature inputs xixi and weights wiwi.

z(x)=w1x1+w2x2+w3x3+w4x4+b=wTx+bz(x)=w1x1+w2x2+w3x3+w4x4+b=wT
x+b

The first layer contains a node for each value in our input feature vector. These
values are scaled by their corresponding weight, wiwi, and added together along
with a bias term, bb. The bias term allows us to build linear models that aren't fixed
at the origin. The following image provides an example of why this is important.
Notice how we can provide a much better decision boundary for logistic regression
when our linear model isn't fixed at the origin.

15
Let's try to visualize that.

16
The input nodes in our network visualization are all connected to a single output
node, which consists of a linear combination of all of the inputs. Each connection
between nodes contains a parameter, ww, which is what we'll tune to form an
optimal model (tuning these parameters will be covered in a later post). The final
output is functional composition, g(z(x))g(z(x)). When we pass the linear
combination of inputs through the logistic (also known as sigmoid) function, the
neural network community refers to this as activation.

Perceptron Learning Rule

This rule is an error correcting the supervised learning algorithm of single layer
feedforward networks with linear activation function, introduced by Rosenblatt.
Basic Concept − As being supervised in nature, to calculate the error, there would
be a comparison between the desired/target output and the actual output. If there is
any difference found, then a change must be made to the weights of connection.
Mathematical Formulation − To explain its mathematical formulation, suppose
we have ‘n’ number of finite input vectors, xnn, along with its desired/target
output vector tnn, where n = 1 to N.
Now the output ‘y’ can be calculated, as explained earlier on the basis of the net
input, and activation function being applied over that net input can be expressed
as follows −

y=f(yin)={1, yin > θ

0, yin ⩽ θ
Where θ is threshold.
The updating of weight can be done in the following two cases −
Case I − when t ≠ y, then
w(new)=w(old)+tx

Case II − when t = y, then


No change in weight
Delta Learning Rule Widrow−HoffRuleWidrow−HoffRule

17
It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean
Square LMSLMS method, to minimize the error over all training patterns. It is
kind of supervised learning algorithm with having continuous activation function.
Basic Concept − The base of this rule is gradient-descent approach, which
continues forever. Delta rule updates the synaptic weights so as to minimize the
net input to the output unit and the target value.
Mathematical Formulation − To update the synaptic weights, delta rule is given
by
Δwi=α.xi.ej

Here Δwi= weight change for ith ⁡pattern;

α = the positive and constant learning rate;

xi= the input value from pre-synaptic neuron;

ej = (t−yin) the difference between the desired/target output and the actual output
⁡yin
The above delta rule is for a single output unit only.
The updating of weight can be done in the following two cases −
Case-I − when t ≠ y, then
w(new)=w(old)+Δw

Case-II − when t = y, then


No change in weight

Adjustments of Weights or Learning

Learning, in artificial neural network, is the method of modifying the weights of


connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning,
and reinforcement learning.

Supervised Learning

As the name suggests, this type of learning is done under the supervision of a
teacher. This learning process is dependent.

18
During the training of ANN under supervised learning, the input vector is
presented to the network, which will give an output vector. This output vector is
compared with the desired output vector. An error signal is generated, if there is a
difference between the actual output and the desired output vector. On the basis of
this error signal, the weights are adjusted until the actual output is matched with
the desired output.

Unsupervised Learning

As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of
similar type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to which the
input pattern belongs.
There is no feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the network
itself must discover the patterns and features from the input data, and the relation
for the input data over the output.

19
Reinforcement Learning

As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to
supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives
some feedback from the environment. This makes it somewhat similar to
supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.

Activation Functions

It may be defined as the extra force or effort applied over the input to obtain an
exact output. In ANN, we can also apply activation functions over the input to get
the exact output. Followings are some activation functions of interest −

Linear Activation Function

It is also called the identity function as it performs no input editing. It can be


defined as −
F(x)=x

Sigmoid Activation Function

20
It is of two type as follows −
 Binary sigmoidal function − This activation function performs input
editing between 0 and 1. It is positive in nature. It is always bounded, which
means its output cannot be less than 0 and more than 1. It is also strictly
increasing in nature, which means more the input higher would be the
output. It can be defined as

F(x) = sigm(x) = 1/1+exp(−x)


 Bipolar sigmoidal function − This activation function performs input
editing between -1 and 1. It can be positive or negative in nature. It is
always bounded, which means its output cannot be less than -1 and more
than 1. It is also strictly increasing in nature like sigmoid function. It can be
defined as

F(x) = 0sigm(x) = 2/1+exp(−x) −1 = 1−exp(x) /1+exp(x)

Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the
basic operational unit of artificial neural networks. It employs supervised learning rule
and is able to classify the data into two classes.
Operational characteristics of the perceptron: It consists of a single neuron with an
arbitrary number of inputs along with adjustable weights, but the output of the neuron is
1 or 0 depending upon the threshold. It also consists of a bias whose weight is always
1. Following figure gives a schematic representation of the perceptron.

Perceptron thus has the following three basic elements −

21
 Links − It would have a set of connection links, which carries a weight including a bias
always having weight 1.
 Adder − It adds the input after they are multiplied with their respective weights.
 Activation function − It limits the output of neuron. The most basic activation function is a
Heaviside step function that has two possible outputs. This function returns 1, if the input is
positive, and 0 for any negative input.

Training Algorithm

Perceptron network can be trained for single output unit as well as multiple output
units.

Training Algorithm for Single Output Unit

Step 1 − Initialize the following to start the training −

 Weights
 Bias
 Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the
learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
Xi = si(I =1 to n)
Step 5 − Now obtain the net input with the following relation −

Yin = b+∑ni xi.wi

Here ‘b’ is bias and ‘n’ is the total number of input neurons.


Step 6 − Apply the following activation function to obtain the final output.
f(yin)={ 1 if yin > θ
0 if –θ ⩽ yin ⩽ θ
−1 if yin < − θ }
Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,

22
wi(new) = wi(old) + αtxi
b(new) = b(old) + αtb(new)
Case 2 − if y = t then,
wi(new)=wi(old)
b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no change
in weight.

Training Algorithm for Multiple Output Units

The following diagram is the architecture of perceptron for multiple output classes.

Step 1 − Initialize the following to start the training −

 Weights
 Bias
 Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the
learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −

23
Xi = si(I = 1to n)
Step 5 − Obtain the net input with the following relation −

Yin = b+ ∑in xiwij

Here ‘b’ is bias and ‘n’ is the total number of input neurons.


Step 6 − Apply the following activation function to obtain the final output for each
output unit j = 1 to m –
f(yin)={ 1 if yin > θ
0 if –θ ⩽ yin ⩽ θ
−1 if yin < − θ }

Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −
Case 1 − if yj ≠ tj then,
wij(new)=wij(old)+αtj
bj(new)=bj(old)+αtj
Case 2 − if yj = tj then,
wij(new)=wij(old)
bj(new)=bj(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in
weight.

ANN Classification and Prediction Overview

ANN Classification, in GeneLinker™, is the process of learning to separate


samples into different classes by finding common features between samples of
known classes. For example, a set of samples may be taken from biopsies of two
different tumor types, and their gene expression levels measured. GeneLinker™
can use this data to learn to distinguish the two tumor types so that later,
GeneLinker™ can diagnose the tumor types of new biopsies. Because making
predictions on unknown samples is often used as a means of testing the ANN
classifier, we use the terms training samples and test samples to distinguish
24
between the samples of which GeneLinker™ knows the classes (training), and
samples of which GeneLinker™ will predict the classes (test).
 

Types of Learning
ANN Classification is an example of Supervised Learning. Known class labels
help indicate whether the system is performing correctly or not. This information
can be used to indicate a desired response, validate the accuracy of the system, or
be used to help the system learn to behave correctly. The known class labels can be
thought of as supervising the learning process; the term is not meant to imply that
you have some sort of interventionist role.
Clustering is an example of Unsupervised Learning where the class labels are
not presented to the system that is trying to discover the natural classes in a dataset.
Clustering often fails to find known classes because the distinction between the
classes can be obscured by the large number of features (genes) which are
uncorrelated with the classes. A step in ANN classification involves identifying
genes which are intimately connected to the known classes. This is called feature
selection or feature extraction. Feature selection and ANN classification together
have a use even when prediction of unknown samples is not necessary: They can
be used to identify key genes which are involved in whatever processes distinguish
the classes
Manual Feature Selection
Manual feature selection is useful if you already have some hypothesis about
which genes are key to a process. You can test that hypothesis by:
i. constructing a gene list of those genes,

ii. running an ANN classifier using those genes as features, and

iii. displaying a plot which shows whether the data can be successfully


classified.

 
Feature Selection Using the SLAM™ Technology
The genes that are frequently observed in associations are frequently good features
for classification with artificial neural networks. In GeneLinker™, ANN
classification is done using a committee of artificial neural networks (ANNs).
ANNs are highly adaptable learning machines which can detect non-linear

25
relationships between the features and the sample classes. A committee of ANNs is
used because an individual ANN may not be robust. That is, it may not make good
predictions on new data (test data) despite excellent performance on the training
data. Such a neural network is referred to as being overtrained.
Each ANN (component neural network or learner) is by default trained on a
different 90% of the training data and then validated on the remaining 10%. (These
fractions can be set differently in the Create ANN Classifier dialog by varying the
number of component neural networks.) This technique mitigates the risk of
overtraining at the level of the individual component neural network.

26
Unit-2nd
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists
of a single layer which contains one or more fully connected recurrent neurons.
The Hopfield network is commonly used for auto-association and optimization
tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words, it


can be said the input and output patterns are discrete vector, which can be either
binary 0,10,1 or bipolar +1,−1+1,−1 in nature. The network has symmetrical
weights with no self-connections i.e., wij = wji and wii = 0.

Architecture

Following are some important points to keep in mind about discrete Hopfield
network −
 This model consists of neurons with one inverting and one non-inverting
output.
 The output of each neuron should be the input of other neurons but not the
input of self.
 Weight/connection strength is represented by wij.
 Connections can be excitatory as well as inhibitory. It would be excitatory,
if the output of the neuron is same as the input, otherwise inhibitory.
 Weights should be symmetrical, i.e. wij = wji

27
The output from Y1 going to Y2, Yi and Yn have the
weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on
them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we


know that we can have the binary input vectors as well as bipolar input vectors.
Hence, in both the cases, weight updates can be done with the following relation
Case 1 − Binary input patterns
For a set of binary patterns sp, p = 1 to P
Here, s p = s1 p, s2 p,..., sip,..., snp
Weight Matrix is given by
wij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠j
Case 2 − Bipolar input patterns
For a set of binary patterns sp, p = 1 to P
Here, sp = s1p, s2p..., sip..., snp
Weight Matrix is given by
wij=∑p=1P[si(p)][sj(p)] for I ≠ j

Testing Algorithm

28
Step 1 − Initialize the weights, which are obtained from training algorithm by
using Hebbian principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input
vector X as follows −
Yi = xi for i=1to n
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −
Yini = xi+∑jyjwji
Step 7 − Apply the activation as follows over the net input to calculate the output

Here θiθi is the threshold.


Step 8 − Broadcast this output yi to all other units.
Step 9 − Test the network for conjunction.

Energy Function Evaluation

An energy function is defined as a function that is bonded and non-increasing


function of the state of the system.
Energy function Ef⁡, ⁡also called Lyapunov function determines the stability of
discrete Hopfield network, and is characterized as follows –

Condition − In a stable network, whenever the state of node changes, the above
energy function will decrease.

29
Suppose when node i has changed state from yi(k) to  yi(k+1) ⁡then the Energy
change ΔEf is given by the following relation

The change in energy depends on the fact that only one unit can update its
activation at a time.

Continuous Hopfield Network

In comparison with Discrete Hopfield network, continuous network has time as a


continuous variable. It is also used in auto association and optimization problems
such as travelling salesman problem.
Model − The model or architecture can be build up by adding electrical
components such as amplifiers which can map the input voltage to the output
voltage over a sigmoid activation function.

Back Propagation Neural Networks

Back Propagation Neural BPN is a multilayer neural network consisting of the


input layer, at least one hidden layer and output layer. As its name suggests, back
propagating will take place in this network. The error which is calculated at the
output layer, by comparing the target output and the actual output, will be
propagated back towards the input layer.

30
Architecture

As shown in the diagram, the architecture of BPN has three interconnected layers
having weights on them. The hidden layer as well as the output layer also has bias,
whose weight is always 1, on them. As is clear from the diagram, the working of
BPN is in two phases. One phase sends the signal from the input layer to the
output layer, and the other phase back propagates the error from the output layer
to the input layer.

Training Algorithm

For training, BPN will use binary sigmoid activation function. The training of
BPN will have the following three phases.
 Phase 1 − Feed Forward Phase
 Phase 2 − Back Propagation of error
 Phase 3 − Updating of weights
All these steps will be concluded in the algorithm as follows
Step 1 − Initialize the following to start the training −
31
 Weights
 Learning rate αα
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.

Step 3 − Continue step 4-10 for every training pair.

Phase 1

Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for
all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation –

Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming
from i unit of the input layer.
Now calculate the net output by applying the following activation function
Qj = f (Qinj)
Send these output signals of the hidden layer units to the output layer units.
Step 6 − Calculate the net input at the output layer unit using the following relation –

Here b0k ⁡is the bias on output unit, wjk is the weight on k unit of the output layer coming
from j unit of the hidden layer.
Calculate the net output by applying the following activation function
Yk = f(yink)

Phase 2

Step 7 − Compute the error correcting term, in correspondence with the target pattern
received at each output unit, as follows −
δk=(tk−yk)f′(yink)
On this basis, update the weight and bias as follows −

32
Δvjk=αδkQij
Δb0k=αδk
Then, send δkδk back to the hidden layer.

Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.

Error term can be calculated as follows –


δj=δinjf′(Qinj)
On this basis, update the weight and bias as follows −
Δwij = αδjxi
Δb0j=αδj

Phase 3

Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
vjk(new)=vjk(old)+Δvjk
b0k(new)=b0k(old)+Δb0k
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
wij(new)=wij(old)+Δwij
b0j (new)=b0j (old)+Δb0j
Step 11 − Check for the stopping condition, which may be either the number of epochs
reached or the target output matches the actual output.

Generalized Delta Learning Rule


Delta rule works only for the output layer. On the other hand, generalized delta rule,
also called as back-propagation rule, is a way of creating the desired values of the
hidden layer.

Mathematical Formulation

33
For the activation function yk = f(yink) the derivation of net input on Hidden layer as
well as on output layer can be given by
Yink = ∑ ziwjk
i
And  yinj=∑xivij
i

Now the error which has to be minimized is


E=1/2∑ [tk−yk]2
k

By using the chain rule, we have

34
35

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy