Notes of ANN
Notes of ANN
The term "Artificial Neural Network" is derived from Biological neural networks
that develop the structure of a human brain. Similar to the human brain that has
neurons interconnected to one another, artificial neural networks also have neurons
that are interconnected to one another in various layers of the networks. These
neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
1
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human
brain, data is stored in such a manner as to be distributed, and we can extract more
than one piece of this data when necessary from our memory parallelly. We can
say that the human brain is made up of incredibly amazing parallel processors.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
3
The artificial neural network takes input and computes the weighted sum of the
inputs and includes a bias. This computation is represented in the form of a transfer
function.
Artificial neural networks have a numerical value that can perform more than one
task simultaneously.
Data that is used in traditional programming is stored on the whole network, not on
a database. The disappearance of a couple of pieces of data in one place doesn't
prevent the network from working.
After ANN training, the information may produce output even with inadequate
data. The loss of performance here relies upon the significance of missing data.
4
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output,
and this feature makes the network fault-tolerance.
It is the most significant issue of ANN. When ANN produces a testing solution, it
does not provide insight concerning why and how. It decreases trust in the
network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per
their structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical
values before being introduced to ANN. The presentation mechanism to be
resolved here will directly impact the performance of the network. It relies on the
user's abilities.
The network is reduced to a specific value of the error, and this value does not give
us optimum results.
Science artificial neural networks that have steeped into the world in the mid-
20th century are exponentially developing. In the present time, we have
investigated the pros of artificial neural networks and the issues encountered in
the course of their utilization. It should not be overlooked that the cons of ANN
5
networks, which are a flourishing science branch, are eliminated individually, and
their pros are increasing day by day. It means that artificial neural networks will
turn into an irreplaceable part of our lives progressively important.
If the weighted sum is equal to zero, then bias is added to make the output non-
zero or something else to scale up to the system's response. Bias has the same
input, and weight equals to 1. Here the total of weighted inputs can be in the range
of 0 to positive infinity. Here, to keep the response in the limits of the desired
value, a certain maximum value is benchmarked, and the total of weighted inputs is
passed through the activation function.
6
The activation function refers to the set of transfer functions used to achieve the
desired output. There is a different kind of the activation function, but primarily
either linear or non-linear sets of functions. Some of the commonly used sets of
activation functions are the Binary, linear, and Tan hyperbolic sigmoidal activation
functions. Let us take a look at each of them in details:
Binary:
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here
the tan hyperbolic function is used to approximate output from the actual net input.
The function is defined as:
Network Topology
A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the
following kinds −
Feedforward Network
7
Multilayer feedforward network − The concept is of feedforward ANN
having more than one weighted layer. As this network has one or more
layers between the input and the output layer, it is called hidden layers.
Feedback Network
As the name suggests, a feedback network has feedback paths, which means the
signal can flow in both directions using loops. This makes it a non-linear dynamic
system, which changes continuously until it reaches a state of equilibrium. It may
be divided into the following types −
Recurrent networks − They are feedback networks with closed loops.
Following are the two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as
both input and output.
8
Jordan network − It is a closed loop network in which the output will go to
the input again as feedback as shown in the following diagram.
9
1.Every new technology need assistance from previous one i.e. data from previous
ones and these data are analyzed so that every pros and cons should be studied
correctly . All of these things are possible only through the help of neural network.
4. Neural Network can be used in betting on horse races, sporting events and most
importantly in stock market .
5. It can be used to predict the correct judgement for any crime by using a large data of
crime details as input and the resulting sentences as output.
6. By analyzing data and determining which of the data has any fault ( files diverging
from peers ) called as Data mining, cleaning and validation can be achieved through
neural network.
7. Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments .
8. It can be used efficiently in Employee hiring so that any company can hire right
employee depending upon the skills the employee has and what should be it’s
productivity in future .
10
BIOLOGICAL NEURONS ARTIFICIAL NEURONS
Information from other neurons, in the form signals. The input layer transmits signals to
of electrical impulses, enters the dendrites the neurons in the next layer, which is called a
at connection points called synapses. The hidden layer. The hidden layer extracts
information flows from the dendrites to the relevant features or patterns from the received
cell where it is processed. The output signals. Those features or patterns that are
signal, a train of impulses, is then sent considered important are then directed to the
down the axon to the synapse of other output layer, which is the final layer of the
neurons. network.
the strength of the connection. This is weights in a manner similar to the physical
Neurons
Biological Neurons (also called nerve cells) or simply neurons are the fundamental
units of the brain and nervous system, the cells responsible for receiving sensory
11
input from the external world via dendrites, process it and gives the output through
Axons.
A biological Neuron
Cell body (Soma): The body of the neuron cell contains the nucleus and carries out
biochemical transformation necessary to the life of neurons.
Dendrites: Each neuron has fine, hair-like tubular structures (extensions) around it.
They branch out into a tree around the cell body. They accept incoming signals.
Axon: It is a long, thin, tubular structure that works like a transmission line.
Dendrites receive input through the synapses of other neurons. The soma processes
these incoming signals over time and converts that processed value into an output,
which is sent out to other neurons through the axon and the synapses.
12
Perceptron
In the above figure, for one single observation, x0, x1, x2, x3...x(n) represents
various inputs(independent variables) to the network. Each of these inputs is
multiplied by a connection weight or synapse. The weights are represented as w0,
w1, w2, w3….w(n) . Weight shows the strength of a particular node.
b is a bias value. A bias value allows you to shift the activation function up or
down.
In the simplest case, these products are summed, fed to a transfer function
(activation function) to generate a result, and this result is sent as output.
Multi-layer ANN
13
A fully connected multi-layer neural network is called a Multilayer Perceptron
(MLP).
It has 3 layers including one hidden layer. If it has more than 1 hidden layer, it is
called a deep ANN.
Notations
14
In the representation below:
ao(in) is simply the bias unit and is equal to 1; it will have the corresponding
weight w0
z(x)=w1x1+w2x2+w3x3+w4x4+b=wTx+bz(x)=w1x1+w2x2+w3x3+w4x4+b=wT
x+b
The first layer contains a node for each value in our input feature vector. These
values are scaled by their corresponding weight, wiwi, and added together along
with a bias term, bb. The bias term allows us to build linear models that aren't fixed
at the origin. The following image provides an example of why this is important.
Notice how we can provide a much better decision boundary for logistic regression
when our linear model isn't fixed at the origin.
15
Let's try to visualize that.
16
The input nodes in our network visualization are all connected to a single output
node, which consists of a linear combination of all of the inputs. Each connection
between nodes contains a parameter, ww, which is what we'll tune to form an
optimal model (tuning these parameters will be covered in a later post). The final
output is functional composition, g(z(x))g(z(x)). When we pass the linear
combination of inputs through the logistic (also known as sigmoid) function, the
neural network community refers to this as activation.
This rule is an error correcting the supervised learning algorithm of single layer
feedforward networks with linear activation function, introduced by Rosenblatt.
Basic Concept − As being supervised in nature, to calculate the error, there would
be a comparison between the desired/target output and the actual output. If there is
any difference found, then a change must be made to the weights of connection.
Mathematical Formulation − To explain its mathematical formulation, suppose
we have ‘n’ number of finite input vectors, xnn, along with its desired/target
output vector tnn, where n = 1 to N.
Now the output ‘y’ can be calculated, as explained earlier on the basis of the net
input, and activation function being applied over that net input can be expressed
as follows −
0, yin ⩽ θ
Where θ is threshold.
The updating of weight can be done in the following two cases −
Case I − when t ≠ y, then
w(new)=w(old)+tx
17
It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean
Square LMSLMS method, to minimize the error over all training patterns. It is
kind of supervised learning algorithm with having continuous activation function.
Basic Concept − The base of this rule is gradient-descent approach, which
continues forever. Delta rule updates the synaptic weights so as to minimize the
net input to the output unit and the target value.
Mathematical Formulation − To update the synaptic weights, delta rule is given
by
Δwi=α.xi.ej
ej = (t−yin) the difference between the desired/target output and the actual output
yin
The above delta rule is for a single output unit only.
The updating of weight can be done in the following two cases −
Case-I − when t ≠ y, then
w(new)=w(old)+Δw
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a
teacher. This learning process is dependent.
18
During the training of ANN under supervised learning, the input vector is
presented to the network, which will give an output vector. This output vector is
compared with the desired output vector. An error signal is generated, if there is a
difference between the actual output and the desired output vector. On the basis of
this error signal, the weights are adjusted until the actual output is matched with
the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of
similar type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to which the
input pattern belongs.
There is no feedback from the environment as to what should be the desired
output and if it is correct or incorrect. Hence, in this type of learning, the network
itself must discover the patterns and features from the input data, and the relation
for the input data over the output.
19
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to
supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives
some feedback from the environment. This makes it somewhat similar to
supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an
exact output. In ANN, we can also apply activation functions over the input to get
the exact output. Followings are some activation functions of interest −
20
It is of two type as follows −
Binary sigmoidal function − This activation function performs input
editing between 0 and 1. It is positive in nature. It is always bounded, which
means its output cannot be less than 0 and more than 1. It is also strictly
increasing in nature, which means more the input higher would be the
output. It can be defined as
Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the
basic operational unit of artificial neural networks. It employs supervised learning rule
and is able to classify the data into two classes.
Operational characteristics of the perceptron: It consists of a single neuron with an
arbitrary number of inputs along with adjustable weights, but the output of the neuron is
1 or 0 depending upon the threshold. It also consists of a bias whose weight is always
1. Following figure gives a schematic representation of the perceptron.
21
Links − It would have a set of connection links, which carries a weight including a bias
always having weight 1.
Adder − It adds the input after they are multiplied with their respective weights.
Activation function − It limits the output of neuron. The most basic activation function is a
Heaviside step function that has two possible outputs. This function returns 1, if the input is
positive, and 0 for any negative input.
Training Algorithm
Perceptron network can be trained for single output unit as well as multiple output
units.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the
learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
Xi = si(I =1 to n)
Step 5 − Now obtain the net input with the following relation −
22
wi(new) = wi(old) + αtxi
b(new) = b(old) + αtb(new)
Case 2 − if y = t then,
wi(new)=wi(old)
b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no change
in weight.
The following diagram is the architecture of perceptron for multiple output classes.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the
learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
23
Xi = si(I = 1to n)
Step 5 − Obtain the net input with the following relation −
Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −
Case 1 − if yj ≠ tj then,
wij(new)=wij(old)+αtj
bj(new)=bj(old)+αtj
Case 2 − if yj = tj then,
wij(new)=wij(old)
bj(new)=bj(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in
weight.
Types of Learning
ANN Classification is an example of Supervised Learning. Known class labels
help indicate whether the system is performing correctly or not. This information
can be used to indicate a desired response, validate the accuracy of the system, or
be used to help the system learn to behave correctly. The known class labels can be
thought of as supervising the learning process; the term is not meant to imply that
you have some sort of interventionist role.
Clustering is an example of Unsupervised Learning where the class labels are
not presented to the system that is trying to discover the natural classes in a dataset.
Clustering often fails to find known classes because the distinction between the
classes can be obscured by the large number of features (genes) which are
uncorrelated with the classes. A step in ANN classification involves identifying
genes which are intimately connected to the known classes. This is called feature
selection or feature extraction. Feature selection and ANN classification together
have a use even when prediction of unknown samples is not necessary: They can
be used to identify key genes which are involved in whatever processes distinguish
the classes
Manual Feature Selection
Manual feature selection is useful if you already have some hypothesis about
which genes are key to a process. You can test that hypothesis by:
i. constructing a gene list of those genes,
Feature Selection Using the SLAM™ Technology
The genes that are frequently observed in associations are frequently good features
for classification with artificial neural networks. In GeneLinker™, ANN
classification is done using a committee of artificial neural networks (ANNs).
ANNs are highly adaptable learning machines which can detect non-linear
25
relationships between the features and the sample classes. A committee of ANNs is
used because an individual ANN may not be robust. That is, it may not make good
predictions on new data (test data) despite excellent performance on the training
data. Such a neural network is referred to as being overtrained.
Each ANN (component neural network or learner) is by default trained on a
different 90% of the training data and then validated on the remaining 10%. (These
fractions can be set differently in the Create ANN Classifier dialog by varying the
number of component neural networks.) This technique mitigates the risk of
overtraining at the level of the individual component neural network.
26
Unit-2nd
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists
of a single layer which contains one or more fully connected recurrent neurons.
The Hopfield network is commonly used for auto-association and optimization
tasks.
Architecture
Following are some important points to keep in mind about discrete Hopfield
network −
This model consists of neurons with one inverting and one non-inverting
output.
The output of each neuron should be the input of other neurons but not the
input of self.
Weight/connection strength is represented by wij.
Connections can be excitatory as well as inhibitory. It would be excitatory,
if the output of the neuron is same as the input, otherwise inhibitory.
Weights should be symmetrical, i.e. wij = wji
27
The output from Y1 going to Y2, Yi and Yn have the
weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on
them.
Training Algorithm
Testing Algorithm
28
Step 1 − Initialize the weights, which are obtained from training algorithm by
using Hebbian principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input
vector X as follows −
Yi = xi for i=1to n
Step 5 − For each unit Yi, perform steps 6-9.
Step 6 − Calculate the net input of the network as follows −
Yini = xi+∑jyjwji
Step 7 − Apply the activation as follows over the net input to calculate the output
–
Condition − In a stable network, whenever the state of node changes, the above
energy function will decrease.
29
Suppose when node i has changed state from yi(k) to yi(k+1) then the Energy
change ΔEf is given by the following relation
The change in energy depends on the fact that only one unit can update its
activation at a time.
30
Architecture
As shown in the diagram, the architecture of BPN has three interconnected layers
having weights on them. The hidden layer as well as the output layer also has bias,
whose weight is always 1, on them. As is clear from the diagram, the working of
BPN is in two phases. One phase sends the signal from the input layer to the
output layer, and the other phase back propagates the error from the output layer
to the input layer.
Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of
BPN will have the following three phases.
Phase 1 − Feed Forward Phase
Phase 2 − Back Propagation of error
Phase 3 − Updating of weights
All these steps will be concluded in the algorithm as follows
Step 1 − Initialize the following to start the training −
31
Weights
Learning rate αα
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.
Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for
all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation –
Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming
from i unit of the input layer.
Now calculate the net output by applying the following activation function
Qj = f (Qinj)
Send these output signals of the hidden layer units to the output layer units.
Step 6 − Calculate the net input at the output layer unit using the following relation –
Here b0k is the bias on output unit, wjk is the weight on k unit of the output layer coming
from j unit of the hidden layer.
Calculate the net output by applying the following activation function
Yk = f(yink)
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target pattern
received at each output unit, as follows −
δk=(tk−yk)f′(yink)
On this basis, update the weight and bias as follows −
32
Δvjk=αδkQij
Δb0k=αδk
Then, send δkδk back to the hidden layer.
Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
vjk(new)=vjk(old)+Δvjk
b0k(new)=b0k(old)+Δb0k
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
wij(new)=wij(old)+Δwij
b0j (new)=b0j (old)+Δb0j
Step 11 − Check for the stopping condition, which may be either the number of epochs
reached or the target output matches the actual output.
Mathematical Formulation
33
For the activation function yk = f(yink) the derivation of net input on Hidden layer as
well as on output layer can be given by
Yink = ∑ ziwjk
i
And yinj=∑xivij
i
34
35