Ppt-Ii NNFL
Ppt-Ii NNFL
• In late 1950s, Frank Rosenblatt introduced a network composed of the units that
were enhanced version of McCulloch-Pitts Threshold Logic Unit (TLU) model.
• The only efficient learning element at that time was for single-layered networks.
. yin = wi xi +b
. wn i=0
xn
{
1 if yin > θ
o(xi)=
0 if - θ ≤ yin ≤ θ
-1 if yin < -θ
Perceptron Learning Rule
wi = wi + wi
wi = txi
t=c(x) is the target value
o is the perceptron output
Is a small constant (e.g. 0.1) called learning rate
Using
Start
Act. Function, y = 1 if yin>θ
0 if -θ< yin < θ
-1 if yin < - θ
Initialize Weights
Weight Updation (ΔW = αtx) :
if y=t, no weight updation
For if y≠t , Wnew = Wold + αtx
Each
s:t n
y
Bias update
Activate input bnew = bold + αt
xi=si
4
1
LEARNING ALGORITHM
➢ Epoch : Presentation of the entire training set to the neural
network.
➢ Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
➢ Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.
yinj = wi xi +b j for j = 1 to m
1 if yinj > θ
yj = f(yinj ) =
0 if - θ ≤ yinj ≤ θ
-1 if yinj < -θ
9
ADAPTIVE LINEAR NEURON (ADALINE)
In 1959, Bernard Widrow and Marcian Hoff of Stanford developed
models they called ADALINE (Adaptive Linear Neuron) and MADALINE
(Multilayer ADALINE). These models were named for their use of
Multiple ADAptive LINear Elements. MADALINE was the first neural
network to be applied to a real world problem. It is an adaptive filter
which eliminates echoes on phone lines.
ADALINE MODEL
ADALINE LEARNING RULE
Adaline network uses Delta Learning Rule. This rule is also called as
Widrow Learning Rule or Least Mean Square Rule. The delta rule for
adjusting the weights is given as (i = 1 to n):
USING ADALINE NETWORKS
➢ Initialize
Initialize • Assign random weights to all links
➢ Training
• Feed-in known inputs in random sequence
• Simulate the network
Training • Compute error between the input and the
output (Error Function)
• Adjust weights (Learning Function)
• Repeat until total error < ε
Thinking ➢ Thinking
• Simulate the network
• Network will respond to any input
• Does not guarantee a correct solution even
for trained inputs
Algorithm Steps:
1.Initialize weights and biases (with small random values), set learning
rate, etc.
2.When stopping condition is false, perform 3-7
3.For each input training pair, do steps 4-6
4.Set activation for input units, xi = si for i=1 to n
5.Compute the activation output of each output unit
yin = wi xi +b
7. Test for stopping condition. (When weight change reaches small level
or number of iteration, etc.)
MADALINE NETWORK
MADALINE is a Multilayer Adaptive Linear Element. MADALINE was the
first neural network to be applied to a real world problem. It is used in
several adaptive filtering process.
Refer the text book for Training
algorithm of MADALINE and testing
the network for solving the non-linear
problems (e.g. XOR function)
ASSOCIATIVE MEMORY
NETWORKS
PATTERN ASSOCIATION
➢ Associating patterns which are
• similar,
• contrary,
• in close proximity (spatial),
• in close succession (temporal).
➢ Associative recall
s1 s1t1...s1t m w11...w1m
s t ...s t
W ( p) = s ( p) t ( p) = t1 ,..., t m =
T 2 1 2 m =
n
s n 1 n m n1
s t ...s t w ...wnm
and
P
W ( p) = s T ( p) t ( p)
p =1
HETERO-ASSOCIATIVE MEMORY NETWORK
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units: y _ in j = x i w ij
i
• Activation function: threshold y = 1 if y _ in j 0
j
0 if y _ in j 0
• Weights are computed by Hebbian rule (sum of outer products
of all training pairs) P
W = s i ( p) t j ( p)
T
• Training samples: p =1
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
COMPUTING THE WEIGHTS
1 1 0 1 1 0
0 0 0 1 1 0
s T (1) t (1) = (1 0 ) = s T (2) t (2) = (1 0) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
s T (3) t (3) = (0 1) = s T (4) t (4) = (0 1) =
0 0 0 1 0 1
1 0 1 1 0 1
2 0
1 0
W =
0 1
0 2
TEST/ RECALL THE NETWORK
x = [1 0 0 0] x = [ 0 1 1 0]
2 0 2 0
1 0
(1 0 0 0)
1 0
= (2 0 ) (0 1 1 0) = (1 1)
0 1
0 1
0 0 2
2
y1 = 1, y 2 = 0 y1 = 1, y 2 = 1
1 1 1 − 1
1 1 1 − 1
W =
1 1 1 − 1
− 1 −1 − 1 1
training pattern (111 − 1) W = (4 4 4 − 4) → (111 − 1)
noisy pattern (− 111 − 1) W = (2 2 2 − 2) → (111 − 1)
missing info (0 0 1 − 1) W = (2 2 2 − 2) → (111 − 1)
more noisy (− 1 − 11 − 1) W = (0 0 0 0) not recognized
AUTO-ASSOCIATIVE MEMORY NETWORK –
DIAGONAL ELEMENTS
• Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This causes
output = input, which may not be any stoned pattern. The
pattern correction power is lost.
• Replace diagonal elements by zero.
0 1 1 − 1
1 0 1 − 1
W0 =
1 1 0 − 1
− 1 − 1 − 1 0
(1 1 1 − 1)W ' = (3 3 3 − 3) → (1 1 1 − 1)
(−1 1 1 − 1)W ' = (3 1 1 − 1) → (1 1 1 − 1)
(0 0 1 − 1)W ' = (2 2 1 − 1) → (1 1 1 − 1)
(−1 − 1 1 − 1)W ' = (1 1 − 1 1) → wrong
STORAGE CAPACITY
• Number of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal).
• Non-orthogonal 0 −2 2
0
0 0 0 0
(1 − 1 − 1 1) → W0 = (1 − 1 − 11) W0 = (1 0 − 1 1)
(1 1 − 1 1) − 2 0 0 − 2
It is not stored correctly
2 0 − 2 0
• Orthogonal
0 −1 −1 − 1
(1 1 − 1 − 1) − 1 0 −1 − 1
(−1 1 1 − 1) → W0 =
(−1 1 − 1 1) − 1 −1 0 − 1 All three patterns can be
− 1 −1 −1 0 correctly recalled
BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
NETWORK
Architecture:
• Two layers of non-linear units: X-layer, Y-layer.
• Units: discrete threshold, continuing sigmoid (can be either
binary or bipolar).
Weights:
P
Wnm = sT ( p) t ( p) (Hebbian/o uterproduc t)
p =1
Symmetric: wij = w ji
Convert binary patterns to bipolar when constructing
W.
RECALL OF BAM NETWORK
Bidirectional, either by X (to recall Y) or by Y (to recall X).
Recurrent:
y (t ) = [ f ( y _ in1 (t ),..., f ( y _ inm (t )]
n
where y _ in j (t ) = wi j xi (t − 1)
i =1
w ij = w ji
w ii = 0
P
Hopfield’s observation: P 0.15n, 0.15
n
n P 1
Theoretical analysis: P ,
2 log 2 n n 2 log 2 n
dui (t ) n
• Internal activation ui : with = wij x j (t ) + i = neti (t )
dt j =1
Computation: all units change their output (states) at the same time,
based on states of all others.
n
• Compute net: net i (t ) = wij x j (t ) + i
j =1
0 1 1 −1
1 0 1 − 1 Output units are
x = (1, 1, 1, − 1) W =
1 1 0 −1 threshold units
− 1 − 1 − 1 0
•Autoassociative Network
•Hetero-associative Network
•Hopfield Nets