0% found this document useful (0 votes)
41 views43 pages

Ppt-Ii NNFL

Uploaded by

Devesh Dewangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views43 pages

Ppt-Ii NNFL

Uploaded by

Devesh Dewangan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Perceptron

• In late 1950s, Frank Rosenblatt introduced a network composed of the units that
were enhanced version of McCulloch-Pitts Threshold Logic Unit (TLU) model.

• Rosenblatt's model of neuron, a perceptron, was the result of merger between


two concepts from the 1940s, McCulloch-Pitts model of an artificial neuron and
Hebbian learning rule of adjusting weights. In addition to the variable weight
values, the perceptron model added an extra input that represents bias. Thus,
the modified equation is now as follows:

• The only efficient learning element at that time was for single-layered networks.

• Today, used as a synonym for a single-layered feed-forward network.


Perceptron

• Linear treshold unit (LTU)


Bias = 1
x1 w1 b
w2
x2  o
.
n

. yin = wi xi +b
. wn i=0

xn
{
1 if yin > θ
o(xi)=
0 if - θ ≤ yin ≤ θ
-1 if yin < -θ
Perceptron Learning Rule
wi = wi + wi
wi = txi
t=c(x) is the target value
o is the perceptron output
 Is a small constant (e.g. 0.1) called learning rate

• If the output is correct (t=o) the weights wi are not changed


• If the output is incorrect (to) the weights wi are changed
such that the output of the perceptron for the new weights
is closer to t.
• The algorithm converges to the correct classification
• if the training data is linearly separable
• and  is sufficiently small
Flow chart of Perceptron training algorithm
1

Using
Start
Act. Function, y = 1 if yin>θ
0 if -θ< yin < θ
-1 if yin < - θ

Initialize Weights
Weight Updation (ΔW = αtx) :
if y=t, no weight updation
For if y≠t , Wnew = Wold + αtx
Each
s:t n
y
Bias update
Activate input bnew = bold + αt
xi=si

Compute the O/p response


Yin = b +∑XiWi Stop

4
1
LEARNING ALGORITHM
➢ Epoch : Presentation of the entire training set to the neural
network.

➢ In the case of the AND function, an epoch consists of four sets of


inputs being presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1]).

➢ Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
➢ Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.

➢ Output , O : The output value from the neuron.

➢ Ij : Inputs being presented to the neuron.

➢ Wj : Weight from input neuron (Ij) to the output neuron.

➢ LR : The learning rate. This dictates how quickly the network


converges. It is set by a matter of experimentation. It is typically
0.1.
TRAINING ALGORITHM
➢ Adjust neural network weights to map inputs to outputs.
➢ Use a set of sample patterns where the desired output (given the
inputs presented) is known.
➢ The purpose is to learn to
• Recognize features which are common to good and bad
exemplars
Steps:
1. Initialize weights and biases, set learning rate, etc.
2. When stopping condition is false, perform 3-7
3. For each input training pair, do steps 4-6
4. Set activation for input units, xi = si for i=1 to n
5. Compute the activation output of each output unit

yinj = wi xi +b j for j = 1 to m
1 if yinj > θ
yj = f(yinj ) =
0 if - θ ≤ yinj ≤ θ
-1 if yinj < -θ

6. The weights and bias are to be updated for j = 1 to m and i = 1 to n


If yj ≠ tj and xi ≠ 0, then
wij(new) = wij(old)+ tjxi
bj(new) = bj(old)+ tj
Else if
Yj = tj
wij(new) = wij(old)
bj(new) = bj(old)

7. Test for stopping condition.


AND-Using Perceptron N/W
Inputs Net O/p Tgt Weight Changes Weights
X1 X2 b Yin Y T ΔW1 ΔW2 Δb W1 W2 b
(0) (0) (0)
1 1 1 0 0 1 1 1 1 1 1 1
1 -1 1 1 1 -1 1 -1 -1 2 0 0
-1 1 1 2 1 -1 -1 1 -1 1 1 -1
-1 -1 1 -3 -1 -1 0 0 0 1 1 -1

9
ADAPTIVE LINEAR NEURON (ADALINE)
In 1959, Bernard Widrow and Marcian Hoff of Stanford developed
models they called ADALINE (Adaptive Linear Neuron) and MADALINE
(Multilayer ADALINE). These models were named for their use of
Multiple ADAptive LINear Elements. MADALINE was the first neural
network to be applied to a real world problem. It is an adaptive filter
which eliminates echoes on phone lines.
ADALINE MODEL
ADALINE LEARNING RULE

Adaline network uses Delta Learning Rule. This rule is also called as
Widrow Learning Rule or Least Mean Square Rule. The delta rule for
adjusting the weights is given as (i = 1 to n):
USING ADALINE NETWORKS
➢ Initialize
Initialize • Assign random weights to all links

➢ Training
• Feed-in known inputs in random sequence
• Simulate the network
Training • Compute error between the input and the
output (Error Function)
• Adjust weights (Learning Function)
• Repeat until total error < ε

Thinking ➢ Thinking
• Simulate the network
• Network will respond to any input
• Does not guarantee a correct solution even
for trained inputs
Algorithm Steps:
1.Initialize weights and biases (with small random values), set learning
rate, etc.
2.When stopping condition is false, perform 3-7
3.For each input training pair, do steps 4-6
4.Set activation for input units, xi = si for i=1 to n
5.Compute the activation output of each output unit
yin = wi xi +b

Finally apply the activation function to obtain output y:


1 if yin ≥ 0
y = f(yin ) =
-1 if yin < 0

6. The weights and bias are to be updated for i = 1 to n


wi(new) = wi(old)+ α(t – yin) xi
b(new) = b(old)+ α(t – yin)

7. Test for stopping condition. (When weight change reaches small level
or number of iteration, etc.)
MADALINE NETWORK
MADALINE is a Multilayer Adaptive Linear Element. MADALINE was the
first neural network to be applied to a real world problem. It is used in
several adaptive filtering process.
Refer the text book for Training
algorithm of MADALINE and testing
the network for solving the non-linear
problems (e.g. XOR function)
ASSOCIATIVE MEMORY
NETWORKS
PATTERN ASSOCIATION
➢ Associating patterns which are

• similar,
• contrary,
• in close proximity (spatial),
• in close succession (temporal).

➢ Associative recall

• evoke associated patterns,


• recall a pattern by part of it,
• evoke/recall with incomplete/noisy patterns.
ASSOCIATIVE MEMORY (AM) NETWORK
➢ Two types of associations exist. For two patterns s and t

• hetero-association (s != t): relating two different patterns (s –


input, t – target).
• auto-association (s = t): relating parts of a pattern with other
parts.

➢ Architectures of NN associative memory:


• single layer (with/out input layer),
• two layers (for bidirectional association)

➢ Learning algorithms for AM:


• Hebbian learning rule and its variations,
• gradient descent.
ASSOCIATIVE MEMORY NETWORK
➢ WORKING PROCESS

• Recall a stored pattern by a noisy input pattern.

• Using the weights that capture the association.

• Stored patterns are viewed as “attractors”, each has its


“attraction basin”.

• Often call this type of NN “associative memory” (recall by


association, not explicit indexing/addressing).
TRAINING ALGORITHM FOR ASSOCIATIVE
MEMORY NETWORK
➢ Network structure: single layer

• one output layer of non-linear units and one input layer.

s_1 x_1 w_1 y_1 t_1


1
w_1m
w_n
s_n x_n 1 y_m t_m
w_n
➢ Goal of learning: m

• to obtain a set of weights w_ij from a set of training pattern


pairs {s:t} such that when s is applied to the input layer, t is
computed at the output layer,
• for all training pairs s:t, tj = f(sTwj) for all j.
HEBB RULE FOR PATTERN ASSOCIATION
➢ Algorithm (bipolar or binary patterns):

• For each training samples s:t: wij = si  t j


• wij increases if both si and t j are ON (binary) or have the same
sign (bipolar).
If wij = 0, then, after updates for all P training patterns,
P
wij =  si ( p )t j ( p ), W = { wij }
P =1

• Instead of obtaining W by iterative updates, it can be


computed from the training set by calculating the outer
product of s and t.
OUTER PRODUCT FOR PATTERN ASSOCIATION

Let s and t be row vectors.

Then for a particular training pair s:t

 s1   s1t1...s1t m  w11...w1m 
   s t ...s t   
W ( p) = s ( p)  t ( p) =   t1 ,..., t m  = 
T 2 1 2 m = 
     
     
 n
s  n 1 n m   n1
s t ...s t w ...wnm 

and
P
W ( p) =  s T ( p)  t ( p)
p =1
HETERO-ASSOCIATIVE MEMORY NETWORK
• Binary pattern pairs s:t with |s| = 4 and |t| = 2.
• Total weighted input to output units: y _ in j =  x i w ij
i
• Activation function: threshold y = 1 if y _ in j  0
j
0 if y _ in j  0
• Weights are computed by Hebbian rule (sum of outer products
of all training pairs) P
W =  s i ( p) t j ( p)
T
• Training samples: p =1

s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
COMPUTING THE WEIGHTS

1  1 0 1  1 0
       
0 0 0 1  1 0
s T (1)  t (1) =  (1 0 ) =  s T (2)  t (2) =  (1 0) = 
0 0 0 0 0 0
       
0 0 0  0 0 0 
     

 0 0 0 0 0 0
       
 0 0 0 0 0 0
s T (3)  t (3) =  (0 1) =  s T (4)  t (4) =  (0 1) = 
0 0 0 1 0 1
       
1  0 1  1  0 1 
     
2 0
 
1 0
W =
0 1
 
0 2
 
TEST/ RECALL THE NETWORK
x = [1 0 0 0] x = [ 0 1 1 0]
2 0 2 0
   
1 0
(1 0 0 0)
1 0
= (2 0 ) (0 1 1 0) = (1 1)
 0 1

0 1
  
0 0 2 
 2  
y1 = 1, y 2 = 0 y1 = 1, y 2 = 1

x = [0 1 0 0] similar to s(1) and s(2) (1 0 0 0), (1 1 0 0) class (1, 0)


2 0 (0 0 0 1), (0 0 1 1) class (0, 1)
 
(0 1 0 0)
1 0
 = (1 0 ) (0 1 1 0) is not sufficiently similar
0 1 to any class
 
0 2 

y1 = 1, y 2 = 0
AUTO-ASSOCIATIVE MEMORY NETWORK
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by a its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights computed
by Hebbian rule or outer product rule.

1 1 1 − 1
1 1 1 − 1
W = 
1 1 1 − 1
− 1 −1 − 1 1 
training pattern (111 − 1) W = (4 4 4 − 4) → (111 − 1)
noisy pattern (− 111 − 1) W = (2 2 2 − 2) → (111 − 1)
missing info (0 0 1 − 1) W = (2 2 2 − 2) → (111 − 1)
more noisy (− 1 − 11 − 1) W = (0 0 0 0) not recognized
AUTO-ASSOCIATIVE MEMORY NETWORK –
DIAGONAL ELEMENTS
• Diagonal elements will dominate the computation when
multiple patterns are stored (= P).
• When P is large, W is close to an identity matrix. This causes
output = input, which may not be any stoned pattern. The
pattern correction power is lost.
• Replace diagonal elements by zero.
0 1 1 − 1
1 0 1 − 1
W0 =  
 1 1 0 − 1 
− 1 − 1 − 1 0 
(1 1 1 − 1)W ' = (3 3 3 − 3) → (1 1 1 − 1)
(−1 1 1 − 1)W ' = (3 1 1 − 1) → (1 1 1 − 1)
(0 0 1 − 1)W ' = (2 2 1 − 1) → (1 1 1 − 1)
(−1 − 1 1 − 1)W ' = (1 1 − 1 1) → wrong
STORAGE CAPACITY
• Number of patterns that can be correctly stored & recalled by a
network.
• More patterns can be stored if they are not similar to each
other (e.g., orthogonal).
• Non-orthogonal 0 −2 2
0 
0 0 0 0 
(1 − 1 − 1 1) → W0 =   (1 − 1 − 11) W0 = (1 0 − 1 1)
(1 1 − 1 1) − 2 0 0 − 2
It is not stored correctly
2 0 − 2 0 

• Orthogonal
0 −1 −1 − 1
(1 1 − 1 − 1) − 1 0 −1 − 1
(−1 1 1 − 1) → W0 =  
(−1 1 − 1 1) − 1 −1 0 − 1 All three patterns can be
− 1 −1 −1 0 correctly recalled
BIDIRECTIONAL ASSOCIATIVE MEMORY (BAM)
NETWORK
Architecture:
• Two layers of non-linear units: X-layer, Y-layer.
• Units: discrete threshold, continuing sigmoid (can be either
binary or bipolar).
Weights:

P
Wnm =  sT ( p)  t ( p) (Hebbian/o uterproduc t)
p =1

Symmetric: wij = w ji
Convert binary patterns to bipolar when constructing
W.
RECALL OF BAM NETWORK
Bidirectional, either by X (to recall Y) or by Y (to recall X).
Recurrent:
y (t ) = [ f ( y _ in1 (t ),..., f ( y _ inm (t )]
n
where y _ in j (t ) =  wi j  xi (t − 1)
i =1

x(t + 1) = [ f ( x _ in1 (t + 1),..., f ( x _ inn (t + 1)]


m
where x _ in i (t + 1) =  wij  y j (t )
j =1

Update can be either asynchronous (as in hetero-associative memory)


or synchronous (change all Y units at one time, then all X units the
next time).
ACTIVATION FUNCTIONS IN BAM
The activation function is based on whether the input target vector
pairs used are binary or bipolar.

Activation function for the Y- Activation function for the X-


layer layer
With binary input vectors With binary input vectors

With bipolar input vectors With bipolar input vectors


DISCRETE HOPFIELD NETWORK (DHN)
➢ A single-layer network
• each node as both input and output units.

➢ More than an Associative Memory, Discrete Hopfield Network can


be used in several applications.
• Other applications such as combinatorial optimization.

➢ Different forms: discrete & continuous.

➢ Major contribution of John Hopfield to NN:


• Treating a network as a dynamic system.
• Introduction of energy function into Neural Network Research.
ARCHITECTURE OF DHN
➢ Architecture

• Single-layer (units serve as both input and output):


✓ nodes are threshold units (binary or bipolar).
✓ weights: fully connected, symmetric, and zero diagonal.

w ij = w ji
w ii = 0

xi are external inputs, which


may be transient or
permanent.
STORAGE CAPACITY OF DHN
P: maximum number of random patterns of dimension n can be stored
in a DHM of n nodes

P
Hopfield’s observation: P  0.15n,  0.15
n
n P 1
Theoretical analysis: P , 
2 log 2 n n 2 log 2 n

P/n decreases because larger n leads to more interference between


stored patterns.

Some work to modify HM to increase its capacity to close to n, W is


trained (not computed by Hebbian rule).
CONTINUOUS HOPFIELD NET
➢ Architecture

• Continuous node output, and continuous time

• Fully connected with symmetric weights wij = w ji , wii = 0

dui (t ) n
• Internal activation ui : with =  wij x j (t ) +  i = neti (t )
dt j =1

• Output (state) xi (t ) = f (ui (t ))

where f is a sigmoid function to ensure binary/bipolar output. E.g.


for bipolar, use hyperbolic tangent function:
e x − e− x
f ( x) = tanh( x) = x
e + e−x
CONTINUOUS HOPFIELD NET

Computation: all units change their output (states) at the same time,
based on states of all others.

n
• Compute net: net i (t ) =  wij x j (t ) + i
j =1

• Compute internal activationui (t ) by first-order Taylor expansion


dui (t )
ui (t +  ) =  ini (t )dt  ui (t ) +   + ... = ui (t ) + net i  
dt
• Compute output
xi (t ) = f (ui (t ))
ITERATIVE ASSOCIATIVE MEMORY NETWORK
Example

 0 1 1 −1 
 1 0 1 − 1 Output units are
x = (1, 1, 1, − 1) W = 
1 1 0 −1  threshold units
 
 − 1 − 1 − 1 0 

An incomplete recall input : x' = (1, 0, 0, 0)


Wx' = (0, 1, 1, − 1) = x"
Wx" = (3, 2, 2, − 2) → (1, 1, 1, − 1) = x

In general: using current output as input of the next iteration

x(0) = initial recall input


x(I) = S(Wx(I-1)), I = 1, 2, ……
until x(N) = x(K) for some K < N
Dynamic System: State vector x(I)
If K = N-1, x(N) is a stable state (fixed point)

f(Wx(N)) = f(Wx(N-1)) = x(N)

If x(K) is one of the stored pattern, then x(K) is called a genuine


memory

Otherwise, x(K) is a spurious memory (caused by cross-


talk/interference between genuine memories)

Each fixed point (genuine or spurious memory) is an attractor


(with different attraction basin)
If K != N-1, limit-circle,

The network will repeat

x(K), x(K+1), …, x(N) = x(K) when iteration continues.

Iteration will eventually stop because the total number of


distinct state is finite (3^n) if threshold units are used. If
patterns are continuous, the system may continue evolve
forever (chaos) if no such K exists.
SUMMARY
This chapter discussed on the various associative networks:

•Autoassociative Network

•Hetero-associative Network

•Bidirectional Associative Memory Network

•Hopfield Nets

•Iterative Associative Nets

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy