0% found this document useful (0 votes)
2 views31 pages

Iterative Net BAM

The document discusses Iterative Autoassociative Neural Networks, specifically focusing on Bidirectional Associative Memory (BAM). It outlines the architecture, functioning, and applications of BAM, including its ability to associate patterns and handle noisy inputs. The document also describes the Hebb rule for storing patterns and provides examples of its implementation with bipolar codes.

Uploaded by

oasisolga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views31 pages

Iterative Net BAM

The document discusses Iterative Autoassociative Neural Networks, specifically focusing on Bidirectional Associative Memory (BAM). It outlines the architecture, functioning, and applications of BAM, including its ability to associate patterns and handle noisy inputs. The document also describes the Hebb rule for storing patterns and provides examples of its implementation with bipolar codes.

Uploaded by

oasisolga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

POLYTECHNIC UNIVERSITY

Department of Computer and Information Science

Iterative Autoassociative Net:


Bidirectional Associative Memory
K. Ming Leung

Abstract: Iterative associative neural networks are in-


troduced. In particular the Bidirectional Associative
Memory net is discussed.

Directory
• Table of Contents
• Begin Article

Copyright c 2000 mleung@poly.edu


Last Revision Date: March 21, 2006
Table of Contents
1. Motivation
2. Bidirectional Associative Memory (BAM)
2.1. BAM to Associate Letters with Simple Bipolar codes
2.2. BAM with Noisy Input
2.3. BAM with Biases
3. Remarks
3.1. Hamming Distance and Correlation Encoding
3.2. Erasing a Stored Association
3.3. Updating Strategies
3.4. Convergence of a BAM and a Lyapunov Function
3.5. Storage Capacity
Section 1: Motivation 3

1. Motivation
After storing a given pattern, an autoassociative net may still not be
able to respond immediately to an input signal with a stored pattern,
but the response may be close to a stored pattern to suggest its use as
input to the net again. In a sense, we are connecting the output units
back to the input units. This results in a recurrent autoassociative
net.
To illustrate the basic idea behind such a recurrent net, we consider
the following simple example.  
Suppose we want to store a single pattern: 1 1 1 −1 .
Hebb rule gives the weight matrix:
   
1 1 1 −1 0 1 1 −1
 1 1 1 −1   or  1
 0 1 −1 
W= 
 1 1 1 −1   1 1 0 −1 
−1 −1 −1 1 −1 −1 −1 0
if the NN has no self connections.
We first consider the case where the NN has no self connections.
The original pattern is of course stored successfully by the rule. Now
Toc JJ II J I Back J Doc Doc I
Section 1: Motivation 4
 
suppose we input a pattern with 3 missing values, like 1 0 0 0 ,
we find
 
0 1 1 −1
  1 0 1 −1  
 = 0 1 1 −1

yin = 1 0 0 0   1 1 0 −1 
−1 −1 −1 0
 
and so y = 0 1 1 −1 , which is close to but not the same
as the stored pattern 1 1 1 −1 . Then we feed this output
pattern into the NN as input. This gives
 
0 1 1 −1
  1 0 1 −1   
yin = 0 1 1 −1   = 3 2 2 −2
 1 1 0 −1 
−1 −1 −1 0
 
and so y = 1 1 1 −1 , which is precisely the stored pattern.
On the other hand, if the NN
 is allowed to have
 self connections,

it
 is easy to see
 that inputs 1 0 0 0 , 0 1 0 0 , and
0 0 1 0 all produce the same correct output immediately.
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 5
 
However, the input 0 0 0 1 converges to the negative of the
 
stored pattern −1 −1 −1 1 in one step. So the NN behaves
better if self connections are eliminated.

2. Bidirectional Associative Memory (BAM)


Bidirectional associative memory (BAM), developed by Kosko in 1988
and 1992, has the following features:
1. Its architecture consists of two layers: an X layer containing N
neurons and a Y layer containing M neurons.
2. The transfer function is taken to be


 1 if yin,j > 0,
fBAM (yin,j ) = previous yj if yin,j = 0,

−1 if yin,j < 0.

Notice that if the net input into a neuron is exactly 0, then its
activation is kept the same as its previous value.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 6

3. A set of input and target patterns s(q) : t(q) , q = 1, 2, . . . , Q is


given, where the input vectors s(q) have N components,
h i
s(q) = s(q)
1 s
(q)
2 . . . s
(q)
N
,

and their targets t(q) have M components,


h i
t(q) = t(q)
1 t
(q)
2 . . . t
(q)
M
.

4. These patterns are stored using Hebb rule to give an N × M


weight matrix, W, so that
Q
(q) (q)
X
wij = si tj .
q=1

Biases are not used at all.


5. Inputs can be applied to either layers.
6. Signals are sent back and forth between the 2 layers until all
neurons reach equilibrium (all activations remain the same for
2 successive steps).
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 7

7. Hebb rule gives a weight matrix W whose elements are wij =


PQ (q) (q)
q=1 si tj . W is used as the weight matrix for signals sent
from the X to the Y layer, and WT is used as the weight matrix
for signals sent from the Y to the X layer.
The BAM algorithm is:

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 8

1 Use Hebb rule to store a set of Q associated vector pairs.


2 For a given input test vector, x do steps a - c.
a Set activations of the X layer to the input test pat-
tern.
b Set activations of the Y layer to 0.
c While activations have not converged do steps i - iii
i Update Y-layer activations
N
X
yin,j = xi wij yj = fBAM (yin,j ).
i=1
ii Update X-layer activations
M
X
xin,i = wij yj xi = fBAM (xin,i ).
j=1

iii Test for convergence of activations in both layers.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 9

The above algorithm assumes that the test pattern is input into
the X-layer. We can also input a test pattern into the Y-layer. In
that case we need to exchange the roles of the X- and the Y-layers.
If we adopt matrix notation and represent all input vectors as a
row vector ( a 1 × N matrix), then the activation of the Y-layer is
written as yin = xW, since W is an N × M matrix. On the other
hand, the activation of the X-layer is written as xin = yWT .

2.1. BAM to Associate Letters with Simple Bipolar


codes
We want to use BAM to associate the letters, A and C, with simple
bipolar codes:
.#.
#.#
###
#.#
#.#
 
with t(1) = −1 1 and
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 10

.##
#..
#..
#..
.##
 
with t(2) = 1 1 . For each of the 2 letters, we replace ”.” with −1
and ”#” with 1, and concatenate the bits row-wise to form a bipolar
row vector. Thus we have
s(1) = −1 1 −1 1 −1 1 1 1 1 1 −1 1 1 −1 1
 

s(2) = −1 1 1 1 −1 −1 1 −1 −1 1 −1 −1 −1 1 1
 

Notice that these 2 vectors have a square magnitude of N = 15 and


are as uncorrelated as can be since s(1) · s(2) = 1 and N is odd.
Hebb rule gives the weight matrix
W = s(1)T −1 1 +s(2)T 1 1 = s(2)T − s(1)T s(2)T + s(1)T .
     

In general, given 2 bipolar vectors s(1) and s(2) , s(2) + s(1) is non-
bipolar vector whose elements are given by 0 if the corresponding

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 11

elements in s(1) and s(2) are different, and by twice the value in s(2)
if the corresponding elements are the same. Also s(2) − s(1) is non-
bipolar vector whose elements are given by 0 if the corresponding
elements in s(1) and s(2) are the same, and by twice the value in s(2)
if the corresponding elements are different.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 12

Explicitly we have
 
0 −2
 0 2 
 
 2 0 
 
 0 2 
 
 0 −2 
 
 −2 0 
 
 0 2 
 
W=  −2 0 

 −2 0 
 
 0 2 
 
 0 −2 
 
 −2 0 
 
 −2 0 
 
 2 0 
0 2
First we check to see if the NN gives the correct output at the
Y-layer if pattern A or C is input to the X-layer. We find that with
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 13

x = s(1)
yin = s(1) W = s(1) (s(2)T − s(1)T ) s(1) (s(2)T + s(1)T )
 
     
= 1 − 15 15 + 1 = −14 16 ⇒ y = −1 1 ,
(2)
and with x = s
xin = s(2) W = s(2) (s(2)T − s(1)T ) s(2) (s(2)T + s(1)T )
 
     
= 15 − 1 1 + 15 = 14 16 ⇒ y= 1 1 .
These are the correct results at the Y-layer. Notice that the above
results are independent of the actual initial activations in the Y-layer,
since none of the net input signal happens to be 0.
In the opposite direction, we want to input the targets at the Y-
layer and see if the correct responses are obtained at the X-layer. We
find that with y = t(1)
 s(2) − s(1)
 
xin = t(1) WT = −1 1

s(2) + s(1)
= −s(2) + s(1) + s(1) + s(2) = 2s(1) ⇒ x = s(1) ,

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 14

and with y = t(2)


s(2) − s(1)
 
(2) T
 
xin = t W = 1 1
s(2) + s(1)
= s(2) − s(1) + s(1) + s(2) = 2s(2) ⇒ x = s(2) .
These are the correct activations at the X-layer. Notice that the above
results are independent of the actual initial activations in the X-layer
since none of the net input signal happens to be 0.
In either of the cases above, iteration converges after just one back
and forth passage of signals.

2.2. BAM with Noisy Input


Next letus test the BAM with noisy input. Suppose the activation
on Y is 0 1 , and nothing is known about the activation on X so
we put
 
x= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 15

We find that
 s(2) − s(1)
 
xin = 0 1 WT = 0 1 = s(2) + s(1) .
  
s(2) + s(1)
From what we said about the sum of any 2 bipolar vectors, we see
that the activation on X is
1  (2) 
x = fBAM (xin ) = s + s(1) .
2
This signal has to be sent back to the Y-layer to get
1  (2) 
s + s(1)

yin = s(2)T − s(1)T s(2)T + s(1)T
2
1   
= 0 s(2) s(2)T + s(1) s(1)T + 2s(2) s(1)T = 0 N + 1 .
2
This gives
 
y= 0 1 ,
which is exactly the input on Y that we started with. So the BAM
has converged, but sadly to a spurious stable state. This is certainly
not a surprising result at all since with knowledge of only the second

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 16

element of y, there is no way to expect the NN (or humans) to tell


which of the 2 characters we are referring to.
So far we have followed closely the textbook. However there is a
very troubling result of this NN that even with knowledge of the first
element of y, this NN still cannot tell which of the 2 characters we
are referring to. We humans should have no problem with that at all.
To see that we repeat the above calculation,
 however this time
with the activation on Y given by 1 0 We find that
 s(2) − s(1)
 
 T 
= s(2) − s(1) .

xin = 1 0 W = 1 0
s(2) + s(1)
From what we said about the difference of any 2 bipolar vectors, we
see that the activation on X is
1  (2) 
x = fBAM (xin ) = s − s(1) .
2

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 17

This signal has to be sent back to the Y-layer to get


1  (2) 
s − s(1)

yin = s(2)T − s(1)T s(2)T + s(1)T
2
1  (2) (2)T   
= s s + s(1) s(1)T − 2s(2) s(1)T 0 = N − 1 0 .
2
This gives
 
y= 1 0 ,
which is exactly the input on Y that we started with. So with this
input, the BAM has again converged, but sadly to a spurious stable
state.
However given that the first element of y is 1 without any infor-
mation on the second element, we should know that we are referring
to character ’C’ and not ’A’. There is something about this NN that
is not satisfactory at all. The main problem is that biases were not
considered. It turns out that they are important for its proper oper-
ation.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 18

2.3. BAM with Biases


As before Hebb rule gives a weight matrix W whose elements are
PQ (q) (q)
wij = q=1 si tj with i = 1, 2, . . . , N and j = 1, 2, . . . , M . W is
used as the weight matrix for signals sent from the X to the Y layer,
and WT is used as the weight matrix for signals sent from the Y to
(Y ) PQ (q)
the X layer. The biases for the Y-layer are given by bj = q=1 tj ,
with j = 1, 2, . . . , M . Since signals are sent back to the X-layer, there
(X) PQ (q)
are biases for the X-layers. These are given by bi = q=1 si , with
i = 1, 2, . . . , N .
The algorithm for the BAM with biases is:

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 19

1 Use Hebb rule to store a set of Q associated vector pairs.


2 For a given input test vector, x do steps a - c.
a Set activations of the X layer to the input test pat-
tern.
b Set activations of the Y layer to 0.
c While activations have not converged do steps i - iii
i Update Y-layer activations
N
(Y )
X
yin,j = xi wij + bj yj = fBAM (yin,j ).
i=1
ii Update X-layer activations
M
(X)
X
xin,i = wij yj + bi xi = fBAM (xin,i ).
j=1

iii Test for convergence of activations in both layers.

Again the above algorithm assumes that the test pattern is input

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 20

into the X-layer. We can also input a test pattern into the Y-layer.
In that case we need to exchange the roles of the X- and the Y-layers.
We now re-visit the character association example but this time
with biases. We find that
b(Y ) = 0 2 , b(X) = s(2) + s(1) .
 

Again we first check to see if the NN gives the correct output at


the Y-layer if pattern A or C is input to the X-layer. We find that
with x = s(1)
yin = s(1) W + b(Y ) = −14 16 + 0 2 = −14 18
     
 
and therefore y = −1 1 , and with x = s(2)
yin = s(2) W + b(Y ) = 14 16 + 0 2 = 14 18
     
 
and therefore y = 1 1 . These are the intended associations.
Notice that the above results are independent of the actual initial
activations in the Y-layer, since none of the net input signal happens
to be 0.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 21

In the opposite direction, we want to input the targets at the Y-


layer and see if the correct responses are obtained at the X-layer. We
find that with y = t(1)
 s(2) − s(1)
 
(1) T (X)
+ s(2) + s(1)

xin = t W + b = −1 1
s(2) + s(1)
= 3s(1) + s(2)
and so x = s(1) .
Similarly with y = t(2)
s(2) − s(1)
 
= t(2) WT + b(X) = + s(2) + s(1)
 
xin 1 1
s(2) + s(1)
= 3s(2) + s(1)
and so x = s(2) . These are the correct activations at the X-layer.
Notice that the above results are independent of the actual initial
activations in the X-layer since none of the net input signal happens
to be 0.
In either of the cases above, iteration converges after just one back
and forth passage of signals.

Toc JJ II J I Back J Doc Doc I


Section 2: Bidirectional Associative Memory (BAM) 22

Next we test
 the BAM with noisy input at the Y-layer with acti-
vation 0 1 , and no knowledge about the activations on X and so
they are set to 0. We find that
 s(2) − s(1)
 
 T (X)
+ s(2) + s(1)
 
xin = 0 1 W +b = 0 1
s(2) + s(1)
 
= 2 s(2) + s(1) .
The activation on X is therefore given by
1  (2) 
x = fBAM (xin ) = s + s(1) .
2
This signal has to be sent back to the Y-layer to get
1  (2) 
s + s(1) s(2)T − s(1)T s(2)T + s(1)T + b(Y )

yin =
2
1   
= 0 s(2) s(2)T + s(1) s(1)T + 2s(2) s(1)T + 0 2
2
 
= 0 N +3 .
This gives
 
y= 0 1 ,
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 23

which is exactly the input on Y that we started with. So the BAM


has converged, but sadly to a spurious stable state. This is certainly
not a surprising result at all since with knowledge of only the second
element of y, there is no way to expect the NN (or humans) to tell
which of the 2 characters we are referring to.
However, if we start off with appropriate partial knowledge of the
activation at the X-layer, the intended association can be made. For
example with noisy input at the Y-layer with activation 0 1 , and
an initial activation at the X-layer
 
x= 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ,
the output at the X-layer is then given by
 
x = −1 1 0 1 −1 1 1 0 0 1 −1 0 0 0 1 ,
which is neither s(1) or s(2) . However
 with this as the input to the
X-layer, the output at the Y-layer is −1 1 . With this as the input
to the Y-layer, the output at the X-layer
 is then s(1) . This in turn
produces at the Y-layer the output −1 1 . The iteration clearly
 
converges to −1 1 at the X-layer and to t(1) at the Y-layer.
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 24

Are the results reasonable? From the initial input at the Y-layer
we cannot tell if one is referring to an ’A’ or a ’C’. However, the input
vectors s(1) and s(2) differ in the 3rd, 6th, 8th, 9th, 12th, 13th and
14th positions. Given that the only other given information is that
the 6th neuron at the X-layer is initial ”on”, strongly suggests that it
is the s(1) : t(1) association that we want, and this is indeed what we
obtain.
Next we repeat the above calculation with the activation on Y
given by 1 0 . This is the one where the previous BAM without
biases failed to produce a reasonable output. Now we find that
 s(2) − s(1)
 
 T (X)
+s(2) +s(1) = 2s(2) .
 
xin = 1 0 W +b = 1 0
s(2) + s(1)
We see that the activation on X is
x = fBAM (xin ) = s(2) .
This signal has to be sent back to the Y-layer. However for this input
at the X-layer, we already know that the activation at the Y-layer is
 
y= 1 1 ,
Toc JJ II J I Back J Doc Doc I
Section 3: Remarks 25

which is exactly t(2) . So the BAM has converged, and this time to
the correct result, since given that the first element of y is 1, but
without any information on the second element, we should know that
we are referring to character ’C’ and not ’A’. Our new BAM with
biases perform well, thus showing the importance of including biases
in the net.

3. Remarks
3.1. Hamming Distance and Correlation Encoding
For any 2 binary or bipolar vectors, x(1) and x(2) , each of length
N , the Hamming distance between the vectors is defined to be the
number of corresponding bits that are different between the vectors,
and it is denoted by H(x(1) , x(2) ). It is related to the dot-product
between these vectors. Let us define A(x(1) , x(2) ) to be the number of
corresponding bits in x(1) and x(2) that agrees with each other. Then
it is clear that for bipolar vectors
x(1) · x(2) = A(x(1) , x(2) ) − H(x(1) , x(2) ),

Toc JJ II J I Back J Doc Doc I


Section 3: Remarks 26

because the corresponding bits are either the same, and so they con-
tribute 1 to the dot-product, or they are different and so contribute
a −1. A corresponding pair of bits must either agree or disagree, and
so
N = A(x(1) , x(2) ) + H(x(1) , x(2) ).
Eliminating A(x(1) , x(2) ) from these 2 expressions, and solving for
H(x(1) , x(2) ) gives the normalized Hamming distance (our textbook
calls it the averaged Hamming distance)
 
1 1 1
H(x(1) , x(2) ) = 1 − x(1) · x(2) .
N 2 N
The dot-product of 2 bipolar vectors ranges from −N for 2 anti-
parallel vectors to N for 2 identical vectors. The dot-product is 0
if the 2 vector are orthogonal to each other. The normalized Ham-
ming distance between 2 bipolar vectors ranges from 0 for 2 identical
vectors to 1 for 2 anti-parallel vectors. It is equal to one half if the 2
vectors are orthogonal to each other.
In our example above, since s(1) · s(2) = 1, and t(1) · t(2) = 0, we

Toc JJ II J I Back J Doc Doc I


Section 3: Remarks 27

have
 
1 1 1 7
H(s(1) , s(2) ) = 1− = .
N 2 15 15
1 1
H(t(1) , t(2) ) = .
N 2
These 2 normalized Hamming distances do not differ much from each
other.
There is the idea of ”correlation encoding” where a pair of input
patterns that are separated by a small Hamming distance are mapped
to a pair of output pattern that are also so separated, while a pair of
input patterns that are separated by a large Hamming distance are
mapped to a pair of output pattern that are also largely separated
(dissimilar).

3.2. Erasing a Stored Association


The complement of a bipolar vector, x, denoted by xc , is a vector
formed from x by flipping all the bits. It is clear that storing pattern
pair sc : tc is the same as storing the pair s : t since Hebb rule gives
Toc JJ II J I Back J Doc Doc I
Section 3: Remarks 28

exactly the same weight matrix. To erase a given association s : t, we


need to store the pair sc : t or s : tc .

3.3. Updating Strategies


Several strategies may be used to update the activations in a NN.
The algorithm used here for the BAM uses a synchronous updating
procedure, where all neurons within a layer update their activations
simultaneously. Updating may also be asynchronous (only one neuron
updates its activation at each step of the iteration) or subset asyn-
chronous (a group of neurons updates all its member’s activations at
each stage).

3.4. Convergence of a BAM and a Lyapunov Func-


tion
The convergence of a BAM can be proved by introducing an energy or
a Lyapunov function. A Lyapunov function must be a scalar function
that is bounded below, and it value cannot increase with every step of
the BAM algorithm. An appropriate choice for a Lyapunov function
Toc JJ II J I Back J Doc Doc I
Section 3: Remarks 29

is the average signal energy for a forward and a backward pass:


1
L = − xWyT + yWT xT

2
Since xWy and yWT xT are scalars that are the of each other, they
T

must be identical. Thus we have


XN X M
T
L = −xWy = − xi wij yj .
i=1 j=1

For binary or bipolar neurons, L is clearly bounded below by


N X
X M
L<− |wij |.
i=1 j=1

It can be proved that this Lyapunov function decreases as the net


iterates, and therefore the iteration converges, for either synchronous
or subgroup updates.

Toc JJ II J I Back J Doc Doc I


Section 3: Remarks 30

3.5. Storage Capacity


It can be shown that the upper bound on the memory of the present
BAM is min(N, M ). However, there was good evidence based on
a combination of heuristics and exhaustive search that the storage
capacity can be extended to min(2N , 2M ) if an appropriate nonzero
threshold values is chosen for each neuron.

Toc JJ II J I Back J Doc Doc I


Section 3: Remarks 31

References
[1] See Chapter 3 in Laurene Fausett, ”Fundamentals of Neural Net-
works - Architectures, Algorithms, and Applications”, Prentice
Hall, 1994.

Toc JJ II J I Back J Doc Doc I

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy