Iterative Net BAM
Iterative Net BAM
Directory
• Table of Contents
• Begin Article
1. Motivation
After storing a given pattern, an autoassociative net may still not be
able to respond immediately to an input signal with a stored pattern,
but the response may be close to a stored pattern to suggest its use as
input to the net again. In a sense, we are connecting the output units
back to the input units. This results in a recurrent autoassociative
net.
To illustrate the basic idea behind such a recurrent net, we consider
the following simple example.
Suppose we want to store a single pattern: 1 1 1 −1 .
Hebb rule gives the weight matrix:
1 1 1 −1 0 1 1 −1
1 1 1 −1 or 1
0 1 −1
W=
1 1 1 −1 1 1 0 −1
−1 −1 −1 1 −1 −1 −1 0
if the NN has no self connections.
We first consider the case where the NN has no self connections.
The original pattern is of course stored successfully by the rule. Now
Toc JJ II J I Back J Doc Doc I
Section 1: Motivation 4
suppose we input a pattern with 3 missing values, like 1 0 0 0 ,
we find
0 1 1 −1
1 0 1 −1
= 0 1 1 −1
yin = 1 0 0 0 1 1 0 −1
−1 −1 −1 0
and so y = 0 1 1 −1 , which is close to but not the same
as the stored pattern 1 1 1 −1 . Then we feed this output
pattern into the NN as input. This gives
0 1 1 −1
1 0 1 −1
yin = 0 1 1 −1 = 3 2 2 −2
1 1 0 −1
−1 −1 −1 0
and so y = 1 1 1 −1 , which is precisely the stored pattern.
On the other hand, if the NN
is allowed to have
self connections,
it
is easy to see
that inputs 1 0 0 0 , 0 1 0 0 , and
0 0 1 0 all produce the same correct output immediately.
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 5
However, the input 0 0 0 1 converges to the negative of the
stored pattern −1 −1 −1 1 in one step. So the NN behaves
better if self connections are eliminated.
Notice that if the net input into a neuron is exactly 0, then its
activation is kept the same as its previous value.
The above algorithm assumes that the test pattern is input into
the X-layer. We can also input a test pattern into the Y-layer. In
that case we need to exchange the roles of the X- and the Y-layers.
If we adopt matrix notation and represent all input vectors as a
row vector ( a 1 × N matrix), then the activation of the Y-layer is
written as yin = xW, since W is an N × M matrix. On the other
hand, the activation of the X-layer is written as xin = yWT .
.##
#..
#..
#..
.##
with t(2) = 1 1 . For each of the 2 letters, we replace ”.” with −1
and ”#” with 1, and concatenate the bits row-wise to form a bipolar
row vector. Thus we have
s(1) = −1 1 −1 1 −1 1 1 1 1 1 −1 1 1 −1 1
s(2) = −1 1 1 1 −1 −1 1 −1 −1 1 −1 −1 −1 1 1
In general, given 2 bipolar vectors s(1) and s(2) , s(2) + s(1) is non-
bipolar vector whose elements are given by 0 if the corresponding
elements in s(1) and s(2) are different, and by twice the value in s(2)
if the corresponding elements are the same. Also s(2) − s(1) is non-
bipolar vector whose elements are given by 0 if the corresponding
elements in s(1) and s(2) are the same, and by twice the value in s(2)
if the corresponding elements are different.
Explicitly we have
0 −2
0 2
2 0
0 2
0 −2
−2 0
0 2
W= −2 0
−2 0
0 2
0 −2
−2 0
−2 0
2 0
0 2
First we check to see if the NN gives the correct output at the
Y-layer if pattern A or C is input to the X-layer. We find that with
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 13
x = s(1)
yin = s(1) W = s(1) (s(2)T − s(1)T ) s(1) (s(2)T + s(1)T )
= 1 − 15 15 + 1 = −14 16 ⇒ y = −1 1 ,
(2)
and with x = s
xin = s(2) W = s(2) (s(2)T − s(1)T ) s(2) (s(2)T + s(1)T )
= 15 − 1 1 + 15 = 14 16 ⇒ y= 1 1 .
These are the correct results at the Y-layer. Notice that the above
results are independent of the actual initial activations in the Y-layer,
since none of the net input signal happens to be 0.
In the opposite direction, we want to input the targets at the Y-
layer and see if the correct responses are obtained at the X-layer. We
find that with y = t(1)
s(2) − s(1)
xin = t(1) WT = −1 1
s(2) + s(1)
= −s(2) + s(1) + s(1) + s(2) = 2s(1) ⇒ x = s(1) ,
We find that
s(2) − s(1)
xin = 0 1 WT = 0 1 = s(2) + s(1) .
s(2) + s(1)
From what we said about the sum of any 2 bipolar vectors, we see
that the activation on X is
1 (2)
x = fBAM (xin ) = s + s(1) .
2
This signal has to be sent back to the Y-layer to get
1 (2)
s + s(1)
yin = s(2)T − s(1)T s(2)T + s(1)T
2
1
= 0 s(2) s(2)T + s(1) s(1)T + 2s(2) s(1)T = 0 N + 1 .
2
This gives
y= 0 1 ,
which is exactly the input on Y that we started with. So the BAM
has converged, but sadly to a spurious stable state. This is certainly
not a surprising result at all since with knowledge of only the second
Again the above algorithm assumes that the test pattern is input
into the X-layer. We can also input a test pattern into the Y-layer.
In that case we need to exchange the roles of the X- and the Y-layers.
We now re-visit the character association example but this time
with biases. We find that
b(Y ) = 0 2 , b(X) = s(2) + s(1) .
Next we test
the BAM with noisy input at the Y-layer with acti-
vation 0 1 , and no knowledge about the activations on X and so
they are set to 0. We find that
s(2) − s(1)
T (X)
+ s(2) + s(1)
xin = 0 1 W +b = 0 1
s(2) + s(1)
= 2 s(2) + s(1) .
The activation on X is therefore given by
1 (2)
x = fBAM (xin ) = s + s(1) .
2
This signal has to be sent back to the Y-layer to get
1 (2)
s + s(1) s(2)T − s(1)T s(2)T + s(1)T + b(Y )
yin =
2
1
= 0 s(2) s(2)T + s(1) s(1)T + 2s(2) s(1)T + 0 2
2
= 0 N +3 .
This gives
y= 0 1 ,
Toc JJ II J I Back J Doc Doc I
Section 2: Bidirectional Associative Memory (BAM) 23
Are the results reasonable? From the initial input at the Y-layer
we cannot tell if one is referring to an ’A’ or a ’C’. However, the input
vectors s(1) and s(2) differ in the 3rd, 6th, 8th, 9th, 12th, 13th and
14th positions. Given that the only other given information is that
the 6th neuron at the X-layer is initial ”on”, strongly suggests that it
is the s(1) : t(1) association that we want, and this is indeed what we
obtain.
Next we repeat the above calculation with the activation on Y
given by 1 0 . This is the one where the previous BAM without
biases failed to produce a reasonable output. Now we find that
s(2) − s(1)
T (X)
+s(2) +s(1) = 2s(2) .
xin = 1 0 W +b = 1 0
s(2) + s(1)
We see that the activation on X is
x = fBAM (xin ) = s(2) .
This signal has to be sent back to the Y-layer. However for this input
at the X-layer, we already know that the activation at the Y-layer is
y= 1 1 ,
Toc JJ II J I Back J Doc Doc I
Section 3: Remarks 25
which is exactly t(2) . So the BAM has converged, and this time to
the correct result, since given that the first element of y is 1, but
without any information on the second element, we should know that
we are referring to character ’C’ and not ’A’. Our new BAM with
biases perform well, thus showing the importance of including biases
in the net.
3. Remarks
3.1. Hamming Distance and Correlation Encoding
For any 2 binary or bipolar vectors, x(1) and x(2) , each of length
N , the Hamming distance between the vectors is defined to be the
number of corresponding bits that are different between the vectors,
and it is denoted by H(x(1) , x(2) ). It is related to the dot-product
between these vectors. Let us define A(x(1) , x(2) ) to be the number of
corresponding bits in x(1) and x(2) that agrees with each other. Then
it is clear that for bipolar vectors
x(1) · x(2) = A(x(1) , x(2) ) − H(x(1) , x(2) ),
because the corresponding bits are either the same, and so they con-
tribute 1 to the dot-product, or they are different and so contribute
a −1. A corresponding pair of bits must either agree or disagree, and
so
N = A(x(1) , x(2) ) + H(x(1) , x(2) ).
Eliminating A(x(1) , x(2) ) from these 2 expressions, and solving for
H(x(1) , x(2) ) gives the normalized Hamming distance (our textbook
calls it the averaged Hamming distance)
1 1 1
H(x(1) , x(2) ) = 1 − x(1) · x(2) .
N 2 N
The dot-product of 2 bipolar vectors ranges from −N for 2 anti-
parallel vectors to N for 2 identical vectors. The dot-product is 0
if the 2 vector are orthogonal to each other. The normalized Ham-
ming distance between 2 bipolar vectors ranges from 0 for 2 identical
vectors to 1 for 2 anti-parallel vectors. It is equal to one half if the 2
vectors are orthogonal to each other.
In our example above, since s(1) · s(2) = 1, and t(1) · t(2) = 0, we
have
1 1 1 7
H(s(1) , s(2) ) = 1− = .
N 2 15 15
1 1
H(t(1) , t(2) ) = .
N 2
These 2 normalized Hamming distances do not differ much from each
other.
There is the idea of ”correlation encoding” where a pair of input
patterns that are separated by a small Hamming distance are mapped
to a pair of output pattern that are also so separated, while a pair of
input patterns that are separated by a large Hamming distance are
mapped to a pair of output pattern that are also largely separated
(dissimilar).
References
[1] See Chapter 3 in Laurene Fausett, ”Fundamentals of Neural Net-
works - Architectures, Algorithms, and Applications”, Prentice
Hall, 1994.