Associative - Memory - Networks 1
Associative - Memory - Networks 1
Associative memory
• Associative memory is defined as the ability to learn and remember the
relationship between unrelated items. for example, remembering the name of
someone or the aroma of a particular perfume.
• Associative memory deals specifically with the relationship between different
objects or concepts. A normal associative memory task involves testing
participants on their recall of pairs of unrelated items, such as face-name pairs.
• Associative memories are neural networks (NNs) for modeling the learning
and retrieval of memories in the brain. The retrieved memory and its query are
typically represented by binary, bipolar, or real vectors describing patterns of
neural activity.
Pattern Association
• Learning is the process of forming associations between related patterns.
• The patterns we associate together may be of the same type or of different
types.
• Each association is an input-output vector pair, s:t.
• If each vector t is the same as the vector s with which it is associated, then the
net is called an autoassociative memory.
• If the t's are different from the s's, the net is called a heteroassociative
memory.
• In each of these cases, the net not only learns the specific pattern pairs that
were used for training, but also is able to recall the desired response pattern
when given an input stimulus that is similar, but not identical, to the training
input.
Tr a i n i n g A l g o r i t h m s f o r Pattern
Association
• Two algorithms
– Hebb rule
– Outer Product rule
• Hebb rule
– Used for finding the weights of an associative
memory neural net.
– Training vector pairs are denoted by s:t
• Algorithm
– Step 0: Initialize weights to 0
wij=0 (i= 1 to n, j=1 to M)
–Step 1: For each training target input output vectors
s:t, perform steps 2-4.
– Step 2: Activate the input layers for training input.
• xi=si(for i=1 to n)
– Step 3: Activate the output layers for target
output.
• yi=ti(for j=1 to m)
– Step 4: Start weight adjustment.
– wij(new)=wij(old)+x iyj (for i=1 to n, j=1 to m)
– Used with patters that can be represented as
either binay or bipolar vectors.
4
Training Algorithms for Pattern Association
• Similar to hebbian learning for classification.
• Algorithm: (bipolar or binary patterns)
• For each training samples s:t: w ij = si t j
• w ij increases if both si and t j
are ON (binary) or have the same sign (bipolar)
P
wij = si ( p)t j ( p) W = { wij }
P =1
And
• It involves 3 nested loops p, i, j (order of p is irrelevant)
p= 1 to P /* for every training pair */
i = 1 to n /* for every row in W */
j = 1 to m /* for every element j in row i */
Delta rule
• In its original form, the delta rule assumed that the activation function for the
output unit was the identity function.
9
c o n t d ..
• Testing Algorithm
• An auto associative network can be used to determine whether the given vector is a ‘known’ or
‘unknown vector’.
• A net is said to recognize a “known” vector if the net produces a pattern of activation on the outpu
which is same as one stored.
• Testing procedure is as follows:
• Step 0: Set weights obtained from Hebb’s rule
• Step 1: For each testing input vector perform steps 2 to 4
• Step 2: Activation of inputs is equal to input vector.
• Step 3: calculate the net input for each output unit j=1 to n:
• Step 4: Calculate the output by applying the activation vector over the net
input
•
10
Auto-associative memory
• For an auto-associative net, the training input and target output vectors are
identical.
• The process of training is often called storing the vectors, which may be
binary or bipolar.
• The performance of the net is judged by its ability to reproduce a stored
pattern from noisy input; performance is, in general, better for bipolar vectors
than for binary vectors.
Auto-associative memory
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights are computed by Hebbian
rule – outer product)
1 1 1 − 1
1 1 1 − 1
W =
1 1 1 − 1
− 1 −1 − 1 1
•
• As before, the differences take one of two forms: "mistakes" in the data
or "missing" data.
• The only "mistakes" we consider are changes from + 1 to -1 or vice
versa.
• We use the term "missing" data to refer to a component that has the
value 0, rather than either + 1 or -1
Iterative Auto-associative memory
• In some cases the net does not respond immediately to an input signal with a
stored target pattern, but the response may be enough like a stored pattern.
• Testing a recurrent auto-associative net: stored vector with second, third and
fourth components set to zero.
• The weight matrix to store the vector (1, 1, 1, -1) is
Iterative Auto-associative memory
• The vector (1,0,0,0) is an example of a vector formed from the stored
vector with three "missing" components (three zero entries).
• The performance of the net for this vector is given next.
• Input vector (1, 0, 0, 0):
• (1, 0, 0, 0).W = (0, 1, 1, -1) >> iterate
• (0, 1, 1, -1).W = (3,2,2, -2) >> (1, 1, 1, -1).
• Thus, for the input vector (1, 0, 0, 0), the net produces the "known" vector (
1, 1, 1, -1) as its response in two iterations.
H e t e r o A s s o c i a t i v e Memory
Network
• Theory
• The training input and target
output vectors are different.
• Determination of weight is
by Hebb rule or Delta rule.
• Input has ‘n’ units and
output has ‘m’ units and
there is a weighted
interconnection between
input and output.
• Architecture
• The architecture is given in
the Fig.
16
Contd..
Testing Algorithm
• Step 0: Initialize the weights from the training algorithm.
• Step 1: Perform steps 2-4 for each input vector presented.
• Step 2: Set the activation inputs equal to current input vector given xi
• Step 3: Calculate the net input to the output units.
•
• Step 3: Determine the activation of the output units over the net input
(J = 1 to m)
• The output vector y obtained gives the pattern associated
with the input vector x.
• If the responses are binary then the activation function will
be as
17
Hetero-associative Memory
Associative memory neural networks are nets in which the weights
are determined in such a way that the net can store a set of P
pattern associations.
• Each association is a pair of vectors (s(p), t(p)), with p = 1, 2, . .
. , P.
• Each vector s(p) is an n-tuple (has n components), and each t(p)
is an m-tuple.
• The weights may be found using the Hebb rule or the delta rule.
Hetero-associative Memory
Example of hetero-associative memory
1 if y _ in j 0
yj =
0 if y _ in j 0
• Weights are computed by Hebbian rule (sum of outer
products of all training pairs)
P
W = s i ( p) t j ( p)
T
p =1
• Training samples:
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
Example of hetero-associative memory
1 1 0 1 1 0
s T (1) t (1) = (1 0 ) =
0 0 0 s (2) t (2) =
T 1 (1 0) = 1 0
0 0 0
0 0 0
0 0 0 0 0
0
0 0 0 0 0 0
s T (3) t (3) = (0 1) =
0 0 0
0 0 0 s (4) t (4) =
T 0 (0 1) = 0 0
1 0
1 0 1
1 1 0
1
2 0
1 0
W =
1
0
Computing the weights
0 2
Example of hetero-associative memory
Recall:
x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and S(2)
2 0
2 0
(1 0 0 0)1 0
= (2 0)
0 1 (0 1 0 0)1 0
= (1 0)
0 2 0 1
0
2
y1 = 1, y2 = 0
y1 = 1, y2 = 0
x=(0 1 1 0)
23
D i s c r e t e Bid i r e c t i o n a l A u t o
A s s o c i a t i v e Memory
• Here weight is found to be the sum of outer product of bipolar form.
• Activation function is defined with nonzero threshold.
• Determination of weights
• Input vectors is denoted by s(ρ) and output vector as t(ρ).Then the
weight matrix is denoted by
• Input ➔ s(ρ)=(S1 (ρ),...Si (ρ),.,Sn (ρ))
• Output ➔ t(ρ)= (t1 (ρ),...tj (ρ),.,tm (ρ))
• Weight matrix is determined using the Hebb Rule.
• If the input vectors is binary, then weight matrix W={wij}=
– Weight matrix will be in bipolar form, whether the input vectors are
binary or not.
24
Co n t d ..
• Activation Function for BAM
• The Activation Function is based on whether the input target vector pairs used are binary or bipola
• The Activation function for Y layer with binary input vectors is
• The activation function for the X layer with binay input vector is
• If threshold value is equal to the net input, then the previous output value
calculated is left as the activation of that unit. Signals are sent only from
one layer to the other and not in both directions. 25
Testing Algorithm for
D i s c r e t e BAM
• Test the noisy patterns entering into the network.
• Testing algorithm for the net is as follows:
• Step 0: Initialize the weights to store ρ vectors. Also initialize all the activations to zero.
• Step 1:Perform steps 2-6 for each testing input.
• Step 2: Set the Activation of X layer to current input patterns, presenting the input x to X layer and
presenting the input pattern y to Y layer. It is bidirectional memory.
• Step 3: Perform steps 4-6 when the activations are not converged.
• Step 4: Update the activation of units in Y layer. Calculate the net input.
26
C o n t i n u o u s Bam
• It uses logistic Sigmoid function as the activation functions for all units.
• It may be binary sigmoid or bipolar sigmoid.
• Bipolar sigmoid function with high gain, converge to vector state and acts
like DBAM.
• If the input vectors are binary, s(ρ), t(ρ), the weights are determined using
the formula
• wij=
• If a binary logistic function is used, then the activation function is
14
A n a l y s i s o f Hamming D i s t a n c e , E n e r g y F u n c t i o n a n d S t o r a g e
Capacity
Hamming distance-number of mismatched
• components of two given bipolar/binary
vectors.
• Denoted by H [X, X’]
• Average hamming distance =(1/n) H [X, X’]
• X = [ 1 0 1 0 1 1 0]
• Y = [ 1 1 1 1 0 0 1]
• HD = 5
• Average = 5/7.
• Stability is determined by Lyapunov
function (Energy function).
• Lyapunov function must always be bounded
and decreasing
• Change in energy
• Memory capacity min(m,n)
• “n” is the number of units in X layer
and “m” is the number of units in the
Y layer.
28
Bidirectional Associative Memory (BAM)
• First proposed by bart kosko
• Heteroassociative network
• It associates patterns from one set, set A, to patterns from another set,
set B, and vice versa
• Generalize and also produce correct outputs despite corrupted or
incomplete inputs
• Consists of two fully interconnected layers of processing elements
• There can also be a feedback link connecting each
Node to itself.
Bidirectional Associative Memory (BAM)
• The BAM mapping of an n dimensional input vector 𝑋𝑛 into the m
dimensional output vector 𝑌𝑚 .
In this case, the BAM input layer must have six neurons and the output layer three neurons.
Bidirectional Associative Memory (BAM)
For instance
Bidirectional Associative Memory (BAM)
Then, we confirm that the BAM recalls 𝑋𝑚 when presented with 𝑌𝑚 . That is,
For instance
Bidirectional Associative Memory (BAM)
• Step 3: Retrieval: Present an unknown vector (probe) X to the BAM and
retrieve a stored association. The probe may present a corrupted or incomplete
version of a pattern from set A (or from set B) stored in the BAM. That is,
• Repeat the iteration until equilibrium, when input and output vectors remain unchanged
with further iterations. The input and output patterns will then represent an associated
pair.
The BAM is unconditionally stable (Kosko, 1992). This means that any set of
associations can be learned without risk of instability. This important quality
arises from the BAM using the transpose relationship between weight matrices in
forward and backward directions.
Let us now return to our example. Suppose we use vector X as a probe. It
represents a single error compared with the pattern 𝑋1 from set A:
This probe applied as the BAM input produces the output vector Y1 from set B.
The vector Y1 is then used as input to retrieve the vector X1 from set A. Thus,
the BAM is indeed capable of error correction.
Hopfield Networks
• Developed a model in the year 1982.
• Hopfield-promoted construction of the first analog VLSI network chip.
• Two types of network
– Discrete Hopfield network
– Continuous Hopfield network
• Discrete Hopfield network
– Hopfield network is an auto associative fully interconnected single-layer feedback network.
– Symmetrically weighted network.
– Operated in discrete fashion it is called as discrete Hopfield network and its architecture as a
single layer feedback is called as recurrent.
– Two inputs (i.e) binary and bipolar.
– Use of bipolar makes analysis easier.
– No self-connections Wij=Wji; Wii=0
– Only one unit updates its activation at a time.
– An input pattern is applied to the network & the network’s output is initialized accordingly.
– Initialized pattern removed and initialized output becomes new updated input through the
feedback connections.
– Process continues until no new, updated responses are produced and the network reaches its
equilibrium.
– The asynchronous updation is called as energy function of Lyapunov function for the net.
– Energy function proves that the net will converge to a stable set of activations.
39
Architecture of Discrete
Hopfield N e t
• Processing elements with two
outputs-inverting and non-inverting.
• Outputs from each processing
elements are fed back to the input of
other processing element and not to
itself.
• The connections are resistive.
• No negative resistors, so excitatory
connections use +ve inputs and
inhibitory use inverted inputs.
• Excitatory-output same as input
• Inhibitary-input different from output.
• Weight is positive if both units are
on.
• If connection strength negative , then
one of the unit is off.
• Weights are symmetric Wij=Wji
40
Training Algorithm of D i s c re t e
Hopfield N e t
• For storing a set of binary patterns the weight
matrix W is given as
41
Testing Algorithm
• Initial weights are obtained from training algorithm. Steps:
• 0: initialize weight to store pattern , weights obtained from training algorithm using
Hebb rule
• 1: when the activations are not converged, perform steps 2-8.
• 2: perform steps 3-7 for each input vector x
• 3: make initial activation equal to external input vector x: yi=xi(i=1 to n)
• 4: Perform steps 5-7 for each unit Yi.
• 5: Calculate the net input of the network
• 6: Apply the activation functions over the net input to calculate the output.
•
42
• Only a single neural unit is allowed to update
its output.
• Next update is carried on randomly chosen
node, used already updated output.
• This is asynchronous stochastic recursion.
• If the input vector is unknown , the activation
vectors resulted will converge into a pattern
which is not in stored pattern, such a pattern is
called spurious stable state.
43
A n a l y s i s o f E n e r g y F u n c t i o n and S t o r a g e
Capacity on D i s c r e t e Hopfield N e t
• Energy function- function that is bounded and is non increasing
function of the state of the system
• Lyapunov function-determines the stability property .
• Energy function
• Network stable- energy function decreases,
whenever the state of any node changes.
• Assume – node i change its state
• A positive definite function Ef(y) can be found such that
– Ef(y) is continuous for all components yi for i=1 to n
– d Ef [y(t)]/dt<0, which indicates that the energy function is decreasing
with time.
– Storage capacity C 0.15 n
– n- number of neurons in the net
– C n/2log2n
44
Continuous Hopfield N e t w o r k
• Discrete can be converted to continuous , if time
is a continuous variable.
• Used in associative memory problems or
travelling salesman problem.
• Nodes have continuous graded output.
• Energy decreases continuously with time.
• Electronic circuit which uses non-linear
amplifiers and resistors.
• Used in building Hopfield with VLSI technology
45
Hardware Model
• The continuous network build up of
electrical components is shown in
figure.
• Model has n amplifiers, mapping its
input voltage ui into an output
voltage yi over an activation function
a(ui).
• Activation function used can be
sigmoid function.
• λ is gain parameter
• Continuous becomes discrete when λ
-> α
• Input capacitance ci and input
conductance gri
• External signal xi
• External signal supply constant
current
46
C o n t d.
. Apply Kirchoff’s current law.
• Total current entering a junction
is equal to that leaving the same
function.
• Equation from KCL describes the
time evolution of the system.
47
I t e r a t i v e Auto Associative N e t w o r k s
51
AutoAssociator with T h r e s h o l d
Unit
• If threshold unit is set, then a threshold function is used
as activation function.
• Training algorithm
• Steps:
• 0: Weight initialization
• 1: Perform steps 2-5 for each testing vector
• 2: Set activations of X.
• 3: Steps 4 and 5.
• 4: Update activation