0% found this document useful (0 votes)
47 views53 pages

Associative - Memory - Networks 1

Associative Memory Neural Networks are designed to learn and recall relationships between unrelated items through associative memory tasks. They can be categorized into autoassociative and heteroassociative networks, with training algorithms such as the Hebb rule and outer product rule used for weight determination. The networks are evaluated based on their ability to reproduce stored patterns from noisy or incomplete inputs, demonstrating their effectiveness in pattern completion and recovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views53 pages

Associative - Memory - Networks 1

Associative Memory Neural Networks are designed to learn and recall relationships between unrelated items through associative memory tasks. They can be categorized into autoassociative and heteroassociative networks, with training algorithms such as the Hebb rule and outer product rule used for weight determination. The networks are evaluated based on their ability to reproduce stored patterns from noisy or incomplete inputs, demonstrating their effectiveness in pattern completion and recovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Associative Memory Neural Networks

Associative memory
• Associative memory is defined as the ability to learn and remember the
relationship between unrelated items. for example, remembering the name of
someone or the aroma of a particular perfume.
• Associative memory deals specifically with the relationship between different
objects or concepts. A normal associative memory task involves testing
participants on their recall of pairs of unrelated items, such as face-name pairs.
• Associative memories are neural networks (NNs) for modeling the learning
and retrieval of memories in the brain. The retrieved memory and its query are
typically represented by binary, bipolar, or real vectors describing patterns of
neural activity.
Pattern Association
• Learning is the process of forming associations between related patterns.
• The patterns we associate together may be of the same type or of different
types.
• Each association is an input-output vector pair, s:t.
• If each vector t is the same as the vector s with which it is associated, then the
net is called an autoassociative memory.
• If the t's are different from the s's, the net is called a heteroassociative
memory.
• In each of these cases, the net not only learns the specific pattern pairs that
were used for training, but also is able to recall the desired response pattern
when given an input stimulus that is similar, but not identical, to the training
input.
Tr a i n i n g A l g o r i t h m s f o r Pattern
Association
• Two algorithms
– Hebb rule
– Outer Product rule
• Hebb rule
– Used for finding the weights of an associative
memory neural net.
– Training vector pairs are denoted by s:t
• Algorithm
– Step 0: Initialize weights to 0
wij=0 (i= 1 to n, j=1 to M)
–Step 1: For each training target input output vectors
s:t, perform steps 2-4.
– Step 2: Activate the input layers for training input.
• xi=si(for i=1 to n)
– Step 3: Activate the output layers for target
output.
• yi=ti(for j=1 to m)
– Step 4: Start weight adjustment.
– wij(new)=wij(old)+x iyj (for i=1 to n, j=1 to m)
– Used with patters that can be represented as
either binay or bipolar vectors.

4
Training Algorithms for Pattern Association
• Similar to hebbian learning for classification.
• Algorithm: (bipolar or binary patterns)
• For each training samples s:t: w ij = si  t j
• w ij increases if both si and t j
are ON (binary) or have the same sign (bipolar)
P
wij =  si ( p)t j ( p) W = { wij }
P =1

• Instead of obtaining W by iterative updates, it can be computed from the


training set by calculating the outer product of s and t.
Outer product
• Outer product. Let s and t be row vectors.
Then for a particular training pair s:t
 s1   s1t1......s1t m  w11......w1m 
   s t ......s t   
W ( p) = s ( p)  t ( p) =   t1 ,......t m  = 
T 2 1 2 m
= 
     
 sn   sn t1......sn t m  w n1......w nm 

And
• It involves 3 nested loops p, i, j (order of p is irrelevant)
p= 1 to P /* for every training pair */
i = 1 to n /* for every row in W */
j = 1 to m /* for every element j in row i */
Delta rule
• In its original form, the delta rule assumed that the activation function for the
output unit was the identity function.

• A simple extension allows for the use of any differentiable activation


function; we shall call this the extended delta rule.
A u t o A s s o c i a t i v e Memory N e t w o r k
• Theory
• Training input and target output vectors
are same.
• Determination of weight is called storing
of vectors
• Diagonal weight is set to zero.
• Auto associative net with no self
connection.
• Increases net ability to generalize.
• Architecture
• The Fig gives the architecture of Auto
associative memory network.
• Input and target output vectors are same.
• The input vector has n inputs and output
vector has n outputs.
• The input and output are connected
through weighted connections.
8
c o nt d..
• Training Algorithm
– Step 0: Initialize weights to 0
• wij=0 (i= 1 to n, j=1 to m)
– Step 1: For each of the vector
that has to be stored, perform
steps 2- 4.
– Step 2: Activate the input
layers for training input.
• xi=si(for i=1 to n)
– Step 3: Activate the output
layers for target output.
• yj=sj (for j=1 to n)
– Step 4: Adjust weight .
– wij(new)=wij(old)+x iyj
– Weight can also be found by
using

9
c o n t d ..
• Testing Algorithm
• An auto associative network can be used to determine whether the given vector is a ‘known’ or
‘unknown vector’.
• A net is said to recognize a “known” vector if the net produces a pattern of activation on the outpu
which is same as one stored.
• Testing procedure is as follows:
• Step 0: Set weights obtained from Hebb’s rule
• Step 1: For each testing input vector perform steps 2 to 4
• Step 2: Activation of inputs is equal to input vector.
• Step 3: calculate the net input for each output unit j=1 to n:

• Step 4: Calculate the output by applying the activation vector over the net
input

10
Auto-associative memory

• For an auto-associative net, the training input and target output vectors are
identical.
• The process of training is often called storing the vectors, which may be
binary or bipolar.
• The performance of the net is judged by its ability to reproduce a stored
pattern from noisy input; performance is, in general, better for bipolar vectors
than for binary vectors.
Auto-associative memory
• Same as hetero-associative nets, except t(p) =s (p).
• Used to recall a pattern by its noisy or incomplete version.
(pattern completion/pattern recovery)
• A single pattern s = (1, 1, 1, -1) is stored (weights are computed by Hebbian
rule – outer product)

1 1 1 − 1
1 1 1 − 1
W = 
1 1 1 − 1
− 1 −1 − 1 1 

training pat. (111 − 1) W = (4 4 4 − 4) → (111 − 1)


noisy pat (− 111 − 1) W = (2 2 2 − 2) → (111 − 1)
missing info (0 0 1 − 1) W = (2 2 2 − 2) → (111 − 1)
more noisy (− 1 − 11 − 1) W = (0 0 0 0) not recognized
Auto-associative memory
• The preceding process of using the net can be written more succinctly
as:

• As before, the differences take one of two forms: "mistakes" in the data
or "missing" data.
• The only "mistakes" we consider are changes from + 1 to -1 or vice
versa.
• We use the term "missing" data to refer to a component that has the
value 0, rather than either + 1 or -1
Iterative Auto-associative memory

• In some cases the net does not respond immediately to an input signal with a
stored target pattern, but the response may be enough like a stored pattern.
• Testing a recurrent auto-associative net: stored vector with second, third and
fourth components set to zero.
• The weight matrix to store the vector (1, 1, 1, -1) is
Iterative Auto-associative memory
• The vector (1,0,0,0) is an example of a vector formed from the stored
vector with three "missing" components (three zero entries).
• The performance of the net for this vector is given next.
• Input vector (1, 0, 0, 0):
• (1, 0, 0, 0).W = (0, 1, 1, -1) >> iterate
• (0, 1, 1, -1).W = (3,2,2, -2) >> (1, 1, 1, -1).
• Thus, for the input vector (1, 0, 0, 0), the net produces the "known" vector (
1, 1, 1, -1) as its response in two iterations.
H e t e r o A s s o c i a t i v e Memory
Network
• Theory
• The training input and target
output vectors are different.
• Determination of weight is
by Hebb rule or Delta rule.
• Input has ‘n’ units and
output has ‘m’ units and
there is a weighted
interconnection between
input and output.
• Architecture
• The architecture is given in
the Fig.

16
Contd..
Testing Algorithm
• Step 0: Initialize the weights from the training algorithm.
• Step 1: Perform steps 2-4 for each input vector presented.
• Step 2: Set the activation inputs equal to current input vector given xi
• Step 3: Calculate the net input to the output units.

• Step 3: Determine the activation of the output units over the net input

(J = 1 to m)
• The output vector y obtained gives the pattern associated
with the input vector x.
• If the responses are binary then the activation function will
be as

17
Hetero-associative Memory
Associative memory neural networks are nets in which the weights
are determined in such a way that the net can store a set of P
pattern associations.
• Each association is a pair of vectors (s(p), t(p)), with p = 1, 2, . .
. , P.
• Each vector s(p) is an n-tuple (has n components), and each t(p)
is an m-tuple.
• The weights may be found using the Hebb rule or the delta rule.
Hetero-associative Memory
Example of hetero-associative memory

• Binary pattern pairs s:t with |s| = 4 and |t| = 2.


• Total weighted input to output units: y _ in j =  x i w ij
• Activation function: threshold
i

1 if y _ in j  0
yj = 
0 if y _ in j  0
• Weights are computed by Hebbian rule (sum of outer
products of all training pairs)
P
W =  s i ( p) t j ( p)
T

p =1
• Training samples:
s(p) t(p)
p=1 (1 0 0 0) (1, 0)
p=2 (1 1 0 0) (1, 0)
p=3 (0 0 0 1) (0, 1)
p=4 (0 0 1 1) (0, 1)
Example of hetero-associative memory

1  1 0 1  1 0
       
s T (1)  t (1) =  (1 0 ) = 
0 0 0 s (2)  t (2) =
T  1  (1 0) =  1 0
 0 0 0  
0 0 0
 0 0 0 0 0 
   0    

 0 0 0 0 0 0
       
s T (3)  t (3) =  (0 1) = 
0 0 0
 0 0 0 s (4)  t (4) =
T  0  (0 1) =  0 0
1  0  
1 0 1
1  1  0
  
   1 
2 0
 
1 0
W =
1
0
Computing the weights
 
0 2 

Example of hetero-associative memory
Recall:
x=(1 0 0 0) x=(0 1 0 0) (similar to S(1) and S(2)
2 0
  2 0
(1 0 0 0)1 0
= (2 0)  
0 1 (0 1 0 0)1 0
= (1 0)
0 2  0 1
 0
 2 
y1 = 1, y2 = 0
y1 = 1, y2 = 0
x=(0 1 1 0)

2 0 (1 0 0 0), (1 1 0 0) class (1, 0)


 
(0 0 0 1), (0 0 1 1) class (0, 1)
(0 1 1 0)  1 0
= (1 1)
0 1 (0 1 1 0) is not sufficiently similar
0 2  to any class

y1 = 1, y2 = 1 delta-rule would give same or
similar results.
B i d i r e c t i o n a l A s s o c i a t i v e M emor y
( BA M )
• Theory
• Developed by Kosko in the year
1988.
• Performs backward & forward search.
• Encodes binary/bipolar pattern using
Hebbian learning rule
• Two types
– Discrete BAM
– Continuous BAM
• Architecture
• Weights are bidirectional
• X layer has ‘n’ input units
• Y layer has ‘m’ output units.
• Weight matrix from X to Y is W and
from Y to X is WT.

23
D i s c r e t e Bid i r e c t i o n a l A u t o
A s s o c i a t i v e Memory
• Here weight is found to be the sum of outer product of bipolar form.
• Activation function is defined with nonzero threshold.
• Determination of weights
• Input vectors is denoted by s(ρ) and output vector as t(ρ).Then the
weight matrix is denoted by
• Input ➔ s(ρ)=(S1 (ρ),...Si (ρ),.,Sn (ρ))
• Output ➔ t(ρ)= (t1 (ρ),...tj (ρ),.,tm (ρ))
• Weight matrix is determined using the Hebb Rule.
• If the input vectors is binary, then weight matrix W={wij}=

– If the input vectors are bipolar, the weight matrix W={w }=


ij

– Weight matrix will be in bipolar form, whether the input vectors are
binary or not.

24
Co n t d ..
• Activation Function for BAM
• The Activation Function is based on whether the input target vector pairs used are binary or bipola
• The Activation function for Y layer with binary input vectors is

• with bipolar input vector is

• The activation function for the X layer with binay input vector is

• With bipolar input vector is

• If threshold value is equal to the net input, then the previous output value
calculated is left as the activation of that unit. Signals are sent only from
one layer to the other and not in both directions. 25
Testing Algorithm for
D i s c r e t e BAM
• Test the noisy patterns entering into the network.
• Testing algorithm for the net is as follows:
• Step 0: Initialize the weights to store ρ vectors. Also initialize all the activations to zero.
• Step 1:Perform steps 2-6 for each testing input.
• Step 2: Set the Activation of X layer to current input patterns, presenting the input x to X layer and
presenting the input pattern y to Y layer. It is bidirectional memory.
• Step 3: Perform steps 4-6 when the activations are not converged.
• Step 4: Update the activation of units in Y layer. Calculate the net input.

– Applying the Activation, we get yj=f(yinj).


– Send this signal to X layer.
• Step 5: Update the activation of units in X layer.
– Calculate the net input

– Applying the activation over the net input


• xi=f(xini)
• Send this signal to Y layer.

Step 6: Test for convergence of the net. The convergence occurs if the activation vectors x and y
• reach equilibrium. If this occurs, then stop, else continue.

26
C o n t i n u o u s Bam
• It uses logistic Sigmoid function as the activation functions for all units.
• It may be binary sigmoid or bipolar sigmoid.
• Bipolar sigmoid function with high gain, converge to vector state and acts
like DBAM.
• If the input vectors are binary, s(ρ), t(ρ), the weights are determined using
the formula
• wij=
• If a binary logistic function is used, then the activation function is

• If the activation function is bipolar logistic function then,

• Net input calculated with bias is included

14
A n a l y s i s o f Hamming D i s t a n c e , E n e r g y F u n c t i o n a n d S t o r a g e
Capacity
Hamming distance-number of mismatched
• components of two given bipolar/binary
vectors.
• Denoted by H [X, X’]
• Average hamming distance =(1/n) H [X, X’]
• X = [ 1 0 1 0 1 1 0]
• Y = [ 1 1 1 1 0 0 1]
• HD = 5
• Average = 5/7.
• Stability is determined by Lyapunov
function (Energy function).
• Lyapunov function must always be bounded
and decreasing
• Change in energy
• Memory capacity min(m,n)
• “n” is the number of units in X layer
and “m” is the number of units in the
Y layer.

28
Bidirectional Associative Memory (BAM)
• First proposed by bart kosko
• Heteroassociative network
• It associates patterns from one set, set A, to patterns from another set,
set B, and vice versa
• Generalize and also produce correct outputs despite corrupted or
incomplete inputs
• Consists of two fully interconnected layers of processing elements
• There can also be a feedback link connecting each
Node to itself.
Bidirectional Associative Memory (BAM)
• The BAM mapping of an n dimensional input vector 𝑋𝑛 into the m
dimensional output vector 𝑌𝑚 .

A BAM network (Each node may also be connected to itself)


Bidirectional Associative Memory (BAM)
• How does the BAM work?

BAM operation: (a) forward direction; (b) backward direction


• The input vector 𝑿(𝒑) is applied to the transpose of weight matrix 𝑾𝑻 to
produce an output vector 𝒀(𝒑)
• Then, the output vector 𝒀(𝒑) is applied to the weight matrix 𝑾 to produce a new input vector 𝑿(𝒑 + 𝟏)
This process is repeated until input and output vectors become unchanged (reach stable state)
Bidirectional Associative Memory (BAM)
Basic idea behind the BAM
• Store pattern pairs so that when n-dimensional vector X from set A
is presented as input, the BAM recalls m-dimensional vector Y
from set B, but when Y is presented as input, the BAM recalls X.
Bidirectional Associative Memory (BAM)
The BAM training algorithm
Step 1: Storage The BAM is required to store M pairs of patterns. For example, we
may wish to store four pairs:

In this case, the BAM input layer must have six neurons and the output layer three neurons.
Bidirectional Associative Memory (BAM)

The weight matrix is determined as:


Number
of pairs
Bidirectional Associative Memory (BAM)
Step 2: Testing The BAM should be able to receive any vector from set A
and retrieve the associated vector from set B, and receive any vector from set
B and retrieve the associated vector from set A. Thus, first we need to
confirm that the BAM is able to recall 𝑌𝑚 when presented with 𝑋𝑚 . That is,

For instance
Bidirectional Associative Memory (BAM)
Then, we confirm that the BAM recalls 𝑋𝑚 when presented with 𝑌𝑚 . That is,

For instance
Bidirectional Associative Memory (BAM)
• Step 3: Retrieval: Present an unknown vector (probe) X to the BAM and
retrieve a stored association. The probe may present a corrupted or incomplete
version of a pattern from set A (or from set B) stored in the BAM. That is,

• Repeat the iteration until equilibrium, when input and output vectors remain unchanged
with further iterations. The input and output patterns will then represent an associated
pair.
The BAM is unconditionally stable (Kosko, 1992). This means that any set of
associations can be learned without risk of instability. This important quality
arises from the BAM using the transpose relationship between weight matrices in
forward and backward directions.
Let us now return to our example. Suppose we use vector X as a probe. It
represents a single error compared with the pattern 𝑋1 from set A:

This probe applied as the BAM input produces the output vector Y1 from set B.
The vector Y1 is then used as input to retrieve the vector X1 from set A. Thus,
the BAM is indeed capable of error correction.
Hopfield Networks
• Developed a model in the year 1982.
• Hopfield-promoted construction of the first analog VLSI network chip.
• Two types of network
– Discrete Hopfield network
– Continuous Hopfield network
• Discrete Hopfield network
– Hopfield network is an auto associative fully interconnected single-layer feedback network.
– Symmetrically weighted network.
– Operated in discrete fashion it is called as discrete Hopfield network and its architecture as a
single layer feedback is called as recurrent.
– Two inputs (i.e) binary and bipolar.
– Use of bipolar makes analysis easier.
– No self-connections Wij=Wji; Wii=0
– Only one unit updates its activation at a time.
– An input pattern is applied to the network & the network’s output is initialized accordingly.
– Initialized pattern removed and initialized output becomes new updated input through the
feedback connections.
– Process continues until no new, updated responses are produced and the network reaches its
equilibrium.
– The asynchronous updation is called as energy function of Lyapunov function for the net.
– Energy function proves that the net will converge to a stable set of activations.

39
Architecture of Discrete
Hopfield N e t
• Processing elements with two
outputs-inverting and non-inverting.
• Outputs from each processing
elements are fed back to the input of
other processing element and not to
itself.
• The connections are resistive.
• No negative resistors, so excitatory
connections use +ve inputs and
inhibitory use inverted inputs.
• Excitatory-output same as input
• Inhibitary-input different from output.
• Weight is positive if both units are
on.
• If connection strength negative , then
one of the unit is off.
• Weights are symmetric Wij=Wji

40
Training Algorithm of D i s c re t e
Hopfield N e t
• For storing a set of binary patterns the weight
matrix W is given as

• For storing bipolar patterns , the weight matrix


W is given as

• No self connection (i.e) Wij=0

41
Testing Algorithm
• Initial weights are obtained from training algorithm. Steps:
• 0: initialize weight to store pattern , weights obtained from training algorithm using
Hebb rule
• 1: when the activations are not converged, perform steps 2-8.
• 2: perform steps 3-7 for each input vector x
• 3: make initial activation equal to external input vector x: yi=xi(i=1 to n)
• 4: Perform steps 5-7 for each unit Yi.
• 5: Calculate the net input of the network
• 6: Apply the activation functions over the net input to calculate the output.

• 7: Feed back the obtained output to all other units.


• 8: Test the network for convergence.

42
• Only a single neural unit is allowed to update
its output.
• Next update is carried on randomly chosen
node, used already updated output.
• This is asynchronous stochastic recursion.
• If the input vector is unknown , the activation
vectors resulted will converge into a pattern
which is not in stored pattern, such a pattern is
called spurious stable state.
43
A n a l y s i s o f E n e r g y F u n c t i o n and S t o r a g e
Capacity on D i s c r e t e Hopfield N e t
• Energy function- function that is bounded and is non increasing
function of the state of the system
• Lyapunov function-determines the stability property .
• Energy function
• Network stable- energy function decreases,
whenever the state of any node changes.
• Assume – node i change its state
• A positive definite function Ef(y) can be found such that
– Ef(y) is continuous for all components yi for i=1 to n
– d Ef [y(t)]/dt<0, which indicates that the energy function is decreasing
with time.
– Storage capacity C 0.15 n
– n- number of neurons in the net
– C n/2log2n

44
Continuous Hopfield N e t w o r k
• Discrete can be converted to continuous , if time
is a continuous variable.
• Used in associative memory problems or
travelling salesman problem.
• Nodes have continuous graded output.
• Energy decreases continuously with time.
• Electronic circuit which uses non-linear
amplifiers and resistors.
• Used in building Hopfield with VLSI technology

45
Hardware Model
• The continuous network build up of
electrical components is shown in
figure.
• Model has n amplifiers, mapping its
input voltage ui into an output
voltage yi over an activation function
a(ui).
• Activation function used can be
sigmoid function.

• λ is gain parameter
• Continuous becomes discrete when λ
-> α
• Input capacitance ci and input
conductance gri
• External signal xi
• External signal supply constant
current

46
C o n t d.
. Apply Kirchoff’s current law.
• Total current entering a junction
is equal to that leaving the same
function.
• Equation from KCL describes the
time evolution of the system.

47
I t e r a t i v e Auto Associative N e t w o r k s

• Net does not respond to the input signal with


the stored target pattern.
• Respond like stored pattern.
• Use the first response as input to the net again.
• Iterative auto associative network recover
original stored vector when presented with test
vector close to it.
• Recurrent autoassociative networks.
48
L in e ar A u t o A ss ocia t iv e Memor y
(LAM)
• James Anderson, 1977.
• Based on Hebbian rule.
• Linear algebra is used for analyzing the
performance of the net.
• Stored vector is eigen vector.
• Eigen value –number of times the vector are
presented.
• When input vector is X, then output response is
XW, where W is the weight matrix.
• If “Ap” is the input pattern, then AiATj = 0 for i≠j.
49
Brain in t h e Box N e t w o r k
• An activity pattern inside the box receives
positive feedback on certain components,
which will force it outward.
• When it hit the walls, it moves to the corner of
the box where it remains such.
• Represents saturation limit of each state.
• Restricted between -1 and +1.
• Self connection exists.
50
Training Algorithm
• Steps :
• 0: Initialize weight . Initialize learning rates α and β.
• 1: Perform steps 2 to 6.
• 2: Initial activation is made equal to the input vector Xi.
Yi=xi
• 3: Perform steps 4 and 5.
• 4: Calculate the net input

• 5: Calculate the output by applying the activations.

• 6: Update the weights


• Wij(new)=wij(old)+βyiyj

51
AutoAssociator with T h r e s h o l d
Unit
• If threshold unit is set, then a threshold function is used
as activation function.
• Training algorithm
• Steps:
• 0: Weight initialization
• 1: Perform steps 2-5 for each testing vector
• 2: Set activations of X.
• 3: Steps 4 and 5.
• 4: Update activation

• 5: Test for stopping conditions.


52
T e m p o r a l Associative Memory
Network
• Storing sequence of patterns as dynamic
transitions.
• Temporal patterns and associative memory
with this capacity is temporal associative
memory network.
• The weight matrix is given as
• BAM for temporal patterns can be modified so that
both layers X and Y are described by identical
weight matrices W. Recalling is based on
• f(.) is activation function
53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy