Art PDF
Art PDF
Neural associative memories (NAM) are neural network models consisting of neuron-
like and synapse-like elements. At any given point in time the state of the neural network
is given by the vector of neural activities, it is called the activity pattern. Neurons update
their activity values based on the inputs they receive (over the synapses). In the simplest
neural network models the input-output function of a neuron is the identitiy function or a
threshold operation. Note that in the latter case the neural activity state is binary: active or
inactive. Information to be processed by the neural network is represented by activity
patterns (for instance, the representation of a tree can an activity pattern where active
neurons display a tree's picture). Thus, activity patterns are the representations of the
elements processed in the network. A representation is called sparse if the ratio between
active and inactive neurons is small.
The synapses in a neural network are the links between neurons or between neurons
and fibers carrying external input. A synapse transmits presynaptic activity to the
postsynaptic site. In the simplest neural network models each synapse has a weight value
and the postsynaptic signal is simply the product of presynaptic activity and weight. The
input of a neuron is then the sum of the postsynaptic signals of all synapses connecting
the neuron. Thus, information is processed in a neural network by activity spread. How
activity spreads, and by this, which algorithm is implemented in the network depends on
how the synaptic structure, the matrix of synaptic weights in the network is shaped by
learning. In neural associative memories the learning provides the storage of a (large) set
of activity patterns during learning, the memory patterns.
Different memory functions are defined by the way how learned patterns can be
selectively accessed by an input pattern. The function of pattern recognition means to
classify input patterns in two classes, familiar patterns and the rest. Pattern association
describes the function to associate certain input patterns with certain memory patterns,
i.e., each stored memory consists of a pair of input and desired output pattern.
The advantage of neural associative memories over other pattern storage algorithms
like lookup tables of hash codes is that the memory access can be fault tolerant with
respect to variation of the input pattern. For pattern association this means that an output
pattern can be produced for a set of input patterns that are (with respect to a metric like
the Hamming distance) closest to the input pattern presented during learning. Fault
tolerance is key in the function of content-addressed pattern retrieval or pattern
completion. Here the pattern pairs used during learning have to consist of two identical
patterns which one also calls autoassociative learning. In this case if the input patterns are
noisy versions of the learned patterns, the pattern association can be seen as pattern
completion, i.e., the noisy version is replaced by the noiseless version of the pattern.
Associative Memories
Description
A content-addressable memory is a type of memory that allows for the recall of data
based on the degree of similarity between the input pattern and the patterns stored
in memory. It refers to a memory organization in which the memory is accessed by
its content as opposed to an explicit address like in the traditional computer memory
system. Therefore, this type of memory allows the recall of information based on
partial knowledge of its contents.
Suppose we are given a memory of names of several people as shown in the figure
below. If the given memory is content-addressable, using the erroneous string
"Crhistpher Columbos" as key is sufficient to retrieve the correct name "Christopher
Colombus." In this sense, this type of memory is robust and fault-tolerant, as this
type of memory exhibits some form of error-correction capability.
A content-addressable memory in action
Artificial neural networks can be used as associative memories. One of the simplest
artificial neural associative memory is the linear associator. The Hopfield model and
bidirectional associative memory (BAM) models are some of the other popular
artificial neural network models used as associative memories.
Associative Memories
Linear Associator
The linear associator is one of the simplest and first studied associative memory
model. Below is the network architecture of the linear associator.
Linear Associator
(wij)k = (xi)k(yj)k
where (xi)k represents the ith component of pattern Xk, (yj)k represents the jth
component of pattern Yk for i = 1, 2, ..., m and j = 1, 2, ..., n. Constructing the
connection weight matrix W is then accomplished by summing up the individual
correlation matrices, i.e.,
After encoding or memorization, the network can be used for retrieval. The process
of retrieving a stored pattern given an input pattern is called decoding. Given a
stimulus input pattern X, decoding or recollection is accomplished by computing the
net input to the output units using:
where inputj stands for the weighted sum of the input or activation value of node j
for j = 1, 2, ..., n. Then determine the output of those units using the bipolar output
function:
where qj is the threshold value of output neuron j. From the foregoing discussions, it
can be seen that the output units behave like the linear threshold units (McCulloch
and Pitts, 1943) and the perceptrons(Rosenblatt, 1958) that compute a weighted
sum of the input and produces a -1 or +1 depending whether the weighted sum is
below or above a certain threshold value.
However, the input pattern may contain errors and noise, or may be an incomplete
version of some previously encoded pattern. Nevertheless, when presented with
such a corrupted input pattern, the network will retrieve the stored pattern that is
closest to actual input pattern. Therefore, the linear associator (and associative
memories in general) is robust and fault tolerant, i.e., the presence of noise or errors
results only in a mere decrease rather than total degradation in the performance of
the network. Associative memories being robust and fault tolerant are the
byproducts of having a number of processing elements performing highly parallel
and distributed computations.
It is known that using Hebb's learning rule in building the connection weight matrix
of an associative memory yields a significantly low memory capacity. Due to the
limitation brought about by using Hebb's learning rule, several modifications and
variants have been proposed to maximize the memory capacity. Some examples
can be found in (Hassoun, 1993; Hertz, 1991; Patterson, 1996; Ritter, 1992).
Perfect retrieval is possible if the input patterns are mutually orthogonal, i.e.,
for a = 1, 2, ..., k and b = 1, 2, ..., k. If the stored input patterns are not mutually
orthogonal, non-perfect retrieval can happen due crosstalk among the patterns.
However, even if the input patterns are not mutually orthogonal, accurate retrieval
can still be realized if the crosstalk term is small. The degree of correlation among
the input patterns sets the limit on the memory capacity and content-addressability
of an associative memory.
Another implementation of associative memory that differ significantly from the one
discussed so far is the recurrent type networks where the output of the units are fed
back to the input units. Recurrent type networks produce their output through
recursive computations until they become stable. The popular Hopfield Model is of
this type of network.
Hopfield Model
The Hopfield model was proposed by John Hopfield of the California Institute of
Technology during the early 1980s. The publication of his work in 1982 significantly
contributed to the renewed interest in research in artificial neural networks. He
showed how an ensemble of simple processing units can have fairly complex
collective computational abilities and behavior.
The dynamics of the Hopfield model is different from that of the linear associator
model in that it computes its output recursively in time until the system becomes
stable. Below is a Hopfield model with six units, where each node is connected to
every other node in the network.
Hopfield model
Unlike the linear associator model which consists of two layers of processing units,
one serving as the input layer while the other as the output layer, the Hopfield model
consists of a single layer of processing elements where each unit is connected to
every other unit in the network other than itself. The connection weight matrix W of
this type of network is square and symmetric, i.e., wij = wji for i, j = 1, 2, ..., m.
Each unit has an extra external input Ii. This extra input leads to a modification in
the computation of the net input to the units:
for j = 1, 2, ..., m.
Unlike the linear associator, the units in the Hopfield model act as both input and
output units. But just like the linear associator, a single associated pattern pair is
stored by computing the weight matrix as follows:
Wk = XkTYk
where Yk = Xk
After encoding, the network can be used for decoding. Decoding in the Hopfield
model is achieved by a collective and recursive relaxation search for a stored pattern
given an initial stimulus pattern. Given an input pattern X, decoding is accomplished
by computing the net input to the units and determining the output of those units
using the output function to produce the pattern X'. The pattern X' is then fed back
to the units as an input pattern to produce the pattern X''. The pattern X'' is again
fed back to the units to produce the pattern X'''. The process is repeated until the
network stabilizes on a stored pattern where further computations do not change the
output of the units.
If the input pattern X is an incomplete pattern or if it contains some distortions, the
stored pattern to which the network stabilizes is typically one that is most similar to
X without the distortions. This feature is called pattern completion and is very useful
in many image processing applications.
During decoding, there are several schemes that can be used to update the output of
the units. The updating schemes are synchronous (or parallel as termed in some
literatures), asynchronous (or sequential), or a combination of the two (hybrid).
Using the synchronous updating scheme, the output of the units are updated as a
group prior to feeding the output back to the network. On the other hand, using the
asynchronous updating scheme, the output of the units are updated in some order
(e.g. random or sequential) and the output are then fed back to the network after
each unit update. Using the hybrid synchronous-asynchronous updating scheme,
subgroups of units are updated synchronously while units in each subgroup updated
asynchronously. The choice of the updating scheme has an effect on the
convergence of the network.
Hopfield (1982) demonstrated that the maximum number of patterns that can be
stored in the Hopfield model of m nodes before the error in the retrieved pattern
becomes severe is around 0.15m. The memory capacity of the Hopfield model can
be increased as shown by Andrecut (1972).
In spite of this seemingly limited memory capacity of the Hopfield model, several
applications can be listed (Chaudhuri, Dai, deMenezes, Garcia, Poli, Soper, Suh,
Tsai). Discussions on the application of the Hopfield model to combinatorial
optimization problems can be found in (Siu, Suh).
Various widely available text and technical papers vy Hassoun, Hertz (et al),
Hopfield, McEliece, Ritter, and Volk can be consulted for a mathematical discussion
on the memory capacity of the Hopfield model.
In the discrete Hopfield model, the units use a slightly modified bipolar output
function where the states of the units, i.e., the output of the units remain the same if
the current state is equal to some threshold value:
The energy of the discrete Hopfield model is bounded from below by:
for all Xk, k = 1, 2, ..., p. Since the energy is bounded from below, the network will
eventually converge to a local minimum corresponding to a stored pattern.
The continuous Hopfield model is just a generalization of the discrete case. Here,
the units use a continuous output function such as the sigmoid or hyperbolic tangent
function. In the continuous Hopfield model, each unit has an associated capacitor Ci
and resistance ri that model the capacitance and resistance of real neuron's cell
membrane, respectively. Thus the state equation of each unit is now:
where
Just like in the discrete case, there is an energy function characterizing the
continuous Hopfield model. The energy function due to Hopfield (1984) is given by: