0% found this document useful (0 votes)
6 views8 pages

Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories

This document discusses the extension of three-directional associative memories (TAM) in neural networks to explore learning protocols and introduces the concept of 'cooperativeness' among layers. It demonstrates that the interplay of dataset entropies across layers enhances retrieval capabilities, contrary to expectations that less informative datasets would hinder performance. The findings represent a significant advancement in understanding emergent computational capabilities in disordered systems, leveraging statistical mechanics tools.

Uploaded by

cuz8al0qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories

This document discusses the extension of three-directional associative memories (TAM) in neural networks to explore learning protocols and introduces the concept of 'cooperativeness' among layers. It demonstrates that the interplay of dataset entropies across layers enhances retrieval capabilities, contrary to expectations that less informative datasets would hinder performance. The findings represent a significant advancement in understanding emergent computational capabilities in disordered systems, leveraging statistical mechanics tools.

Uploaded by

cuz8al0qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Beyond Disorder: Unveiling Cooperativeness in

Multidirectional Associative Memories


arXiv:2503.04454v1 [cond-mat.dis-nn] 6 Mar 2025

Andrea Alessandrelli,c,d Adriano Barra,b,e Andrea Ladiana,a Andrea Lepre,a Federico


Ricci-Tersenghi.g,h,i
a
Dipartimento di Matematica e Fisica, Università del Salento, Lecce, Italy.
b
Istituto Nazionale d’Alta Matematica, GNFM, Roma, Italy.
c
Dipartimento di Informatica, Università di Pisa, Pisa Italy.
d
Istituto Nazionale di Fisica Nucleare, Sezione di Lecce, Italy.
e
Dipartimento di Scienze di Base Applicate all’Ingegneria, Sapienza Università di Roma, Rome, Italy.
g
Dipartimento di Fisica, Sapienza Università di Roma, Roma, Italy.
h
CNR-Nanotec, Rome unit, 00185 Roma, Italy.
i
Istituto Nazionale di Fisica Nucleare, Sezione di Roma1, Italy

Abstract: By leveraging tools from the statistical mechanics of complex systems, in these short
notes we extend the architecture of a neural network for hetero-associative memory called three-
directional associative memories (TAM) to explore supervised and unsupervised learning protocols. In
particular, by providing entropic-heterogeneous datasets to its various layers, we predict and quantify
a new emergent phenomenon -that we term layer’s “cooperativeness”- where the interplay of dataset
entropies across network’s layers enhances their retrieval capabilities beyond those they would have
without reciprocal influence. Naively we would expect layers trained with less informative datasets
to develop smaller retrieval regions compared to those pertaining to layers that experienced more
information: this does not happen and all the retrieval regions settle to the same amplitude, allowing
for optimal retrieval performance globally. This cooperative dynamics marks a significant advancement
in understanding emergent computational capabilities within disordered systems.
Contents
1 Introduction 1

2 The Supervised and Unsupervised Hebbian protocols 1

3 Conclusion 6

1 Introduction
John Hopfield’s legacy, recently recognized with the Nobel Prize in Physics, continues to inspire the
study of associative memories. His groundbreaking work established the foundation for understand-
ing how networks of simple units can give rise to complex, emergent behaviors. Thanks to modern
variations of Hopfield’s networks, they are experiencing a resurgence of interest [1–8].
Building on the layered associative Hebbian network architecture introduced for pattern recognition
and disentanglement tasks [9, 10], this paper extends the exploration to new learning paradigms, push-
ing the boundaries of what these associative networks may accomplish.
While earlier work demonstrated how such networks could autonomously extract fundamental compo-
nents from composite inputs—like identifying individual notes from musical chords—we delve deeper
into the dynamics that govern these emergent capabilities. This dual framework enables a compre-
hensive examination of how varying levels of data structure and supervision influence the network’s
performance, especially under noisy or corrupted conditions [11, 12].
The most striking finding of our study is the emergence of a phenomenon we term “cooperativeness”.
A detailed examination of the network phase diagrams, parameterized by dataset entropy values across
distinct layers, reveals that the retrieval performance of each layer is not merely a reflection of its cor-
responding dataset’s quality. Instead, it is governed by the collective interplay of datasets’ entropy
distributions across all layers. Interestingly, heterogeneous entropy levels diminish the retrieval per-
formance of layers trained over more high quality datasets while enhancing the performance of those
associated with noisier ones. This dynamics results in a balanced retrieval capacity across the network,
a synergistic interaction absent in classical associative networks where layers operate independently.
Notably, our approach leverages tools from the statistical mechanics of complex systems [13–18], al-
lowing us to rigorously predict and quantify this cooperative phenomenon through mathematically
robust frameworks. This cooperative dynamics represents a significant advancement in understanding
emergent intelligence in disordered systems.

2 The Supervised and Unsupervised Hebbian protocols


We consider a neural network composed of three different families of binary neurons hereafter indicated
by σ ≡ {σiA }A=1,2,3
i=1,...,NA which interact in pairs via generalized Hebbian couplings (vide infra) whose
A=1,2,3
goal lies in reconstructing the information encoded in a triplet of K binary archetypes {ξµA }µ=1,...,K
respectively of length N1 , N2 and N3 . However, such archetypes are not provided directly to the
network, hence the latter has to infer them by experiencing solely their noisy or corrupted versions. In

–1–
Figure 1: Schematic representation of the neural network described in (2.3) for the case
(N1 , N2 , N3 ) = (4, 3, 2).

particular, we assume that for each triplet of archetypes (µ, A), M examples ηµa,A , with a = 1, . . . , M
, are available, which are corrupted versions of the archetypes, such that for each A = 1, 2, 3 and
i = 1, . . . , NA we have

a,A A 1 + rA a,A A 1 − rA a,A A


P(ηi,µ |ξi,µ ) = δ(ηi,µ − ξi,µ )+ δ(ηi,µ + ξi,µ ) (2.1)
2 2
where rA ∈ [0, 1] rules the quality of the dataset, i.e. for rA = 1 the example matches perfectly the
archetype, while for rA = 0 it is totally random. To quantify the information content of the dataset
it is useful to introduce the variables
2 2 2
1 − rA 1 − rA rB
ρA = 2 , ρAB = 2 2 with A, B ∈ {1, 2, 3} (2.2)
M rA M rA rB

that we shall refer to as the dataset entropies as deepened in [19, 20]. We observe that both ρA and
ρAB approaches zero either when the examples perfectly match the archetypes (i.e., rA , rB → 1), when
the number of examples becomes infinite (i.e., M → ∞), or under both conditions simultaneously.
The information related to the archetypes is encoded in the synaptic matrix, as outlined by the
following cost function (or Hamiltonian):

3 NXA ,NB
g 1 X AB A B
HN (σ|J ) = − gAB Jij σi σj , (2.3)
2 i,j=1
A̸=B

where g ∈ R3×3 represents the strength of the interactions between different layers. The network
architecture is sketched in Fig. 1. In order to let the network deal with examples rather than patterns,
in this paper we inspect the following two variations of the above coupling matrix:
K M
1 X 1 X a,A a,B
(J (unsup) )AB
ij = p ηi,µ ηj,µ , (2.4)
rA rB NA NB (1 + ρA )(1 + ρB ) µ=1 M a=1
r K M
! M
!
(1 + ρA )(1 + ρB ) X 1 X a,A 1 X b,B
(J (sup) )AB
ij = η ηj,µ . (2.5)
NA NB µ=1
M a=1 i,µ M
b=1

In the first case, there is no external teacher who knows the labels and can organize the examples
based on archetypes, as occurs in the second scenario. This distinction is why the two formulations
are associated with unsupervised and supervised learning protocols, respectively [11, 12, 19, 20].

–2–
Figure 2: Phase diagrams of the TAM network in the supervised setting in the noise versus storage
plane at α = θ = 1. The analysis includes different inter-layer interaction strengths and various values
of the dataset entropy ρ, as indicated in the legend. Each solid line depicts the phase transition
for the whole network splitting the working region (bottom left) -where archetypes are learned and
thus their retrieval and generalization allowed- from the blackout region (up right) -where spin glass
effects prevail- for a specific value of dataset entropy i.e., ρ1 = ρ2 = ρ3 = ρ. The retrieval region is
determined by the conditions |m1ξ1 |, |m2ξ1 |, |m3ξ1 | > 0: these inequalities are all satisfied simultaneously
1 1 1
in the region below the solid line, while above it, all magnetizations vanish. The influence of ρ is
clearly visible: as ρ increases, the retrieval region progressively shrinks in all diagrams. For ρ = 0, we
recover the results of the standard Kosko’s BAM case [21] (first panel) and the novel ones pertaining
to the TAM [9] (second and third panels). In the insets of each plots: MC simulation at zero-fast noise
(β −1 = 0) with a symmetric network (N1 = N2 = N3 = 1000), showing the evolution of the Mattis
magnetizations mξ1 across the layers as a function of network load (γ) for different ρ. The simulations
agree with theoretical predictions, correctly depicting the maximum load beyond which the network
stops functioning.

As calculations will be performed in the thermodynamic limit, where N1 , N2 , N3 → ∞, it is important


to highlight that the sizes of the three layers—and consequently the lengths of the corresponding
examples—as well as the number of samples in each dataset, can differ from one another, meaning
N1 ̸= N2 ̸= N3 and M1 ̸= M2 ̸= M3 . Furthermore, despite these differences, the number of archetypes
remains constant across all layers, denoted by K for each. Moreover, in order to ensure a meaningful
asymptotic (thermodynamic) behavior, the ratio between the number of patterns and their respective
lengths must remain finite. To achieve this, we impose the following conditions on K, N1 , N2 and N3 :
r r
N1 N1 K
lim = α, lim = θ, lim =γ (2.6)
N1 ,N3 →∞ N3 N1 ,N2 →∞ N2 N1 ,K→∞ N1

where α, θ, γ ∈ R+ are control parameters. The parameter γ characterizes the storage capacity of the
network, and our focus will be on the high-storage regime, where γ > 0.
Pivotal for a statistical mechanical analysis is the study of the quenched free energy in the thermody-
namic limit, defined as
 
1 X
Agα,θ,γ (β) = lim E ln  exp (−βHNg
(σ|J )) , (2.7)
N1 ,N2 ,N3 →∞ L
1 2 3
{σ ,σ ,σ }
 
1 √ 1 √ 1 √ 1
where E averages over the J distributions, L = 3 N1 N2
+ N1 N3
+ N2 N3
and β ∈ R+ tunes the
+
fast noise in the network such that for β → 0 network’s dynamics is a pure random walk in the

–3–
neural configuration space (and any configuration is equally likely to occur), while for β → +∞ its
dynamics steepest descends toward the minima of the Hamiltonian (2.3).
More precisely, our aim is to find an expression of Agα,θ,γ (β) in terms of a suitable set of macroscopic
observables able to capture the global behavior of the system: these order parameters are the K
archetype (ground truth) Mattis magnetizations that assess the quality of network’s retrieval, defined
as
NA
1 X
mAA = ξA σA , (2.8)
ξµ
NA i=1 (i,µ) i

such that mA µ
ξ A = 1 accounts for a perfect retrieval of the archetype ξ by layer A, its lacking being
µ

accounted by mA A = 0.
ξµ
The application of Guerra’s interpolation method [22] allows us to derive an explicit expression for the
quenched free energy in the thermodynamic limit, in terms of the control parameters (β, γ, α, θ and
g) and the order ones, under the assumption of replica symmetry.y. This assumption implies that, in
the thermodynamic limit, the observables defined in (2.8) exhibit negligible fluctuations around their
means. Once the quenched free energy is expressed in terms of the control and order parameters, we
can proceed to extremize it with respect to the order parameters. This process results in a set of
self-consistent equations, whose solutions describe the behavior of the order parameters as functions
of the control ones. By analyzing these solutions, we can construct the phase diagram, identifying
regions in the control parameter space where the network successfully learns the archetypes from the
examples, retrieves them and it is thus capable of generalization.
Focusing specifically on the retrieval of the pattern triplet labeled by µ = 1 –without loss of generality–
we can extract an explicit expression for the self-consistency equations governing the order parameters,
under the large dataset limit assumption (i.e. M ≫ 1) which allow us to construct the model self
diagrams both in the supervised and unsupervised scenarios.
Focusing on the supervised protocol (i.e. assuming the network has the coupling (2.5)), we first
investigate the case of datasets sharing the same entropy (i.e., ρ1 = ρ2 = ρ3 = ρ over all the layers):
results are summarized by the phase diagrams in the inter-layer activation strength g vs noise presented
in Fig. 21 . As shown in Fig. 2, increasing ρ leads to a systematic reduction of the retrieval region—i.e.,
the domain in which the network can successfully reconstruct patterns from examples. As expected,
for ρ = 0 results collapse to those of the standard Hebbian-like TAM scenario [9]. The main reward
of this analysis is the determination of the thresholds for learning, namely the minimal values of ρ
required to sustain a non-zero retrieval region, as it allows us to predict, a priori, through (2.2), the
relationship between dataset quality (rA ) and dataset size (M ) providing crucial insights for optimizing
learning processes.
Then, by keeping the network symmetric in terms of both sizes and activation strengths (i.e., α =
θ, g12 = g13 = g23 = 1), we deepened the analysis for datasets characterized by different entropies: re-
sults of this investigation are shown in Fig. 3. A detailed examination of the resulting phase diagrams,
parameterized by the entropy values across the distinct layers, reveals an intriguing phenomenon: the
retrieval region pertaining to each layer does not merely reflect the entropy of its corresponding dataset
but is instead governed by the collective interplay of entropy distributions across all the layers. Indeed,
when the network has to handle entropic-heterogeneous datasets, a redistribution effect spontaneously
appears: the retrieval region pertaining to the layer associated with the most informative dataset
1 Forthe unsupervised counterpart, outcomes are qualitatively similar, the only difference being in the definition of
the dataset entropy (where ρA is replaced by ρAB ).

–4–
Figure 3: Phase diagrams in the symmetric case (α = θ = 1, (g12 , g13 , g23 ) = (1, 1, 1)) highlighting the
cooperative behavior among layers. Retrieval regions are shown for layers σ 1 , σ 2 , and σ 3 under different
dataset’s entropy values: (top) zero-entropy datasets (ρ1 = ρ2 = ρ3 = 0); (middle) heterogeneous-
entropy datasets (ρ2 = ρ3 = 0.2, ρ1 = 0); (bottom) homogeneous-entropy datasets (ρ1 = ρ2 = ρ3 =
0.2). The comparison highlights that the amplitude of the retrieval regions depends on the interplay
between dataset entropies across layers: in the middle row, the first layer’s retrieval region shrinks
-orange curve- despite the noiseless inputs would allow to reach the dashed blue line, allowing the
noisy layers to expand theirs (from dashed green to orange boundary), an effect impossible without
inter-layer interactions.

shrinks, while those of the layers at work with messy datasets enlarge, benefiting from their mutual
interaction. This results in a more balanced retrieval performance across the network’s layers. This
emergent effect highlights their intrinsic cooperative nature, wherein the presence of a low entropy
training set can enhance the performance of noisier ones, fostering a form of mutual reinforcement.
We propose the term “cooperativeness” to describe this emergent property of the network, a feature
that inherently arises from the reciprocal influence among the layers. This cooperative behavior is not
merely a byproduct of parameter tuning but an intrinsic characteristic of the multipartite network
structure, which only becomes evident through a comprehensive analytical treatment of the system’s
self-consistency equations and whose presence can be crucial when dealing with dirty or small datasets.

–5–
3 Conclusion
Our work focuses on Hebbian information processing by a hetero-associative model able to cope with
three sources of information simultaneously (i.e. the TAM network). In our setting, rather than
directly providing the network with the original archetypes (i.e. the patterns), we expose it to exam-
ples—corrupted versions of them, thereby assessing its ability to learn and generalize from incomplete
or noisy data.
Through a statistical mechanics analysis, we obtained the phase diagrams of the network: the latter
highlights how the amplitude of the retrieval region is affected by the entropy of the experienced
datasets. Our results emphasize that successful pattern retrieval depends critically on both the qual-
ity and the quantity of examples provided, much like how human learning benefits from both clear
instruction and repeated exposure.
The most noteworthy finding of this study is the cooperative behavior emerging among the layers of the
network. A detailed examination of the phase diagrams, further corroborated by extensive numerical
simulations, has revealed that layers associated with higher-informative datasets actively assist those
provided with lower-informative ones, enhancing the amplitude of their retrieval regions by sacrificing
their own. This effect arises because, the lower the entropy of a dataset, the larger the retrieval region
of the relative layer thus the stronger layer, benefiting from a higher quality dataset, can partially
reduce its own retrieval region for the overall advantage of the system. This trade-off results in an
optimal redistribution of learning and retrieval capacity across the network, fostering a form of mutual
reinforcement that is absent in classical associative memory models.
This phenomenon is particularly striking because it has no direct counterpart in the existing liter-
ature: our findings suggest that cooperativity may play a crucial and previously overlooked role in
multidirectional associative models, potentially offering new insights into both artificial and biological
memory systems and their applications.

References
[1] D. Krotov, A new frontier for Hopfield networks, Nature Reviews Physics 5.7, 366–367, (2023).
[2] D. Krotov, J. Hopfield, Dense associative memory for pattern recognition, Adv. Neur. Inf. Proc. Sys. 29,
(2016).
[3] D. Krotov, J. Hopfield, Large associative memory problem in neurobiology and machine learning,
arXiv:2008.06996, (2020).
[4] H. Ramsauer, et al, Hopfield networks is all you need, arXiv:2008.02217, (2020).
[5] A. Barra, G. Catania, A. Decelle, B. Seoane, Thermodynamics of bidirectional associative memories, J.
Phys. A: Math. Theor. 56.(20), 205005, (2023).
[6] M. Demircigil, et al., On a model of associative memory with huge storage capacity, J. Stat. Phys. 168,
288, (2017).
[7] , C. Lucibello, M. Mezard, Exponential capacity of dense associative memories, Phys. Rev. Lett. 132.7,
077301, (2024).
[8] E. Agliari, et al., Machine learning and statistical physics: theory, inspiration, application, J. Phys. A:
special issue, (2020).
[9] E. Agliari, et al., Generalized hetero-associative neural networks, J. Stat. Mech. 2025.1, 013302, (2025).
[10] E. Agliari, et al., Networks of neural networks: more is different arxiv:2501.16789, (2020).

–6–
[11] E. Agliari, et al., Dense Hebbian neural networks: a replica symmetric picture of supervised learning,
Physica A 626, 129076, (2023).
[12] E. Agliari, et al., Dense Hebbian neural networks: A replica symmetric picture of unsupervised learning,
Physica A 627,129143, (2023).
[13] M. Mezard, G. Parisi, M.A. Virasoro, Spin glass theory and beyond: An Introduction to the Replica
Method and Its Applications, World Scientific Publishing Company, (1987).
[14] M. Talagrand, Spin glasses: a challenge for mathematicians, Springer Press, (2003).
[15] E. Agliari, et al., Neural networks with a redundant representation: detecting the undetectable, Phys.
Rev. Lett. 124.2, 028301, (2020).
[16] S. Franz, et al., Exact solutions for diluted spin glasses and optimization problems, Phys. Rev. Lett.
87.12, 127209, (2001).
[17] E. Agliari, et al., A transport equation approach for deep neural networks with quenched random weights,
J. Phys. A: Math. Theor. 54.(50), 505004, (2021).
[18] L. Albanese, et al., Replica symmetry breaking in dense Hebbian neural networks, J. Stat. Phys. 189.2,
24, (2022).
[19] F. Alemanno, et al., Supervised Hebbian learning, Europhys. Letts. 141.(1), 11001, (2023).
[20] E. Agliari, et al., Hebbian dreaming for small datasets, Neural Networks 173, 106174, (2024).
[21] B. Kosko, Bidirectional associative memories, IEEE Trans. Sys. Man. Cyb. 18.(1), 49, (1988).
[22] F. Guerra, Sum rules for the free energy in the mean field spin glass model, Fields Inst. Comm. 30.11,
(2001).

–7–

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy