0% found this document useful (0 votes)

19 views19 pages

Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality

Uploaded by

Sachin Barthwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views19 pages

Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality

Uploaded by

Sachin Barthwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Deep Learning 2.

0: Artificial Neurons That Matter - Reject

Correlation, Embrace Orthogonality

Taha Bouhsine
MLNomads
Agadir, Morocco
yat@mlnomads.com

Abstract complex, non-linear patterns in data [18, 28].

The incorporation of non-linear activation functions,
We introduce ⵟ-product-powered neural network, the such as the Rectified Linear Unit (ReLU) [20], was a
Neural Matter Network (NMN), a breakthrough in deep pivotal advancement, enabling neural networks to ap-
learning that achieves non-linear pattern recognition proximate a wide range of functions and thus address
without activation functions. Our key innovation relies the limitations inherent in linear models [10]. This
on the ⵟ-product and ⵟ-product, which naturally in- transformation has allowed for the development of deep
duces non-linearity by projecting inputs into a pseudo- learning architectures that can capture intricate re-
metric space, eliminating the need for traditional ac- lationships within large datasets. However, this gain
tivation functions while maintaining only a softmax in representational flexibility has introduced a signifi-
layer for final class probability distribution. This ap- cant trade-off: as models grow in complexity, the inter-
proach simplifies network architecture and provides un- pretability of their learned representations diminishes,
precedented transparency into the network’s decision- making it increasingly challenging to understand the
making process. Our comprehensive empirical evalua- underlying decision-making processes [15, 26].
tion across different datasets demonstrates that NMN A core challenge lies in the transformation of data
consistently outperforms traditional MLPs. The results space. The dot product in the perceptron layer, rooted
challenge the assumption that separate activation func- in Euclidean geometry, measures the similarity between
tions are necessary for effective deep-learning models. inputs and weight vectors. When followed by non-
The implications of this work extend beyond immediate linear activation functions, this transformation projects
architectural benefits: by eliminating intermediate ac- data into spaces that are no longer well-defined or ex-
tivation functions while preserving non-linear capabili- plainable, obscuring insights into the learned represen-
ties, ⵟ-MLP establishes a new paradigm for neural net- tations. As neural network models grow in complexity,
work design that combines simplicity with effectiveness. they become less interpretable, making it challenging
Most importantly, our approach provides unprecedented to fully comprehend how decisions are made, especially
insights into the traditionally opaque ”black-box” nature in high-stakes applications [5].
of neural networks, offering a clearer understanding of In this paper, we introduce the ⵟ-product as a novel
how these models process and classify information. solution to these challenges, aiming to bridge the gap
between model complexity and interpretability. The
ⵟ-product combines squared Euclidean distance with
1. Introduction orthogonality through the squared dot product. Origi-
nally proposed by Bouhsine et al. in the context of con-
The perceptron, introduced by Rosenblatt in 1958 [25], trastive learning [1], this operation works in a pseudo-
has played a fundamental role in the development of metric space [2] which retains non-linearity without re-
neural networks, serving as an essential building block lying on activation functions.
in artificial intelligence over the past six decades [11, Building on this innovation, we propose the Neural-
23]. As a linear model, the perceptron projects input Matter Network (NMN), a new network layer that
data onto an Euclidean space where the dot product leverages the ⵟ-product to create deep neural net-
of inputs and weights capture similarity. However, the works without activation functions. NMNs operate in
perceptron’s linear nature limits its ability to model a pseudo-metric space that accommodates intricate, in-

1
terdependent data relationships without distorting the 2. Theorical Foundation
geometric topology of the data. By eliminating the
need for activation functions, NMNs simplify network 2.1. Multi-Layer Perceptron: Theoretical Analysis
architecture, making it possible to interpret weight de- 2.1.1 Fundamental Components
pendencies directly without sacrificing the model’s ca-
pacity to learn complex patterns. Consider an input vector x ∈ Rn entering an MLP
layer. Each neuron is characterized by a weight vector
Our work also introduces the Neural-Matter State w ∈ Rn and a bias term b ∈ R. The transformation
(NMS) Plots, a new framework for visualizing and in- process consists of two primary operations:
terpreting weight distributions within NMNs. For the 1. Affine Transformation: The neuron computes a
first time, this approach attempts to provide a frame- weighted sum followed by a bias addition:
work to explore the ”black box” of neural networks, n
offering insights into the organization and significance z = wT x + b = (1)
X
w i xi + b
of learned weights. The NMS Plot reveals not only how i=1
individual weights contribute to model predictions but 2. Non-linear Activation: The affine result under-
also how weight clusters influence factors like overfit- goes non-linear transformation via function f : R →
ting and feature uniqueness. This advance holds sig- R:
nificant implications for enhancing the interpretability y = f (z) = f (wT x + b) (2)
and trustworthiness of AI systems as they are increas-
For a layer containing m neurons, we express the
ingly applied to critical decision-making tasks.
operation in matrix notation. Let W ∈ Rm×n denote
Additionally, we introduce the ⵟ-regularizer, which the weight matrix where each row corresponds to a
is inspired by the SimO loss proposed by Bouhsine et neuron’s weight vector, and let b ∈ Rm represent the
al. [1]. This regularizer addresses the issue of neu- bias vector. For a batch of k input samples X ∈ Rk×n ,
ral collapse, where neurons in the model become too the layer output Y ∈ Rk×m is computed as:
close in the neuron space. By promoting orthogonality
among neurons, the ⵟ-regularizer optimizes their spa- Y = f (XWT + b) (3)
tial to ensure that they are positioned further apart In traditional MLP architectures, vector similarity is
from each other and non-linearly dependent then each computed between the input vector x and each weight
other. vector wi using the dot product. For neuron i:
n
Our contributions are as follows:
zi = x · wi = (4)
X
xj wij
j=1

• The Neural-Matter Network (NMN), is an For illustration, consider x = [1, 2, 3] and wi =

activation-free network architecture using a [0.5, −1, 0.2]:
non-linear ⵟ-product to learn representations
without distorting data topology. zi = (1 × 0.5) + (2 × −1) + (3 × 0.2) = −0.9 (5)
• Introducing ⵟ-regularization for intra-orthogonality
vectors in the weight matrix. The activation function introduces non-linearity into
• Introduction of the Neural-Matter State (NMS) the network. Using ReLU as an example:
Plot, a framework for interpreting and visualizing
learned weights. ai = max(0, zi ) (6)
• Affero GNU Open-source implementation of the For instance, given z = [−0.9, 1.5, −0.3, 2.1], ReLU
Neural-Matter Layer and related experiments using yields:
Flax linen/Jax. a = [0, 1.5, 0, 2.1] (7)
The final layer typically employs softmax to gener-
ate probability distributions:
These contributions mark a significant advancement
in creating more interpretable and potentially efficient exp(xi )
softmax(xi ) = P (8)
deep learning models. By introducing ⵟ-product-based j exp(xj )
architectures, we pave the way for further exploration
For logits z = [2.0, 1.0, 0.1], softmax produces:
of activation-free designs and more explainable deep
learning frameworks. softmax(z) = [0.665, 0.245, 0.090] (9)

2
2.1.2 Architectural Limitations and commutativity, though lacking other conventional
algebraic properties such as associativity, distributiv-
Linear Similarity Constraints The dot product’s
ity, and the existence of an identity element. This
linearity in Euclidean space limits expressiveness.
unique structure enables the Bouhsine products to cap-
While ReLU introduces non-linearity, information loss
ture aspects of vector relationships that standard sim-
occurs through negative value elimination.
ilarity metrics like the dot product may overlook.
Activation Range Issues: Unbounded dot prod-
uct outputs (−∞ to +∞) can lead to neuron dom-
inance. For example, when a · w1 = 100 versus 2.2.2 Limitations of the Dot Product and Ad-
a · w2 = 0.1, ReLU preserves magnitude differences, vantages of the ⵟ-Product
potentially overshadowing subtle patterns.
Topological Structure Preservation: The inter- Traditional similarity measures, such as the dot prod-
leaving of linear and non-linear operations can obscure uct, often fall short in capturing comprehensive vector
topological relationships within the embedding space, relationships, especially when vectors share the same
complicating interpretability and structural informa- direction but vary in magnitude. This limitation be-
tion preservation. comes particularly apparent when interpreting neuron
Dropout serves as a primary mitigation strategy by weights geometrically, where each weight vector repre-
randomly zeroing activations: sents a specific reference in the embedding space.
Consider a set of neuron weight vectors (1, 1), (2, 2),
adropout = m a (10) (3, 3), (4, 4), (5, 5), (8, 8), and (9, 9), which are all par-
allel and thus point in the same direction. For a new
where m is a binary mask and denotes element-
point (6, 6), using cosine similarity (or dot product)
wise multiplication. For example:
would yield identical results for all these vectors since
aoriginal = [0, 1.5, 0, 122.1] (11) cosine similarity emphasizes direction while ignoring
magnitude. This outcome is misleading, as intuitively,
adropout = [0, 1.5, 0, 0] (12)
the point (6, 6) is closest to (5, 5) in both direction and
However, dropout remains a probabilistic solution magnitude. By disregarding magnitude, the dot prod-
that does not address fundamental dot product limita- uct and cosine similarity fail to differentiate the prox-
tions, especially during inference. imity between (6, 6) and the individual weight vectors.
These limitations motivate the exploration of alter- The ⵟ-product addresses this limitation by incorpo-
native similarity measures operating in non-euclidean rating both magnitude and distance information. Un-
spaces, capturing both similarity and orthogonality in like the dot product, which is primarily based on angu-
unified operations. lar similarity, the ⵟ-product is designed to account for
both directional alignment and the relative distances
2.2. Bouhsine’s Products (ⵟ-product and ⵟ-product) between vectors. When applying the ⵟ-product to com-
2.2.1 Definition pare (6, 6) with each neuron vector, it correctly iden-
tifies (5, 5) as the closest match, offering a nuanced
The Bouhsine products [1] introduce two distinct sim- understanding that aligns with the intuitive notion of
ilarity measures between vectors in Rn : the ⵟ-product similarity.
and the ⵟ-product, defined for two vectors e1 , e2 ∈ Rn Figure 1 visually demonstrates the differences be-
as follows: tween the dot product and the ⵟ-product. Plot (b)
ⵟ-product (yat) shows the dot product’s tendency to favor larger vec-
tor magnitudes. In contrast, the ⵟ-product plot (c) ef-
(e1 · e2 )2
e1 ⵟ e2 = (13) fectively differentiates between vectors based on both
||e2 − e1 ||2 magnitude and spatial proximity, underscoring its ad-
vantage in applications that require a holistic similarity
ⵟ-product (posi-yat)
measure.
||e2 − e1 ||2
e1 ⵟ e2 = (14)
(e1 · e2 )2 3. Methods
Here, · denotes the standard dot product, || · || rep- We propose a novel approach to the Multilayer Per-
resents the Euclidean norm. ceptron (MLP) layer operation, replacing the tradi-
The Bouhsine products are pseudo-metric (for ⵟ) tional dot product with a custom product we call the
and semi-metric (ⵟ) (Theorem 8.1), satisfying closure ⵟ-product.

3
ⵟ-score between it and the features vector it is trying
to attract.
We can express the operation in matrix form for a
layer with m neurons and a batch of k input samples.
Let W ∈ Rm×n be the weight matrix where each row
corresponds to a neuron’s weight vector, and X ∈ Rk×n
(a) (b) (c) be the input matrix. The layer’s output Y ∈ Rk×m is
computed as:
Figure 1. Comparison of similarity measurements for the
test point (6, 6) with neuron weight vectors using (a) the dot
Yij = ⵙ ∗ ⵟ(Wi , Xj ) + bi
product and (b) the ⵟ-product (Code 8.3). In (b), the dot
product scales with vector magnitude, often exaggerating n (Wi · Xj )2
similarity based on size alone. In (c), the ⵟ-product more Yij = ( )α ∗ + bi
log(1 + n) ||Xj − Wi ||2 +
accurately reflects the relative magnitude and distance, cor-
rectly identifying (5, 5) as the closest match to (6, 6). where W ∈ Rm×n is the weight matrix, X ∈ Rk×n
is the input matrix, and b ∈ Rm is the bias vector and
the ⵙ ∈ R is the scale.

3.2. Neural-Matter State (NMS) Plot

To assess neuron distribution within a layer, we use
t-SNE and PCA for dimensionality reduction on neu-
ron weight vectors. This visualization involves a scatter
plot with a corresponding 2D density plot and is accom-
panied by a similarity matrix based on prior work by
Bouhsine et al. [1]. Through these representations, we
(a) (b) analyze neuron alignment and identify potential over-
fitting, vulnerability to adversarial attacks, and neuron
Figure 2. Neuron Field Plot of a (a) single ⵟ-Neuron with- redundancy, which can inform decisions about neuron
out scale and (b) multiple ⵟ-Neurons impact on the space fusion for optimized performance.

3.3. ⵟ-Regularization
3.1. ⵟ-Neuron
We propose a ⵟ-regularizer based on the ⵟ-product
In our proposed neuron, we replace the traditional dot intra-similarity minimization used by Bouhsine et al for
product in the MLP layer with the ⵟ-product. For a Anchor-Free Contrastive Learning (AFCL) [1]. This
single ⵟ-neuron (Code 8.3) with weight vector w ∈ Rn regularizer minimizes ⵟ-similarity score between weight
and input vector x ∈ Rn , the output y is computed as: vectors, encouraging intra-orthogonality.

y = ⵙ ∗ ⵟ(w, x) + b 3.4. ⵟ-ViT

with the ⵙ ∈ R is the scale factor that is equal to: Our ⵟ-ViT retains the original design but uses ⵟ-
neurons instead, omitting any activation functions af-
n ter the layer.
ⵙ=( )α
log(1 + n)
The full output of the neuron is: 3.4.1 ⵟ-MHA

n (w · x)2 Additionally, we replace the standard scaling factor

y=( )α ∗ +b dk with a learnable parameter that resembles the ⵟ-
log(1 + n) ||x − w||2
neuron scaling factor.
where b ∈ R is the bias term and α ∈ R is a learnable
parameter for damening of the output and is a small Attention(Q, K, V ) = softmax(ⵙ · QK T )V (15)
positive constant added to ensure numerical stability. α
Figure 2 shows the impact of neurons in the embed- where ⵙ = log(1+n)
n
and α is a learnable parame-
dings space, instead of learning a boundary, the neu- ter. This adjustment enhances the flexibility of scaling
ron learns a position in the space that maximizes the within the attention mechanism.

4
3.4.2 Random Token Masking

Inspired by the Masked Autoencoder (MAE) approach

[8], we apply random token masking between encoder
blocks. Specifically, we randomly remove p% of input
tokens and replace them with a single mask token. This
strategy bolsters robustness and mitigates overfitting.
Let:
- X = [x1 , x2 , . . . , xn ] be the input sequence,
- p be the masking ratio, representing the probability (a) (b)
of masking each token.
1. Binary Mask Generation: Figure 3. Decision boundaries of one trained (a) ⵟ-Neuron
without scale (Code 8.4) vs (b) Traditional Neuron
For each token xi , generate a binary mask Mi such
that:
with probability p
(
1
Mi =
0 with probability 1 − p

2. Apply the Mask:

Define each masked token x0i as:

x0i = Mi · [MASK] + (1 − Mi ) · xi

giving the masked sequence X0 = [x01 , x02 , . . . , x0n ],

(a) (b)
where p × n tokens are approximately replaced by
[MASK]. Figure 4. T-SNE plot of MNIST with (a) true labels and
(b) predicted labels by manual fitting of 10 ⵟ-Neurons (large
4. Results circles) without scale and 73% accuracy

4.1. Experimental Setup

4.2. XOR Delima
Our experimental framework compares the ⵟ-neuron
and traditional neuron ensuring comparable parame- One significant challenge in the field is the XOR prob-
ter counts across all models using different image clas- lem, which typically requires nonlinear transformations
sification datasets: (1) ⵟ-neuron-based MLP architec- or multiple layers in traditional neural networks to
tures with Dropout; (2) a traditional MLP with ReLU achieve accurate results.
activation, Dropout, Layer Normalization; (3) ViT [6] As shown in Figure 3, a single ⵟ-neuron success-
Architecture using ⵟ-neuron and traditional neuron. fully resolves the XOR problem without the need for
The configuration for the ViT model [8, 27, 27] used additional layers or non-homomorphic activation func-
in this study consists of a total of six layers. Each layer tions, demonstrating its capacity to handle nonlinear
contains 128 hidden units. The model is designed with problems with minimal complexity.
512 parameters and utilizes two attention heads (1.2m
4.3. Do you even MNIST bro?
params).
The results for the MLP and ⵟ-MLP (NMN) models In Figure 4, We manually placed 10 ⵟ-neurons (large
uses the same patch-embedding scheme as the ViT, fol- circles) on the t-SNE plot of the MNIST dataset. We
lowed by global average pooling. We compare two mod- then used the maximum ⵟ-similarity score between the
els with the same number of neurons (0.2m params), neurons and the t-SNE point of each image to predict
but the ⵟ-neurons do not use any activation functions, the class. This approach achieved an accuracy of 73%.
while the traditional neurons have GeLU activation
function [9]. For data augmentation we use Rotation, 4.4. Vision Classification Task
Flip Left-Right, Up-Down, and color jitter. Table 1 presents the accuracy results across five bench-
To ensure the reproducibility of our results, all ex- mark image classification datasets, comparing architec-
periments are conducted on NVIDIA L4 GPUs using tures trained with traditional neurons and activation-
Jax/Flax [19] over GCP/Colab. free ⵟ-neurons in both MLP and Vision Transformer

5
Traditional Neuron with GeLU Activation Function
Model CIFAR10 [13] CIFAR100 [13] Caltech101 [14] Oxford Flowers [22] STL10 [3]
(in2 /# cls/steps) 322 /10/390 322 /100 962 /102/23 2242 /102/15 962 /10/78
MLP /4) ±25.31% /4) 10.09% /8) ±14.68% ±1.42% ±23.22 %
ViT-t /4) ±72.91% /4) ±36.93% /8) ±30.21% ±31.22% ±49.73%
Activation-Free ⵟ-Neuron
ⵟ-MLP /4) ±47.36% /4) ±25.14% /8) ±24.15% ±20.20 % ±42.25 %
ⵟ-ViT-t /4) ±74.22% /4)±40.75% /8) ±34.31% ±31.42% ±51.95%

Table 1. Comparison of test accuracy across multiple image classification datasets for architectures trained from scratch with
traditional neurons and activation-free ⵟ-neurons. Results highlight the improved or comparable performance of ⵟ-neuron
models, underscoring their effectiveness in both MLP and Vision Transformer (ViT) configurations

(ViT) configurations. Across all tasks, models incor- activation functions. The ⵟ-product introduces im-
porating ⵟ-neurons demonstrate improved or compa- plicit non-linearity, capturing complex data patterns in
rable accuracy over traditional neuron models trained a manner that preserves more information compared
for 200 epochs, underscoring the potential of ⵟ-neurons to standard activation-based approaches. This novel
to deliver higher performance with fewer architectural non-linear processing method represents a shift in how
complexities. neural networks interpret data, reducing common in-
For CIFAR-10, the traditional ViT-t architecture formation loss issues typically introduced by activation
achieves a test accuracy of 72.91%, while the ⵟ- functions ( Table 1).
ViT-t model outperforms it slightly at 74.22%, high- The ⵟ-neuron architecture enhances interpretabil-
lighting the effectiveness of ⵟ-neurons in enhancing ity by embedding non-linearity directly within the ⵟ-
generalization without additional activation functions. product, thus preserving essential geometric relation-
In CIFAR-100, ⵟ-neuron models similarly outperform ships. This design enables a more intuitive understand-
their traditional counterparts, with the ⵟ-ViT-t model ing of neuron interactions and allows for straightfor-
achieving 40.75% accuracy compared to 36.93% for the ward visual representations of their behavior.
traditional ViT-t. These consistent improvements re- Additionally, ⵟ-neurons circumvent several stabil-
flect the robustness of the ⵟ-neuron approach across ity challenges seen with traditional activation func-
datasets of varying complexity. tions. Standard non-linearities can lead to gradient
In MLP configurations, the advantage of ⵟ-neurons saturation and “dead” neurons [24], with saturating
is further pronounced. The ⵟ-MLP model achieves functions often resulting in vanishing gradients, while
47.36% accuracy on CIFAR-10 and 25.14% on CIFAR- non-saturating ones can cause exploding gradients or
100, compared to the traditional MLP’s 25.31% on neuron death, requiring added techniques like batch
CIFAR-10 and 10.09% on CIFAR-100. These results normalization and gradient clipping. The ⵟ-product
illustrate that ⵟ-neurons not only eliminate the need avoids these issues by maintaining non-saturating in-
for explicit activation functions but also improve over- ternal non-linearity, offering stable training dynamics,
all performance in simpler architectures, making them and reducing the dependency on additional regulariza-
an efficient alternative for network design in resource- tion techniques.
constrained settings.
5.1. Artificial Neurons that Matter
On other datasets like Caltech101 and STL10, ⵟ-
ViT-t models maintain their performance advantage, The ⵟ-product draws intriguing analogies with physical
with respective accuracies of 34.31% and 51.95%, out- laws, such as the inverse-square law, hinting at a poten-
performing the traditional ViT-t’s scores of 30.21% and tial new paradigm for understanding neural networks.
49.73%. The performance gains observed in ⵟ-neuron Here, neurons are not merely linear separators but op-
models across these datasets reinforce the generaliz- erate as geometric constructs within a non-linear man-
ability and scalability of ⵟ-neuron-based architectures. ifold. This interpretation reimagines neural network
layers as inherently physical, aligning closer to natural
5. Discussion and Analysis of ⵟ-Neuron principles. Such a view opens pathways to network ar-
Performance chitectures that are both adaptable and more naturally
aligned with fundamental scientific laws. Furthermore,
The results highlight the capability of ⵟ-neuron in including a learnable scale parameter mitigates the risk
achieving high accuracy without relying on traditional of weight explosion. Without this parameter, weights

6
grow excessively, leading to instability.

5.2. Effect of ⵟ-Regularization on Output Layer

Representation
Our examination of well-trained model weight matrices
shows a tendency toward orthogonality among neurons,
which supports improved generalization and robust-
ness. Non-orthogonality or linear dependencies in the (a) with ⵟ-regularization (b) w/o regularization
weight matrix often indicate suboptimal training. To
promote orthogonality, The ⵟ-regularizer encourages
orthogonality among weight vectors, preventing neu-
ron representations from collapsing into similar config-
urations. This effect is especially beneficial for distin-
guishing similar classes, such as ”cats” versus ”dogs” in
datasets like CIFAR-10. Figure 5 illustrates how the
ⵟ-regularizer prevents neuron collapse in the output
layer, as shown through NMS plots, providing distinct
and separated representations for each class compared
to training without regularization which leads to the (c) with ⵟ-regularization (d) w/o regularization
neuron misclassification of the classes dog vs cats as
the neuron weights of those two classes are highly sim-
ilar.

6. Limitations and Future Directions

While the ⵟ-neuron architecture introduces several ad-
vantages, certain limitations may affect its scalability
and compatibility across different applications.
The ⵟ-product does not satisfy standard associative
and distributive properties, fundamental to many ma-
trix operations in deep learning. This limitation could (e) with ⵟ-regularization (f) w/o regularization
restrict the ⵟ-neuron’s adaptability, especially in code
Figure 5. ⵟ-reguliarization prevent the neuron collapse be-
optimizations and architectures that depend on these tween Dog (idx 5) and Cat (idx 3) Neurons in the output
algebraic properties for performance and flexibility. layer as shown by the ⵟ-similarity matrix (a,b) and NMS
As for the computational efficiency, traditional neu- Plots (c,d,e,f)
rons compute a dot product followed by a ReLU activa-
tion, requiring about 2d + 1 FLOPs per neuron (where
d is the input dimension). In contrast, ⵟ-neurons use The absence of activation functions between lay-
the ⵟ-product, which includes a squared Euclidean dis- ers in the ⵟ-neuron architectures results in more
tance and magnitude normalization, totaling approxi- information-preserving representations, yet this can
mately 5d − 1 FLOPs per neuron. lead to a higher risk of overfitting.
To analyze the computational overhead, we calcu- Drawing an analogy to Hebbian learning, which
late the FLOP ratio between ⵟ-neurons and traditional states ”neurons that fire together, wire together,” [17]
neurons: we observe that overfitting in neurons can manifest as
5d − 1 neuron collapse. This collapse occurs when multiple
Efficiency Ratio = ≈ 2.5 neurons converge to encode identical information, oc-
2d + 1
cupying the same region in the neural space. Figure 6
This suggests that ⵟ-neurons require approximately 2.5 illustrates this phenomenon using Neural-Matter State
times the FLOPs of conventional neurons. However, (NMS) plots revealing a collapsed distribution indicat-
the ⵟ-neuron’s design eliminates the need for separate ing overfitting in the output layer.
activation functions, potentially streamlining the over- Future research should concentrate on comprehend-
all network structure and reducing the memory load in ing this new paradigm and developing architectures
deeper architectures. specifically optimized for this new neuron.

7
(a) Healthy (b) Healthy (c) Overfit (d) Overfit

Figure 6. Analysis of the neurons in the third layer of an MLP head trained on CIFAR-10, using a Neural-Matter State
(NMS) plot, reveals: (a, b) a well-fitting, healthy distribution of neuron specialization and (c, d) overfitting appears as a
collapsed distribution.

7. Related Works far beyond human cognition. These fundamental laws

- which shape galaxies and guide quantum particles -
7.1. Inverse-Square Laws represent a deeper form of intelligence that we have
The inverse-square law describes a principle where a largely ignored in our pursuit of AI.
specified physical quantity or intensity is inversely pro- In this paper, we broke free from biological con-
portional to the square of the distance from the source. straints by drawing direct inspiration from inverse-
This relationship is foundational in various fields, par- square law [12], Coulomb’s law [4], Newton’s first law
ticularly in physics, and underpins several fundamental of motion [21], Gauss’s Law [7], and Bouhsine’s Prod-
laws [12]. Newton’s Law of Universal Gravitation is one ucts [1]. By redefining neural fundamentals through
of the earliest applications of the inverse-square law. It the lens of physics and abstract topology, we demon-
states that the gravitational force between two masses strated that a single activation-free neuron could solve
is inversely proportional to the square of the distance the XOR problem - a task that traditionally required
between their centers [21]. Similarly, Coulomb’s Law multiple neurons with complex activation functions.
describes the electrostatic force between two charged Our extensive experiments show that networks built
particles, with a force that diminishes with the square with the ⵟ-neurons consistently outperform traditional
of the distance [4]. Gauss’s Law, though formulated architectures without using any non-linear activation
differently, implies an inverse-square relationship for functions, suggesting that we have only scratched the
electric and gravitational flux through a closed sur- surface of what’s possible when we look beyond biolog-
face, connecting field flux to the source charge or mass ical metaphors.
within that surface [7].
License
7.2. Multi Layer Perceptron Architectures
The source code, algorithms, and all contributions pre-
Recent research in computer vision explores Multilayer sented in this work are licensed under the GNU Affero
Perceptrons (MLPs) and Vision Transformers [6] as al- General Public License (AGPL) v3.0. This license en-
ternatives to conventional convolutional models, focus- sures that any use, modification, or distribution of the
ing on simplicity and computational efficiency. Mod- code and any adaptations or applications of the un-
els like MLP-Mixer [29] and gMLP [16] have demon- derlying models and methods must be made publicly
strated competitive results on benchmarks like Ima- available under the same license. This applies whether
geNet by using operations such as matrix multiplica- the work is used for personal, academic, or commercial
tion and spatial gating, effectively capturing spatial in- purposes, including services provided over a network.
formation without self-attention.
Acknowledgment
8. Conclusion
The Google Developer Expert program and Google
Perhaps artificial intelligence’s greatest limitation has AI/ML Developer Programs team supported this work
been our stubborn fixation on the human brain as the by providing Google Cloud Credit. I want to extend my
pinnacle of intelligence. The universe itself, governed gratitude to the staff at High Grounds Coffee Roast-
by elegant and powerful laws, demonstrates intelligence ers for their excellent coffee and peaceful atmosphere.

8
I would also like to thank Dr. Andrew Ng for creat- [13] Alex Krizhevsky, Geoffrey Hinton, et al. Learning mul-
ing the Deep Learning course that introduced me to tiple layers of features from tiny images. 2009. 6
this field, without his efforts to democratize access to [14] Fei-Fei Li, Marco Andreeto, Marc’Aurelio Ranzato,
knowledge, this work would not have been possible. and Pietro Perona. Caltech 101, 2022. 6
Additionally, I want to express my appreciation to all [15] Zachary C. Lipton. The mythos of model interpretabil-
the communities I have been part of, especially MLNo- ity: In machine learning, the concept of interpretabil-
mads, Google Developers, and MLCollective commu- ity is both important and slippery. Queue, 16(3):31–57,
2018. 1
nities.
[16] Hanxiao Liu, Zihang Dai, David R. So, and Quoc V.
References Le. Pay Attention to MLPs, 2021. arXiv:2105.08050
[cs]. 8
[1] Taha Bouhsine, Imad El Aaroussi, Atik Faysal, and [17] Siegrid Löwel and Wolf Singer. Selection of intrinsic
Wang. Simo loss: Anchor-free contrastive loss for fine- horizontal connections in the visual cortex by corre-
grained supervised contrastive learning. In Submitted lated neuronal activity. Science, 255(5041):209–212,
to The Thirteenth International Conference on Learn- 1992. 7
ing Representations, 2024. under review. 1, 2, 3, 4, [18] Warren S McCulloch and Walter Pitts. A logical cal-
8 culus of the ideas immanent in nervous activity. The
[2] Michael M Bronstein, Joan Bruna, Yann LeCun, bulletin of mathematical biophysics, 5:115–133, 1943.
Arthur Szlam, and Pierre Vandergheynst. Geomet- 1
ric deep learning: going beyond euclidean data. IEEE [19] Fraser Mince, Dzung Dinh, Jonas Kgomo, Neil
Signal Processing Magazine, 34(4):18–42, 2017. 1 Thompson, and Sara Hooker. The grand illusion: The
[3] Adam Coates, Andrew Ng, and Honglak Lee. An anal- myth of software portability and implications for ml
ysis of single-layer networks in unsupervised feature progress, 2023. 5
learning. In Proceedings of the fourteenth interna- [20] Vinod Nair and Geoffrey E. Hinton. Rectified linear
tional conference on artificial intelligence and statis- units improve restricted boltzmann machines. In In-
tics, pages 215–223. JMLR Workshop and Conference ternational Conference on Machine Learning, 2010. 1
Proceedings, 2011. 6 [21] Isaac Newton. Philosophiæ Naturalis Principia Math-
[4] Charles-Augustin de Coulomb. Premier mémoire sur ematica. S. Pepys, London, 1687. 8
l’électricité et le magnétisme. Histoire de l’Académie
[22] Maria-Elena Nilsback and Andrew Zisserman. Au-
Royale des Sciences, pages 1–31, 1785. in French. 8
tomated flower classification over a large number of
[5] Finale Doshi-Velez and Been Kim. Towards a rigor-
classes. In 2008 Sixth Indian conference on computer
ous science of interpretable machine learning. arXiv:
vision, graphics & image processing, pages 722–729.
Machine Learning, 2017. 1
IEEE, 2008. 6
[6] Alexey Dosovitskiy, Lucas Beyer, Alexander
[23] J Orbach. Principles of neurodynamics. perceptrons
Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas
and the theory of brain mechanisms. Archives of Gen-
Unterthiner, Mostafa Dehghani, Matthias Minderer,
eral Psychiatry, 7(3):218–219, 1962. 1
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and
Neil Houlsby. An image is worth 16x16 words: [24] Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua
Transformers for image recognition at scale, 2021. 5, Bengio, Aaron Courville, Doina Precup, and Guil-
8 laume Lajoie. Gradient starvation: A learning pro-
[7] Carl Friedrich Gauss. Allgemeine Lehrsätze in clivity in neural networks, 2021. 6
Beziehung auf die im verkehrten Verhältniss des [25] Frank Rosenblatt. The perceptron: a probabilistic
Quadrats der Entfernung wirkenden Anziehungs- und model for information storage and organization in the
Abstossungskräfte. Dietrich, Göttingen, 1835. 8 brain. Psychological review, 65(6):386, 1958. 1
[8] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, [26] Juergen Schmidhuber. Annotated history of modern ai
Piotr Dollár, and Ross Girshick. Masked autoencoders and deep learning. arXiv preprint arXiv:2212.11279,
are scalable vision learners, 2021. 5 2022. 1
[9] Dan Hendrycks and Kevin Gimpel. Gaussian error [27] Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai,
linear units (gelus), 2023. 5 Ross Wightman, Jakob Uszkoreit, and Lucas Beyer.
[10] Kurt Hornik, Maxwell B. Stinchcombe, and Halbert L. How to train your vit? data, augmentation, and regu-
White. Multilayer feedforward networks are universal larization in vision transformers, 2022. 5
approximators. Neural Networks, 2:359–366, 1989. 1 [28] Stephen M Stigler. Gauss and the invention of least
[11] Roger David Joseph. Contributions to perceptron the- squares. the Annals of Statistics, pages 465–474, 1981.
ory. Cornell University, 1961. 1 1
[12] Johannes Kepler. Ad vitellionem paralipomena, quibus [29] Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov,
astronomiae pars optica traditur. 1604. Johannes Ke- Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jes-
pler: Gesammelte Werke, Ed. Walther von Dyck and sica Yung, Daniel Keysers, Jakob Uszkoreit, Mario Lu-
Max Caspar, Münchenk, 1939. 8 cic, and Alexey Dosovitskiy. Mlp-mixer: An all-mlp ar-

9
chitecture for vision. arXiv preprint arXiv:2105.01601,
2021. 8

10
Appendix
8.1. Geometric Topology
In geometric topology, various types of spaces are used to study notions of distance and convergence. Each type
of space has a specific set of properties defined by a function known as a metric or a generalization thereof. We
describe below metric spaces, semi-metric spaces, and pseudo-metric spaces, emphasizing the differences in their
definitions.

8.1.1 Metric Space

A metric space is a set X equipped with a distance function d : X × X → R, called a metric, which satisfies the
following properties for all x, y, z ∈ X:
1. Non-negativity: d(x, y) ≥ 0.
2. Identity of indiscernibles: d(x, y) = 0 if and only if x = y.
3. Symmetry: d(x, y) = d(y, x).
4. Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z).
These properties ensure that a metric space has a well-defined notion of distance between any pair of points in
X, which is fundamental to many topological and analytical concepts.

8.1.2 Semi-Metric Space

A semi-metric space generalizes a metric space by relaxing the triangle inequality requirement. A semi-metric
space is defined as a set X with a distance function d : X × X → R that satisfies:
1. Non-negativity: d(x, y) ≥ 0.
2. Identity of indiscernibles: d(x, y) = 0 if and only if x = y.
3. Symmetry: d(x, y) = d(y, x).
In this case, d provides a notion of distance that is symmetric and non-negative, though it does not necessarily
satisfy the triangle inequality.

8.1.3 Pseudo-Metric Space

A pseudo-metric space is another generalization of a metric space, where the identity of indiscernibles requirement
is omitted. Thus, a pseudo-metric space is a set X with a distance function d : X × X → R that satisfies:
1. Non-negativity: d(x, y) ≥ 0.
2. Symmetry: d(x, y) = d(y, x).
3. Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z).
In a pseudo-metric space, d(x, y) = 0 does not necessarily imply that x = y; points can have zero distance between
them without being identical, which makes pseudo-metric spaces useful in contexts where such indistinguishability
is needed.
8.2. ⵟ is pseudo-metric space
Theorem 8.1 (ⵟ is semi-metric and ⵟ is pseudo-metric). Let (Rn , ⵟ) and (Rn , ⵟ) be two spaces where:

oij (ei · ej )2
ⵟ(ei , ej ) = =
d2ij |ei − ej |2

d2ij |ei − ej |2
ⵟ(ei , ej ) = =
oij (ei · ej )2
for ei , ej ∈ Rn \ {0} where ei 6= ej , with d2ij = |ei − ej |2 being the squared Euclidean distance and oij = (ei · ej )2
being the squared dot product.
Then (Rn , ⵟ) is semi-metric, while (Rn , ⵟ) is pseudo-metric.
Proof. We structure this proof into four parts:

11
1. Preliminary observations and domain analysis
2. Proof of common properties for both measures
3. Proof that ⵟ is a pseudo-metric
4. Proof that ⵟ is a semi-metric
Part I: Preliminary Observations
Before proving the metric properties, we must establish the domain where these measures are well-defined:
1. For non-zero vectors ei , ej :
• d2ij = 0 ⇐⇒ ei = ej
• oij = 0 ⇐⇒ ei ⊥ ej (vectors are orthogonal)
2. Domain restrictions:
• ⵟ is defined when d2ij 6= 0 (distinct vectors)
• ⵟ is defined when oij 6= 0 (non-orthogonal vectors)
Part II: Common Properties
Both measures satisfy the following properties:
1. Non-negativity: Since both d2ij and oij are squared quantities:

d2ij = |ei − ej |2 ≥ 0 and oij = (ei · ej )2 ≥ 0

Therefore:
ⵟ(ei , ej ) ≥ 0 and ⵟ(ei , ej ) ≥ 0
2. Identity of Indiscernibles: For both measures, we prove this bidirectionally:
(⇒) If ei = ej :
• d2ij = 0
• oij = |ei |4 > 0 (for non-zero vectors)
Therefore, ⵟ(ei , ej ) = 0 and ⵟ(ei , ej ) = 0
(⇐) If ⵟ(ei , ej ) = 0 or ⵟ(ei , ej ) = 0:
• For ⵟ: dij 2 = 0 =⇒ oij = 0 (since dij 6= 0 for distinct vectors)
o 2
ij
d2ij
• For ⵟ: oij = 0 =⇒ d2ij = 0 (since oij 6= 0 in domain)
In both cases, this implies that this rule stand for ⵟ, but not for ⵟ.
3. Symmetry: Symmetry follows from the symmetry of dot product and Euclidean distance:

(ei · ej )2 (ej · ei )2
ⵟ(ei , ej ) = = = ⵟ(ej , ei )
|ei − ej |2 |ej − ei |2

And similarly for ⵟ.

Part III: Proof the triangle inequality for ⵟ
To prove ⵟ satisfies the triangle inequality, we proceed in steps:
Given:
(e1 · e2 )2
e1 ⵟe2 = .
||e2 − e1 ||2
Let:
- e1 and e2 be vectors in Rn ,
- θ be the angle between e1 and e2 .
The dot product between e1 and e2 can be written as:

e1 · e2 = ||e1 || ||e2 || cos θ.

Thus, (e1 · e2 )2 becomes:

(e1 · e2 )2 = (||e1 || ||e2 || cos θ)2 = ||e1 ||2 ||e2 ||2 cos2 θ.
The Euclidean distance between e1 and e2 is:

12
||e2 − e1 ||2 = ||e1 ||2 + ||e2 ||2 − 2 ||e1 || ||e2 || cos θ.
Now we substitute these expressions into the formula for e1 ⵟe2 :

||e1 ||2 ||e2 ||2 cos2 θ

e1 ⵟe2 = .
||e1 ||2 + ||e2 ||2 − 2 ||e1 || ||e2 || cos θ
Let’s simplify by defining:
- A = ||e1 ||,
- B = ||e2 ||.
Thus, the expression becomes:

A2 B 2 cos2 θ
f (θ) = e1 ⵟe2 = .
A2 + B 2 − 2AB cos θ
Let’s factor out common terms in the numerator. Notice that each term in the numerator has a factor of
A2 B 2 sin θ, so we can factor that out:

A2 B 2 sin θ −2 cos θ(A2 + B 2 − 2AB cos θ) − 2AB cos2 θ

0
f (θ) = .
(A2 + B 2 − 2AB cos θ)2
Now, distribute −2 cos θ in the first term inside the brackets:

A2 B 2 sin θ −2A2 cos θ − 2B 2 cos θ + 4AB cos2 θ − 2AB cos2 θ

= .
(A2 + B 2 − 2AB cos θ)2
Combine the cos2 θ terms:

A2 B 2 sin θ −2A2 cos θ − 2B 2 cos θ + 2AB cos2 θ

= .
(A2 + B 2 − 2AB cos θ)2
Thus, the simplified form of f 0 (θ) is:

−2A2 B 2 sin θ A2 cos θ + B 2 cos θ − AB cos2 θ

0
f (θ) = .
(A2 + B 2 − 2AB cos θ)2
This form is simpler and allows us to see that the sign of f 0 (θ) depends on the sign of − sin θ, which is non-positive
on the interval [0, π]. Therefore, f 0 (θ) ≤ 0 on this interval, confirming that f (θ) is monotonically decreasing.
Since f (θ) is monotonically decreasing, it follows that ⵟ(e1 , e2 ) = f (θ) decreases as θ increases.
Applying the Angular Triangle Inequality Angles in Euclidean space satisfy the triangle inequality
(Cauchy–Schwarz inequality):
θik ≤ θij + θjk .
Since ⵟ(ei , ej ) is a decreasing function of θ, we conclude:

ⵟ(ei , ek ) ≤ ⵟ(ei , ej ) + ⵟ(ej , ek ).

Part IV: Disproof the triangle inequality for ⵟ

To show ⵟ is not a metric, we provide a counter-example where the triangle inequality fails:
Consider three points in R2 :

e1 = (1, 0)
e2 = (0, 1)
e3 = (1, 1)

Computing the values:

13
ⵟ(e1 , e2 ) = 0
1
ⵟ(e2 , e3 ) = = 1
1
2
ⵟ(e1 , e3 ) = = 2
1
Which means:
ⵟ(e1 , e3 ) ≥ ⵟ(e1 , e2 ) + ⵟ(e2 , e3 )
Hence the inequality doesn’t stand for ⵟ.
Therefore, we conclude that (Rn , ⵟ) is pseudo-metric and (Rn , ⵟ) is semi-metric.

8.3. ⵟ-product vs Dot Product

# P l o t t h e n euro ns and t h e t e s t p o i n t i n t h e 2D s p a c e

p l t . f i g u r e ( f i g s i z e =(6 , 6 ) )

# P l o t n euro ns a s r e d d o t s
f o r i , neuron i n enumerate ( neur ons ) :
p l t . p l o t ( neuron [ 0 ] , neuron [ 1 ] , ’ ro ’ )
p l t . t e x t ( neuron [ 0 ] + 0 . 1 , neuron [ 1 ] , f ’ Neuron { i +1} ’ , f o n t s i z e =12)

# Plot the t e s t point (6 , 6) as a blue s t a r

p l t . p l o t ( t e s t _ p o i n t [ 0 ] , t e s t _ p o i n t [ 1 ] , ’ b ∗ ’ , m a r k e r s i z e =15 , l a b e l =’ Test Point ( 6 , 6 ) ’ )
p l t . t e x t ( t e s t _ p o i n t [ 0 ] + 0 . 1 , t e s t _ p o i n t [ 1 ] , ’ Test Point ( 6 , 6 ) ’ , f o n t s i z e =12)

# S e t up t h e p l o t l i m i t s and l a b e l s
p l t . xlim ( 0 , 7 )
p l t . ylim ( 0 , 7 )
p l t . x l a b e l ( ’X−a x i s ’ )
p l t . y l a b e l ( ’Y−a x i s ’ )
p l t . t i t l e ( ’ Neurons and Test Point i n 2D Space ’ )

# Draw l i n e s c o n n e c t i n g t h e o r i g i n t o t h e neur ons and t e s t p o i n t

f o r neuron i n n euro ns :
p l t . p l o t ( [ 0 , neuron [ 0 ] ] , [ 0 , neuron [ 1 ] ] , ’ r −−’)

plt . plot ([0 , test_point [ 0 ] ] , [0 , test_point [ 1 ] ] , ’ b−−’)

p l t . g r i d ( True )
p l t . gca ( ) . s e t _ a s p e c t ( ’ equal ’ , a d j u s t a b l e =’box ’ )
p l t . show ( )

%%%%%%% TODO change t h i s code i n t h e paper t o t h e new one

import numpy a s np
import m a t p l o t l i b . p y p l o t a s p l t

# D e f i n e t h e n euro ns a s v e c t o r s
neur ons = np . a r r a y ( [ [ 1 , 1 ] , [ 2 , 2 ] , [ 3 , 3 ] , [ 4 , 4 ] , [ 5 , 5 ] , [ 7 , 7 ] , [ 8 , 8 ] ] )
t e s t _ p o i n t = np . a r r a y ( [ 6 , 6 ] )

# dot p r o d uc t f u n c t i o n

14
d e f c o s i n e _ s i m i l a r i t y ( v1 , v2 ) :
dot_product = np . dot ( v1 , v2 )
r e t u r n dot_product

# Yat−p r o d uc t f u n c t i o n
d e f yat_product ( v1 , v2 , e p s i l o n =1e −6):
dot_product_squared = np . dot ( v1 , v2 ) ∗∗ 2
d i s t a n c e _ s q u a r e d = np . l i n a l g . norm ( v2 − v1 ) ∗∗ 2
r e t u r n dot_product_squared / ( d i s t a n c e _ s q u a r e d + e p s i l o n )
# C a l c u l a t e c o s i n e s i m i l a r i t i e s and yat−p r o d u c t s
c o s i n e _ s i m i l a r i t i e s = [ c o s i n e _ s i m i l a r i t y ( t e s t _ p o i n t , neuron ) f o r neuron i n n euro ns ]
yat_products = [ yat_product ( t e s t _ p o i n t , neuron ) f o r neuron i n n euro ns ]

# Plot the c o s i n e s i m i l a r i t i e s
p l t . f i g u r e ( f i g s i z e =(10 , 5 ) )

plt . subplot (1 , 2 , 1)
plt . p l o t ( r a n g e ( 1 , 6 ) , c o s i n e _ s i m i l a r i t i e s , marker =’o ’ , c o l o r =’ blue ’ , l a b e l =’ Co s i n e S i m i l a r i t y
plt . t i t l e ( ” C o s i n e S i m i l a r i t y with ( 6 , 6 ) ” )
plt . x l a b e l ( ” Neuron ” )
plt . y l a b e l (” Cosine S i m i l a r i t y ”)
plt . ylim ( 0 , 1 . 1 )
plt . g r i d ( True )
plt . xticks ([1 , 2 , 3 , 4 , 5])

# P l o t t h e Yat−p r o d u c t s
p l t . subplot (1 , 2 , 2)
p l t . p l o t ( r a n g e ( 1 , 6 ) , yat_products , marker =’o ’ , c o l o r =’ red ’ , l a b e l =’Yat−product ’ )
p l t . t i t l e ( ” Yat−p r o d u c t with ( 6 , 6 ) ” )
p l t . x l a b e l ( ” Neuron ” )
p l t . y l a b e l ( ” Yat−p r o d u c t ” )
p l t . g r i d ( True )
plt . xticks ([1 , 2 , 3 , 4 , 5])

plt . tight_layout ()
p l t . show ( )

ⵟ-Neuron with numpy

import numpy a s np
def yat_neuron (X, w, b ) :
# Squared d o t p r o d u c t
dot_squared = np . dot (X, w) ∗∗ 2
# Squared E u c l i d e a n d i s t a n c e
d i s t a n c e _ s q u a r e d = np .sum( (w − X) ∗∗ 2 , a x i s =1)
# Avoid d i v i s i o n by z e r o by a d d i n g a s m a l l e p s i l o n
e p s i l o n = 1 e−6
return dot_squared / ( d i s t a n c e _ s q u a r e d + e p s i l o n ) + b

8.4. XOR Code

import numpy a s np
np . random . s e e d ( 4 2 ) # For r e p r o d u c i b i l i t y

15
w = np . random . randn ( 2 ) # Random w e i g h t s f o r a 2D i n p u t
b = np . random . randn ( ) # Random b i a s
# XOR d a t a s e t : i n p u t s and c o r r e s p o n d i n g o u t p u t s
X = np . a r r a y ( [ [ 0 , 0 ] , [ 0 , 1 ] , [ 1 , 0 ] , [ 1 , 1 ] ] ) # I n p u t s
y = np . a r r a y ( [ 0 , 1 , 1 , 0 ] ) # E x p e c t e d o u t p u t s f o r XOR

# Apply custom neuron t o each i n p u t i n XOR d a t a s e t

o u t p u t s = yat_neuron (X, w, b )

# Print r e s u l t s
print (w)
print ( b )
print ( o u t p u t s )
w, b , o u t p u t s

from s c i p y . o p t i m i z e import minimize

# Loss f u n c t i o n : Mean Squared Error be twee n t h e neuron o u t p u t and t h e t a r g e t XOR o u t p u t
def l o s s _ f u n c t i o n ( params ) :
w = params [ : 2 ] # F i r s t two v a l u e s a r e w e i g h t s
b = params [ 2 ] # Last value i s b i a s
o u t p u t s = yat_neuron (X, w, b )
return np . mean ( ( o u t p u t s − y ) ∗∗ 2 ) # Mean Squared Error (MSE)

# I n i t i a l p a r a m e t e r s : [ w1 , w2 , b ]
i n i t i a l _ p a r a m s = np . append (w, b )

# Optimize t h e w e i g h t s and b i a s u s i n g ’ minimize ’ from SciPy

r e s u l t = minimize ( l o s s _ f u n c t i o n , i n i t i a l _ p a r a m s , method= ’BFGS ’ )

# E x t r a c t o p t i m i z e d w e i g h t s and b i a s
optimized_params = r e s u l t . x
o p t i m i z e d _ w e i g h t s = optimized_params [ : 2 ]
o p t i m i z e d _ b i a s = optimized_params [ 2 ]

# Apply o p t i m i z e d w e i g h t s and b i a s t o XOR d a t a s e t

o p t i m i z e d _ o u t p u t s = yat_neuron (X, optimized_weights , o p t i m i z e d _ b i a s )

print ( ’###############’ )
print ( o p t i m i z e d _ w e i g h t s )
print ( o p t i m i z e d _ b i a s )
print ( o p t i m i z e d _ o u t p u t s )

import m a t p l o t l i b . p y p l o t a s p l t

# Function t o p l o t XOR d a t a and d e c i s i o n boundary

def plot_xor_decision_boundary ( w e i g h t s , b i a s ) :
# XOR i n p u t d a t a p o i n t s
X_pos = X[ y == 1 ] # P o i n t s where y == 1
X_neg = X[ y == 0 ] # P o i n t s where y == 0

# Create a mesh grid f o r t h e p l o t

xx , yy = np . meshgrid ( np . l i n s p a c e ( −0.5 , 1 . 5 , 4 0 0 ) , np . l i n s p a c e ( −0.5 , 1 . 5 , 4 0 0 ) )
g r i d _ p o i n t s = np . c_ [ xx . r a v e l ( ) , yy . r a v e l ( ) ]

16
# Compute neuron o u t p u t f o r each p o i n t i n t h e g r i d
Z = yat_neuron ( g r i d _ p o i n t s , w e i g h t s , b i a s )
Z = Z . r e s h a p e ( xx . shape )
print (Z)
print ( xx )
# P l o t t h e d e c i s i o n boundary and XOR p o i n t s
p l t . f i g u r e ( f i g s i z e =(6 , 6 ) )
p l t . c o n t o u r f ( xx , yy , Z > 0 . 5 , a l p h a =0.5 , cmap= ’ coolwarm ’ ) # D e c i s i o n boundary
p l t . s c a t t e r ( X_pos [ : , 0 ] , X_pos [ : , 1 ] , c o l o r= ’ r e d ’ , l a b e l= ’ 1 ’ , e d g e c o l o r s= ’ k ’ )
p l t . s c a t t e r ( X_neg [ : , 0 ] , X_neg [ : , 1 ] , c o l o r= ’ b l u e ’ , l a b e l= ’ 0 ’ , e d g e c o l o r s= ’ k ’ )
p l t . t i t l e ( ”XOR Problem : D e c i s i o n Boundary � ( Neuron ) ” )
p l t . x l a b e l ( ’ x1 ’ )
p l t . y l a b e l ( ’ x2 ’ )
plt . legend ()
p l t . g r i d ( True )
p l t . show ( )

# P l o t t h e XOR d e c i s i o n boundary w i t h o p t i m i z e d w e i g h t s and b i a s

plot_xor_decision_boundary ( optimized_weights , o p t i m i z e d _ b i a s )

8.5. ⵟ-Layer (NML) with Flax

import j a x . numpy a s jnp

from f l a x . l i n e n . d t y p e s import promote_dtype
from f l a x . l i n e n . module import Module , compact
from f l a x . t y p i n g import (
PRNGKey a s PRNGKey,
Shape a s Shape ,
DotGeneralT ,
)

from t y p i n g import (
Any ,
)
import j a x . numpy a s jnp
import j a x . l a x a s l a x
from f l a x . l i n e n import Module , compact
from f l a x import l i n e n a s nn
from f l a x . l i n e n . i n i t i a l i z e r s import z e r o s _ i n i t , lecun_normal
from t y p i n g import Any , O p t i o n a l

c l a s s YatDense ( Module ) :
”””
Attributes :
f e a t u r e s : t h e number o f o u t p u t f e a t u r e s .
u s e _ b i a s : w h e t h e r t o add a b i a s t o t h e o u t p u t ( d e f a u l t : True ) .
d t y p e : t h e d t y p e o f t h e co m pu t at i on .
param_dtype : t h e d t y p e p a s s e d t o parameter i n i t i a l i z e r s ( d e f a u l t : f l o a t 3 2 ) .
p r e c i s i o n : n u m e r i c a l p r e c i s i o n o f t h e c o mp u ta t io n .
k e r n e l _ i n i t : i n i t i a l i z e r function f o r the weight matrix .
bias_init : i n i t i a l i z e r function for the bias .
epsilon : small constant .

17
”””
f e a t u r e s : int
u s e _ b i a s : bool = True
dtype : O p t i o n a l [ Any ] = None
param_dtype : Any = jnp . f l o a t 3 2
p r e c i s i o n : Any = None
k e r n e l _ i n i t : Any = nn . i n i t i a l i z e r s . o r t h o g o n a l ( )
b i a s _ i n i t : Any = z e r o s _ i n i t ( )

# I n i t i a l i z e alpha to 1.0
a l p h a _ i n i t : Any = lambda key , shape , dtype : jnp . o n e s ( shape , type )
e p s i l o n : f l o a t = 1 e−6
d o t _ g e n e r a l : DotGeneralT | None = None
d o t _ g e n e r a l _ c l s : Any = None
r e t u r n _ w e i g h t s : bool = F a l s e

@compact
def __call__ ( s e l f , i n p u t s : Any) −> Any :
”””
Args :
i n p u t s : The nd−a r r a y t o be t r a n s f o r m e d .

Returns :
The t r a n s f o r m e d i n p u t .
”””
k e r n e l = s e l f . param (
’ kernel ’ ,
s e l f . kernel_init ,
( s e l f . f e a t u r e s , jnp . shape ( i n p u t s ) [ − 1 ] ) ,
s e l f . param_dtype ,
)
a l p h a = s e l f . param (
’ alpha ’ ,
s e l f . alpha_init ,
( 1 , ) , # S i n g l e s c a l a r parameter
s e l f . param_dtype ,
)
i f s e l f . use_bias :
b i a s = s e l f . param (
’ b i a s ’ , s e l f . b i a s _ i n i t , ( s e l f . f e a t u r e s , ) , s e l f . param_dtype
)
else :
b i a s = None

i n p u t s , k e r n e l , b i a s = promote_dtype ( i n p u t s , k e r n e l , b i a s , dtype= s e l f . dtype )

# Compute d o t p r o d u c t bet ween i n p u t and k e r n e l
i f s e l f . d o t _ g e n e r a l _ c l s i s not None :
dot_general = s e l f . dot_general_cls ( )
e l i f s e l f . d o t _ g e n e r a l i s not None :
dot_general = s e l f . dot_general
else :
dot_general = lax . dot_general
y = dot_general (

18
inputs ,
jnp . t r a n s p o s e ( k e r n e l ) ,
( ( ( i n p u t s . ndim − 1 , ) , ( 0 , ) ) , ( ( ) , ( ) ) ) ,
p r e c i s i o n= s e l f . p r e c i s i o n ,
)
inputs_squared_sum = jnp .sum( i n p u t s ∗ ∗ 2 , a x i s =−1, keepdims=True )
kernel_squared_sum = jnp .sum( k e r n e l ∗ ∗ 2 , a x i s =−1)
d i s t a n c e s = inputs_squared_sum + kernel_squared_sum − 2 ∗ y

# Element−w i s e o p e r a t i o n
y = y ∗∗ 2 / ( d i s t a n c e s + s e l f . e p s i l o n )
s c a l e = ( jnp . s q r t ( s e l f . f e a t u r e s ) / jnp . l o g ( 1 + s e l f . f e a t u r e s ) ) ∗∗ a l p h a
y = y ∗ scale
# Normalize y
i f b i a s i s not None :
y += jnp . r e s h a p e ( b i a s , ( 1 , ) ∗ ( y . ndim − 1 ) + ( −1 ,))
i f s e l f . return_weights :
return y , k e r n e l
return y

Organograma FIBA - Inglês
No ratings yet
Organograma FIBA - Inglês
1 page
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Review of Deep Learning Algorithms and Architectur
No ratings yet
Review of Deep Learning Algorithms and Architectur
29 pages
Neural Network As Universal Approximates
No ratings yet
Neural Network As Universal Approximates
5 pages
Neural Network Representation
No ratings yet
Neural Network Representation
5 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
3rd Unit ML
No ratings yet
3rd Unit ML
7 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
8.2.1: Introduction To Neural Networks: Objectives
No ratings yet
8.2.1: Introduction To Neural Networks: Objectives
11 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
Unit I
No ratings yet
Unit I
90 pages
Unit V
No ratings yet
Unit V
49 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Deep Learning
No ratings yet
Deep Learning
156 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Module I
No ratings yet
Module I
109 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
LBDL
No ratings yet
LBDL
143 pages
NN Fundamentals CS
No ratings yet
NN Fundamentals CS
36 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Little Book of Deep Learning
100% (1)
Little Book of Deep Learning
158 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Unit 1
No ratings yet
Unit 1
16 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Introduction Neural
No ratings yet
Introduction Neural
13 pages
Deep Neural Networks Intro As If
No ratings yet
Deep Neural Networks Intro As If
55 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Aatman Ai New
No ratings yet
Aatman Ai New
11 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
House Dzone Refcard 383 Neural Network Essentials
No ratings yet
House Dzone Refcard 383 Neural Network Essentials
5 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Neural Network Thesis 2013
100% (1)
Neural Network Thesis 2013
5 pages
NNML Full
No ratings yet
NNML Full
19 pages
Deep Learning Algorithms and Architectures
No ratings yet
Deep Learning Algorithms and Architectures
26 pages
A Review On Basic Deep Learning
No ratings yet
A Review On Basic Deep Learning
9 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Deep Learning Has Evolved Significantly Since Its Inception in The 1940s
No ratings yet
Deep Learning Has Evolved Significantly Since Its Inception in The 1940s
50 pages
Lesson 3 Basics of Neural Networks - Perceptron
No ratings yet
Lesson 3 Basics of Neural Networks - Perceptron
26 pages
Analysing 3 Networks
No ratings yet
Analysing 3 Networks
30 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Deep Learning Module 1 Important Topics PYQs
No ratings yet
Deep Learning Module 1 Important Topics PYQs
27 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Micro-Canonical Cascades and Random Homeomorphisms: Bstract
No ratings yet
Micro-Canonical Cascades and Random Homeomorphisms: Bstract
28 pages
Moduli Spaces of Sextic Curves With Simple Singularities and Their Compactifications
No ratings yet
Moduli Spaces of Sextic Curves With Simple Singularities and Their Compactifications
35 pages
On The Inclusion: 2010 MSC Keywords
No ratings yet
On The Inclusion: 2010 MSC Keywords
7 pages
Global Representation Theory:: Homological Foundations
No ratings yet
Global Representation Theory:: Homological Foundations
38 pages
A Modern Perspective On Rational Homotopy Theory: Abstract
No ratings yet
A Modern Perspective On Rational Homotopy Theory: Abstract
37 pages
Functional Calculus For A Commuting Tuple OF Ritt Operators: Abstract
No ratings yet
Functional Calculus For A Commuting Tuple OF Ritt Operators: Abstract
25 pages
Complete Right Tail Asymptotic For The Density of Branching Processes With Fractional Generating Functions
No ratings yet
Complete Right Tail Asymptotic For The Density of Branching Processes With Fractional Generating Functions
18 pages
A General Regularization Strategy For Singular Stokes Problems and Convergence Analysis For Corresponding Discretization and Iterative Solution
No ratings yet
A General Regularization Strategy For Singular Stokes Problems and Convergence Analysis For Corresponding Discretization and Iterative Solution
26 pages
Cowen-Douglas Operators and Analytic Continuation: Pawe L Pietrzycki
No ratings yet
Cowen-Douglas Operators and Analytic Continuation: Pawe L Pietrzycki
7 pages
A Hiv: K F DMD: Ging and Mortality of Persons With A Novel Alman Iltering and Framework
No ratings yet
A Hiv: K F DMD: Ging and Mortality of Persons With A Novel Alman Iltering and Framework
20 pages
Active Rheology of Soft Solids Performed With Acoustical Tweezers
No ratings yet
Active Rheology of Soft Solids Performed With Acoustical Tweezers
10 pages
Introducing A Novel Subclass of Harmonic Functions With Close-To-Convex Properties
No ratings yet
Introducing A Novel Subclass of Harmonic Functions With Close-To-Convex Properties
11 pages
Dubrovin Duality and Mirror Symmetry For Ade Resolutions: R R R R R
No ratings yet
Dubrovin Duality and Mirror Symmetry For Ade Resolutions: R R R R R
26 pages
Multidimensional Integrable Systems From Contact Geometry
No ratings yet
Multidimensional Integrable Systems From Contact Geometry
9 pages
NEATH IV: An Early Onset of Complex Organic Chemistry in Molecular Clouds
No ratings yet
NEATH IV: An Early Onset of Complex Organic Chemistry in Molecular Clouds
10 pages
A Note On Multiplicative Roots of Multivariable Formal Power Series
No ratings yet
A Note On Multiplicative Roots of Multivariable Formal Power Series
8 pages
WSSM: Geographic-Enhanced Hierarchical State-Space Model For Global Station Weather Forecast
No ratings yet
WSSM: Geographic-Enhanced Hierarchical State-Space Model For Global Station Weather Forecast
5 pages
The Method of - Extension: The KDV Equation: Metin G Urses
No ratings yet
The Method of - Extension: The KDV Equation: Metin G Urses
11 pages
Reduction Properties of The KP-MKP Hierarchy
No ratings yet
Reduction Properties of The KP-MKP Hierarchy
15 pages
B. Grammaticos A. Ramani: A Fast Algorithmic Way To Calculate The Degree Growth of Birational Mappings
No ratings yet
B. Grammaticos A. Ramani: A Fast Algorithmic Way To Calculate The Degree Growth of Birational Mappings
18 pages
The Direct Linearization Scheme With The Lam e Function: The KP Equation and Reductions
No ratings yet
The Direct Linearization Scheme With The Lam e Function: The KP Equation and Reductions
25 pages
Reality Conditions For Sine-Gordon Equation and Quasi-Periodic Solutions in Finite Phase Spaces
No ratings yet
Reality Conditions For Sine-Gordon Equation and Quasi-Periodic Solutions in Finite Phase Spaces
28 pages
Is Magnitude Generically Continuous' For Finite Metric Spaces?
No ratings yet
Is Magnitude Generically Continuous' For Finite Metric Spaces?
18 pages
Effects of Taxes, Redistribution Actions and Fiscal Evasion On Wealth Inequality: An Agent-Based Model Approach
No ratings yet
Effects of Taxes, Redistribution Actions and Fiscal Evasion On Wealth Inequality: An Agent-Based Model Approach
22 pages
Evolution of Spots and Stripes in Cellular Automata: Author: Submitted To: Category: Date
No ratings yet
Evolution of Spots and Stripes in Cellular Automata: Author: Submitted To: Category: Date
18 pages
Computational Geometry With Probabilistically Noisy Primitive Operations
No ratings yet
Computational Geometry With Probabilistically Noisy Primitive Operations
28 pages
Evidence of Non-Equilibrium Critical Phenomena in A Simple Model of Traffic
No ratings yet
Evidence of Non-Equilibrium Critical Phenomena in A Simple Model of Traffic
9 pages
Quantum Circuits For Elementary Cellular Automata: Wolfram Code Border Conditions Periodic Bcs Fixed
No ratings yet
Quantum Circuits For Elementary Cellular Automata: Wolfram Code Border Conditions Periodic Bcs Fixed
6 pages
Reality Conditions For Sin-Gordon Equation and Quasi-Periodic Solutions in Finite Phase Spaces
No ratings yet
Reality Conditions For Sin-Gordon Equation and Quasi-Periodic Solutions in Finite Phase Spaces
26 pages
Towards High Resolution, Validated and Open Global Wind Power Assessments
No ratings yet
Towards High Resolution, Validated and Open Global Wind Power Assessments
40 pages
GR 11 Paper 1 Business 2025 Term 1
No ratings yet
GR 11 Paper 1 Business 2025 Term 1
7 pages
Breezes Color by Number
No ratings yet
Breezes Color by Number
1 page
C Advertising Agencyad Agency C C
No ratings yet
C Advertising Agencyad Agency C C
9 pages
Sentosa Case - SIS Experience
100% (2)
Sentosa Case - SIS Experience
14 pages
ECO 350: Money and Banking: Professor Griffy
No ratings yet
ECO 350: Money and Banking: Professor Griffy
17 pages
18CV54
No ratings yet
18CV54
4 pages
Conditionals Random Pages Sample2 PDF
No ratings yet
Conditionals Random Pages Sample2 PDF
22 pages
Board of Intermediate and Secondary Education, DG - Khan Application
No ratings yet
Board of Intermediate and Secondary Education, DG - Khan Application
1 page
Interview Writing
No ratings yet
Interview Writing
4 pages
This Release Contains:: How To Upgrade From Previous Versions
No ratings yet
This Release Contains:: How To Upgrade From Previous Versions
8 pages
B-15 - Stratified Analysis
No ratings yet
B-15 - Stratified Analysis
9 pages
Statement of The Problem: (CITATION Abb08 /L 1033) (CITATION Hea97 /L 1033)
No ratings yet
Statement of The Problem: (CITATION Abb08 /L 1033) (CITATION Hea97 /L 1033)
15 pages
Pre-Trial Report
100% (1)
Pre-Trial Report
10 pages
SSC CGL 2021 Mains Maths (En) Paper
No ratings yet
SSC CGL 2021 Mains Maths (En) Paper
35 pages
(Ebook PDF) Health, Safety, and Nutrition For The Young Child 9th Edition Download
100% (1)
(Ebook PDF) Health, Safety, and Nutrition For The Young Child 9th Edition Download
57 pages
Notes For My Early Influences
No ratings yet
Notes For My Early Influences
21 pages
3.BP Travel - Create Quotes - Functional Requirements Questionnaire (FRQ)
No ratings yet
3.BP Travel - Create Quotes - Functional Requirements Questionnaire (FRQ)
11 pages
Human Capital Theory
No ratings yet
Human Capital Theory
4 pages
Infectious Diseases of Livestock, 2nd Edition, Volume 1
No ratings yet
Infectious Diseases of Livestock, 2nd Edition, Volume 1
688 pages
New York Magazine 18 April 2016
100% (1)
New York Magazine 18 April 2016
156 pages
Programming Imp Questions
No ratings yet
Programming Imp Questions
32 pages
Financial Management: Valuation of Bonds and Shares Quiz
No ratings yet
Financial Management: Valuation of Bonds and Shares Quiz
12 pages
Bonding Chem20 Exam
No ratings yet
Bonding Chem20 Exam
5 pages
MKU
No ratings yet
MKU
5 pages
(Journal) Mapping The Policy Space of Public Consultations Evidence From The European Union - Supp
No ratings yet
(Journal) Mapping The Policy Space of Public Consultations Evidence From The European Union - Supp
7 pages
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
No ratings yet
Aditya Praksh Jalan Saraswati Vidya Mandir, Kudlum: Online Class Routine
1 page
Log in and Out DTR
No ratings yet
Log in and Out DTR
5 pages
Results For Quiz What Breed of Cat Are You
No ratings yet
Results For Quiz What Breed of Cat Are You
1 page
Operation Manual WBL-100/101/200: 3, Hagavish St. Israel 58817 Tel: 972 3 5595252, Fax: 972 3 5594529
No ratings yet
Operation Manual WBL-100/101/200: 3, Hagavish St. Israel 58817 Tel: 972 3 5595252, Fax: 972 3 5594529
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality

Uploaded by

Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality

Uploaded by

Deep Learning 2.

0: Artificial Neurons That Matter - Reject

Abstract complex, non-linear patterns in data [18, 28].

• The Neural-Matter Network (NMN), is an For illustration, consider x = [1, 2, 3] and wi =

3.2. Neural-Matter State (NMS) Plot

y = ⵙ ∗ ⵟ(w, x) + b 3.4. ⵟ-ViT

n (w · x)2 Additionally, we replace the standard scaling factor

Inspired by the Masked Autoencoder (MAE) approach

2. Apply the Mask:

giving the masked sequence X0 = [x01 , x02 , . . . , x0n ],

4.1. Experimental Setup

5.2. Effect of ⵟ-Regularization on Output Layer

6. Limitations and Future Directions

7. Related Works far beyond human cognition. These fundamental laws

8.1.1 Metric Space

8.1.2 Semi-Metric Space

8.1.3 Pseudo-Metric Space

d2ij = |ei − ej |2 ≥ 0 and oij = (ei · ej )2 ≥ 0

And similarly for ⵟ.

e1 · e2 = ||e1 || ||e2 || cos θ.

||e1 ||2 ||e2 ||2 cos2 θ

A2 B 2 sin θ −2 cos θ(A2 + B 2 − 2AB cos θ) − 2AB cos2 θ

A2 B 2 sin θ −2A2 cos θ − 2B 2 cos θ + 4AB cos2 θ − 2AB cos2 θ

A2 B 2 sin θ −2A2 cos θ − 2B 2 cos θ + 2AB cos2 θ

−2A2 B 2 sin θ A2 cos θ + B 2 cos θ − AB cos2 θ

ⵟ(ei , ek ) ≤ ⵟ(ei , ej ) + ⵟ(ej , ek ).

Part IV: Disproof the triangle inequality for ⵟ

Computing the values:

8.3. ⵟ-product vs Dot Product

# Plot the t e s t point (6 , 6) as a blue s t a r

# Draw l i n e s c o n n e c t i n g t h e o r i g i n t o t h e neur ons and t e s t p o i n t

plt . plot ([0 , test_point [ 0 ] ] , [0 , test_point [ 1 ] ] , ’ b−−’)

%%%%%%% TODO change t h i s code i n t h e paper t o t h e new one

ⵟ-Neuron with numpy

8.4. XOR Code

# Apply custom neuron t o each i n p u t i n XOR d a t a s e t

from s c i p y . o p t i m i z e import minimize

# Optimize t h e w e i g h t s and b i a s u s i n g ’ minimize ’ from SciPy

# Apply o p t i m i z e d w e i g h t s and b i a s t o XOR d a t a s e t

# Function t o p l o t XOR d a t a and d e c i s i o n boundary

# Create a mesh grid f o r t h e p l o t

# P l o t t h e XOR d e c i s i o n boundary w i t h o p t i m i z e d w e i g h t s and b i a s

8.5. ⵟ-Layer (NML) with Flax

import j a x . numpy a s jnp

i n p u t s , k e r n e l , b i a s = promote_dtype ( i n p u t s , k e r n e l , b i a s , dtype= s e l f . dtype )

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.