0% found this document useful (0 votes)
185 views22 pages

Closed-Form Continuous-Time Neural Networks: Nature Machine Intelligence

1) Continuous-time neural networks modeled by differential equations can model spatiotemporal data but are limited by numerical solvers. 2) The paper presents a closed-form solution to approximate the interaction between neurons and synapses in liquid time-constant networks, which has previously had no known closed-form solution. 3) This closed-form solution allows the models to be between one and five orders of magnitude faster in training and inference compared to differential equation models, and also scale better compared to other deep learning methods.

Uploaded by

suw4308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
185 views22 pages

Closed-Form Continuous-Time Neural Networks: Nature Machine Intelligence

1) Continuous-time neural networks modeled by differential equations can model spatiotemporal data but are limited by numerical solvers. 2) The paper presents a closed-form solution to approximate the interaction between neurons and synapses in liquid time-constant networks, which has previously had no known closed-form solution. 3) This closed-form solution allows the models to be between one and five orders of magnitude faster in training and inference compared to differential equation models, and also scale better compared to other deep learning methods.

Uploaded by

suw4308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

nature machine intelligence

Article https://doi.org/10.1038/s42256-022-00556-7

Closed-form continuous-time neural


networks

Received: 23 March 2022 Ramin Hasani 1,5 , Mathias Lechner1,2,5, Alexander Amini1,
Lucas Liebenwein 1, Aaron Ray1, Max Tschaikowski3, Gerald Teschl 4
&
Accepted: 5 October 2022
Daniela Rus1
Published online: 15 November 2022

Check for updates Continuous-time neural networks are a class of machine learning systems
that can tackle representation learning on spatiotemporal decision-making
tasks. These models are typically represented by continuous differential
equations. However, their expressive power when they are deployed on
computers is bottlenecked by numerical differential equation solvers.
This limitation has notably slowed down the scaling and understanding of
numerous natural physical phenomena such as the dynamics of nervous
systems. Ideally, we would circumvent this bottleneck by solving the given
dynamical system in closed form. This is known to be intractable in general.
Here, we show that it is possible to closely approximate the interaction
between neurons and synapses—the building blocks of natural and artificial
neural networks—constructed by liquid time-constant networks efficiently
in closed form. To this end, we compute a tightly bounded approximation
of the solution of an integral appearing in liquid time-constant dynamics
that has had no known closed-form solution so far. This closed-form
solution impacts the design of continuous-time and continuous-depth
neural models. For instance, since time appears explicitly in closed
form, the formulation relaxes the need for complex numerical solvers.
Consequently, we obtain models that are between one and five orders of
magnitude faster in training and inference compared with differential
equation-based counterparts. More importantly, in contrast to ordinary
differential equation-based continuous networks, closed-form networks
can scale remarkably well compared with other deep learning instances.
Lastly, as these models are derived from liquid networks, they show good
performance in time-series modelling compared with advanced recurrent
neural network models.

Continuous neural network architectures built by ordinary differential sharing, adaptive computations and function approximation for
equations (ODEs)2 are expressive models useful in modelling data with non-uniformly sampled data.
complex dynamics. These models transform the depth dimension of These continuous-depth (time) models have shown promise in
static neural networks and the time dimension of recurrent neural density estimation applications3–6, as well as modelling sequential and
networks (RNNs) into a continuous vector field, enabling parameter irregularly sampled data1,7–9.

Massachusetts Institute of Technology, Cambridge, MA, USA. 2Institute of Science and Technology Austria, Klosterneuburg, Austria. 3Aalborg University,
1

Aalborg, Denmark. 4University of Vienna, Vienna, Austria. 5These authors contributed equally: Ramin Hasani, Mathias Lechner. e-mail: rhasani@mit.edu

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 992


Article https://doi.org/10.1038/s42256-022-00556-7

Postsynaptic neuron
We solve this in
Presynaptic stimuli closed form
dx(t) x(t)
I(t) =– + S(t) x(t) =
dt τ 1
(x(0) – A) e–[τ +f(I(t))]t f(–I(t)) +A
This is a LTC DE instance
x(t)

Synapses x(t) Postsynaptic neuron’s potential


A Synaptic reversal potential
f(·) Synaptic release nonlinearity
S(t) = f(I(t)) (A – x(t)) τ Postsynaptic neuron’s time constant

Fig. 1 | Neural and synapse dynamics. A postsynaptic neuron receives the is a fundamental building block of LTC networks1, for which there is no known
stimuli I(t) through a nonlinear conductance-based synapse model. Here, S(t) closed-form expression. Here, we provide an approximate solution for this
stands for the synaptic current. The dynamics of the membrane potential of this equation which shows the interaction of nonlinear synapses with postsynaptic
postsynaptic neuron are given by the DE presented in the middle. This equation neurons in closed form.

While ODE-based neural networks with careful memory and gradi- nonlinear transmission of neurotransmitters, the probability of activa-
ent propagation design9 perform competitively with advanced discre- tion of receptors and the concentration of available neurotransmitters,
tized recurrent models on relatively small benchmarks, their training among other nonlinearities (see S(t) in Fig. 1) and (3) the propagation
and inference are slow owing to the use of advanced numerical differ- of information between neurons is induced by feedback and memory
ential equation (DE) solvers10. This becomes even more troublesome apparatuses (see how I(t) stimulates x(t) through a nonlinear synapse
as the complexity of the data, task and state space increases (that is, S(t) which also has a multiplicative difference of potential to the post-
requiring more precision)11, for instance, in open-world problems such synaptic neuron accounting for a negative feedback mechanism).
as medical data processing, self-driving cars, financial time-series and One could read I(t) as a mixture of exogenous input to the (neural)
physics simulations. network and presynaptic inputs from other neurons that result in a
The research community has developed solutions for resolving depolarization x(t). This depolarization is mediated by the current
this computational overhead and for facilitating the training of neural S(t) that depends upon depolarization and a reversal threshold A. LTC
ODEs, for instance by relaxing the stiffness of a flow by state augmenta- networks1, which are expressive continuous-depth models obtained by
tion techniques4,12, reformulating the forward pass as a root-finding a bilinear approximation20 of a neural ODE formulation2, are designed
problem13, using regularization schemes14–16 or improving the inference on the basis of these mechanisms. Correspondingly, we take their ODE
time of the network17. semantics and approximate a closed-form solution for the scalar case of
Here, we derive a closed-form continuous-depth model that has a postsynaptic neuron receiving an input stimulus from a presynaptic
the modelling capabilities of ODE-based models but does not require source through a nonlinear synapse.
any solver to model data (Fig. 1). To this end, we apply the theory of linear ODEs21 to analytically solve
Intuitively, in this work, we replace the integration (that is, solu- the dynamics of an LTC DE as shown in Fig. 1. We then simplify the solu-
tion) of a nonlinear DE describing the interaction of a neuron with its tion to the point where there is one integral left to solve. This integral
t
input nonlinear synaptic connections, with their corresponding non- compartment, ∫0 f(I(s)) dsin which f is a positive, continuous, monotoni-
linear operators. This could be achieved in principle using functional cally increasing and bounded nonlinearity, is challenging to solve in
Taylor expansions (in the spirit of the Volterra series)18. However, in closed form since it has dependencies on an input signal I(s) that is
the particular case of liquid time-constant (LTC) networks, we can arbitrarily defined (such as real-world sensory readouts). To approach
leverage a closed-form expression for the system’s response to input. this problem, we discretize I(s) into piecewise constant segments and
This allows one to evaluate the system’s response to exogenous input obtain the discrete approximation of the integral in terms of the sum
(I) and recurrent inputs from hidden states (x) as a function of time. of piecewise constant compartments over intervals. This piecewise
One way of looking at this is to regard the closed-form solution as the constant approximation inspired us to introduce an approximate
t
application of a nonlinear forward operator to the inputs of each hid- closed-form solution for the integral ∫0 f(I(s)) ds that is provably tight
den state or neuron in the network, where the outputs of one neuron when the integral appears as the exponent of an exponential decay,
constitute the inputs for others. Effectively, this rests on approximating which is the case for LTCs. We theoretically justify how this closed-form
a conductance-based model with a neural mass model, of the kind used solution represents LTCs’ ODE semantics and is as expressive (Fig. 1).
in the dynamic causal modelling of real neuronal networks19.
The proposed continuous neural networks yield considerably Explicit time dependence
faster training and inference speeds while being as expressive as their We then dissect the properties of the obtained closed-form solution and
ODE-based counterparts. We provide a derivation for the approximate design a new class of neural network models we call closed-form
closed-form solution to a class of continuous neural networks that explic- continuous-depth networks (CfC). CfCs have an explicit time dependence
itly models time. We demonstrate how this transformation can be formu- in their formulation that does not require a numerical ODE solver to obtain
lated into a novel neural model and scaled to create flexible, performant their temporal rollouts. Thus, they maximize the trade-off between accu-
and fast neural architectures on challenging sequential datasets. racy and efficiency of solvers. Formally, this property corresponds to
obtaining lower time complexity for models without numerical insta-
Deriving an approximate closed-form solution for neural bilities and errors as illustrated in Table 1 (left). For example, Table 1 (left)
interactions shows that the complexity of a pth-order numerical ODE solver is 𝒪𝒪(Kp),
Two neurons interact with each other through synapses as shown in where K is the number of ODE steps, while a CfC system (which has explicit
Fig. 1. There are three principal mechanisms for information propaga- time dependence) requires 𝒪𝒪(K)̃ , where K is the exogenous input time
tion in natural brains that are abstracted away in the current building steps, which are typically one to three orders of magnitude smaller than
blocks of deep learning systems: (1) neural dynamics are typically con- K. Moreover, the approximation error of a pth-order numerical ODE solver
tinuous processes described by DEs (see the dynamics of x(t) in Fig. 1), scales with 𝒪𝒪(ϵp+1 ), whereas CfCs are closed-form continuous-time sys-
(2) synaptic release is much more than scalar weights, involving a tems, thus the notion of approximation error becomes irrelevant to them.

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 993


Article https://doi.org/10.1038/s42256-022-00556-7

Table 1 | Computational complexity of models

Time complexity Sequence and time-step prediction complexity


Method Complexity Local error Model Sequence prediction Time-step prediction

pth-order solver 𝒪𝒪(Kp) 𝒪𝒪(ϵp+1 ) RNN 𝒪𝒪(nk) 𝒪𝒪(k)


Adaptive-step solver — 𝒪𝒪(ϵ̃p+1 ) ODE-RNN 𝒪𝒪(nkp) 𝒪𝒪(kp)
Euler hypersolver 𝒪𝒪(K) 𝒪𝒪(δϵ ) 2 Transformer 2
𝒪𝒪(n k) 𝒪𝒪(nk)
pth-order hypersolver 𝒪𝒪(Kp) 𝒪𝒪(δϵ p+1
) CfC 𝒪𝒪(nk) 𝒪𝒪(k)
CfC (current work) 𝒪𝒪(K)̃ Not relevant

Left: The time complexity of the process to compute K solver steps. ϵ is step size. ϵ̃ is the maximum step size and δ ≪ 0. K̃ is the time steps for CfCs corresponding to the input time step, which
is typically one to three orders of magnitude smaller than K. The left portion is reproduced with permission from ref. 17. Right: Sequence and time-step prediction complexity. n is the sequence
length. k is the number of hidden units. p is the order of the ODE solver.

This explicit time dependence allows CfCs to perform computa- The hidden state of an LTC network is determined by the solution
tions at least one order of magnitude faster in terms of training and of the following initial value problem (IVP)1:
inference time compared with their ODE-based counterparts, without
loss of accuracy. dx
= − [wτ + f(x, I, θ)] ⊙ x(t) + A ⊙ f(x, I, θ), (1)
dt
Sequence and time-step prediction efficiency
In sequence modelling tasks, one can perform predictions based on an where at a time step t, x(D×1)(t) defines the hidden state of a LTC layer
entire sequence of observations, or perform auto-regressive modelling with D cells, and I(m×1)(t) is an exogenous input to the system with m
(D×1)
where the model predicts the next time-step output given the current features. Here, wτ is a time-constant parameter vector, A(D×1) is a bias
time-step input. Table 1 (right) depicts the time complexity of different vector, f is a neural network parametrized by θ and ⊙ is the Hadamard
neural network instances at inference, for a given sequence of length product. The dependence of f(.) on x(t) denotes the posibility of having
n and a neural network of k number of hidden units. We observe that recurrent connections.
the complexity of ODE-based networks and Transformer modules is at The full proof of theorem 1 is given in Methods. The theorem for-
least an order of magnitude higher than that of discrete RNNs and CfCs mally demonstrates that the approximated closed-form solution for the
in both sequence prediction and auto-regressive modelling (time-step given LTC system is given by equation (2) and that this approximation
prediction) frameworks. is tightly bounded with bounds given in the proof.
This is desirable because not only do CfCs establish a continuous In the following, we show an illustrative example of this tightness
flow similar to ODE models1 to achieve better expressivity in representa- result in practice. To do this, we first present an instantiation of LTC
tion learning but they do so with the efficiency of discrete RNN models. networks and their approximate closed-form expressions. Extended
Data Fig. 1 shows a liquid network with two neurons and five synaptic
CfCs: flexible deep models for sequential tasks connections. The network receives an input signal I(t). Extended Data
Additionally, CfCs are equipped with novel time-dependent gat- Fig. 1 further derives the DE expression for the network along with its
ing mechanisms that explicitly control their memory. CfCs are as closed-form approximate solution. In general, it is possible to com-
expressive as their ODE-based peers and can be supplied with mixed pile an LTC network into its closed-form expression as illustrated in
memory architectures9 to avoid gradient issues in sequential data Extended Data Fig. 1. This compilation can be performed using Algo-
processing applications with long-range dependences. Beyond rithm 1 provided in Methods.
accuracy and performance metrics, our results indicate that, when
considering accuracy per compute time, CfCs exhibit over 150 fold Theorem 1
improvements over ODE-based compartments. We perform a diverse Given an LTC system determined by the IVP in equation (1), constructed by
set of advanced time-series modelling experiments and present one cell, receiving a single-dimensional time-series exogenous input I(t)
the performance and speed gain achievable by using CfCs in tasks with no self-connections, the following expression is an approximation
with long-term dependences, irregular data and modelling physical of its closed-form solution:
dynamics, among others.
x(t) ≈ (x0 − A)e−[wτ +f(I(t),θ)]t f(−I(t), θ) + A. (2)
Deriving a closed-form solution
In this section, we derive an approximate closed-form solution for Tightness of the closed-form solution in practice
LTC networks, an expressive subclass of time-continuous models. We Figure 2 shows an LTC-based network trained for autonomous driving22.
discuss how the scalar closed-form expression derived from a small LTC The figure further illustrates how close the proposed solution fits the
system can inspire the design of CfC models. In this regard, we define actual dynamics exhibited from a single-neuron ODE given the same
the LTC semantics. We then state the main theorem that computes a parametrization. The details of this experiment are given in Methods.
closed-form approximation of a given LTC system for the scalar case. To We next show how to design a novel neural network instance
prove the theorem, we first find the integral solution of the given LTC inspired by this closed-form solution that has well-behaved gradient
ODE system. We then compute a closed-form analytical solution for the properties and approximation capabilities.
integral solution for the case of piecewise constant inputs. Afterward,
we generalize the closed-form solution of the piecewise constant inputs Designing CfC models from the solution
to the case of arbitrary inputs with our novel approximation and finally Leveraging the scalar closed-form solution expressed by equation (2),
provide sharpness results (that is, measure the rate and accuracy of an we can now distil this model into a neural network model that can be
approximation error) for the derived solution. trained at scale. The solution provides a grounded theoretical basis

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 994


Article https://doi.org/10.1038/s42256-022-00556-7

LTC module also replace A with another neural network instance, h(. ) to enhance the
flexibility of the model. To obtain a more general network architecture,
we allow the nonlinearity f(−x, −I; θ) present in equation (3) to have both
Input stream
Outputs
shared (backbone) and independent (g(. )) network compartments.

Gating balance
Perception module The time-decaying sigmoidal term can play a gating role if we addition-
Dynamics of each node
ally multiply h(. ) with (1 − σ(. )). This way, the time-decaying sigmoid
function stands for a gating mechanism that interpolates between the
dx Inputs I(t)
= –(wτ + f(x,I)) x(t) + A f(x,I) two limits of t → −∞ and t → ∞ of the ODE trajectory.
dt Neuron’s state x(t)
Nonlinearity f(·)
Parameters wτ , A Backbone
Output neuron dynamics

Instead of learning all three neural network instances f, g and h sepa-


rately, we have them share the first few layers in the form of a backbone
ODE LTC that branches out into these three functions. As a result, the backbone
allows our model to learn shared representations, thereby speeding up
CfC Closed-form and stabilizing the learning process. More importantly, this architec-
solution of LTC
tural prior enables two simultaneous benefits: (1) Through the shared
backbone, a coupling between the time constant of the system and its
Time (s) state nonlinearity is established that exploits causal representation
x(t) = (x(0) – A) e–[wτ+ f(x,I)]t
f(–x,–I) + A
learning evident in a liquid neural network1,24. (2) through separate
head network layers, the system has the ability to explore temporal
Fig. 2 | Tightness of the closed-form solution in practice. We approximate a
and structural dependences independently of each other.
closed-form solution for LTC networks1 while largely preserving the trajectories
These modifications result in the CfC neural network model:
of their equivalent ODE systems. We develop our solution into CfC models that
are at least 100 fold faster than neural ODEs at both training and inference on x(t) = σ(−f(x, I; θf )t) ⊙g(x, I; θg ) + ⏟⎵
[1 − σ(−[f(x,
⏟⎵⎵⎵⏟⎵⎵⎵⏟ ⎵⎵ ⎵⎵⎵⏟⎵⎵I;⎵θ
⎵f⎵)]t)]
⎵⏟ ⊙h(x, I; θh ).
complex time-series prediction tasks.
time-continuous gating time-continuous gating
(4)

The CfC architecture is illustrated in Extended Data Fig. 2. The neu-


for solving scalar continuous-time dynamics, and it is important to ral network instances could be selected arbitrarily. The time complexity
translate this theory into a practical neural network model which can of the algorithm is equivalent to that of discretized recurrent net-
be integrated into larger representation learning systems equipped works25, being at least one order of magnitude faster than ODE-based
with gradient descent optimizers. Doing so requires careful atten- networks.
tion to potential gradient and expressivity issues that can arise during
optimization, which we will outline in this section. The procedure to account for the explicit time dependence
Formally, the hidden states, x(t)(D×1) with D hidden units at each CfCs are continuous-depth models that can set their temporal behav-
time step t, can be obtained explicitly as iour based on the task under test. For time-variant datasets (for exam-
ple, irregularly sampled time series, event-based data and sparse data),
x(t) = B ⊙ e−[wτ +f(x,I;θ)]t ⊙ f(−x, −I; θ) + A, (3) the t for each incoming sample is set based on its time stamp or order.
For sequential applications where the time of the occurrence of a sam-
where B(D) collapses (x0 − A) of equation (2) into a parameter vector. A(D) ple does not matter, t is sampled as many times as the batch length, with
(D)
and wτ are system’s parameter vectors, while I(t) (m×1) is an equidistant intervals within two hyperparameters a and b.
m-dimensional input at each time step t, f is a neural network para-
(m×D) (D×D) (D)
metrized by θ = {WIx , Wxx , bx } and ⊙ is the Hadamard Experiments with CfCs
(element-wise) product. While the neural network presented in equa- We now assess the performance of CfCs in a series of sequential data
tion (3) can be proven to be a universal approximator as it is an approx- processing tasks compared with advanced, recurrent models. We
imation of an ODE system1,2, in its current form, it has trainability issues first approach solving conventional sequential data modelling tasks
which we point out and resolve shortly. (for example, bit-stream prediction, sentiment analysis on text data,
medical time-series prediction, human activity recognition, sequential
Resolving the gradient issues image processing and robot kinematics modelling), and compare CfC
The exponential term in equation (3) drives the system’s first part variants with an extensive set of advanced RNN baselines. We then
(exponentially fast) to 0 and the entire hidden state to A. This issue evaluate how CfCs compare with LTC-based neural circuit policies
becomes more apparent when there are recurrent connections and (NCPs)22 in real-world autonomous lane-keeping tasks.
causes vanishing gradient factors when trained by gradient descent23.
To reduce this effect, we replace the exponential decay term with a CfC network variants
reversed sigmoidal nonlinearity σ(.). This nonlinearity is approximately To evaluate the proposed modifications we applied to the closed-form
1 at t = 0 and approaches 0 in the limit t → ∞. However, unlike exponential solution network described by equation (3), we test four variants of the
decay, its transition happens much more smoothly, yielding a better CfC architecture: (1) the closed-form solution network (Cf-S) obtained
condition on the loss surface. by equation (3), (2) the CfC without the second gating mechanism
(CfC-noGate), a variant that does not have the 1 − σ instance shown in
Replacing biases by learnable instances Extended Data Fig. 2, (3) The CfC model (CfC) expressed by equation
Next, we consider the bias parameter B to be part of the trainable param- (4) and (4) the CfC wrapped inside a mixed memory architecture (that
eters of the neural network f( − x, − I; θ) and choose to use a new network is, where the CfC defines the memory state of an RNN, for instance,
instance instead of f (presented in the exponential decay factor). We a long short-term memory (LSTM)), a variant we call CfC-mmRNN.

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 995


Article https://doi.org/10.1038/s42256-022-00556-7

Each of these four proposed variants leverages our proposed solution Table 2 | Human activity recognition, per time-step
and thus is at least one order of magnitude faster than continuous-time classification
ODE models.
To investigate their representation learning power, in the following Model Accuracy (%) Time per epoch
(min)
we extensively evaluate CfCs on a series of sequence modelling tasks.
The objective is to test the effectiveness of the CfCs in learning spati- †RNN-Impute7 79.50 ± 0.8 0.38
otemporal dynamics, compared with a wide range of advanced models. †RNN-Δt 7
79.50 ± 0.8 0.45
†RNN-Decay7 80.00 ± 1.0 0.39
Baselines
We compare CfCs with a diverse set of advanced algorithms developed †GRU-D 51
80.60 ± 0.7 0.15
for sequence modelling by both discretized and continuous mecha- †RNN-VAE7 34.30 ± 4.0 2.63
nisms. These baselines are given in full in Methods. †Latent-ODE-RNN 7
83.50 ± 1.0 7.71
†ODE-RNN7 82.90 ± 1.6 3.15
Human activity recognition
7
The human activity dataset contains 6,554 sequences of humans dem- †Latent-ODE-ODE7 84.60 ± 1.3 8.49
onstrating activities such as walking, lying, sitting, etc. The input space Cf-S (current work) 87.04 ± 0.47 0.097
is formed of 561-dimensional inertial sensor measurements per time CfC-noGate (current work) 85.57 ± 0.34 0.093
step, recorded from the user’s smartphone26, being categorized into
CfC (current work) 84.87 ± 0.42 0.084
six group of activities (per time step) as output.
We set up our dataset split (training, validation and test) to care- CfC-mmRNN (current work) 85.97 ± 0.25 0.128
fully reflect the modifications made by Rubanova et al.7 on this task. Numbers represent mean ± s.d. (n = 5). The performance of the models marked by † is reported
The results of this experiment are reported in Table 2. We observe from ref. 7. Bold values indicate the highest accuracy and best time per epoch (min).

that not only do the CfC variants Cf-S, CfC-noGate and CfC-mmRNN
outperform other models with a high margin, but they do so with a
speed-up of more than 8,752% over the best-performing ODE-based Extended Data Fig. 5 compares the performance of many RNN
instance (Latent-ODE-ODE). The reason for such a large speed dif- baselines. Many architectures such as Augmented LSTM, CT-GRU,
ference is the complexity of the dataset dynamics that causes the GRU-D, ODE-LSTM, coRNN and Lipschitz RNN, and all variants of CfC,
ODE solvers of ODE-based models such as Latent-ODE-ODE to com- can successfully solve the task with 100% accuracy when the bit-stream
pute many steps upon stiff dynamics. This issue does not exist for samples are equidistant from each other. However, when the bit-stream
closed-form models as they do not use any ODE solver to account samples arrive at non-uniform distances, only architectures that are
for dynamics. The hyperparameter details of this experiment are immune to the vanishing gradient in irregularly sampled data can solve
provided in Extended Data Fig. 3. the task. These include GRU-D, ODE-LSTM, CfC and CfC-mmRNNs.
ODE-based RNNs cannot solve the event-based encoding tasks regard-
Physical dynamics modelling less of their choice of solvers, as they have vanishing/exploding gradi-
The Walker2D dataset consists of kinematic simulations of the MuJoCo ent issues9. The hyperparameter details of this experiment are provided
physics engine27 (see Methods for more details). As shown in Table 3, in Extended Data Fig. 4.
CfCs outperform the other baselines by a large margin, supporting
their strong capability to model irregularly sampled physical dynamics PhysioNet Challenge
with missing phases. It is worth mentioning that, on this task, CfCs even The PhysioNet Challenge 2012 dataset considers the prediction of the
outperform transformers by a considerable, 18% margin. The hyperpa- mortality of 8,000 patients admitted to the intensive care unit. The
rameter details of this experiment are provided in Extended Data Fig. 3. features represent time series of medical measurements taken dur-
ing the first 48 h after admission. The data are irregularly sampled in
Event-based sequential image processing time and over features, that is, only a subset of the 37 possible features
We next assess the performance of CfCs on a challenging sequential is given at each time point. We perform the same test–train split and
image processing task. This task is generated from the sequential modi- preprocessing as in ref. 7, and report the area under the curve (AUC)
fied National Institute of Standards and Technology (MNIST) dataset on the test set as a metric in Extended Data Fig. 6. We observe that
following the steps described in Methods. Moreover, the hyperparam- CfCs perform competitively to other baselines while performing 160
eter details of this experiment are provided in Extended Data Fig. 4. times faster in terms of training time compared with ODE-RNN and
Table 4 summarizes the results on this event-based sequence 220 times compared with continuous latent models. CfCs are also, on
classification task. We observe that models such as ODE-RNN, CT-RNN, average, three times faster than advanced discretized gated recurrent
GRU-ODE and LSTMs struggle to learn a good representation of the models. The hyperparameter details of this experiment are provided
input data and therefore show poor performance. In contrast, RNNs in Extended Data Fig. 7.
endowed with explicit memory, such as bi-directional RNNs, GRU-D,
Lipschitz RNN, coRNN, CT-LSTM and ODE-LSTM, perform well on Sentiment analysis using IMDB
this task. All CfC variants perform well on this task and establish the The Internet Movie Database (IMDB) sentiment analysis dataset28 con-
state-of-the-art on this task, with CfC-mmRNN achieving 98.09% and sists of 25,000 training and 25,000 test sentences (see Methods for
CfC-noGate achieving 96.99% accuracy in classifying irregularly sam- more details). Extended Data Fig. 8 shows how CfCs equipped with
pled sequences. It is worth mentioning that they do so around 200– mixed memory instances outperform advanced RNN benchmarks. The
400% faster than ODE-based models such as GRU-ODE and ODE-RNN. hyperparameter details of this experiment are provided in Extended
Data Fig. 7.
Regularly and irregularly sampled bit-stream XOR
The bit-stream XOR dataset9 considers the classification of bit streams Performance of CfCs in autonomous driving
by implementing an XOR function in time. That is, each item in the In this experiment, our objective is to evaluate how robustly CfCs
sequence contributes equally to the correct output. The details are learn to perform autonomous navigation in comparison with their
given in Methods. ODE-based counterparts, LTC networks. The task is to map incoming

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 996


Article https://doi.org/10.1038/s42256-022-00556-7

Table 3 | Per time-step regression Table 4 | Event-based sequence classification on irregularly


sequential MNIST
Model Mean Squared Error Time per epoch
(MSE) (min) Model Accuracy (%) Time per epoch
(min)
†ODE-RNN7 1.904 ± 0.061 0.79
†CT-RNN48 1.198 ± 0.004 0.91 ODE-RNN7 72.41 ± 1.69 14.57

†AugmentedLSTM 44
1.065 ± 0.006 0.10 CT-RNN48 72.05 ± 0.71 17.30

†CT-GRU49 1.172 ± 0.011 0.18 Augmented LSTM44 82.10 ± 4.36 2.48

†RNN-Decay7 1.406 ± 0.005 0.16 CT-GRU 49


87.51 ± 1.57 3.81

†Bi-directional RNN 53
1.071 ± 0.009 0.39 RNN-Decay7 88.93 ± 4.06 3.64

†GRU-D51 1.090 ± 0.034 0.11 Bi-directional RNN 7


94.43 ± 0.23 8.097

†PhasedLSTM 52
1.063 ± 0.010 0.25 GRU-D51 95.44 ± 0.34 3.42

†GRU-ODE7 1.051 ± 0.018 0.56 PhasedLSTM 52


86.79 ± 1.57 5.69

†CT-LSTM 50
1.014 ± 0.014 0.31 GRU-ODE7 80.95 ± 1.52 6.76

†ODE-LSTM9 0.883 ± 0.014 0.29 CT-LSTM50 94.84 ± 0.17 3.84

coRNN 57
3.241 ± 0.215 0.18 coRNN57 94.44 ± 0.24 3.90

Lipschitz RNN 58
1.781 ± 0.013 0.17 Lipschitz RNN58 95.92 ± 0.16 3.86

LTC 1
0.662 ± 0.013 0.78 ODE-LSTM9 95.73 ± 0.24 6.35

Transformer 36
0.761 ± 0.032 0.80 Cf-S (current work) 95.23 ± 0.16 2.73

Cf-S (current work) 0.948 ± 0.009 0.12 CfC-noGate (current work) 96.99 ± 0.30 3.36

CfC-noGate (current work) 0.650 ± 0.008 0.21 CfC (current work) 95.42 ± 0.21 3.62

CfC (current work) 0.643 ± 0.006 0.08 CfC-mmRNN (current work) 98.09 ± 0.18 5.50
Test accuracy shown as mean ± s.d. (n = 5). Bold values indicate the highest accuracy and best
CfC-mmRNN (current work) 0.617 ± 0.006 0.34
time per epoch (min).
Modelling the physical dynamics of a walker agent in simulation. Numbers present mean ± s.d.
(n = 5). The performance of the models marked by † is reported from ref. 9. Bold values indicate
the lowest error and best time per epoch (min).

guarantee invertibility (that is, under uniqueness conditions6, one


can run them backwards in time). CfCs only approximate ODEs and
high-dimensional pixel observations to steering curvature commands. therefore no longer necessarily form a bijection32.
The details of this experiment are given in Methods.
We observe that CfCs similar to NCPs demonstrate a consistent What are the limitations of CfCs?
attention pattern in each subtask while maintaining their attention CfCs might express vanishing gradient problems. To avoid this, for tasks
profile under heavy noise as depicted in Extended Data Fig. 10c. This is that require long-term dependences, it is better to use them together
while the attention profile of other networks such as CNNs and LSTMs with mixed memory networks9 (as in the CfC variant CfC-mmRNN) or
is hindered by added input noise (Extended Data Fig. 10c). with proper parametrization of their transition matrices33,34. Moreover,
This experiment empirically validates that CfCs possess similar we speculate that inferring causality from ODE-based networks might
robustness properties to their ODE counterparts, that is, LTC-based be more straightforward than a closed-form solution24. It would also be
networks. Moreover, similar to NCPs, CfCs are parameter efficient. beneficial to assess whether verifying a continuous neural flow35 is more
They performed the end-to-end autonomous lane-keeping task with tractable by using an ODE representation of the system or its closed form.
around 4,000 trainable parameters in their RNN component (Extended For problems such as language modelling where a large amount of
Data Fig. 9). sequential data and substantial computational resources are available,
transformers36 and their variants are great choices of models. CfCs
Scope, discussion and conclusions could bring value when: (1) data have limitations and irregularities (for
We introduce a closed-form continuous-time neural model built example, medical data, financial time series, robotics37 and closed-loop
from an approximate closed-form solution of LTC networks that pos- control, and multi-agent autonomous systems in supervised and rein-
sess the strong modelling capabilities of ODE-based networks while forcement learning schemes38), (2) the training and inference efficiency
being notably faster, more accurate, and stable. These closed-form of a model is important (for example, embedded applications39–41) and
continuous-time models achieve this by explicit time-dependent gating (3) when interpretability matters42.
mechanisms and having a LTC modulated by neural networks. A discus-
sion of related research on continuous-time models is given in Methods. Ethics statement
For large-scale time-series prediction tasks, and where closed-loop All authors acknowledge the Global Research Code on the development,
performance matters24, CfCs can bring great value. This is because they implementation and communication of this research. For the purpose
capture the flexible, causal and continuous-time nature of ODE-based of transparency, we have included this statement on inclusion and
networks, such as LTC networks, while being more efficient. A discus- ethics. This work cites a comprehensive list of research from around
sion on how to use different variants of CfCs is provided in Methods. On the world on related topics.
the other hand, implicit ODE- and partial differential equation-based
models17,29–31 can be beneficial in solving continuously defined phys- Methods
ics problems and control tasks. Moreover, for generative modelling, Proof of theorem 1
continuous normalizing flows built by ODEs are the suitable choice of Proof. In the single-dimensional case, the IVP in equation (1) becomes
model as they ensure invertibility, unlike CfCs2. This is because DEs linear in x as follows:

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 997


Article https://doi.org/10.1038/s42256-022-00556-7

d
x(t) = − [wτ + f(I(t))] ⋅ x(t) + Af(I(t)). (5) ̃ = (x(0) − A)e−[wτ t+f(I(t))t] f(−I(t)) + A.
x(t) (9)
dt
̃
Then, |x(t) − x(t)| ≤ |x(0) − A|e−wτ t for all t ≥ 0. Writing c = x(0) − A for con-
Therefore, we can use the theory of linear ODEs to obtain an inte- venience, we can obtain the following sharpness results, additionally:
gral closed-form solution (section 1.10 in ref. 21) consisting of two nested 1. ̃
1
For any t ≥ 0, we have sup { (x(t) − x(t))|I ∶ [0; t] → ℝ} = e−wτ t.
integrals. The inner integral can be eliminated by means of integration c
1
by substitution43. The remaining integral expression can then be solved 2. For any t ≥ 0, we have inf { (x(t) − x(t))|I
̃ ∶ [0; t] → ℝ} = e−wτ t (e−t − 1).
c
in the case of piecewise constant inputs and approximated in the case
of general inputs. The three steps of the proof are outlined below. Above, the supremum and infimum are meant to be taken across all
continuous input signals. These statements settle the question about the
Integral closed-form solution of LTC worst-case errors of the approximation. The first statement implies, in
We consider the ODE semantics of a single neuron that receives some particular, that our bound is sharp.
arbitrary continuous input signal I and has a positive, bounded, con- The full proof is given in the next section. Lemma 1 demonstrates
tinuous and monotonically increasing nonlinearity f: that the integral solution we obtained and shown in equation (6) is
tightly close to the approximate closed-form solution we proposed in
d
x(t) = − [wτ + f(I(t))] ⋅ x(t) + A ⋅ [wτ + f(I(t))] . equation (9). Note that, as wτ is positively defined, the derived bound
dt
between equations (6) and (9) ensures an exponentially decaying error
as time goes by. Therefore, we have the statement of the theorem. □
Assumption. We assumed a second constant value wτ in the above
representation of a single LTC neuron. This is done to introduce sym- Proof of lemma 1
metry in the structure of the ODE, yielding a simpler expression for We start by noting that
the solution. The inclusion of this second constant may appear to t

profoundly alter the dynamics. However, as shown below, numerical ̃ = c e−wτ t [e− ∫0 f(I(s))ds − e−f(I(t))t f(−I(t))] .
x(t) − x(t)
experiments suggest that this simplifying assumption has a marginal
t
effect on the ability to approximate LTC cell dynamics. Since 0 ≤ f ≤ 1, we conclude that e− ∫0 f(I(s))ds ∈ [0; 1] and e −f(I(t))tf(−I
Using the variation of constants formula (section 1.10 in ref. 21), we (t)) ∈ [0; 1]. This shows that |x(t) − x(t)|
̃ ≤ |c|e−wτ t. To see the sharpness
obtain after some simplifications: results, pick some arbitrary small ε > 0 and a sufficiently large C > 0
t
such that f(−C) ≤ ε and 1 − ε ≤ f(C). With this, for any 0 < δ < t, we consider
x(t) = (x(0) − A)e−wτ t−∫0 f(I(s))ds + A. (6) the piecewise constant input signal I such that I(s) = −C for s ∈ [0; t − δ]
and I(s) = C for s ∈ (t − δ; t]. Then, it can be noted that
t

Analytical LTC solution for piecewise constant inputs e− ∫0 f(I(s))ds − e−f(I(t))t f(−I(t)) ≥
.
The derivation of a useful closed-form expression of x requires us to solve e−εt−δ⋅1 − e−(1−ε)⋅t ε → 1, when ε, δ → 0
t
the integral expression ∫0 f(I(s)) ds for any t ≥ 0. Fortunately, the integral
t
∫0 f(I(s)) dsenjoys a simple closed-form expression for piecewise constant Statement 1 follows by noting that there exists a family of continu-
inputs I. Specifically, assume that we are given a sequence of time points ous signals In ∶ [0; t] → ℝ such that ∣In( ⋅ )∣ ≤ C for all n ≥ 1 and In → I point-
wise as n → ∞. This is because
0 = τ0 < τ1 < τ2 < … < τn−1 < τn = ∞,
t t
lim | ∫0 f(I(s)) ds − ∫0 f(In (s)) ds | ≤
n→∞
such that τ1 , … , τn−1 ∈ ℝ and I(t) = γi for all t ∈ [τi; τi+1) with 0 ≤ i ≤n − 1. t t
lim ∫ | f(I(s)) − f(In (s))| ds ≤ lim L ∫0 |I(s) − In (s)| ds ,
Then, it holds that n→∞ 0 n→∞

t k−1 =0
∫ f(I(s)) ds = f(γk )(t − τk ) + ∑ f(γi )(τi+1 − τi ), (7)
0 i=0 where L is the Lipschitz constant of f, and the last identity is due to the
dominated convergence theorem43. To see statement 2, we first note
−1
when τk ≤ t < τk+1 for some 0 ≤ k ≤ n − 1 (as usual, one defines ∑i=0 ∶= 0). that the negation of the signal −I provides us with
With this, we have t
e− ∫0 f(−I(s))ds − e−f(−I(t))t f(I(t)) ≤
k−1
−f(γk )(t−τk )− ∑ f(γi )(τi+1 −τi )
x(t) = (x(0) − A)e−wτ t e i=0 + A, (8) e−(1−ε)(t−δ)−δ⋅0 − e−ε⋅t (1 − ε) → e−t − 1,

when τk ≤ t < τk+1 for some 0 ≤ k ≤ n − 1. While any continuous input can if ε, δ → 0. The fact that the left-hand side of the last inequality must be
t ′
be approximated arbitrarily well by a piecewise constant input43, a tight at least e −t − 1 follows by observing that e−t ≤ e− ∫0 f(I (s))ds and
approximation may require a large number of discretization points e −f(I″(t))t
f( − I″(t)) ≤ 1 for any I , I ∶ [0; t] → ℝ. □
′ ′′

τ1, …, τn. We address this next.


Compiling LTC architectures into their closed-form equivalent
Analytical LTC approximation for general inputs In general, it is possible to compile the architecture of an LTC net-
Inspired by equations (7) and (8), the next result provides an analytical work into its closed-form version. This compilation allows us to
approximation of x(t). speed up the training and inference time of ODE-based networks
as the closed-form variant does not require complex ODE solvers to
Lemma 1 compute outputs. Algorithm 1 provides the instructions on how to
For any Lipschitz continuous, positive, monotonically increasing and transfer the architecture of an LTC network into its closed-form vari-
bounded f and continuous input signal I(t), we approximate x(t) in equa- ant. Here, WAdj corresponds to the adjacency matrix that maps exog-
tion (6) as enous inputs to hidden states and the coupling among hidden states.

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 998


Article https://doi.org/10.1038/s42256-022-00556-7

This adjacency matrix can have an arbitrary sparsity (that is, there Experimental details for the Walker2D dataset
is no need to use a directed acyclic graph for the coupling between This task is designed based on the Walker2d-v2 OpenAI gym59 environ-
neurons). ment using data from four different stochastic policies. The objective
is to predict the physics state in the next time step. The training and
Algorithm 1. Translate the architecture of an LTC network into its testing sequences are provided at irregularly sampled intervals. We
closed-form variant report the squared error on the test set as a metric.
 Inputs: LTC inputs I(N×T)(t), the activity x(H×T)(t) and initial states
x(H×1)(0) of LTC neurons and the adjacency matrix for synapses Description of the event-based MNIST experiment
[(N+H)∗(N+H)]
WAdj We first sequentialize each image by transforming each 28 × 28 image
LTC ODE solver with step of Δt into a long series of length 784. The objective is to predict the class
time-instance vectors of inputs, t(1×T)
I(t)
corresponding to each image from the long input sequence. Advanced
 time-instance of LTC neurons tx(t)   ∇ time might be sampled sequence modelling frameworks such as coRNN57, Lipschitz RNN58 and
irregularly mixed memory ODE-LSTM9 can solve this task very well with accuracy
LTC neuron parameter τ(H×1) of up to 99.0%. However, we aim to make the task even more challeng-
LTC network synaptic parameters {σ(N×H), μ(N×H), A(N×H)} ing by sparsifying the input vectors with event-like irregularly sampled
 Outputs: LTC closed-form approximation of hidden state neu- mechanisms. To this end, in each vector input (that is, flattened image),
rons, x̂ (N×T) (t) we transform each consecutive occurrence of values into one event.
 xpre(t) = WAdj × [I0…IN, x0…xH]   ∇ all presynaptic signals to For instance, within the long binary vector of an image, the sequence
nodes 1, 1, 1, 1 is transformed to (1, t = 4) (ref. 9). This way, sequences of length
for ith neuron in neurons 1 to H do 784 are condensed into event-based irregularly sampled sequences of
  for j in Synapses to ith neuron do length 256 that are far more challenging to handle than equidistance
[ −tx(t) ⊙(1/τi +
1
(−σij (xpre −μij ))
))] 1
input signals. A recurrent model now has to learn to memorize input
   xî + = (x0 − Aij )e 1+e ij
⊙ (σij (xpre −μij ))
+ Aij information of length 256 while keeping track of the time lags between
1+e ij

  end for the events.


end for
return x(t)
̂ Description of the event-based XOR encoding experiment
The bit streams are provided in densely sampled and event-based
Experimental details of the tightness experiment sampled formats. The densely sampled version simply represents
We took a trained NCP22, which consists of a perception module and an an incoming bit as an input event. The event-based sampled version
LTC-based network1 that possesses 19 neurons and 253 synapses. The transmits only bit changes to the network, that is, multiple equal bits
network was trained to steer a self-driving vehicle autonomously. We are packed into a single input event. Consequently, the densely sam-
used recorded real-world test runs of the vehicle for a lane-keeping pled variant is a regular sequence classification problem, whereas
task governed by this network. The records included the inputs, the event-based encoding variant represents an irregularly sampled
outputs and all the LTC neurons’ activities and parameters. To per- sequence classification problem.
form a numerical evaluation of our theory to determine whether our
proposed closed-form solution for LTC neurons is good enough in Experimental details of the IMDB dataset experiment
practice as well, we inserted the parameters for individual neurons Each sentence corresponds to either positive or negative sentiment.
and synapses of the DEs into the closed-form solution (similar to the We tokenize the sentences in a word-by-word fashion with a vocabulary
representations shown in Extended Data Fig. 1b,c) and emulated the consisting of the 20,000 words occurring most frequently in the data-
structure of the ODE-based LTC networks. We then visualized the set. We map each token to a vector using trainable word embedding.
output neuron’s dynamics of the ODE (in blue) and of the closed-form The word embedding is initialized randomly. No pretraining of the
solution (in red). As illustrated in Fig. 2, we observed that the behav- network or word embedding is performed.
iour of the ODE is captured by the closed-form solution with a mean
squared error of 0.006. This experiment provides numerical evi- Setting of the driving experiment
dence for the tightness results presented in our theory. Hence, the It has been shown that models based on LTC networks are more robust
closed-form solution contains the main properties of liquid networks when trained on offline demonstrations and tested online in closed
in approximating dynamics. loop with their environments, in many end-to-end robot control tasks
such as mobile robots60, autonomous ground vehicles22 and autono-
Baseline models mous aerial vehicles24,61. This robustness in decision-making (that is,
The example baseline models considered include some variations of their flexibility in learning and executing the task from demonstrations
classical auto-regressive RNNs, such as an RNN with concatenated despite environmental or observational disturbances and distribu-
Δt (RNN-Δt), a recurrent model with moving average on missing val- tional shifts) originates from their model semantics that formally
ues (RNN-impute), RNN-Decay7, LSTMs44 and gated recurrent units reduces to dynamic causal models20,24. Intuitively, LTC-based networks
(GRUs)45. We also report results for a variety of encoder–decoder learn to extract a good representation of the task they are given (for
ODE-RNN-based models, such as RNN-VAE, latent variable models example, their attention maps indicate what representation they have
with RNNs, and with ODEs, all from ref. 7. learned to focus on the road with more attention to the road’s horizon)
Furthermore, we include models such as interpolation prediction and maintain this understanding under heavy distribution shifts. An
networks (IP-Net)46, set functions for time series (SeFT)47, CT-RNN48, example is illustrated in Extended Data Fig. 10.
CT-GRU49, CT-LSTM50, GRU-D51, PhasedLSTM52 and bi-directional In this experiment, we aim to investigate whether CfC models and
RNNs53. Finally, we benchmarked CfCs against competitive recent their variants, such as CfC-mmRNN, possess this robustness charac-
RNN architectures with the premise of tackling long-term depend- teristic (maintaining their attention map under distribution shifts and
ences, such as Legandre memory units54, high-order polynomial projec- added noise), similar to their ODE counterparts (LTC-based networks
tion operators (Hippo)55, orthogonal recurrent models (expRNNs)56, called NCPs22).
mixed memory RNNs such as ODE-LSTMs9, coupled oscillatory RNNs We start by training neural network architectures that pos-
(coRNN)57 and Lipschitz RNN58. sess a convolutional head stacked with the choice of RNN. The RNN

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 999


Article https://doi.org/10.1038/s42256-022-00556-7

compartment of the networks is replaced by LSTM networks, NCPs22, Description of hyperparameters


Cf-S, CfC-NoGate and CfC-mmRNN. We also trained a fully convolu- The hyperparameters used in our experimental results are as follows:
tional neural network for the sake of proper comparison. Our train- • clipnorm: the gradient clipping norm (that is, the global norm
ing pipeline followed an imitation learning approach with paired clipping threshold)
pixel-control data from a 30 Hz BlackFly PGE-23S3C red–green–blue • optimizer: the weight update preconditioner (for example,
camera, collected by a human expert driver across a variety of rural driv- Adam, Stochastic Gradient Descent with momentum, etc.)
ing environments, including different times of day, weather conditions • batch_size: the number of samples used to compute the
and seasons of the year. The original 3 h data set was further augmented gradients
to include off-orientation recovery data using a privileged controller62 • hidden size: the number of RNN units
and a data-driven view synthesizer63. The privileged controller enabled • epochs: the number of passes over the training dataset
the training of all networks using guided policy learning64. After train- • base_lr: the initial learning rate
ing, all networks were transferred on-board our full-scale autonomous • decay_lr: the factor by which the learning rate is multiplied after
vehicle (Lexus RX450H, retrofitted with drive-by-wire capability). The each epoch
vehicle was consistently started at the centre of the lane, initialized with • backbone_activation: the activation function of the backbone
each trained model and run to completion at the end of the road. If the layers
model exited the bounds of the lane, a human safety driver intervened • backbone_dr: the dropout rate of the backbone layers
and restarted the model from the centre of the road at the intervention • forget_bias: the forget gate bias (for mmRNN and LSTM)
location. All models were tested with and without noise added to the • backbone_units: the number of hidden units per backbone layer
sensory inputs to evaluate robustness. • backbone_layers: the number of backbone layers
The testing environment consisted of 1 km of private test road with • weight_decay: the L2 weight regularization factor
unlabelled lane markers, and we observed that all trained networks • τdata: the constant factor by which the elapsed time input is multi-
were able to successfully complete the lane-keeping task at a constant plied (default value 1)
velocity of 30 km h−1. Extended Data Fig. 10 provides an insight into how • init: the gain of the Xavier uniform distribution for the weight
these networks reach driving decisions. To this end, we computed the initialization (default value 1)
attention of each network while driving by using the VisualBackProp
algorithm65.
Data availability
Related works on continuous-time models All data and materials used in the analysis are openly available at https://
Continuous-time models. Machine learning, control theory github.com/raminmh/CfC under an Apache 2.0 license for the purposes
and dynamical systems merge at models with continuous-time of reproducing and extending the analysis.
dynamics60,66–69. In a seminal work, Chen et al.2,7 revived the class
of continuous-time neural networks48,70, with neural ODEs. These Code availability
continuous-depth models give rise to vector field representations All code and materials used in the analysis are openly available at
and a set of functions that were not possible to generate before with https://github.com/raminmh/CfC under an Apache 2.0 license for
discrete neural networks. These capabilities enabled flexible density the purposes of reproducing and extending the analysis (https://doi.
estimation3–5,71,72 as well as performant modelling of sequential and org/10.5281/zenodo.7135472).
irregularly sampled data1,7–9,58. In this paper, we showed how to relax
the need for an ODE solver to realize an expressive continuous-time References
neural network model for challenging time-series problems. 1. Hasani, R., Lechner, M., Amini, A., Rus, D. & Grosu, R. Liquid
time-constant networks. In Proc. of AAAI Conference on Artificial
Improving neural ODEs. ODE-based neural networks are as good Intelligence 35(9), 7657–7666 (AAAI, 2021).
as their ODE solvers. As the complexity or the dimensionality of 2. Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural
the modelling task increases, ODE-based networks demand a more ordinary differential equations. In Proc. of Advances in Neural
advanced solver that largely impacts their efficiency17, stability13,15,73–75 Information Processing Systems (Eds. Bengio, S. et al.) 6571–6583
and performance1. A large body of research has studied how to (NeurIPS, 2018).
improve the computational overhead of these solvers, for example, 3. Grathwohl, W., Chen, R. T., Bettencourt, J., Sutskever, I. &
by designing hypersolvers17, deploying augmentation methods4,12, Duvenaud, D. Ffjord: free-form continuous dynamics for scalable
pruning6 or regularizing the continuous flows14–16. To enhance the reversible generative models. In International Conference on
performance of an ODE-based model, especially in time-series mod- Learning Representations (2018). https://openreview.net/
elling tasks76, solutions for stabilizing their gradient propagation forum?id=rJxgknCcK7
have been provided9,58,77. In this work, we showed that CfCs improve 4. Dupont, E., Doucet, A. & Teh, Y. W. Augmented neural ODEs. In
the scalability, efficiency and performance of continuous-depth Proc. of Advances in Neural Information Processing Systems (Eds.
neural models. Wallach, H. et al.) 3134–3144 (NeurIPS, 2019).
5. Yang, G. et al. Pointflow: 3D point cloud generation with continuous
Which CfC variants to choose in different applications normalizing flows. In Proc. of the IEEE/CVF International Conference
Our extensive experimental results demonstrate that different CfC vari- on Computer Vision 4541–4550 (IEEE, 2019).
ants, namely Cf-S, CfC-noGate, vanilla CfC and CfC-mmRNN, achieve 6. Liebenwein, L., Hasani, R., Amini, A. & Daniela, R. Sparse flows:
comparable results to each other while one comes on top depending pruning continuous-depth models. In Proc. of Advances in Neural
on the nature of the data set. We suggest using CfC in most cases where Information Processing Systems (Eds. Ranzato, M. et al.) 22628–
the sequence length is up to a couple of hundred steps. To capture 22642 (NeurIPS, 2021).
longer-range dependences, we recommend CfC-mmRNN. The Cf-S 7. Rubanova, Y., Chen, R. T. & Duvenaud, D. Latent Neural ODEs for
variant is effective when we aim to obtain the fastest inference time. irregularly-sampled time series. In Proc. of Advances in Neural
CfC-noGate could be tested as a hyperparameter when using the vanilla Information Processing Systems (Eds. Wallach, H. et al.) 32
CfC as the primary choice of model. (NeurIPS, 2019).

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 1000


Article https://doi.org/10.1038/s42256-022-00556-7

8. Gholami, A., Keutzer, K. & Biros, G. ANODE: unconditionally 29. Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning
accurate memory-efficient gradients for neural ODEs. In nonlinear operators via deeponet based on the universal
Proceedings of the 28th International Joint Conference on Artificial approximation theorem of operators. Nat. Mach. Intell. 3,
Intelligence 730–736 (IJCAI, 2019). 218–229 (2021).
9. Lechner, M. & Hasani, R. Learning long-term dependencies in 30. Karniadakis, G. E. et al. Physics-informed machine learning.
irregularly-sampled time series. Preprint at https://arxiv.org/ Nat. Rev. Phys. 3, 422–440 (2021).
abs/2006.04418 (2020). 31. Wang, S., Wang, H. & Perdikaris, P. Learning the solution operator
10. Prince, P. J. & Dormand, J. R. High order embedded Runge–Kutta of parametric partial differential equations with physics-informed
formulae. J. Comput. Appl. Math. 7, 67–75 (1981). deeponets. Sci. Adv. 7, eabi8605 (2021).
11. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed 32. Rezende, D. & Mohamed, S. Variational inference with normalizing
neural networks: a deep learning framework for solving forward flows. In Proc. of International Conference on Machine Learning
and inverse problems involving nonlinear partial differential (Eds. Bach, F. & Blei, D.) 1530–1538 (PMLR, 2015).
equations. J. Comput. Phys. 378, 686–707 (2019). 33. Gu, A., Goel, K. & Re, C. Efficiently modeling long sequences
12. Massaroli, S., Poli, M., Park, J., Yamashita, A. & Asma, H. Dissecting with structured state spaces. In Proc. of International Conference
neural ODEs. In Proc. of 33th Conference on Neural Information on Learning Representations (2022). https://openreview.net/
Processing Systems (Eds. Larochelle, H. et al.) (NeurIPS, 2020). forum?id=uYLFoz1vlAC
13. Bai, S., Kolter, J. Z. & Koltun, V. Deep equilibrium models. Adv. 34. Hasani, R. et al. Liquid structural state-space models. Preprint at
Neural Inform. Process. Syst. 32, 690–701 (2019). https://arxiv.org/abs/2209.12951 (2022).
14. Finlay, C., Jacobsen, J.-H., Nurbekyan, L. & Oberman, A. M. How 35. Grunbacher, S. et al. On the verification of neural ODEs
to train your neural ODE: the world of Jacobian and kinetic with stochastic guarantees. Proc. AAAI Conf. Artif. Intell. 35,
regularization. In International Conference on Machine Learning 11525–11535 (2021).
(Eds. Daumé III, H. & Singh, A.) 3154–3164 (PMLR, 2020). 36. Vaswani, A. et al. Attention is all you need. In Proc. of Advances
15. Massaroli, S. et al. Stable Neural Flows. Preprint at https://arxiv. in Neural Information Processing Systems (Eds. Guyon, I. et al.)
org/abs/2003.08063 (2020). 5998–6008 (NeurIPS, 2017).
16. Kidger, P., Chen, R. T. & Lyons, T. “Hey, that’s not an ODE”: 37. Lechner, M., Hasani, R., Grosu, R., Rus, D. & Henzinger, T. A.
Faster ODE Adjoints via Seminorms. In Proceedings of the 38th Adversarial training is not ready for robot learning. In 2021 IEEE
International Conference on Machine Learning (Eds. Meila, M. & International Conference on Robotics and Automation (ICRA)
Zhang, T.) 139 (PMLR, 2021). 4140–4147 (IEEE, 2021).
17. Poli, M. et al. Hypersolvers: toward fast continuous-depth models. 38. Brunnbauer, A. et al. Latent imagination facilitates zero-shot
In Proc. of Advances in Neural Information Processing Systems transfer in autonomous racing. In 2022 International Conference
(Eds. Larochelle, H.) 21105–21117 (NeurIPS, 2020). on Robotics and Automation (ICRA) 7513–7520 (IEEE, 2021).
18. Schumacher, J., Haslinger, R. & Pipa, G. Statistical modeling 39. Hasani, R. M., Haerle, D. & Grosu, R. Efficient modeling of complex
approach for detecting generalized synchronization. Phys. Rev. E analog integrated circuits using neural networks. In Proc. of 12th
85, 056215 (2012). Conference on Ph.D. Research in Microelectronics and Electronics
19. Moran, R., Pinotsis, D. A. & Friston, K. Neural masses and 1–4 (IEEE, 2016).
fields in dynamic causal modeling. Front. Comput. Neurosci. 7, 40. Wang, G., Ledwoch, A., Hasani, R. M., Grosu, R. & Brintrup, A. A
57 (2013). generative neural network model for the quality prediction of
20. Friston, K. J., Harrison, L. & Penny, W. Dynamic causal modelling. work in progress products. Appl. Soft Comput. 85, 105683 (2019).
Neuroimage 19, 1273–1302 (2003). 41. DelPreto, J. et al. Plug-and-play supervisory control using muscle
21. Perko, L. Differential Equations and Dynamical Systems and brain signals for real-time gesture and error detection. Auton.
(Springer-Verlag, 1991). Robots 44, 1303–1322 (2020).
22. Lechner, M. et al. Neural circuit policies enabling auditable 42. Hasani, R. Interpretable Recurrent Neural Networks in
autonomy. Nat. Mach. Intell. 2, 642–652 (2020). Continuous-Time Control Environments. PhD dissertation,
23. Hochreiter, S. Untersuchungen zu dynamischen neuronalen Technische Univ. Wien (2020).
netzen. Diploma, Technische Universität München 91 (1991). 43. Rudin, W. Principles of Mathematical Analysis, 3rd edn.
24. Vorbach, C., Hasani, R., Amini, A., Lechner, M. & Rus, D. Causal (McGraw-Hill, 1976).
navigation by continuous-time neural networks. In Proc. of 44. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural
Advances in Neural Information Processing Systems (Eds. Ranzato, Comput. 9, 1735–1780 (1997).
M. et al.) 12425–12440 (NeurIPS, 2021). 45. Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation
25. Hasani, R. et al. Response characterization for auditing cell of gated recurrent neural networks on sequence modeling.
dynamics in long short-term memory networks. In Proc. of 2019 Preprint at https://arxiv.org/abs/1412.3555 (2014).
International Joint Conference on Neural Networks 1–8 (IEEE, 2019). 46. Shukla, S. N. & Marlin, B. Interpolation–prediction networks
26. Anguita, D., Ghio, A., Oneto, L., Parra Perez, X. & Reyes Ortiz, for irregularly sampled time series. In Proc. of International
J. L. A public domain dataset for human activity recognition Conference on Learning Representations (2018). https://
using smartphones. In Proc. of the 21st International European openreview.net/forum?id=r1efr3C9Ym
Symposium on Artificial Neural Networks, Computational 47. Horn, M., Moor, M., Bock, C., Rieck, B. & Borgwardt, K. Set
Intelligence and Machine Learning 437–442 (i6doc, 2013). functions for time series. In Proc. of International Conference on
27. Todorov, E., Erez, T. & Tassa, Y. MuJoCo: a physics engine for Machine Learning (Eds. Daumé III, H. & Singh, A.) 4353–4363
model-based control. In Proc. of 2012 IEEE/RSJ International (PMLR, 2020).
Conference on Intelligent Robots and Systems 5026–5033 48. Funahashi, K.-i & Nakamura, Y. Approximation of dynamical
(IEEE, 2012). systems by continuous time recurrent neural networks. Neural
28. Maas, A. et al. Learning word vectors for sentiment analysis. Netw. 6, 801–806 (1993).
In Proc. of the 49th Annual Meeting of the Association for 49. Mozer, M. C., Kazakov, D. & Lindsey, R. V. Discrete event,
Computational Linguistics: Human Language Technologies continuous time RNNs. Preprint at https://arxiv.org/abs/1710.04110
142–150 (ACM, 2011). (2017).

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 1001


Article https://doi.org/10.1038/s42256-022-00556-7

50. Mei, H. & Eisner, J. The neural Hawkes process: a neurally 68. Lu, Z., Pu, H., Wang, F., Hu, Z. & Wang, L. The expressive power of
self-modulating multivariate point process. In Proc. of 31st neural networks: a view from the width. In Proc. of Advances in
International Conference on Neural Information Processing Neural Information Processing Systems (Eds. Guyon, I. et al.) 30
Systems (Eds. Guyon, I. et al.) 6757–6767 (Curran Associates (Curran Associates, Inc 2017).
Inc., 2017). 69. Li, Q., Chen, L., Tai, C. et al. Maximum principle based algorithms
51. Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent for deep learning. J. Mach. Learn. Res. 18, 5998–6026 (2018).
neural networks for multivariate time series with missing values. 70. Cohen, M. A. & Grossberg, S. Absolute stability of global pattern
Sci. Rep. 8, 1–12 (2018). formation and parallel memory storage by competitive neural
52. Neil, D., Pfeiffer, M. & Liu, S.-C. Phased LSTM: accelerating networks. IEEE Trans. Syst. Man Cybern. 5, 815–826 (1983).
recurrent network training for long or event-based sequences. 71. Mathieu, E. & Nickel, M. Riemannian continuous normalizing
In Proc. of 30th International Conference on Neural Information flows. In Proc. of Advances in Neural Information Processing
Processing Systems (Eds. Lee, D. D. et al.) 3889–3897 (Curran Systems Vol. 33 (eds Larochelle et al.) 2503–2515 (Curran
Associates Inc., 2016). Associates, Inc., 2020).
53. Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural 72. Hodgkinson, L., van der Heide, C., Roosta, F. & Mahoney, M. W.
networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997). Stochastic normalizing flows. In Proc. of Advances in Neural
54. Voelker, A. R., Kajić, I. & Eliasmith, C. Legendre memory units: Information Processing Systems (Eds. Larochelle, H. et al.)
continuous-time representation in recurrent neural networks. 5933–5944 (NeurIPS, 2020).
In Proceedings of the 33rd International Conference on Neural 73. Haber, E., Lensink, K., Treister, E. & Ruthotto, L. IMEXnet a forward
Information Processing Systems (Eds. Wallach, H. et al.) 15570– stable deep neural network. In Proc. of International Conference
15579 (ACM, 2019). on Machine Learning (Eds. Chaudhuri, K. & Salakhutdinov, R.)
55. Gu, A., Dao, T., Ermon, S., Rudra, A. & Ré, C. Hippo: recurrent 2525–2534 (PMLR, 2019).
memory with optimal polynomial projections. In Proc. of 74. Chang, B., Chen, M., Haber, E. & Chi, E. H. AntisymmetricRNN:
Advances in Neural Information Processing Systems (Eds. a dynamical system view on recurrent neural networks. In
Larochelle, H. et al.) 1474–1487 (NeurIPS, 2020). International Conference on Learning Representations (2018).
56. Lezcano-Casado, M. & Martınez-Rubio, D. Cheap orthogonal https://openreview.net/forum?id=ryxepo0cFX
constraints in neural networks: a simple parametrization of the 75. Lechner, M., Hasani, R., Rus, D. & Grosu, R. Gershgorin loss
orthogonal and unitary group. In Proc. of International Conference stabilizes the recurrent neural network compartment of an
on Machine Learning (Eds. Chaudhuri, K. & Salakhutdinov, R.) end-to-end robot learning scheme. In Proc. of IEEE International
3794–3803 (PMLR, 2019). Conference on Robotics and Automation 5446–5452 (IEEE, 2020).
57. Rusch, T. K. & Mishra, S. Coupled oscillatory recurrent 76. Gleeson, P., Lung, D., Grosu, R., Hasani, R. & Larson, S. D. c302:
neural network (coRNN): an accurate and (gradient) stable a multiscale framework for modelling the nervous system of
architecture for learning long time dependencies. In Proc. of Caenorhabditis elegans. Philos.Trans. R. Soc. B 373, 20170379 (2018).
International Conference on Learning Representations (2021). 77. Li, X., Wong, T.-K. L., Chen, R. T. & Duvenaud, D. Scalable gradients
https://openreview.net/forum?id=F3s69XzWOia for stochastic differential equations. In Proc. of International
58. Erichson, N. B., Azencot, O., Queiruga, A., Hodgkinson, L. & Conference on Artificial Intelligence and Statistics 3870–3882
Mahoney, M. W. Lipschitz recurrent neural networks. In Proc. of (PMLR, 2020).
International Conference on Learning Representations (2021). 78. Shukla, S. N. & Marlin, B. M. Multi-time attention networks for
https://openreview.net/forum?id=-N7PBXqOUJZ irregularly sampled time series. In International Conference on
59. Brockman, G. et al. OpenAI gym. Preprint at https://arxiv.org/ Learning Representations (2020). https://openreview.net/
abs/1606.01540 (2016). forum?id=4c0J6lwQ4_
60. Lechner, M., Hasani, R., Zimmer, M., Henzinger, T. A. & Grosu, 79. Xiong, Y. et al. Nyströmformer: a Nyström-based algorithm
R. Designing worm-inspired neural networks for interpretable for approximating self-attention. In Proceedings of the AAAI
robotic control. In Proc. of International Conference on Robotics Conference on Artificial Intelligence Vol. 35, No. 16, pp.
and Automation 87–94 (IEEE, 2019). 14138–14148 (AAAI, 2021).
61. Tylkin, P. et al. Interpretable autonomous flight via compact
visualizable neural circuit policies. IEEE Robot. Autom. Lett. 7, Acknowledgements
3265–3272 (2022). This research was supported in part by the AI2050 program at
62. Amini, A. et al. Vista 2.0: An open, data-driven simulator for Schmidt Futures (grant G-22-63172), the Boeing Company, and the
multimodal sensing and policy learning for autonomous vehicles. United States Air Force Research Laboratory and the United States
In 2022 International Conference on Robotics and Automation Air Force Artificial Intelligence Accelerator and was accomplished
(ICRA) 2419–2426 (IEEE, 2022). under cooperative agreement number FA8750-19-2-1000. The views
63. Amini, A. et al. Learning robust control policies for end-to-end and conclusions contained in this document are those of the authors
autonomous driving from data-driven simulation. IEEE Robot. and should not be interpreted as representing the official policies,
Autom. Lett. 5, 1143–1150 (2020). either expressed or implied, of the United States Air Force or the U.S.
64. Levine, S. & Koltun, V. Guided policy search. In Proc. of Government. The U.S. Government is authorized to reproduce and
International Conference on Machine Learning (Eds. Dasgupta, S. distribute reprints for Government purposes, notwithstanding any
& McAllester, D.) 1–9 (PMLR, 2013). copyright notation herein. This work was further supported by The
65. Bojarski, M. et al. VisualBackProp: efficient visualization of CNNs Boeing Company and Office of Naval Research grant N00014-18-1-
for autonomous driving. In Proc. of IEEE International Conference 2830. M.T. is supported by the Poul Due Jensen Foundation, grant
on Robotics and Automation 1–8 (IEEE, 2018). 883901. M.L. was supported in part by the Austrian Science Fund
66. Zhang, H., Wang, Z. & Liu, D. A comprehensive review of stability under grant Z211-N23 (Wittgenstein Award). A.A. was supported by the
analysis of continuous-time recurrent neural networks. IEEE Trans. National Science Foundation Graduate Research Fellowship Program.
Neural Netw. Learn. Syst 25, 1229–1262 (2014). We thank T.-H. Wang, P. Kao, M. Chahine, W. Xiao, X. Li, L. Yin and Y. Ben
67. Weinan, E. A proposal on machine learning via dynamical for useful suggestions and for testing of CfC models to confirm the
systems. Commun. Math. Stat. 5, 1–11 (2017). results across other domains.

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 1002


Article https://doi.org/10.1038/s42256-022-00556-7

Author contributions Reprints and permissions information is available at


R.H. and M.L. conceptualized, proved theory, designed, performed www.nature.com/reprints.
research and analysed data. A.A. contributed to designing research,
data curation, research implementation, new analytical tools and Publisher’s note Springer Nature remains neutral with
analysed data. L.L. and A.R. contributed to the refinement of the regard to jurisdictional claims in published maps and
theory and research implementation. M.T. and G.T. proved theory and institutional affiliations.
analysed correctness. D.R. helped with the design of the research, and
guided and supervised the work. All authors wrote the paper. Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
Competing interests adaptation, distribution and reproduction in any medium or format,
The authors declare no competing interest. as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate
Additional information if changes were made. The images or other third party material in this
Extended data is available for this paper at article are included in the article’s Creative Commons license, unless
https://doi.org/10.1038/s42256-022-00556-7. indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons license and your intended
Correspondence and requests for materials should be addressed use is not permitted by statutory regulation or exceeds the permitted
to Ramin Hasani. use, you will need to obtain permission directly from the copyright
holder. To view a copy of this license, visit http://creativecommons.
Peer review information Nature Machine Intelligence thanks Karl org/licenses/by/4.0/.
Friston and the other, anonymous, reviewer(s) for their contribution to
the peer review of this work. © The Author(s) 2022, corrected publication 2022

Nature Machine Intelligence | Volume 4 | November 2022 | 992–1003 1003


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 1 | Instantiation of LTCs in ODE and closed-form representations. a) A sample LTC network with two nodes and five synapses. b) the ODE
representation of this two-neuron system. c) the approximate closed-form representation of the network.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 2 | Closed-form Continuous-depth neural architecture. A backbone neural network layer delivers the input signals into three head networks g, f
and h. f acts as a liquid time-constant for the sigmoidal time-gates of the network. g and h construct the nonlinearities of the overall CfC network.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 3 | Hyperparameters for Human activity and Walker. List of hyperparameters used to obtain results in Human activity and Walker2D
Experiments.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 4 | Hyperparameters for ET-sMNIST and Bit-stream XOR. List of hyperparameters used to obtain results in Event-based MNIST and Bit-stream
XOR Experiments.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 5 | Bit-stream XOR sequence classification. The performance values (accuracy %) for all baseline models are reproduced from9. Numbers present
mean ± standard deviations, (n=5). Note: The performance of models marked by † are reported from9. Bold declares highest accuracy and best time per epoch (min).

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 6 | PhysioNet. AUC stands for area under curve. Numbers present mean ± standard deviations, (n=5). Note: The performance of the models
marked by † are reported from 7 and the ones with * from78. Bold declares highest AUC score and best time per epoch (min).

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 7 | Hyperparameters for Physionet and IMDB. List of hyperparameters used to obtain results in Physionet and IMDB sentiment classification
experiments.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 8 | Results on the IMDB datasets. The experiment is (n=5). Note: The performance of the models marked by † are reported from55, and
performed without any pretraining or pretrained word-embeddings. Thus, we * are reported from57. The n/a standard deviation denotes that the original report
excluded advanced attention-based models78,79 such as Transformers36 and RNN of these experiments did not provide the statistics of their analysis. Bold declares
structures that use pretraining. Numbers present mean ± standard deviations, highest accuracy and best time per epoch (min).

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 9 | Lane-keeping models’ parameter count. CfC and NCP networks perform lane-keeping in unseen scenarios with a compact representation.

Nature Machine Intelligence


Article https://doi.org/10.1038/s42256-022-00556-7

Extended Data Fig. 10 | Attention Profile of networks. Trained networks tested on data collected in summer. b) results for networks tested on data
receive unseen inputs (first column in each tab) and generate acceleration and collected in winter. c) results for inputs corrupted by a zero-mean Gaussian noise
steering commands. We use the Visual-Backprop algorithm65 to compute the with variance, σ2 = 0.35.
saliency maps of the convolutional part of each network. a) results for networks

Nature Machine Intelligence

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy