0% found this document useful (0 votes)

13 views5 pages

Integrating Mamba and Transformer For Long-Short Range Time Series Forecasting

The document presents a novel hybrid architecture called Mambaformer, which integrates Mamba, a State Space Model (SSM) known for its linear time complexity, with Transformer models to enhance long-short range time series forecasting. The authors demonstrate that Mambaformer outperforms both Mamba and Transformer in forecasting tasks by leveraging the strengths of both architectures. This research marks the first attempt to combine these two approaches specifically for time series data, showcasing significant improvements in performance.

Uploaded by

r.almamlook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

Integrating Mamba and Transformer For Long-Short Range Time Series Forecasting

Uploaded by

r.almamlook

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Integrating Mamba and Transformer for Long-Short Range Time

Series Forecasting
Xiongxiao Xu1 , Yueqing Liang1 , Baixiang Huang1 , Zhiling Lan2 , Kai Shu1
1 Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
2 Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA
{xxu85,yliang40,bhuang15}@hawk.iit.edu,zlan@uic.edu,kshu@iit.edu

ABSTRACT such expensive computation is prohibitive for long distance and

Time series forecasting is an important problem and plays a key role limit Transformers to short-range time series forecasting.
arXiv:2404.14757v1 [cs.LG] 23 Apr 2024

in a variety of applications including weather forecasting, stock mar- An emerging body of research suggests that State Space Models
ket, and scientific simulations. Although transformers have proven (SSMs) [11–14, 33] have shown promising progress in sequence
to be effective in capturing dependency, its quadratic complexity of modeling problem. As a representative SSM model, Mamba achieves
attention mechanism prevents its further adoption in long-range comparable performance with Transformer in language modeling
time series forecasting, thus limiting them attend to short-range while enjoys a linear-time complexity. On the performance side,
range. Recent progress on state space models (SSMs) have shown Mamba introduces a selective mechanism to remember relevant
impressive performance on modeling long range dependency due information and filter out irrelevant information indefinitely. On the
to their subquadratic complexity. Mamba, as a representative SSM, computation side, Mamba implements a hardware-aware algorithm
enjoys linear time complexity and has achieved strong scalability for parallel training like a CNN and can be regarded as a RNN
on tasks that requires scaling to long sequences, such as language, for linear-time inference. Considering the above two advantages,
audio, and genomics. In this paper, we propose to leverage a hybrid Mamba is exceptional in handling long-range time series data.
framework Mambaformer that internally combines Mamba for Recent findings show that SSMs and Transformers are comple-
long-range dependency, and Transformer for short-range depen- mentary for language modeling [10, 17, 23]. We are interested in
dency, for long-short range forecasting. To the best of our knowl- if the observation is consistent in time series data. In this work,
edge, this is the first paper to combine Mamba and Transformer we propose to leverage a hybrid architecture Mambaformer [23]
architecture in time series data. We investigate possible hybrid that internally integrates strengths of Transformer and Mamba
architectures to combine Mamba layer and attention layer for long- for long-short range series forecasting. The comparative experi-
short range time series forecasting. The comparative study shows ments demonstrate that the Mambaformer family can integrate
that the Mambaformer family can outperform Mamba and Trans- advantages of Mamba and Transformer, thus facilitating time series
former in long-short range time series forecasting problem. The forecasting. To summarize, our contributions are as follows:
code is available at https://github.com/XiongxiaoXu/Mambaformer- • We are the first to explore the potential of the integration of
in-Time-Series. Mamba and Transformer in time series.
• We propose to adopt a hybrid architecture Mambaformer to
KEYWORDS capture long-short range dependencies in time series.
Mamba, Transformer, Time Series Forecasting • We conduct a comparative study to demonstrate the superiority of
Mambaformer family compared with Mamba and Transformer
in long-short range time series forecasting.
1 INTRODUCTION
Time series forecasting is an important problem and has been widely 2 RELATED WORK
used in many real-world scenarios, including weather forecast [1],
stock market [27], and scientific simulations [38]. For example, in 2.1 Time Series Forecasting
scientific simulation, researcher are interested in building a surro- Time series forecasting research has existed for a long time. The ear-
gate model built on machine leaning model to forecast behaviors lier researchers leverage statistical and traditional machine learning
of supercomputer across timescales to accelerate simulations, thus methods such as ARIMA [3], simple neural network [6], and sup-
bypassing billion even trillion events [9]. port vector machine [15], to forecast road traffic. However, these
Deep learning models, especially Transformer-based models, approaches are relatively weak due to their oversimplified assump-
have achieved progress in time series forecasting. Benefiting from tions and limited representation capabilities. Although more ex-
the attention mechanism, Transformers can achieve great advan- pressive deep leaning models including RNN [8] and LSTM [40]
tages and depict pairwise dependencies in time series data. However, are utilized to model time series data, they fall short into gradient
recent research [41] has questioned the validity of Transformer- vanishing problem [30] when dealing with long-range sequences.
based forecaster with a linear model. Although the effectiveness Inspired from the success of Transformer [31] models in text data, a
of Transformer-based models are proven in later work [20, 22], the variety of variants of Transformer [16, 19, 20, 22, 35, 36, 42, 43] have
quadratic complexity of attention mechanism is still computation- proven effective in time series data. For example, the latest iTrans-
ally challenging. When inferring next-token, the transformer has to former [20] that simply applies the attention and feed-forward
find relationships in sequences from all past tokens. Albeit effective, network on the inverted dimensions achieve SOTA performance.
Outputs
Additionally, recent work [2, 24, 28, 34] based on SSMs proposes to
leverage Mamba for time series forecasting. For instance, TimeMa-
chine [2] utilize four Mamba blocks to capture long range depen- Forecasting Layer
dency in multivariate time series data. Different from the previous Mamba Block
work, our paper makes the first attempt to combine transformer Add & Norm
Linear
and Mamba for time series forecasting.
Mamba
Block
𝐿× ×
2.2 State Space Models and Mamba Mambaformer
State Space Models (SSMs) [11–14, 33] emerge as a promising class Layer Add & Norm SSM
of architectures for sequence modeling. S4 is a structured SSM Masked
where the specialized Hippo [12] structure is imposed on the matrix Multi-head
A to capture long-range dependency. Building upon S4, Mamba [11] Attention 𝜎 𝜎
designs a selective mechanism to filter out irrelevant information,
and a hardware-aware algorithm for efficient implementation. Bene- Conv
Add & Norm
fiting from the designs, Mamba has achieve impressive performance
Mamba
across modalities such as language, audio, and genomics while re- Mamba Linear Linear
Pre-Processing
quiring only linear complexity on the sequence length, thus poten- Block
Block
tially an alternative of Transformer. Benefiting form its modeling
capability and scalability, Mamba has recently shown significant
progress in various communities, such as computer vision [29, 44],
medical [21, 37], graph [4, 32] and recommendation [18, 39]. A note- Embedding Token Temporal
worthy line of research is to combine the Transformer and Mamba Layer Encoding Encoding
for the purpose of language modeling [10, 17, 23]. A comparative
study [23] shows Mambaformer is effective in in-context learning Inputs
tasks. Jamba [17] is the first production-grade attention-SSM hy-
brid model with 12B active and 52B total available parameters, and Figure 1: The overview of the Mambaformer.
shows desirable performance for long context. We are interested in
if the observation is consistent in time series data and propose to where A ∈ R𝑁 ×𝑁 , B ∈ R𝑁 ×1 , and C ∈ R1×𝑁 are learnable matri-
adapt Mambaformer for time series forecasting. ces. SSM can be discretized from continuous signal into discrete
sequences by a step size Δ. The discretized version is as follows:

3 PRELIMINARIES ℎ𝑡 = Aℎ𝑡 −1 + B𝑥𝑡

3.1 Problem Statement 𝑦 = Cℎ𝑡 (2)

In the long-short range time series forecasting problem, given histor- where discrete parameters (A, B) can be obtained from continuous
ical time series samples with a look-back window L = (x1, x2, .., x𝐿 ) parameters (Δ, A, B) through a discretization rule, such as zero-
with length 𝐿, where each x𝑡 ∈ R𝑀 at time step 𝑡 is with 𝑀 vari- order hold (ZOH) rule A = exp(ΔA), B = exp(ΔA) −1 (exp(ΔA) −
ates, we aim to forecast 𝐹 future values F = (x𝐿+1, x𝐿+2, .., x𝐿+𝐹 ) I) ·ΔB. After discretization, the model can be computed in two ways,
with length 𝐹 . Besides, the associated temporal context information either as a linear recurrence for inference as shown in Equation 2,
(c1, c2, .., c𝐿 ) with dimension 𝐶 is assumed to be known [16], e.g. or as a global convolution for training as the following Equation 3:
day-of-the-week and hour-of-the-day. Note that the work is under 𝑘
K = (CB, CAB, ..., CA B, ...)
the rolling forecasting setting [42] where upon the completion of
a forecast for F , the look-back window B moves forward 𝐹 steps 𝑦 =𝑥 ∗K (3)
towards the future so that models can do a next forecast.
where K is a convolutional kernel.

3.2 State Space Models 4 METHODOLOGY

The State Space Model (SSM) is a recent class of sequence model- 4.1 Overview of Mambaformer
ing framework that are broadly related to RNNs, and CNNs, and
Inspired by advantages of the hybrid architectures in language
classical state space models [14]. S4 [13] and Mamba [11] are two
modeling [23], we propose to leverage Mambaformer to integrate
representative SSMs, and they are inspired by a continuous system
Mamba and Transformer to capture long-short range dependen-
that maps an input function or sequence 𝑥 (𝑡) ∈ R to an output
cies in time series data, leading to enhanced performance. Mam-
function or sequence 𝑦 (𝑡) ∈ R through an implicit latent state
baformer adopts a decoder-only style as GPT [5, 25, 26] family.
ℎ(𝑡) ∈ R𝑁 as follows:
4.2 Embedding Layer
ℎ ′ (𝑡) = Aℎ(𝑡) + B𝑥 (𝑡)
We utilize an embedding layer to map the low-dimension time se-
𝑦 (𝑡) = Cℎ(𝑡) (1) ries data into a high-dimensional space, including token embedding
2
and temporal embedding. projection matrix W𝑂 ∈ Rℎ𝑑 𝑣 ×𝐷 , the output of attention layer
Token Embedding. To convert raw time series data into high- H2 = OW𝑂 ∈ R𝐵×𝐿×𝐷 . We adopt the masking mechanism to
dimensional tokens, we utilize a one-dimensional convolutional prevent positions from attending to subsequent positions, and set
layer as a token embedding module because it can retain local se- 𝑑𝑘 = 𝑑 𝑣 = 𝐷/ℎ following vanilla Transformer setting [31].
mantic information within the time series data [7]. Mamba Layer. To overcome computational challenges of the Trans-
Temporal Embedding. Besides numerical value itself in the se- former and be beyond the performance of Transformer, we incor-
quence, temporal context information also provides informative porate the Mamba layer into the model to enhance the capability of
clues, such as hierarchical timestamps (week, month and year) and capturing long-range time series dependency. As shown in Figure 1,
agnostic timestamps (holidays and events) [41]. We employ a linear Mamba block is a sequence-sequence module with the same dimen-
layer to embed temporal context information. sion of input and output. In particularly, Mamba takes an input
Formally, let X ∈ R𝐵×𝐿×𝑀 denote input sequences with batch H2 and expand the dimension by two input linear projection. For
size 𝐵 and C ∈ R𝐵×𝐿×𝐶 denote the associated temporal context. one projection, Mamba processes the expanded embedding through
The embedding layer can be expressed as follows: a convolution and SiLU activation before feeding into the SSM.
The core discretized SSM module is able to select input-dependent
E = 𝐸𝑡𝑜𝑘𝑒𝑛 (X) + 𝐸𝑡𝑒𝑚 (C) (4)
knowledge and filter out irrelevant information. The other pro-
where E ∈ R𝐵×𝐿×𝐷 is output embedding, 𝐷 is the dimension of jection followed by SiLU activation, as a residual connection, is
the embedding, 𝐸𝑡𝑜𝑘𝑒𝑛 and 𝐸𝑡𝑒𝑚 denote toke embedding layer and combined with the output of the SSM module through a multiplica-
temporal embedding layer, respectively. tive gate. Finally, Mamba delivers output H3 ∈ R𝐵×𝐿×𝐷 through
Note that we do not need a positional embedding typically exist- an output linear projection.
ing in Transformer model. Instead, a Mamba pre-processing block
introduced in the next subsection is leveraged to internally incor- 4.5 Forecasting Layer
porate positional information to the embedding. At this layer, we obtain forecasting resulting by a linear layer to
convert high-dimension embedding space into original dimension
4.3 Mamba Pre-Processing Layer of time series data as follows:
To endow the embedding with positional information, we pre- b = 𝐿𝑖𝑛𝑒𝑎𝑟 (H3 )
X (7)
process the sequence by a Mamba block to internally embed order
information of input tokens. Mamba can be regarded as a RNN b ∈ R𝐵×𝐿×𝑀 denotes forecasting results.
where X
where the hidden state ℎ𝑡 at current time 𝑡 is updated by the hidden
state ℎ𝑡 −1 at previous time 𝑡 − 1 as shown in the Equation 2. Such 5 EXPERIMENTS
recurrence mechanism to process tokens enables Mamba naturally 5.1 Datasets and Evaluation Metrics
consider order information of sequences. Therefore, unlike posi-
Datasets. To evaluate the Mambaformer family, we adopt three
tional encoding being an essential component in Transformer, Mam-
popular real-world datasets [41], including ETTh1, Electricity, and
baformer replace positional encoding by a Mamba pre-processing
Exchange-Rate. All of them belong to multivariate time series. The
block. The Mamba pre-processing block can be expressed as follows:
statistics of the datasets are summarized in Table 1.
H1 = 𝑀𝑎𝑚𝑏𝑎(E) (5)
Table 1: The statistics of three real-world time series datasets.
where H1 ∈ R𝐵×𝐿×𝐷is a mixing vector including token embedding,
temporal embedding, and positional information.
Datasets ETTh1 Electricity Exchange_Rate
4.4 Mambaformer Layer Variates 7 321 8
The core Mambaformer layer interleaves Mamba layer and self- Timestamps 17,420 26,304 7,588
attention layer to combine advantages of Mamba and Transformer
to facilitate long-short range time series forecasting. Metrics. We use the MSE (Mean Square Error) and MAE (Mean
Attention Layer. To inherit impressive performance of depict- Absolute Error) metrics [41] to assess the Mambaformer family.
ing short-range time series dependencies in the transformer, we
leverage masked multi-head attention layer to obtain correlations 5.2 Mambaformer Family
between tokens. In particular, each head 𝑖 = 1, 2, ..., ℎ in the atten-
𝑄 As shown in Figure 2, we conduct a comparative study to investigate
tion layer transforms the embedding H1 into queries Q𝑖 = H1 W𝑖 ,
hybrid structures of Mamba and Transformer [23]. Particularly, we
keys K𝑖 = H1 W𝑖𝐾 , and values V𝑖 = H1 W𝑉𝑖 , where W𝑖 ∈ R𝐷 ×𝑑𝑘 ,
𝑄
interleave Mamba layer and attention layer in different orders and
W𝑖𝐾 ∈ R𝐷 ×𝑑𝑘 ∈, and W𝑉𝑖 ∈ R𝐷 ×𝑑 𝑣 are learnable matrices. After- compare them with Mamba and transformer. The structures of
wards, a scaled dot-product attention is utilized: Mambaformer family are as follows:
Q𝑖 K𝑇 • Mambaformer utilizes a pre-processing Mamba block and
O𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(Q𝑖 , K𝑖 , V𝑖 ) = 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 ( √︁ 𝑖 )Vi (6) Mambaformer layer without a positional encoding.
𝑑𝑘 • Attention-Mamba Hybrid leverages a Attention-Mamba layer
where the outputs O𝑖 of each head are concatenated into a output where an attention layer is followed by a Mamba layer with a
vector O with the embedding dimension ℎ𝑑 𝑣 . Following a learnable positional encoding.
3
Mamba Mamba Attention Mamba Feed
Layer Layer Layer Layer Forward
×𝐿 ×𝐿 ×𝐿 ×𝐿 ×𝐿

Attention Attention Mamba Mamba Attention

Layer Layer Layer Layer Layer

Mamba
Layer
Positional Positional
Encoding Encoding

Token Temporal Token Temporal Token Temporal Token Temporal Token Temporal
Encoding Encoding Encoding Encoding Encoding Encoding Encoding Encoding Encoding Encoding
(a) Mambaformer (b) Attention-Mamba Hybrid (c) Mamba-Attention Hybrid (d) Mamba (e) Transformer

Figure 2: The structures of Mambaformer family and Mamba and Transformer. For illustration, we ignore the residual
connections and layer normalization associated with Mamba layer, attention layer, and feed forward layer in the figure.

• Mamba-Attention Hybrid adopts a Mamba-Attention layer Table 2: Multivariate time series forecasting results of the
where a Mamba block layer is followed by an attention layer comparative study. The values are averaged for multiple fore-
without a positional encoding. casting lengths 𝐹 ∈ {96, 192, 336, 720} where 96 and 192 corre-
spond to short-range forecasting, and 336 and 720 correspond
The other models in Figure 2 are as follows: to long-range forecasting. The length of look-back window
• Mamba leverages two Mamba block as a layer. is fixed at 𝐿 = 196. The best results are in bold and the second
• Transformer is a decoder-only Transformer architecture. best results are underlined.

Positional encoding is optional in the above architectures be-

cause Mamba layer internally consider positional information while ETTh1 Electricity Exchange
Methods
Transformer does not. For Mambaformer family, if a Mamba layer MSE MAE MSE MAE MSE MAE
is before an attention layer, the model does not need a positional Mambaformer 0.962 0.721 0.317 0.386 1.878 1.123
encoding; if not, the model needs a positional encoding. Attention-Mamba 0.995 0.792 0.349 0.409 2.029 1.126
Mamba-Attention 0.973 0.727 0.327 0.404 2.317 1.238
5.3 Comparative Performance Mamba 1.417 0.914 0.322 0.400 2.423 1.174
Transformer 0.991 0.790 0.355 0.414 2.173 1.165
We shown the comparative performance of Mambaformer and
Mamba and Transformer in Table 2. Accordingly, we have the
following observations:
6 DISCUSSION
• Mambaformer achieves superior performance compared to Mamba
and Transformer. It demonstrates Mambaformer can integrate This paper first investigates the potential of hybrid Mamba-Transformer
the strength of Mamba and Transformer, and capture both short- architecture in time series data. We propose to utilize a Mam-
range and long-range dependencies in time series data, thus out- baformer architecture for long-short range time series forecasting.
performing them. The observations are consistent with a large- We conduct a comparative study to investigate various combina-
scale hybrid mamba-transformer language model Jamba [17]. tions of Mamba and Transformer. The results show Mambaformer
• Mambaformer obtains the best performance in Mambaformer family can integrate advantages of Mamba and Transformer, thus
family. It further shows the reasonable design of Mambaformer. outperforming them in long-short range time series forecasting.
Compared to the attention-mamba hybrid architecture, Mam- This work adapts Mambaformer for time series data, but does
baformer can get the better performance. It demonstrates Mamba not compare with SOTA methods in time series forecasting. Future
layer can pre-process time series data and internally provide po- directions include (1) proposing a new hybrid Mamba-Transformer
sitional information, eliminating explicit positional encoding. architecture specifically for long-short range time series forecasting
• The performance of attention-mamba hybrid is comparable to and achieving SOTA results on comprehensive datasets, (2) scaling
mamba-attention hybrid. It indicates the order to interleave to large-scale hybrid Mamba-Transformer architecture specifically
Mamba layer and attention layer does not cause significant im- for long-short range time series forecasting, and (3) investigating
pact on performance of long-short range time series forecasting. combinations of Transformer and other sequence modeling frame-
works with subquadratic complexity in time series data.
4
REFERENCES [24] Badri N Patro and Vijay S Agneeswaran. 2024. SiMBA: Simplified Mamba-
[1] Kumar Abhishek, Maheshwari Prasad Singh, Saswata Ghosh, and Abhishek Based Architecture for Vision and Multivariate Time series. arXiv preprint
Anand. 2012. Weather forecasting model using artificial neural network. Procedia arXiv:2403.15360 (2024).
Technology 4 (2012), 311–318. [25] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018.
[2] Md Atik Ahamed and Qiang Cheng. 2024. TimeMachine: A Time Series is Worth Improving language understanding by generative pre-training. (2018).
4 Mambas for Long-term Forecasting. arXiv preprint arXiv:2403.09898 (2024). [26] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever,
[3] Mohammed S Ahmed and Allen R Cook. 1979. Analysis of freeway traffic time- et al. 2019. Language models are unsupervised multitask learners. OpenAI blog
series data by using Box-Jenkins techniques. Number 722. 1, 8 (2019), 9.
[4] Ali Behrouz and Farnoosh Hashemi. 2024. Graph Mamba: Towards Learning on [27] Omer Berat Sezer, Mehmet Ugur Gudelek, and Ahmet Murat Ozbayoglu. 2020.
Graphs with State Space Models. arXiv preprint arXiv:2402.08678 (2024). Financial time series forecasting with deep learning: A systematic literature
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, review: 2005–2019. Applied soft computing 90 (2020), 106181.
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda [28] Zhuangwei Shi. 2024. MambaStock: Selective state space model for stock predic-
Askell, et al. 2020. Language models are few-shot learners. Advances in neural tion. arXiv preprint arXiv:2402.18959 (2024).
information processing systems 33 (2020), 1877–1901. [29] Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, and Junwei Liang. 2024.
[6] Kit Yan Chan, Tharam S Dillon, Jaipal Singh, and Elizabeth Chang. 2011. Neural- VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spa-
network-based models for short-term traffic flow forecasting using a hybrid tiotemporal Forecasting. arXiv preprint arXiv:2403.16536 (2024).
exponential smoothing and Levenberg–Marquardt algorithm. IEEE Transactions [30] Igor V Tetko, David J Livingstone, and Alexander I Luik. 1995. Neural network
on Intelligent Transportation Systems 13, 2 (2011), 644–654. studies. 1. Comparison of overfitting and overtraining. Journal of chemical
[7] Ching Chang, Wen-Chih Peng, and Tien-Fu Chen. 2023. Llm4ts: Two-stage information and computer sciences 35, 5 (1995), 826–833.
fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint [31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
arXiv:2308.08469 (2023). Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
[8] Xingyi Cheng, Ruiqing Zhang, Jie Zhou, and Wei Xu. 2018. Deeptransport: you need. Advances in neural information processing systems 30 (2017).
Learning spatial-temporal dependency for traffic condition forecasting. In 2018 [32] Chloe Wang, Oleksii Tsepa, Jun Ma, and Bo Wang. 2024. Graph-mamba: Towards
International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. long-range graph sequence modeling with selective state spaces. arXiv preprint
[9] Elkin Cruz-Camacho, Kevin A Brown, Xin Wang, Xiongxiao Xu, Kai Shu, Zhiling arXiv:2402.00789 (2024).
Lan, Robert B Ross, and Christopher D Carothers. 2023. Hybrid PDES Simulation [33] Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe
of HPC Networks Using Zombie Packets. In Proceedings of the 2023 ACM SIGSIM Kong, Ju Huang, Shihao Li, Haoxiang Yang, et al. 2024. State Space Model for
Conference on Principles of Advanced Discrete Simulation. 128–132. New-Generation Network Alternative to Transformers: A Survey. arXiv preprint
[10] Mahan Fathi, Jonathan Pilault, Pierre-Luc Bacon, Christopher Pal, Orhan Firat, arXiv:2404.09516 (2024).
and Ross Goroshin. 2023. Block-state transformer. arXiv preprint arXiv:2306.09539 [34] Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Han Zhao, Daling Wang,
(2023). and Yifei Zhang. 2024. Is Mamba Effective for Time Series Forecasting? arXiv
[11] Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with preprint arXiv:2403.11144 (2024).
selective state spaces. arXiv preprint arXiv:2312.00752 (2023). [35] Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan,
[12] Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. 2020. Hippo: and Liang Sun. 2022. Transformers in time series: A survey. arXiv preprint
Recurrent memory with optimal polynomial projections. Advances in neural arXiv:2202.07125 (2022).
information processing systems 33 (2020), 1474–1487. [36] Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De-
[13] Albert Gu, Karan Goel, and Christopher Ré. 2021. Efficiently modeling long composition transformers with auto-correlation for long-term series forecasting.
sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021). Advances in neural information processing systems 34 (2021), 22419–22430.
[14] Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and [37] Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. 2024. Segmamba:
Christopher Ré. 2021. Combining recurrent, convolutional, and continuous-time Long-range sequential modeling mamba for 3d medical image segmentation.
models with linear state space layers. Advances in neural information processing arXiv preprint arXiv:2401.13560 (2024).
systems 34 (2021), 572–585. [38] Xiongxiao Xu, Xin Wang, Elkin Cruz-Camacho, Christopher D. Carothers, Kevin
[15] Xuexiang Jin, Yi Zhang, and Danya Yao. 2007. Simultaneously prediction of A. Brown, Robert B. Ross, Zhiling Lan, and Kai Shu. 2023. Machine Learning
network traffic flow based on PCA-SVR. In Advances in Neural Networks–ISNN for Interconnect Network Traffic Forecasting: Investigation and Exploitation. In
2007: 4th International Symposium on Neural Networks, ISNN 2007, Nanjing, China, Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete
June 3-7, 2007, Proceedings, Part II 4. Springer, 1022–1031. Simulation. 133–137.
[16] Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, [39] Jiyuan Yang, Yuanzi Li, Jingyu Zhao, Hanbing Wang, Muyang Ma, Jun Ma,
and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottle- Zhaochun Ren, Mengqi Zhang, Xin Xin, Zhumin Chen, et al. 2024. Uncovering Se-
neck of transformer on time series forecasting. Advances in neural information lective State Space Model’s Capabilities in Lifelong Sequential Recommendation.
processing systems 32 (2019). arXiv preprint arXiv:2403.16371 (2024).
[17] Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedi- [40] Huaxiu Yao, Xianfeng Tang, Hua Wei, Guanjie Zheng, and Zhenhui Li. 2019.
gos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Revisiting spatial-temporal similarity: A deep learning framework for traffic
Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33.
Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor 5668–5675.
Zusman, and Yoav Shoham. 2024. Jamba: A Hybrid Transformer-Mamba Lan- [41] Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are transformers
guage Model. arXiv:2403.19887 [cs.CL] effective for time series forecasting?. In Proceedings of the AAAI conference on
[18] Chengkai Liu, Jianghao Lin, Jianling Wang, Hanzhou Liu, and James Caverlee. artificial intelligence, Vol. 37. 11121–11128.
2024. Mamba4Rec: Towards Efficient Sequential Recommendation with Selective [42] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong,
State Space Models. arXiv preprint arXiv:2403.03900 (2024). and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long se-
[19] Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and quence time-series forecasting. In Proceedings of the AAAI conference on artificial
Schahram Dustdar. 2021. Pyraformer: Low-complexity pyramidal attention for intelligence, Vol. 35. 11106–11115.
long-range time series modeling and forecasting. In International conference on [43] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin.
learning representations. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term
[20] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and series forecasting. In International Conference on Machine Learning. PMLR, 27268–
Mingsheng Long. 2023. itransformer: Inverted transformers are effective for time 27286.
series forecasting. arXiv preprint arXiv:2310.06625 (2023). [44] Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and
[21] Jun Ma, Feifei Li, and Bo Wang. 2024. U-mamba: Enhancing long-range de- Xinggang Wang. 2024. Vision mamba: Efficient visual representation learning
pendency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024).
(2024).
[22] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2022.
A time series is worth 64 words: Long-term forecasting with transformers. arXiv
preprint arXiv:2211.14730 (2022).
[23] Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet
Oymak, Kangwook Lee, and Dimitris Papailiopoulos. 2024. Can Mamba Learn
How to Learn? A Comparative Study on In-Context Learning Tasks. arXiv
preprint arXiv:2402.04248 (2024).

Bryan Lim
No ratings yet
Bryan Lim
145 pages
Mamba 2
No ratings yet
Mamba 2
52 pages
2024 - Mamba-360 - Survey of State Space Models As Transformer Alternative For Long Sequence Modelling - Patro - Agneeswaran - Arxiv
No ratings yet
2024 - Mamba-360 - Survey of State Space Models As Transformer Alternative For Long Sequence Modelling - Patro - Agneeswaran - Arxiv
46 pages
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
No ratings yet
Card: C A R B T - T S F: Hannel Ligned Obust Lend Rans Former For IME Eries Orecasting
39 pages
T T - A: A T: Ransformers in IME Series Nalysis Utorial
No ratings yet
T T - A: A T: Ransformers in IME Series Nalysis Utorial
29 pages
632 Itransformer Inverted Tran
No ratings yet
632 Itransformer Inverted Tran
25 pages
Can Mamba Always Enjoy The "Free Lunch"
No ratings yet
Can Mamba Always Enjoy The "Free Lunch"
21 pages
Are Self Attention Effective For Time Series Forecasting
No ratings yet
Are Self Attention Effective For Time Series Forecasting
23 pages
Mamba Architecture
No ratings yet
Mamba Architecture
36 pages
ML DL Time Series Presentation
No ratings yet
ML DL Time Series Presentation
14 pages
Yu MambaOut Do We Really Need Mamba For Vision CVPR 2025 Paper
No ratings yet
Yu MambaOut Do We Really Need Mamba For Vision CVPR 2025 Paper
13 pages
【2023】热点文章 Mamba Linear-Time Sequence Modeling with Selective State Spaces
No ratings yet
【2023】热点文章 Mamba Linear-Time Sequence Modeling with Selective State Spaces
37 pages
Long-Term Forecasting With TiDE Time-Series Dense Encoder
No ratings yet
Long-Term Forecasting With TiDE Time-Series Dense Encoder
21 pages
Is Mamba Effective For Time Series Forecasting?: Wang, Kong, Feng, Wang, Yang, Zhao, Wang and Zhang
No ratings yet
Is Mamba Effective For Time Series Forecasting?: Wang, Kong, Feng, Wang, Yang, Zhao, Wang and Zhang
14 pages
A Multi-View Multi-Task Learning Framework For Multi-Variate Time Series Forecasting
No ratings yet
A Multi-View Multi-Task Learning Framework For Multi-Variate Time Series Forecasting
16 pages
Time Series Forecasting With Transformer Models and Application To Asset Management
No ratings yet
Time Series Forecasting With Transformer Models and Application To Asset Management
44 pages
Is Mamba Effective For Time Series Forecasting?
No ratings yet
Is Mamba Effective For Time Series Forecasting?
14 pages
SOFTS Multivariate TimeSeries
No ratings yet
SOFTS Multivariate TimeSeries
23 pages
Long-Term Forecasting With
No ratings yet
Long-Term Forecasting With
18 pages
Mambaout: Do We Really Need Mamba For Vision?: Code
No ratings yet
Mambaout: Do We Really Need Mamba For Vision?: Code
17 pages
Multivariate Time Series Forecasting Final 3rd Sem
No ratings yet
Multivariate Time Series Forecasting Final 3rd Sem
22 pages
XLSTMTime Long-Term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime Long-Term Time Series Forecasting With XLSTM
13 pages
Autoformer
No ratings yet
Autoformer
20 pages
Comsys2020 Paper
No ratings yet
Comsys2020 Paper
12 pages
Assessment and Identification of Needs PDF
100% (1)
Assessment and Identification of Needs PDF
637 pages
Long-Term Forecasting With Tide: Time-Series Dense Encoder
No ratings yet
Long-Term Forecasting With Tide: Time-Series Dense Encoder
18 pages
Hybrid Variational Autoencoder For Time Series Forecasting
No ratings yet
Hybrid Variational Autoencoder For Time Series Forecasting
9 pages
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network For Long-Term Time-Series Forecasting
No ratings yet
Building Trend Fuzzy Granulation-Based LSTM Recurrent Neural Network For Long-Term Time-Series Forecasting
15 pages
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
No ratings yet
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
33 pages
Affirm Interactive Mamba With Adaptive Fourier Filters For Long-Term Time Series Forecasting
No ratings yet
Affirm Interactive Mamba With Adaptive Fourier Filters For Long-Term Time Series Forecasting
9 pages
Leveraging Hybrid Deep Learning Models For Enhanced Multivariate Time Series Forecasting
No ratings yet
Leveraging Hybrid Deep Learning Models For Enhanced Multivariate Time Series Forecasting
25 pages
Book 7
No ratings yet
Book 7
35 pages
Deep Learning Models For Time Series Forecasting A Review
No ratings yet
Deep Learning Models For Time Series Forecasting A Review
22 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
Autoformer Nips21
No ratings yet
Autoformer Nips21
12 pages
GCformer - An Efficient Framework For Accurate and Scalable Long-Term Multivariate Time Series Forecasting
No ratings yet
GCformer - An Efficient Framework For Accurate and Scalable Long-Term Multivariate Time Series Forecasting
10 pages
Python Programming Important Notes
No ratings yet
Python Programming Important Notes
46 pages
10.1007@s10489 019 01426 3
No ratings yet
10.1007@s10489 019 01426 3
14 pages
Mamba Meets Financial Markets: A Graph-Mamba Approach For Stock Price Prediction
No ratings yet
Mamba Meets Financial Markets: A Graph-Mamba Approach For Stock Price Prediction
5 pages
Timemachine: A Time Series Is Worth 4 Mambas For Long-Term Forecasting
No ratings yet
Timemachine: A Time Series Is Worth 4 Mambas For Long-Term Forecasting
10 pages
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
No ratings yet
A Comparison Between Machine and Deep Learning Models On High Stationarity Data
11 pages
Adama Science and Technology University School of Electrical Engineering and Computing Computer Science and Engineering Department
100% (1)
Adama Science and Technology University School of Electrical Engineering and Computing Computer Science and Engineering Department
5 pages
AM F M T S F: Amba Oundation Odel For IME Eries Orecasting
No ratings yet
AM F M T S F: Amba Oundation Odel For IME Eries Orecasting
15 pages
FYP EEG Mamba
No ratings yet
FYP EEG Mamba
3 pages
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime - Long-Term Time Series Forecasting With XLSTM
13 pages
Mamba
No ratings yet
Mamba
3 pages
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
No ratings yet
T: I T A E T S F: I Ransformer Nverted Ransformers RE Ffective For IME Eries Orecasting
19 pages
Time Machine
No ratings yet
Time Machine
10 pages
Dayananda Sagar College of Engineering, Department of Computer Science and Engineering
No ratings yet
Dayananda Sagar College of Engineering, Department of Computer Science and Engineering
20 pages
Enhancing The Locality and Breaking The Memory Bottleneck of Transformer On Time Series Forecasting Paper
No ratings yet
Enhancing The Locality and Breaking The Memory Bottleneck of Transformer On Time Series Forecasting Paper
11 pages
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
No ratings yet
1.shiyang Li - Enhance Locality and Break The Memory Bottleneck
14 pages
A Systematic Review For Transformer-Based Long-Term Series Forecasting
No ratings yet
A Systematic Review For Transformer-Based Long-Term Series Forecasting
30 pages
A Review of Deep Learning Models For Time Series Prediction
No ratings yet
A Review of Deep Learning Models For Time Series Prediction
16 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
1 s2.0 S0925231220300606 Main
No ratings yet
1 s2.0 S0925231220300606 Main
11 pages
Time Series Forecasting With Deep Learning: A Survey: Research
No ratings yet
Time Series Forecasting With Deep Learning: A Survey: Research
13 pages
Teaching Strategies For Response To Literature
100% (2)
Teaching Strategies For Response To Literature
11 pages
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
No ratings yet
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
10 pages
Crossformer - Transformer Utilizing Cross-Dimension Dependency For Multivariate Time Series Forecasting
No ratings yet
Crossformer - Transformer Utilizing Cross-Dimension Dependency For Multivariate Time Series Forecasting
21 pages
Prediction of Time Series Data Using GA-BPNN Based Hybrid ANN Model
No ratings yet
Prediction of Time Series Data Using GA-BPNN Based Hybrid ANN Model
6 pages
Financial Time Series Forecasting Using CNN and Transformer
No ratings yet
Financial Time Series Forecasting Using CNN and Transformer
4 pages
Aldhelm: ("There Is No Real Evidence That Any Such Version Ever Existed... ")
No ratings yet
Aldhelm: ("There Is No Real Evidence That Any Such Version Ever Existed... ")
29 pages
Transformers: Principles and Applications
From Everand
Transformers: Principles and Applications
Richard Johnson
No ratings yet
Pplied Time Series Ransfer Learning
No ratings yet
Pplied Time Series Ransfer Learning
4 pages
Time Series 1
No ratings yet
Time Series 1
3 pages
Umaru Ali Shinkafi Polytechnic Sokoto
No ratings yet
Umaru Ali Shinkafi Polytechnic Sokoto
14 pages
6 Frenchmccay
No ratings yet
6 Frenchmccay
59 pages
Lesson Plan English ADVERB
No ratings yet
Lesson Plan English ADVERB
9 pages
Duties and Responsibilities of Teachers
No ratings yet
Duties and Responsibilities of Teachers
11 pages
Organizational Behavior Summary Chapter 7 Motivation Concept
No ratings yet
Organizational Behavior Summary Chapter 7 Motivation Concept
6 pages
Hima CV 2016 Eee
No ratings yet
Hima CV 2016 Eee
3 pages
Remotesensing 16 03080
No ratings yet
Remotesensing 16 03080
22 pages
Case Study On BPO Employee - Impact of Employee Remuneration in Employee Motivation (Responses)
No ratings yet
Case Study On BPO Employee - Impact of Employee Remuneration in Employee Motivation (Responses)
13 pages
Classroom Management
No ratings yet
Classroom Management
24 pages
3.4 Directing of School or College Sports Program
No ratings yet
3.4 Directing of School or College Sports Program
2 pages
My Idol
No ratings yet
My Idol
3 pages
Aerobic, Muscle Strengthening, and Bone Strengthening Activities For Fitness Development
100% (1)
Aerobic, Muscle Strengthening, and Bone Strengthening Activities For Fitness Development
3 pages
BCU Immigration Booklet
No ratings yet
BCU Immigration Booklet
12 pages
Quantum Simulation of Schrödingers Equation
No ratings yet
Quantum Simulation of Schrödingers Equation
50 pages
Physics 1201 Course Outline Official-2
No ratings yet
Physics 1201 Course Outline Official-2
10 pages
CHC42015 - Overview of Role Plays and Recorded Assessments
No ratings yet
CHC42015 - Overview of Role Plays and Recorded Assessments
10 pages
PHD Thesis Eth
100% (2)
PHD Thesis Eth
8 pages
NRP Math
100% (1)
NRP Math
2 pages
Ekiram Zeynu
No ratings yet
Ekiram Zeynu
22 pages
Cas DC Sample
No ratings yet
Cas DC Sample
21 pages
Viettesol International Convention 2023: Elt For 21 Century Excellence
No ratings yet
Viettesol International Convention 2023: Elt For 21 Century Excellence
4 pages
IEEE Usa
No ratings yet
IEEE Usa
7 pages
Back To School Plan by Faryal Asmai
No ratings yet
Back To School Plan by Faryal Asmai
10 pages
Course Outline (EEE315Lab)
No ratings yet
Course Outline (EEE315Lab)
4 pages
21is019 U4 LM15
No ratings yet
21is019 U4 LM15
5 pages
Actual Paper
No ratings yet
Actual Paper
7 pages
Chapter 1 The Problem and Its Background
100% (1)
Chapter 1 The Problem and Its Background
3 pages
Listening Skills Develop Early
No ratings yet
Listening Skills Develop Early
1 page
Gifted Children Education in Early Childhood-The Practical Strategies
No ratings yet
Gifted Children Education in Early Childhood-The Practical Strategies
12 pages
Lot Streaming Hybrid Flow Shop Scheduling
No ratings yet
Lot Streaming Hybrid Flow Shop Scheduling
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Integrating Mamba and Transformer For Long-Short Range Time Series Forecasting

Uploaded by

Integrating Mamba and Transformer For Long-Short Range Time Series Forecasting

Uploaded by

Integrating Mamba and Transformer for Long-Short Range Time

ABSTRACT such expensive computation is prohibitive for long distance and

3 PRELIMINARIES ℎ𝑡 = Aℎ𝑡 −1 + B𝑥𝑡

3.1 Problem Statement 𝑦 = Cℎ𝑡 (2)

3.2 State Space Models 4 METHODOLOGY

Attention Attention Mamba Mamba Attention

Positional encoding is optional in the above architectures be-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.