TRAMS: Training-free Memory Selection for Long-range Language Modeling

Haofei Yu, Cunxiang Wang, Yue Zhang, Wei Bi


Abstract
The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.
Anthology ID:
2023.findings-emnlp.331
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4966–4972
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.331/
DOI:
10.18653/v1/2023.findings-emnlp.331
Bibkey:
Cite (ACL):
Haofei Yu, Cunxiang Wang, Yue Zhang, and Wei Bi. 2023. TRAMS: Training-free Memory Selection for Long-range Language Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4966–4972, Singapore. Association for Computational Linguistics.
Cite (Informal):
TRAMS: Training-free Memory Selection for Long-range Language Modeling (Yu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.331.pdf

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy