0% found this document useful (0 votes)

36 views1 page

LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber

The document analyzes eight variants of long short-term memory (LSTM) neural networks on three tasks: speech recognition, handwriting recognition, and polyphonic music modeling. It finds that none of the variants significantly outperform the standard LSTM architecture and that the forget gate and output activation function are most critical. Guidelines for efficiently adjusting hyperparameters are also derived.

Uploaded by

xing007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views1 page

LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber

Uploaded by

xing007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh K. Srivastava, Jan Koutnı́k, Bas R. Steunebrink, Jürgen Schmidhuber

Abstract—Several variants of the Long Short-Term Memory synthesis [10], protein secondary structure prediction [11],
(LSTM) architecture for recurrent neural networks have been analysis of audio [12], and video data [13] among others.
proposed since its inception in 1995. In recent years, these
networks have become the state-of-the-art models for a variety The central idea behind the LSTM architecture is a memory
of machine learning problems. This has led to a renewed interest cell which can maintain its state over time, and non-linear
in understanding the role and utility of various computational gating units which regulate the information flow into and out of
components of typical LSTM variants. In this paper, we present the cell. Most modern studies incorporate many improvements
the first large-scale analysis of eight LSTM variants on three
that have been made to the LSTM architecture since its
arXiv:1503.04069v2 [cs.NE] 4 Oct 2017

representative tasks: speech recognition, handwriting recognition,

and polyphonic music modeling. The hyperparameters of all original formulation [14, 15]. However, LSTMs are now
LSTM variants for each task were optimized separately using applied to many learning problems which differ significantly
random search, and their importance was assessed using the in scale and nature from the problems that these improvements
powerful fANOVA framework. In total, we summarize the results were initially tested on. A systematic study of the utility of
of 5400 experimental runs (≈ 15 years of CPU time), which
makes our study the largest of its kind on LSTM networks. various computational components which comprise LSTMs
Our results show that none of the variants can improve upon (see Figure 1) was missing. This paper fills that gap and
the standard LSTM architecture significantly, and demonstrate systematically addresses the open question of improving the
the forget gate and the output activation function to be its LSTM architecture.
most critical components. We further observe that the studied
hyperparameters are virtually independent and derive guidelines We evaluate the most popular LSTM architecture (vanilla
for their efficient adjustment. LSTM; Section II) and eight different variants thereof on
Index Terms—Recurrent neural networks, Long Short-Term
three benchmark problems: acoustic modeling, handwriting
Memory, LSTM, sequence learning, random search, fANOVA. recognition, and polyphonic music modeling. Each variant
differs from the vanilla LSTM by a single change. This
allows us to isolate the effect of each of these changes
I. I NTRODUCTION on the performance of the architecture. Random search [16–
18] is used to find the best-performing hyperparameters for
Recurrent neural networks with Long Short-Term Memory each variant on each problem, enabling a reliable comparison
(which we will concisely refer to as LSTMs) have emerged as of the performance of different variants. We also provide
an effective and scalable model for several learning problems insights gained about hyperparameters and their interaction
related to sequential data. Earlier methods for attacking these using fANOVA [19].
problems have either been tailored towards a specific problem
or did not scale to long time dependencies. LSTMs on the
other hand are both general and effective at capturing long-
term temporal dependencies. They do not suffer from the
optimization hurdles that plague simple recurrent networks II. VANILLA LSTM
(SRNs) [1, 2] and have been used to advance the state-of-
the-art for many difficult problems. This includes handwriting
recognition [3–5] and generation [6], language modeling [7] The LSTM setup most commonly used in literature was
and translation [8], acoustic modeling of speech [9], speech originally described by Graves and Schmidhuber [20]. We refer
to it as vanilla LSTM and use it as a reference for comparison
2016
c IEEE. Personal use of this material is permitted. Permission from of all the variants. The vanilla LSTM incorporates changes
IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional
by Gers et al. [21] and Gers and Schmidhuber [22] into the
purposes, creating new collective works, for resale or redistribution to servers original LSTM [15] and uses full gradient training. Section III
or lists, or reuse of any copyrighted component of this work in other works. provides descriptions of these major LSTM changes.
Manuscript received May 15, 2015; revised March 17, 2016; accepted June 9,
2016. Date of publication July 8, 2016; date of current version June 20, 2016. A schematic of the vanilla LSTM block can be seen in
DOI: 10.1109/TNNLS.2016.2582924 Figure 1. It features three gates (input, forget, output), block
This research was supported by the Swiss National Science Foundation grants
“Theory and Practice of Reinforcement Learning 2” (#138219) and “Advanced input, a single cell (the Constant Error Carousel), an output
Reinforcement Learning” (#156682), and by EU projects “NASCENCE” activation function, and peephole connections1 . The output of
(FP7-ICT-317662), “NeuralDynamics” (FP7-ICT-270247) and WAY (FP7- the block is recurrently connected back to the block input and
ICT-288551).
K. Greff, R. K. Srivastava, J. Koutı́k, B. R. Steunebrink and J. Schmidhuber all of the gates.
are with the Istituto Dalle Molle di studi sull’Intelligenza Artificiale (IDSIA),
the Scuola universitaria professionale della Svizzera italiana (SUPSI), and the
Università della Svizzera italiana (USI).
Author e-mails addresses: {klaus, rupesh, hkou, bas, juergen}@idsia.ch 1 Some studies omit peephole connections, described in Section III-B.

Emerging Technologies For Nanoparticle Manufacturing
No ratings yet
Emerging Technologies For Nanoparticle Manufacturing
604 pages
Transformer
No ratings yet
Transformer
33 pages
Module 1 - Introduction To Artificial Intelligence (AI)
No ratings yet
Module 1 - Introduction To Artificial Intelligence (AI)
27 pages
Tumor Detection Through Mri Brain Images: Rohit Arya 20MCS1009
No ratings yet
Tumor Detection Through Mri Brain Images: Rohit Arya 20MCS1009
25 pages
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
55 pages
Neural Network Architectures
No ratings yet
Neural Network Architectures
32 pages
XLSTM Extended Long Short-Term Memory
No ratings yet
XLSTM Extended Long Short-Term Memory
55 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Generative AI Roadmap 1740183235
No ratings yet
Generative AI Roadmap 1740183235
15 pages
Chapter 12 PartII en
No ratings yet
Chapter 12 PartII en
23 pages
Activation Function
No ratings yet
Activation Function
4 pages
AI Notes
No ratings yet
AI Notes
5 pages
XLSTM: Extended Long Short-Term Memory
No ratings yet
XLSTM: Extended Long Short-Term Memory
55 pages
LSTM
No ratings yet
LSTM
19 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
2024 - Mamba-360 - Survey of State Space Models As Transformer Alternative For Long Sequence Modelling - Patro - Agneeswaran - Arxiv
No ratings yet
2024 - Mamba-360 - Survey of State Space Models As Transformer Alternative For Long Sequence Modelling - Patro - Agneeswaran - Arxiv
46 pages
DL Co-3 PPT 3
No ratings yet
DL Co-3 PPT 3
19 pages
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
No ratings yet
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
27 pages
Longshorttermmemorylstm 231215171600 1feb7b1b
No ratings yet
Longshorttermmemorylstm 231215171600 1feb7b1b
17 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
No ratings yet
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
17 pages
Class8 LSTM Search Space Odyssey
No ratings yet
Class8 LSTM Search Space Odyssey
31 pages
A Review On The Long Short Term Memory Model
No ratings yet
A Review On The Long Short Term Memory Model
34 pages
A Review On The Long Short Term Memory Model
No ratings yet
A Review On The Long Short Term Memory Model
34 pages
A Review of Recurrent Neural Networks
No ratings yet
A Review of Recurrent Neural Networks
36 pages
A Review of Recurrent Neural Networks - LSTM Cells and Network Architectures (Neural Computation) (2019)
No ratings yet
A Review of Recurrent Neural Networks - LSTM Cells and Network Architectures (Neural Computation) (2019)
36 pages
2 - Neural Network
100% (1)
2 - Neural Network
59 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
Long Short-Term Memory RNN: Department of Computer Science
No ratings yet
Long Short-Term Memory RNN: Department of Computer Science
16 pages
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
No ratings yet
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
43 pages
LSTM
No ratings yet
LSTM
12 pages
Long-Short Term Memory
No ratings yet
Long-Short Term Memory
21 pages
LocalGLMnet: A Deep Learning Architecture For Actuaries
No ratings yet
LocalGLMnet: A Deep Learning Architecture For Actuaries
35 pages
LSTM
No ratings yet
LSTM
22 pages
LSTM 1738024034
No ratings yet
LSTM 1738024034
13 pages
A Driving Decision
No ratings yet
A Driving Decision
39 pages
ResNet Deep Residual Learning For Image Recognition
No ratings yet
ResNet Deep Residual Learning For Image Recognition
12 pages
Electronics 13 00827
No ratings yet
Electronics 13 00827
14 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Seminar-For CA-1 of Machine Learning-10200121006
No ratings yet
Seminar-For CA-1 of Machine Learning-10200121006
12 pages
LSTM
No ratings yet
LSTM
10 pages
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
No ratings yet
Were Rnns All We Needed?: Leo - Feng@Mila - Quebec
20 pages
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
No ratings yet
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
14 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
LSTM
No ratings yet
LSTM
27 pages
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks I IBM
No ratings yet
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks I IBM
7 pages
Machine Learning ISA-2 Answer Bank
No ratings yet
Machine Learning ISA-2 Answer Bank
28 pages
Lect5 UWA
No ratings yet
Lect5 UWA
93 pages
Finalproject Review PPT
No ratings yet
Finalproject Review PPT
39 pages
LSTM
No ratings yet
LSTM
12 pages
Sherstinsky 2020
No ratings yet
Sherstinsky 2020
28 pages
EPJ LSTM Survey
No ratings yet
EPJ LSTM Survey
14 pages
Tensorflow, Keras and Deep Learning
No ratings yet
Tensorflow, Keras and Deep Learning
51 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Recurrent Neural Network Applications
No ratings yet
Recurrent Neural Network Applications
16 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Convolutional Neural Networks (1) : Geena Kim
No ratings yet
Convolutional Neural Networks (1) : Geena Kim
28 pages
Vapour Compression Refrigeration Test Rig: Experimental Procedure
No ratings yet
Vapour Compression Refrigeration Test Rig: Experimental Procedure
5 pages
9 - Exp-5 LSTM
No ratings yet
9 - Exp-5 LSTM
10 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Presentation Title
No ratings yet
Presentation Title
10 pages
DeepuGupta1057 ML601
No ratings yet
DeepuGupta1057 ML601
9 pages
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
No ratings yet
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
8 pages
What Is LSTM
No ratings yet
What Is LSTM
5 pages
Revisiting Mechanisms Underlying Digestion of Starches. Journal of Agricultural and Food Chemistry
No ratings yet
Revisiting Mechanisms Underlying Digestion of Starches. Journal of Agricultural and Food Chemistry
15 pages
Asp Dac2017 1352 11
No ratings yet
Asp Dac2017 1352 11
6 pages
Sak14 Interspeech
No ratings yet
Sak14 Interspeech
5 pages
AI ML Training
No ratings yet
AI ML Training
6 pages
Machin Learning Data Science Junior
No ratings yet
Machin Learning Data Science Junior
10 pages
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
No ratings yet
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
12 pages
LSTM Detailed Explanation
No ratings yet
LSTM Detailed Explanation
2 pages
Experimental Equine Aflatoxicosis. Toxicology and Applied Pharmacology, 65 (3), 354-365
No ratings yet
Experimental Equine Aflatoxicosis. Toxicology and Applied Pharmacology, 65 (3), 354-365
12 pages
Long Short-Term Memory
No ratings yet
Long Short-Term Memory
9 pages
Long Short-Term Memory Survey Paper
No ratings yet
Long Short-Term Memory Survey Paper
6 pages
AI LearningPath
No ratings yet
AI LearningPath
4 pages
Mechanical State Prediction Based On LSTM Neural Netwok
No ratings yet
Mechanical State Prediction Based On LSTM Neural Netwok
6 pages
Deep Learning Book
No ratings yet
Deep Learning Book
3 pages
Practice Lecture4
No ratings yet
Practice Lecture4
3 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
A LSTM Neural Network Applied To Mobile Robots Path Planning
No ratings yet
A LSTM Neural Network Applied To Mobile Robots Path Planning
6 pages
Lab 10 - Neural Network
No ratings yet
Lab 10 - Neural Network
11 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
Addition Multiplication RNN
No ratings yet
Addition Multiplication RNN
7 pages
شبكات عصبية ٢
No ratings yet
شبكات عصبية ٢
6 pages
Dayananda Sagar College of Engineering, Department of Computer Science and Engineering
No ratings yet
Dayananda Sagar College of Engineering, Department of Computer Science and Engineering
20 pages
RHYTHMIX - LSTM Based Music Synthesizer
No ratings yet
RHYTHMIX - LSTM Based Music Synthesizer
3 pages
FPGA Implementation of LSTM Based On Automatic Speech Recognition
No ratings yet
FPGA Implementation of LSTM Based On Automatic Speech Recognition
3 pages
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
No ratings yet
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
5 pages
Components & Strokes of I.C.Engine
No ratings yet
Components & Strokes of I.C.Engine
6 pages
Istory Of: Transactions On Neural Networks and Learning Systems 3
No ratings yet
Istory Of: Transactions On Neural Networks and Learning Systems 3
1 page
Timit IAM Online JSB Chorales: Transactions On Neural Networks and Learning Systems 6
No ratings yet
Timit IAM Online JSB Chorales: Transactions On Neural Networks and Learning Systems 6
1 page
Transactions On Neural Networks and Learning Systems 11
No ratings yet
Transactions On Neural Networks and Learning Systems 11
1 page
Project Report
No ratings yet
Project Report
5 pages
Implementation of Single Layer Perceptron Model Using MATLAB
No ratings yet
Implementation of Single Layer Perceptron Model Using MATLAB
5 pages
Components & Strokes of I.C.Engine
No ratings yet
Components & Strokes of I.C.Engine
6 pages
Information On IC Engine
No ratings yet
Information On IC Engine
6 pages
Assignment
No ratings yet
Assignment
4 pages
The Comprehensive Guide to Machine Learning Algorithms and Techniques
From Everand
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Mohammed Ahmed
5/5 (1)
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber

Uploaded by

LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber

Uploaded by

TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

LSTM: A Search Space Odyssey

representative tasks: speech recognition, handwriting recognition,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.