skip to main content
10.1145/3319619.3326843acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Global structure of policy search spaces for reinforcement learning

Published: 13 July 2019 Publication History

Abstract

Reinforcement learning is gaining prominence in the machine learning community. It dates back over three decades in areas such as cybernetics and psychology, but has more recently been applied widely in robotics, game playing and control systems. There are many approaches to reinforcement learning, most of which are based on the Markov decision process model. The goal of reinforcement learning is to learn the best strategy (referred to as a policy in reinforcement learning) of an agent interacting with its environment in order to reach a specified goal. Recently, evolutionary computation has been shown to be of benefit to reinforcement learning in some limited scenarios. Many studies have shown that the performance of evolutionary computation algorithms is influenced by the structure of the fitness landscapes of the problem being optimised. In this paper we investigate the global structure of the policy search spaces of simple reinforcement learning problems. The aim is to highlight structural characteristics that could influence the performance of evolutionary algorithms in a reinforcement learning context. Results indicate that the problems we investigated are characterised by enormous plateaus that form unimodal structures, resulting in a kind of needle-in-a-haystack global structure.

References

[1]
I. Arel, C. Liu, T. Urbanik, and A. Kohls. 2010. Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems 4, 2 (2010), 128--135.
[2]
L. Barnett. 1998. Ruggedness and neutrality - the NKp family of fitness landscapes. In Proceedings of the sixth international conference on Artificial life (ALIFE). MIT Press, Cambridge, MA, USA, 18--27. http://dl.acm.org/citation.cfm?id=286139.286143
[3]
L. Barnett. 2001. Netcrawling-optimal evolutionary search with neutral networks. In Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat.No.01TH8546). 30--37.
[4]
W. Beaudoin, S. Verel, P. Collard, and C. Escazut. 2006. Deceptiveness and neutrality: the ND family of fitness landscapes. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 507--514.
[5]
P. Chrabaszcz, L. Loshchilov, and F. Hutter. 2018. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 1419--1426.
[6]
J. P. K. Doye. 2002. The network topology of a potential energy landscape: a static scale-free network. Phys. Rev.Lett. 88 (2002), 238701.
[7]
P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. 2018. Deep Reinforcement Learning that Matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 3207--3214.
[8]
J. Horn and D. E. Goldberg. 1995. Genetic Algorithm Difficulty and the Modality of Fitness Landscapes. In Foundations of Genetic Algorithms 3, L. Darrell Whitley and Michael D. Vose (Eds.). Morgan Kaufmann, San Francisco, CA, 243--269.
[9]
T.Jones and S. Forrest. 1995. Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms. In Proceedings of the Sixth International Conference on Genetic Algorithms. Morgan Kaufmann, 184--192.
[10]
P. H. Kim and R. Crawfis. 2015. The quest for the perfect perfect-maze. In Proceedingsings of the Computer Games: AI, Animation, Mobile, Multimedia, Educational and Serious Games Conference. 65--72.
[11]
R. Laroche and R. Fraud. 2018. Reinforcement Learning Algorithm Selection. In Proceedings of the Sixth International Conference on Learning Representations.
[12]
T. P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the Sixth International Conference on Learning Representations.
[13]
K. M. Malan and A. P. Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148--163.
[14]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, A. Antonoglou, A. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602.
[15]
G. Ochoa, M. Tomassini, S. Verel, and C. Darabos. 2008. A Study of NK Landscapesfi Basins and Local Optima Networks. In Proceedings of Genetic and Evolutionary Computation Conference.
[16]
G. Ochoa, S. Verel, F. Daolio, and M. Tomassini. 2011. Local Optima Networks of NK Landscapes With Neutrality. IEEE Transactions on Evolutionary Computation 15, 6 (2011), 783--797.
[17]
G. Ochoa, S. Verel, F. Daolio, and M. Tomassini. 2014. Recent Advances in the Theory and Application of Fitness Landscapes. Springer, Chapter Local Optima Networks: A new model of combinatorial fitness landscapes, 233--262.
[18]
A. Owen and I. Harvey. 2007. Adapting Particle Swarm Optimisation for Fitness Landscapes with Neutrality. In Swarm Intelligence Symposium, 2007. SIS 2007. IEEE. 258 -- 265.
[19]
S. Rana. 1999. Examining the role of local optima and schema processing in genetic search. Ph.D. Dissertation. Colorado State University, USA. Adviser: Whitley, Darrell.
[20]
C. M. Reidys and P. F. Stadler. 2002. Combinatorial landscapes. SIAM review 44, 1 (2002), 3--54.
[21]
S. Russel and P. Norvig. 2009. Artificial Intelligence: A Modern Approach (Third ed.). Prentice Hall, Upper Saddle River, N.J.
[22]
T. Salimans, J. Ho, X. Chen, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. (2017). arXiv:1703.03864.
[23]
V. d. N. Silva and L. Chaimowicz. 2017. Moba: A new arena for game AI. arXiv:1705.10443. (2017).
[24]
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484fi??489.
[25]
T. Smith, P. Husbands, P. Layzell, and M. O'Shea. 2002. Fitness Landscapes and Evolvability. Evolutionary Computation 10, 1 (2002), 1--34.
[26]
T. Stewart. 2001. Extrema selection: accelerated evolution on neutral networks. In Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546). 25--29.
[27]
F.H. Stillinger. 1995. A Topographic View of Supercooled Liquids and Glass Formation. Science 267 (1995), 1935--1939.
[28]
R. S. Sutton and A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press.
[29]
O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Kttler, J. Agapiou, J. Schrittwieser, and et. al. 2017. StarCraft II: A New Challenge for Reinforcement Learning. Technical Report. Collaboration between DeepMind and Blizzard. arXiv:1708.04782.
[30]
D. G. Wilson, S. Cussat-Blanc, H. Luga, and J. F. Miller. 2018. Evolving Simple Programs for Playing Atari Games. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 229--236.

Cited By

View all
  • (2021)A Survey of Advances in Landscape Analysis for OptimisationAlgorithms10.3390/a1402004014:2(40)Online publication date: 28-Jan-2021
  • (2020)Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy SpacesParallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58115-2_35(500-514)Online publication date: 2-Sep-2020

Recommendations

Comments

Information & Contributors

Information

Published In

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion
July 2019
2161 pages
ISBN:9781450367486
DOI:10.1145/3319619
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fitness landscapes
  2. local optima networks
  3. reinforcement learning

Qualifiers

  • Research-article

Conference

GECCO '19
Sponsor:
GECCO '19: Genetic and Evolutionary Computation Conference
July 13 - 17, 2019
Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Survey of Advances in Landscape Analysis for OptimisationAlgorithms10.3390/a1402004014:2(40)Online publication date: 28-Jan-2021
  • (2020)Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy SpacesParallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58115-2_35(500-514)Online publication date: 2-Sep-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy