research-article

Global structure of policy search spaces for reinforcement learning

Authors:

K. M. MalanAuthors Info & Claims

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1773 - 1781

https://doi.org/10.1145/3319619.3326843

Published: 13 July 2019 Publication History

Abstract

Reinforcement learning is gaining prominence in the machine learning community. It dates back over three decades in areas such as cybernetics and psychology, but has more recently been applied widely in robotics, game playing and control systems. There are many approaches to reinforcement learning, most of which are based on the Markov decision process model. The goal of reinforcement learning is to learn the best strategy (referred to as a policy in reinforcement learning) of an agent interacting with its environment in order to reach a specified goal. Recently, evolutionary computation has been shown to be of benefit to reinforcement learning in some limited scenarios. Many studies have shown that the performance of evolutionary computation algorithms is influenced by the structure of the fitness landscapes of the problem being optimised. In this paper we investigate the global structure of the policy search spaces of simple reinforcement learning problems. The aim is to highlight structural characteristics that could influence the performance of evolutionary algorithms in a reinforcement learning context. Results indicate that the problems we investigated are characterised by enormous plateaus that form unimodal structures, resulting in a kind of needle-in-a-haystack global structure.

References

[1]

I. Arel, C. Liu, T. Urbanik, and A. Kohls. 2010. Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems 4, 2 (2010), 128--135.

[2]

L. Barnett. 1998. Ruggedness and neutrality - the NKp family of fitness landscapes. In Proceedings of the sixth international conference on Artificial life (ALIFE). MIT Press, Cambridge, MA, USA, 18--27. http://dl.acm.org/citation.cfm?id=286139.286143

Digital Library

[3]

L. Barnett. 2001. Netcrawling-optimal evolutionary search with neutral networks. In Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat.No.01TH8546). 30--37.

[4]

W. Beaudoin, S. Verel, P. Collard, and C. Escazut. 2006. Deceptiveness and neutrality: the ND family of fitness landscapes. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 507--514.

Digital Library

[5]

P. Chrabaszcz, L. Loshchilov, and F. Hutter. 2018. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 1419--1426.

Digital Library

[6]

J. P. K. Doye. 2002. The network topology of a potential energy landscape: a static scale-free network. Phys. Rev.Lett. 88 (2002), 238701.

[7]

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger. 2018. Deep Reinforcement Learning that Matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 3207--3214.

[8]

J. Horn and D. E. Goldberg. 1995. Genetic Algorithm Difficulty and the Modality of Fitness Landscapes. In Foundations of Genetic Algorithms 3, L. Darrell Whitley and Michael D. Vose (Eds.). Morgan Kaufmann, San Francisco, CA, 243--269.

[9]

T.Jones and S. Forrest. 1995. Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms. In Proceedings of the Sixth International Conference on Genetic Algorithms. Morgan Kaufmann, 184--192.

Digital Library

[10]

P. H. Kim and R. Crawfis. 2015. The quest for the perfect perfect-maze. In Proceedingsings of the Computer Games: AI, Animation, Mobile, Multimedia, Educational and Serious Games Conference. 65--72.

[11]

R. Laroche and R. Fraud. 2018. Reinforcement Learning Algorithm Selection. In Proceedings of the Sixth International Conference on Learning Representations.

[12]

T. P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the Sixth International Conference on Learning Representations.

[13]

K. M. Malan and A. P. Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148--163.

Digital Library

[14]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, A. Antonoglou, A. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602.

[15]

G. Ochoa, M. Tomassini, S. Verel, and C. Darabos. 2008. A Study of NK Landscapesfi Basins and Local Optima Networks. In Proceedings of Genetic and Evolutionary Computation Conference.

Digital Library

[16]

G. Ochoa, S. Verel, F. Daolio, and M. Tomassini. 2011. Local Optima Networks of NK Landscapes With Neutrality. IEEE Transactions on Evolutionary Computation 15, 6 (2011), 783--797.

[17]

G. Ochoa, S. Verel, F. Daolio, and M. Tomassini. 2014. Recent Advances in the Theory and Application of Fitness Landscapes. Springer, Chapter Local Optima Networks: A new model of combinatorial fitness landscapes, 233--262.

[18]

A. Owen and I. Harvey. 2007. Adapting Particle Swarm Optimisation for Fitness Landscapes with Neutrality. In Swarm Intelligence Symposium, 2007. SIS 2007. IEEE. 258 -- 265.

Digital Library

[19]

S. Rana. 1999. Examining the role of local optima and schema processing in genetic search. Ph.D. Dissertation. Colorado State University, USA. Adviser: Whitley, Darrell.

[20]

C. M. Reidys and P. F. Stadler. 2002. Combinatorial landscapes. SIAM review 44, 1 (2002), 3--54.

Digital Library

[21]

S. Russel and P. Norvig. 2009. Artificial Intelligence: A Modern Approach (Third ed.). Prentice Hall, Upper Saddle River, N.J.

Digital Library

[22]

T. Salimans, J. Ho, X. Chen, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. (2017). arXiv:1703.03864.

[23]

V. d. N. Silva and L. Chaimowicz. 2017. Moba: A new arena for game AI. arXiv:1705.10443. (2017).

[24]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484fi??489.

[25]

T. Smith, P. Husbands, P. Layzell, and M. O'Shea. 2002. Fitness Landscapes and Evolvability. Evolutionary Computation 10, 1 (2002), 1--34.

Digital Library

[26]

T. Stewart. 2001. Extrema selection: accelerated evolution on neutral networks. In Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546). 25--29.

[27]

F.H. Stillinger. 1995. A Topographic View of Supercooled Liquids and Glass Formation. Science 267 (1995), 1935--1939.

[28]

R. S. Sutton and A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press.

Digital Library

[29]

O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Kttler, J. Agapiou, J. Schrittwieser, and et. al. 2017. StarCraft II: A New Challenge for Reinforcement Learning. Technical Report. Collaboration between DeepMind and Blizzard. arXiv:1708.04782.

[30]

D. G. Wilson, S. Cussat-Blanc, H. Luga, and J. F. Miller. 2018. Evolving Simple Programs for Playing Atari Games. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 229--236.

Digital Library

Cited By

Malan K(2021)A Survey of Advances in Landscape Analysis for OptimisationAlgorithms10.3390/a1402004014:2(40)Online publication date: 28-Jan-2021
https://doi.org/10.3390/a14020040
du Preez-Wilkinson NGallagher M(2020)Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy SpacesParallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58115-2_35(500-514)Online publication date: 2-Sep-2020
https://doi.org/10.1007/978-3-030-58115-2_35

Index Terms

Global structure of policy search spaces for reinforcement learning
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy Spaces
Parallel Problem Solving from Nature – PPSN XVI
Abstract
Reinforcement learning (RL) algorithms have received a lot of attention in recent years. However, relatively little work has been dedicated to analysing RL problems; which are thought to contain unique challenges, such as sparsity of the reward ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Mapping the global structure of TSP fitness landscapes

The global structure of combinatorial landscapes is not fully understood, yet it is known to impact the performance of heuristic search methods. We use a so-called local optima network model to characterise and visualise the global structure of ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2019

2161 pages

ISBN:9781450367486

DOI:10.1145/3319619

Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '19

Sponsor:

SIGEVO

GECCO '19: Genetic and Evolutionary Computation Conference

July 13 - 17, 2019

Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
105
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Malan K(2021)A Survey of Advances in Landscape Analysis for OptimisationAlgorithms10.3390/a1402004014:2(40)Online publication date: 28-Jan-2021
https://doi.org/10.3390/a14020040
du Preez-Wilkinson NGallagher M(2020)Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy SpacesParallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58115-2_35(500-514)Online publication date: 2-Sep-2020
https://doi.org/10.1007/978-3-030-58115-2_35

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy