Fcteg 02 722092

1) The document presents a reinforcement learning model that trains an agent to develop its own representation of time using a population of recurrently connected nonlinear firing rate neurons. 2) The model is evaluated on a "task switching" scenario where the agent must click on multiple circles in specific time windows. This tests the agent's ability to learn both spatial and temporal decisions. 3) The research finds that representing time using a population of dynamically interacting neurons is better suited than traditional neural networks for learning precise time intervals and temporal scaling needed for adaptive control systems.

Uploaded by

mda mps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

Fcteg 02 722092

Uploaded by

mda mps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

ORIGINAL RESEARCH

published: 06 August 2021

doi: 10.3389/fcteg.2021.722092

Time and Action Co-Training in

Reinforcement Learning Agents
Ashlesha Akella and Chin-Teng Lin *

Faculty of Engineering and Information Technology (FEIT), School of Computer Science, Australian Artiﬁcial Intelligence Institute,
University of Technology Sydney, Sydney, NSW, Australia

In formation control, a robot (or an agent) learns to align itself in a particular spatial
alignment. However, in a few scenarios, it is also vital to learn temporal alignment along
with spatial alignment. An effective control system encompasses flexibility, precision,
and timeliness. Existing reinforcement learning algorithms excel at learning to select an
action given a state. However, executing an optimal action at an appropriate time
remains challenging. Building a reinforcement learning agent which can learn an
optimal time to act along with an optimal action can address this challenge. Neural
networks in which timing relies on dynamic changes in the activity of population
neurons have been shown to be a more effective representation of time. In this
work, we trained a reinforcement learning agent to create its representation of time
using a neural network with a population of recurrently connected nonlinear firing rate
neurons. Trained using a reward-based recursive least square algorithm, the agent
learned to produce a neural trajectory that peaks at the “time-to-act”; thus, it learns
“when” to act. A few control system applications also require the agent to temporally
Edited by:
Qin Wang,
scale its action. We trained the agent so that it could temporally scale its action for
Yangzhou University, China different speed inputs. Furthermore, given one state, the agent could learn to plan
Reviewed by: multiple future actions, that is, multiple times to act without needing to observe a
Peng Liu, new state.
North University of China, China
Tianhong Liu, Keywords: reinforcement learning, recurrent neural network, time perception, formation control, temporal scaling
Yangzhou University, China
*Correspondence:
Chin-Teng Lin 1 INTRODUCTION
chin-teng.lin@uts.edu.au
A powerful formation control system requires continuously monitoring the current state, comparing
Specialty section: the performance, and deciding whether to take necessary actions. This process does not only need to
This article was submitted to understand the system’s state and optimal actions but also needs to learn the appropriate time to
Nonlinear Control, perform an action. Deep reinforcement learning algorithms which have achieved remarkable success
a section of the journal
in the field of robotics, games, and board games have also been shown to perform well in adaptive
Frontiers in Control Engineering
control system problems Li et al. (2019); Oh et al. (2015); Xue et al. (2013). However, the challenge of
Received: 08 June 2021
learning the precise time to act has not been directly addressed.
Accepted: 12 July 2021
The ability to measure time from the start of a state change and use it accordingly is an essential
Published: 06 August 2021
part of applications such as adaptive control systems. In general, the environment encodes as four
Citation:
dimensions: the three dimensions of space and the dimension. The role of representation of time
Akella A and Lin C-T (2021) Time and
Action Co-Training in Reinforcement
affects the decision-making process along with the spatial aspects of the environment Klapproth
Learning Agents. (2008). However, in the field of reinforcement learning (RL), the essential role of time is not explicitly
Front. Control. Eng. 2:722092. acknowledged, and existing RL research mainly focuses on the spatial dimensions. The lack of time
doi: 10.3389/fcteg.2021.722092 sense might not be an issue when considering a simple behavioral task, but many tasks in control