0% found this document useful (0 votes)

24 views4 pages

2 - Overview of This Book

223

Uploaded by

leron iris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

2 - Overview of This Book

223

Uploaded by

leron iris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Overview of this Book

Algorithms/Methods

Chapter 4: Chapter 5: Chapter 6:

with model Stochastic
Value Iteration & to Monte Carlo
Policy Iteration Methods Approximation
without model

Chapter 7:
Chapter 3: Temporal-Difference
Chapter 2: Methods
Bellman Optimality
Bellman Equation
Equation
tabular representation
to
function representation
Chapter 1:
Basic Concepts
Chapter 8:
Value Function
Fundamental tools
Methods

Chapter 10: policy-based

Chapter 9:
Actor-Critic plus Policy Gradient
Methods value-based Methods

Figure 1: The map of this book.

Before we start the journey, it is important to look at the “map” of the book shown
in Figure 1. This book contains ten chapters, which can be classified into two parts: the
first part is about basic tools, and the second part is about algorithms. The ten chapters
are highly correlated. In general, it is necessary to study the earlier chapters first before
the later ones.
Next, please follow me on a quick tour through the ten chapters. Two aspects of each
chapter will be covered. The first aspect is the contents introduced in each chapter, and
the second aspect is its relationships with the previous and subsequent chapters. A heads
up for you to read this overview is as follows. The purpose of this overview is to give you
an impression of the contents and structure of this book. It is all right if you encounter
many concepts you do not understand. Hopefully, you can make a proper study plan

ix
that is suitable for you after reading this overview.

Chapter 1 introduces the basic concepts such as states, actions, rewards, returns, and
policies, which are widely used in the subsequent chapters. These concepts are first
introduced based on a grid world example, where a robot aims to reach a prespecified
target. Then, the concepts are introduced in a more formal manner based on the
framework of Markov decision processes.
Chapter 2 introduces two key elements. The first is a key concept, and the second is
a key tool. The key concept is the state value, which is defined as the expected return
that an agent can obtain when starting from a state if it follows a given policy. The
greater the state value is, the better the corresponding policy is. Thus, state values
can be used to evaluate whether a policy is good or not.
The key tool is the Bellman equation, which can be used to analyze state values. In
a nutshell, the Bellman equation describes the relationship between the values of all
states. By solving the Bellman equation, we can obtain the state values. Such a
process is called policy evaluation, which is a fundamental concept in reinforcement
learning. Finally, this chapter introduces the concept of action values.
Chapter 3 also introduces two key elements. The first is a key concept, and the
second is a key tool. The key concept is the optimal policy. An optimal policy has the
greatest state values compared to other policies. The key tool is the Bellman optimality
equation. As its name suggests, the Bellman optimality equation is a special Bellman
equation.
Here is a fundamental question: what is the ultimate goal of reinforcement learn-
ing? The answer is to obtain optimal policies. The Bellman optimality equation is
important because it can be used to obtain optimal policies. We will see that the
Bellman optimality equation is elegant and can help us thoroughly understand many
fundamental problems.

The first three chapters constitute the first part of this book. This part lays the
necessary foundations for the subsequent chapters. Starting in Chapter 4, the book
introduces algorithms for learning optimal policies.

Chapter 4 introduces three algorithms: value iteration, policy iteration, and truncated
policy iteration. The three algorithms have close relationships with each other. First,
the value iteration algorithm is exactly the algorithm introduced in Chapter 3 for
solving the Bellman optimality equation. Second, the policy iteration algorithm is
an extension of the value iteration algorithm. It is also the foundation for Monte
Carlo (MC) algorithms introduced in Chapter 5. Third, the truncated policy iteration
algorithm is a unified version that includes the value iteration and policy iteration
algorithms as special cases.

x
The three algorithms share the same structure. That is, every iteration has two steps.
One step is to update the value, and the other step is to update the policy. The idea
of the interaction between value and policy updates widely exists in reinforcement
learning algorithms. This idea is also known as generalized policy iteration. In ad-
dition, the algorithms introduced in this chapter are actually dynamic programming
algorithms, which require system models. By contrast, all the algorithms introduced
in the subsequent chapters do not require models. It is important to well understand
the contents of this chapter before proceeding to the subsequent ones.
Starting in Chapter 5, we introduce model-free reinforcement learning algorithms that
do not require system models. While this is the first time we introduce model-free
algorithms in this book, we must fill a knowledge gap: how to find optimal policies
without models? The philosophy is simple. If we do not have a model, we must have
some data. If we do not have data, we must have a model. If we have neither, then we
can do nothing. The “data” in reinforcement learning refer to the experience samples
generated when the agent interacts with the environment.
This chapter introduces three algorithms based on MC estimation that can learn
optimal policies from experience samples. The first and simplest algorithm is MC
Basic, which can be readily obtained by extending the policy iteration algorithm
introduced in Chapter 4. Understanding the MC Basic algorithm is important for
grasping the fundamental idea of MC-based reinforcement learning. By extending
this algorithm, we further introduce two more complicated but more efficient MC-
based algorithms. The fundamental trade-off between exploration and exploitation is
also elaborated in this chapter.

Up to this point, the reader may have noticed that the contents of these chapters are
highly correlated. For example, if we want to study the MC algorithms (Chapter 5), we
must first understand the policy iteration algorithm (Chapter 4). To study the policy
iteration algorithm, we must first know the value iteration algorithm (Chapter 4). To
comprehend the value iteration algorithm, we first need to understand the Bellman opti-
mality equation (Chapter 3). To understand the Bellman optimality equation, we need
to study the Bellman equation (Chapter 2) first. Therefore, it is highly recommended to
study the chapters one by one. Otherwise, it may be difficult to understand the contents
in the later chapters.

There is a knowledge gap when we move from Chapter 5 to Chapter 7: the algorithms
in Chapter 7 are incremental, but the algorithms in Chapter 5 are non-incremental.
Chapter 6 is designed to fill this knowledge gap by introducing the stochastic ap-
proximation theory. Stochastic approximation refers to a broad class of stochastic
iterative algorithms for solving root-finding or optimization problems. The classic
Robbins-Monro and stochastic gradient descent algorithms are special stochastic ap-
proximation algorithms. Although this chapter does not introduce any reinforcement

xi
learning algorithms, it is important because it lays the necessary foundations for s-
tudying Chapter 7.
Chapter 7 introduces the classic temporal-difference (TD) algorithms. With the prepa-
ration in Chapter 6, I believe the reader will not be surprised when seeing the TD
algorithms. From a mathematical point of view, TD algorithms can be viewed as
stochastic approximation algorithms for solving the Bellman or Bellman optimality
equations. Like Monte Carlo learning, TD learning is also model-free, but it has some
advantages due to its incremental form. For example, it can learn in an online manner:
it can update the value estimate every time an experience sample is received. This
chapter introduces quite a few TD algorithms such as Sarsa and Q-learning. The
important concepts of on-policy and off-policy are also introduced.
Chapter 8 introduces the value function approximation method. In fact, this chap-
ter continues to introduce TD algorithms, but it uses a different way to represent
state/action values. In the preceding chapters, state/action values are represented by
tables. The tabular method is straightforward to understand, but it is inefficient for
handling large state or action spaces. To solve this problem, we can employ the value
function approximation method. The key to understanding this method is to under-
stand the three steps in its optimization formulation. The first step is to select an
objective function for defining optimal policies. The second step is to derive the gradi-
ent of the objective function. The third step is to apply a gradient-based algorithm to
solve the optimization problem. This method is important because it has become the
standard technique to represent values. It is also the location in which artificial neu-
ral networks are incorporated into reinforcement learning as function approximators.
The famous deep Q-learning algorithm is also introduced in this chapter.
Chapter 9 introduces the policy gradient method, which is the foundation of many
modern reinforcement learning algorithms. The policy gradient method is policy-based.
It is a large step forward in this book because all the methods in the previous chapters
are value-based. The basic idea of the policy gradient method is simple: it selects
an appropriate scalar metric and then optimizes it via a gradient-ascent algorithm.
Chapter 9 has an intimate relationship with Chapter 8 because they both rely on the
idea of function approximation. The advantages of the policy gradient method are
numerous. For example, it is more efficient for handling large state/action spaces. It
has stronger generalization abilities and is more efficient in sample usage.
Chapter 10 introduces actor-critic methods. From one point of view, actor-critic refers
to a structure that incorporates both policy-based and value-based methods. From
another point of view, actor-critic methods are not new since they still fall into the
scope of the policy gradient method. Specifically, they can be obtained by extending
the policy gradient algorithm introduced in Chapter 9. It is necessary for the reader
to properly understand the contents in Chapters 8 and 9 before studying Chapter 10.

xii

Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Reinforcement Learning in A Nutshell
No ratings yet
Reinforcement Learning in A Nutshell
12 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
1 - Table of Contents
No ratings yet
1 - Table of Contents
6 pages
Book All in One
No ratings yet
Book All in One
288 pages
Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
No ratings yet
Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
283 pages
RL Test Leif
No ratings yet
RL Test Leif
163 pages
Book All-In-One 2
No ratings yet
Book All-In-One 2
281 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Fundamentals of Reinforcement Learning Learning Objectives
No ratings yet
Fundamentals of Reinforcement Learning Learning Objectives
3 pages
Unit-5 Ai
No ratings yet
Unit-5 Ai
19 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
CS229
No ratings yet
CS229
17 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
No ratings yet
An Introduction To Reinforcement Learning From Theory To Algorithms (December 19, 2024) - Joon Kwon
66 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
M 2
No ratings yet
M 2
12 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
No ratings yet
19 - Monte Carlo and Temporal Difference For Markov Decision Processes
57 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
20ai903 - RL - Unit 2
No ratings yet
20ai903 - RL - Unit 2
27 pages
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
No ratings yet
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
14 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
2.2+model Free+Control
No ratings yet
2.2+model Free+Control
92 pages
Add-On DRL CS06
No ratings yet
Add-On DRL CS06
23 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Lecture 12 Slides - After
No ratings yet
Lecture 12 Slides - After
50 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Value Functions & Bellman Equations: UNIT-3
No ratings yet
Value Functions & Bellman Equations: UNIT-3
11 pages
18 - Dynamic Programming For Markov Decision Processes
No ratings yet
18 - Dynamic Programming For Markov Decision Processes
50 pages
Subtitle
No ratings yet
Subtitle
1 page
Lec 09
No ratings yet
Lec 09
51 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
2025 - MDPs 2
No ratings yet
2025 - MDPs 2
42 pages
5SC28 Machine Learning For Systems and Control
No ratings yet
5SC28 Machine Learning For Systems and Control
68 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
No ratings yet
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
16 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
17 - Markov Decision Processes
No ratings yet
17 - Markov Decision Processes
59 pages
Reinforcement Learning As Classification: Leveraging Modern Classifiers
No ratings yet
Reinforcement Learning As Classification: Leveraging Modern Classifiers
8 pages
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Objective Function Decisions Demand Supply Constraints
No ratings yet
Objective Function Decisions Demand Supply Constraints
7 pages
Productores de Banano de Nicaragua Probanic Datos Climáticos de Estación Finca San Luis Enero, 2011
No ratings yet
Productores de Banano de Nicaragua Probanic Datos Climáticos de Estación Finca San Luis Enero, 2011
17 pages
Iphone Laptop Computer Information
No ratings yet
Iphone Laptop Computer Information
1 page
Math of Finance
No ratings yet
Math of Finance
33 pages
Touch Me - Chapter 1 - Baeconandeggs, Xiaolianhua - EXO (Band) (Archive of Our Own)
No ratings yet
Touch Me - Chapter 1 - Baeconandeggs, Xiaolianhua - EXO (Band) (Archive of Our Own)
12 pages
1 Datasheet Solis-3P10K-4G
No ratings yet
1 Datasheet Solis-3P10K-4G
2 pages
Happy Birthday
No ratings yet
Happy Birthday
2 pages
BSc Microbiology
No ratings yet
BSc Microbiology
28 pages
Critical Analysis of My Mother at Sixty Six
No ratings yet
Critical Analysis of My Mother at Sixty Six
7 pages
School Plan of Activities Sembreak
No ratings yet
School Plan of Activities Sembreak
2 pages
Hyperlipidemia 1
No ratings yet
Hyperlipidemia 1
54 pages
Chapter 3 Data Modeling Using The Entity Relationship ER Model
No ratings yet
Chapter 3 Data Modeling Using The Entity Relationship ER Model
55 pages
Problem Solving 11 20
No ratings yet
Problem Solving 11 20
10 pages
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
No ratings yet
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
13 pages
Scedule of Defense
No ratings yet
Scedule of Defense
1 page
The Art of Growing Irish Potatoes in Sacks
No ratings yet
The Art of Growing Irish Potatoes in Sacks
6 pages
Applied Sciences: Fficiency Analysis of Manufacturing Line With
No ratings yet
Applied Sciences: Fficiency Analysis of Manufacturing Line With
15 pages
Order Now Whatsapp: Course: Teacher Education in Pakistan (8626) Semester: Spring, 2023 Level: B.Ed. (1.5 Years)
No ratings yet
Order Now Whatsapp: Course: Teacher Education in Pakistan (8626) Semester: Spring, 2023 Level: B.Ed. (1.5 Years)
14 pages
Pumeet
No ratings yet
Pumeet
46 pages
Of Plymouth Plantation PDF
100% (2)
Of Plymouth Plantation PDF
4 pages
Mechanical Engineering Seminars
No ratings yet
Mechanical Engineering Seminars
1 page
Reflective Essay 1
No ratings yet
Reflective Essay 1
2 pages
Lexicology Summary 1
No ratings yet
Lexicology Summary 1
1 page
ISO-9426-2003 - Wood-Based Panels - Determination of Dimension of Panels
No ratings yet
ISO-9426-2003 - Wood-Based Panels - Determination of Dimension of Panels
9 pages
The Psychology of Academic Achievement
No ratings yet
The Psychology of Academic Achievement
31 pages
Venkat - AEM Developer
No ratings yet
Venkat - AEM Developer
4 pages
Current Research Topics in Optical Sensors and Laser Diagnostics
No ratings yet
Current Research Topics in Optical Sensors and Laser Diagnostics
17 pages
Kagawaran NG Edukasyon: OUA MEMO 00-0821-0062
No ratings yet
Kagawaran NG Edukasyon: OUA MEMO 00-0821-0062
112 pages
Rhabdo Virus
No ratings yet
Rhabdo Virus
13 pages
Punching Shear
100% (1)
Punching Shear
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2 - Overview of This Book

Uploaded by

2 - Overview of This Book

Uploaded by

Overview of this Book

Chapter 4: Chapter 5: Chapter 6:

Chapter 10: policy-based

Figure 1: The map of this book.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.