0% found this document useful (0 votes)

28 views23 pages

08. Chapter. 08 - Markov Decision Processes - Examples

Uploaded by

truong khoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views23 pages

08. Chapter. 08 - Markov Decision Processes - Examples

Uploaded by

truong khoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Markov Decision Processes

Examples
Chapter 8
(adapted from https://inst.eecs.berkeley.edu/~cs188/fa18)
MDPs: Micro-Blackjack

In micro-blackjack, you repeatedly draw a card (with replacement)

that is equally likely to be a 2, 3, or 4. You can either Draw or Stop if
the total score of the cards you have drawn is less than 6. If your
total score is 6 or higher, the game ends, and you receive a utility of
0. When you Stop, your utility is equal to your total score (up to 5),
and the game ends. When you Draw, you receive no utility. There is
no discount (g = 1). Let’s formulate this problem as an MDP with the
following states: 0, 2, 3, 4, 5 and a Done state, for when the game
ends.

2
MDPs: Micro-Blackjack

a) What is the transition function and the reward function for this MDP?
Note:
• States s: 0,2,3,4,5, Done
• Actions a: Draw, Stop
• Transition function T(s, a, s’)?
(Probability that action a from s leads to s’, i.e., P(s’| s, a))
• Reward function R(s, a, s’):? (utility values)
• Discount: no discount (g = 1)

3
MDPs: Micro-Blackjack

a) What is the transition function and the reward function for this MDP?
• T(s, Stop, s’) = 1
• T(0, Draw, s’) = 1/3 s’ = {2,3,4}
• T(2, Draw, s’) = 1/3 s’ = {2+2,2+3,2+4} = {4,5,6}
• T(3, Draw, s’) = 1/3 {s’=5}; s’ = {3+2,3+3,3+4} = {5,Done,Done}
• T(3, Draw, s’) = 2/3 {s’=Done} s’ = {5,Done}
• T(4, Draw, Done) = 1
• T(5, Draw, Done) = 1
• Reward function R(s, a, s’):
o R(s, Stop, Done) = s, s ⩽ 5
o R(s, a, s’) = 0 otherwise

4
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
States 0 2 3 4 5
V0
V1
V2
V3
V4

5
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
• V0 = 0 (for all state)
• Value Iteration:
• V1,2,3,4: compute value for each action (Draw, Stop)
o Vk(s’) = 0 b/c it’s a done state
o Vi (s=0, Stop) = 1 * [0 + 0] = 0 • T(s, Stop, s’) = 1
• Reward function R(s, a, s’)
o Vi (s=2, Stop) = 1 * [2 + 0] = 2 o R(s, Stop, Done) = s, s ⩽ 5
o Vi (s=3, Stop) = 1 * [3 + 0] = 3 o R(s, a, s’) = 0 otherwise
o Vi (s=4, Stop) = 1 * [4 + 0] = 4
o Vi (s=5, Stop) = 1 * [5 + 0] = 5

6
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
• Reward function R(s, a, s’)
o R(s, Stop, Done) = s, s ⩽ 5
o R(s, a, s’) = 0 otherwise

• V0 = 0 (for all state)

• Value Iteration:
• V1,2,3,4: compute value for each action (Draw, Stop)
o V! s, Draw = ∑"’ T s, Draw, s’ R s, Draw, s’ + 1 ∗ V$(s’)
o V! s, Draw = ∑"’ T s, Draw, s’ 0 + 1 ∗ 0 = 0

7
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
States 0 2 3 4 5
V0 0 0 0 0 0
Draw Stop Draw Stop Draw Stop Draw Stop Draw Stop
0 0 0 2 0 3 0 4 0 5
V1
0 2 3 4 5
V2

8
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
• Value Iteration:
• V1,2,3,4: compute value for each action (Draw, Stop)
o V% s, Draw = ∑"’ T s, Draw, s’ R s, Draw, s’ + 1 ∗ V!(s’)
o V% s, Draw = ∑"’ T s, Draw, s’ V!(s’)
o V2(s=0, Draw) = 1/3*2 + 1/3*3 + 1/3*4 = 3
o V2(s=2, Draw) = 1/3*4 + 1/3*5 = 3
o V2(s=3, Draw) = 1/3*5 = 5/3 • Reward function R(s, a, s’)
o R(s, Stop, Done) = s, s ⩽ 5
o V2(s=4, Draw) = 0 o R(s, a, s’) = 0 otherwise
o V2(s=5, Draw) = 0

9
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
States 0 2 3 4 5
V0 0 0 0 0 0
Draw Stop Draw Stop Draw Stop Draw Stop Draw Stop
0 0 0 2 0 3 0 4 0 5
V1
0 2 3 4 5
3 0 3 2 5/3 3 0 4 0 5
V2
3 3 3 4 5
V3

10
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
• Value Iteration:
• V1,2,3,4: compute value for each action (Draw, Stop)
o V& s, Draw = ∑"’ T s, Draw, s’ R s, Draw, s’ + 1 ∗ V%(s’)
o V& s, Draw = ∑"’ T s, Draw, s’ V%(s’)
o V3(s=0, Draw) = 1/3*3 + 1/3*3 + 1/3*4 = 10/3
o V3(s=2, Draw) = 1/3*4 + 1/3*5 = 3
o V3(s=3, Draw) = 1/3*5 = 5/3 • Reward function R(s, a, s’)
o R(s, Stop, Done) = s, s ⩽ 5
o V3(s=4, Draw) = 0 o R(s, a, s’) = 0 otherwise
o V3(s=5, Draw) = 0

11
MDPs: Micro-Blackjack

12
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.
• Value Iteration:
• V1,2,3,4: compute value for each action (Draw, Stop)
o V' s, Draw = ∑"’ T s, Draw, s’ R s, Draw, s’ + 1 ∗ V&(s’)
o V' s, Draw = ∑"’ T s, Draw, s’ V%(s’)
o V4(s=0, Draw) = 1/3*3 + 1/3*3 + 1/3*4 = 10/3
o V4(s=2, Draw) = 1/3*4 + 1/3*5 = 3
o V4(s=3, Draw) = 1/3*5 = 5/3 • Reward function R(s, a, s’)
o R(s, Stop, Done) = s, s ⩽ 5
o V4(s=4, Draw) = 0 o R(s, a, s’) = 0 otherwise
o V4(s=5, Draw) = 0

13
MDPs: Micro-Blackjack

14
MDPs: Micro-Blackjack

b) Fill in the following table of value iteration values for the first 4 iterations.

States 0 2 3 4 5
V0 0 0 0 0 0
V1 0 2 3 4 5
V2 3 3 3 4 5
V3 10/3 3 3 4 5
V4 10/3 3 3 4 5

15
MDPs: Micro-Blackjack

c) You should have noticed that value iteration converged above. What
is the optimal policy for the MDP?

States 0 2 3 4 5
p*

16
MDPs: Micro-Blackjack

c) You should have noticed that value iteration converged above. What
is the optimal policy for the MDP?
• optimal policy = optimal action from state s
• Q*(s,a) = expected utility starting out having taken action a from state
s and (thereafter) acting optimally à optimal action gives best Q(s,a)

States 0 2 3 4 5
Draw Stop Draw Stop Draw Stop Draw Stop Draw Stop
V1 0 0 0 2 0 3 0 4 0 5
V2 3 0 3 2 5/3 3 0 4 0 5
V3 10/3 0 3 2 5/3 3 0 4 0 5
V4 10/3 0 c3 2 5/3 3c 0 4 0 5

• 17
MDPs: Micro-Blackjack

c) You should have noticed that value iteration converged above. What
is the optimal policy for the MDP?

States 0 2 3 4 5
p* Draw Draw Stop Stop Stop

18
MDPs: Micro-Blackjack

d) Perform one iteration of policy iteration for one step of this MDP,
starting from the fixed policy below:

States 0 2 3 4 5
pi Draw Stop Draw Stop Draw
Vpi
pi+1

19
MDPs: Micro-Blackjack

d) Perform one iteration of policy iteration for one step of this MDP,
starting from the fixed policy below:
• Utilities for a Fixed Policy:

• V ()*+ 5 = ∑(,-. 1 ∗ [0 + 1.0 ∗ 0] = 0

• V /0,1 4 = ∑(,-. 1 ∗ 4 = 4
! %
• V ()*+ 3 = ∑2,(,-. ∗ [0 + 1.0 ∗ 0]
&
+ &
∗ [0 + 1.0 ∗ 0] =0
• V /0,1 2 = ∑(,-. 1 ∗ 2 = 2
!
• V ()*+ 0 = ∑%,&,' & ∗ 0 + 1.0 ∗ 2 + 0 + 1.0 ∗ 0 + [0 + 1.0 ∗ 4] = 2

20
MDPs: Micro-Blackjack

d) Perform one iteration of policy iteration for one step of this MDP,
starting from the fixed policy below:
• Utilities for a Fixed Policy:

States 0 2 3 4 5
pi Draw Stop Draw Stop Draw
Vpi 2 2 0 4 0
pi+1

21
MDPs: Micro-Blackjack

d) Perform one iteration of policy iteration for one step of this MDP,
starting from the fixed policy below:
• Computing Actions from Q-Values:
• S = 5: Q(5, D) = 0; Q(5, S) = 5 à Qmax (s=5) = 5 à a = Stop
• S = 4: Q(4, D) = 0; Q(4, S) = 4 à Qmax (s=4) = 4 à a = Stop
• S = 3: Q(3, D) = T(3,D,5)*VD(5) + T(3,D,Done)*VD(Done) = 0 + 0 = 0
Q(3, S) = 3 à Qmax (s=3) = 3 à a = Stop
• S = 2: Q(2, D) = T(2,D,4)* VD(4) + T(2,D,5)* VD(5) + T(2,D,Done)* VD(Done)
Q(2, D) = 1/3*4 + 1/3*0 + 1/3*0 = 4/3
Q(2, S) = 2 à Qmax (s=2) = 2 à a = Stop
• S = 0: Q(0, D) = T(0,D,2)* VD(2) + T(0,D,3)* VD(3) + T(0,D,4)* VD(4)
Q(0, D) = 1/3*2 + 1/3*0 + 1/3*4 = 2
Q(0, S) = 0 à Qmax (s=0) = 2 à a = Draw
22
MDPs: Micro-Blackjack

d) Perform one iteration of policy iteration for one step of this MDP,
starting from the fixed policy below:
• Computing Actions from Q-Values:

States 0 2 3 4 5
pi Draw Stop Draw Stop Draw
Vpi 2 2 0 4 0
pi+1 Draw Stop Stop Stop Stop

PerformanceResults 7.131.1 23082024 PSU
No ratings yet
PerformanceResults 7.131.1 23082024 PSU
513 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Littomore
No ratings yet
Littomore
169 pages
06 MDP
No ratings yet
06 MDP
89 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
The History of Hungary and The Magyars
100% (1)
The History of Hungary and The Magyars
401 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Fycs Sem II Design&Analysisofalgorithm Notes.docx (1)
No ratings yet
Fycs Sem II Design&Analysisofalgorithm Notes.docx (1)
85 pages
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
No ratings yet
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
14 pages
2023 Week3 Discussion Updated
No ratings yet
2023 Week3 Discussion Updated
21 pages
2025_MDPs 1
No ratings yet
2025_MDPs 1
62 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
Training Practice
No ratings yet
Training Practice
140 pages
Lec 09
No ratings yet
Lec 09
51 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Practice Problem Set
No ratings yet
Practice Problem Set
3 pages
solutions-Markov_Decision_Processes
No ratings yet
solutions-Markov_Decision_Processes
8 pages
Reliability Analysis of Shear Strength of A
No ratings yet
Reliability Analysis of Shear Strength of A
14 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
15 MDP
No ratings yet
15 MDP
35 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Week 10
No ratings yet
Week 10
5 pages
18 - Dynamic Programming for Markov Decision Processes.pptx
No ratings yet
18 - Dynamic Programming for Markov Decision Processes.pptx
50 pages
Agriculture Knowledge Management: Advance Training Program On
No ratings yet
Agriculture Knowledge Management: Advance Training Program On
68 pages
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
No ratings yet
Quick Start: Resolving A Markov Decision Process Problem Using The Mdptoolbox in Matlab
9 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
CAB Lecture 3
No ratings yet
CAB Lecture 3
35 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Assignment 5
No ratings yet
Assignment 5
2 pages
RL_Practice_Midterm
No ratings yet
RL_Practice_Midterm
4 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Rl Exam Tutti
No ratings yet
Rl Exam Tutti
47 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Reinforcement Learning: Part I - Definitions
No ratings yet
Reinforcement Learning: Part I - Definitions
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
07. Chapter. 07 - Expectimax Search and Utilities
No ratings yet
07. Chapter. 07 - Expectimax Search and Utilities
47 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
M 2
No ratings yet
M 2
12 pages
The Best Web Hosting
No ratings yet
The Best Web Hosting
14 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Markov decision
No ratings yet
Markov decision
4 pages
Tutorial
No ratings yet
Tutorial
28 pages
Catalog Title: Subtitle
No ratings yet
Catalog Title: Subtitle
4 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
Ref Final
No ratings yet
Ref Final
6 pages
Data and Data Set
No ratings yet
Data and Data Set
31 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
RL_Paper_Deepsk
No ratings yet
RL_Paper_Deepsk
4 pages
Penerjemahan Tindak Tutur Direktif Bahasa Jepang Dalam Novel Nijuushi No Hitomi Dan Dua Belas Pasang
No ratings yet
Penerjemahan Tindak Tutur Direktif Bahasa Jepang Dalam Novel Nijuushi No Hitomi Dan Dua Belas Pasang
13 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
CS229
No ratings yet
CS229
17 pages
cs229-notes12 Reinforcement in Control
No ratings yet
cs229-notes12 Reinforcement in Control
17 pages
06. Chapter. 06 - Adversarial Search and Games - No Embedded Videos
No ratings yet
06. Chapter. 06 - Adversarial Search and Games - No Embedded Videos
51 pages
Pharmacy Final
No ratings yet
Pharmacy Final
40 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
What is Decentralized Data - Software Architecture
No ratings yet
What is Decentralized Data - Software Architecture
15 pages
Coa Presentation
No ratings yet
Coa Presentation
15 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Mobile Transport Protocols
No ratings yet
Mobile Transport Protocols
15 pages
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
rl
No ratings yet
rl
6 pages
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
No ratings yet
Exam Prep 4 Solutions: Q1. MDPS: Dice Bonanza
4 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
ns3 Programs
No ratings yet
ns3 Programs
4 pages
AI_TEST
No ratings yet
AI_TEST
10 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
Restful Web Services With Dropwizard
No ratings yet
Restful Web Services With Dropwizard
9 pages
Iot Based Web Controlled Notice Board: International Research Journal of Engineering and Technology (Irjet)
No ratings yet
Iot Based Web Controlled Notice Board: International Research Journal of Engineering and Technology (Irjet)
4 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
SPSS Installation Instructions
No ratings yet
SPSS Installation Instructions
18 pages
Smart Asthma Inhaler
No ratings yet
Smart Asthma Inhaler
3 pages
Microsoft Dynamics 365 Cloud Service Compliance Datasheet
No ratings yet
Microsoft Dynamics 365 Cloud Service Compliance Datasheet
6 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Linear Inequalities: Chapter - 6
No ratings yet
Linear Inequalities: Chapter - 6
4 pages
Homework 5 Solutions: 2.4 - Invertibility and Isomorphisms
No ratings yet
Homework 5 Solutions: 2.4 - Invertibility and Isomorphisms
4 pages
Partial Quotients Division-Parents PDF
No ratings yet
Partial Quotients Division-Parents PDF
10 pages
It 6001 Da 2 Marks With Answer PDF
No ratings yet
It 6001 Da 2 Marks With Answer PDF
10 pages
SOW - DSC652 - Oct2023
No ratings yet
SOW - DSC652 - Oct2023
3 pages
agent - AI
No ratings yet
agent - AI
3 pages
MS Lecture Notes
No ratings yet
MS Lecture Notes
6 pages
AI - Prune
No ratings yet
AI - Prune
3 pages
GraphPlan
No ratings yet
GraphPlan
3 pages
mcadd-401-operating-systems-jun-2020_1
No ratings yet
mcadd-401-operating-systems-jun-2020_1
3 pages
XXXXXDDDDDDD
No ratings yet
XXXXXDDDDDDD
2 pages
Reinforcement Learning - Unit 6 - Week 4
No ratings yet
Reinforcement Learning - Unit 6 - Week 4
3 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

08. Chapter. 08 - Markov Decision Processes - Examples

Uploaded by

08. Chapter. 08 - Markov Decision Processes - Examples

Uploaded by

Markov Decision Processes

In micro-blackjack, you repeatedly draw a card (with replacement)

• V0 = 0 (for all state)

• V ()*+ 5 = ∑(,-. 1 ∗ [0 + 1.0 ∗ 0] = 0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.