0% found this document useful (0 votes)

14 views32 pages

RL MJJ

The document provides an overview of Reinforcement Learning (RL), explaining its types, including positive and negative reinforcement, and how it operates through an agent interacting with an environment. It introduces the Bellman equation as a key concept for calculating value functions and discusses Q-learning as a model-free RL algorithm that aims to maximize rewards. Additionally, it describes the creation of a Q-table to help agents select optimal actions based on learned Q-values.

Uploaded by

Kevilsinh Barad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views32 pages

RL MJJ

Uploaded by

Kevilsinh Barad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Reinforcement

Learning
Introduction
Example:

 Any problem where decision making is sequential, and the goal is

long-term like:
 8 – puzzle
 Chess
 Tik-tak-toe
 Any state action problem
Elements of RL
Conti..
Agent environment interface
Types of Reinforcement learning

 There are mainly two types of reinforcement learning, which are:

 Positive Reinforcement
 Negative Reinforcement
Positive Reinforcement:

 The positive reinforcement learning means adding something to

increase the tendency that expected behavior would occur again. It
impacts positively on the behavior of the agent and increases the
strength of the behavior.
 This type of reinforcement can sustain the changes for a long time,
but too much positive reinforcement may lead to an overload of
states that can reduce the consequences.
Negative Reinforcement:

 The negative reinforcement learning is opposite to the positive

reinforcement as it increases the tendency that the specific behavior
will occur again by avoiding the negative condition.
 It can be more effective than the positive reinforcement depending on
situation and behavior, but it provides reinforcement only to meet
minimum behavior.
How does Reinforcement Learning
Work?
 To understand the working process of the RL,
we need to consider two main things:
 Environment: It can be anything such as a
room, maze, football ground, etc.
 Agent: An intelligent agent such as AI robot.
 Let's take an example of a maze environment
that the agent needs to explore. Consider the
image:
Conti..

 In the above image, the agent is at the very first block of the maze. The
maze is consisting of an S6 block, which is a wall, S8 a fire pit, and
S4 a diamond block.
 The agent cannot cross the S6 block, as it is a solid wall. If the agent
reaches the S4 block, then get the +1 reward; if it reaches the fire pit,
then gets -1 reward point. It can take four actions: move up, move
down, move left, and move right.
 The agent can take any path to reach to the final point, but he needs to
make it in possible fewer steps. Suppose the agent considers the
path S9-S5-S1-S2-S3, so he will get the +1-reward point.
 The agent will try to remember the preceding steps that it has taken to
reach the final step. To memorize the steps, it assigns 1 value to each
previous step. Consider the following step:
 Now, the agent has successfully stored the previous steps assigning
the 1 value to each previous block. But what will the agent do if he
starts moving from the block, which has 1 value block on both sides?
Consider the below diagram:
Conti..

 It will be a difficult condition for the agent whether he should go up or

down as each block has the same value. So, the above approach is
not suitable for the agent to reach the destination. Hence to solve the
problem, we will use the Bellman equation, which is the main
concept behind reinforcement learning.
The Bellman Equation
 The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the
year 1953, and hence it is called as a Bellman equation. It is associated with dynamic
programming and used to calculate the values of a decision problem at a certain point by
including the values of previous states.
 It is a way of calculating the value functions in dynamic programming or environment that leads
to modern reinforcement learning.
 The key-elements used in Bellman equations are:
 Action performed by the agent is referred to as "a"
 State occurred by performing the action is "s."
 The reward/feedback obtained for each good and bad action is "R."
 A discount factor is Gamma "γ."
 The Bellman equation can be written as:
 In the above equation, we are taking the max of the complete values because the
agent tries to find the optimal solution always.
 So now, using the Bellman equation, we will find value at each state of the given
environment. We will start from the block, which is next to the target block.
 For 1st block:
 V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.
 V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
 For 2nd block:
 V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because
there is no reward at this state.
 V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9
 For 3rd block:
 V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because
there is no reward at this state also.
 V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81
 For 4th block:
 V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because
there is no reward at this state also.
 V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73
 For 5th block:
 V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because
there is no reward at this state also.
 V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66
 Now, we will move further to the 6th block, and
here agent may change the route because it
always tries to find the optimal path. So now, let's
consider from the block next to the fire pit.
 Now, the agent has three options to move; if he moves to the blue box, then
he will feel a bump if he moves to the fire pit, then he will get the -1 reward.
But here we are taking only positive rewards, so for this, he will move to
upwards only. The complete block values will be calculated using this
formula. Consider the below image:
Q learning
 Q-learning is an Off policy RL algorithm, which is used for the
temporal difference Learning. The temporal difference learning
methods are the way of comparing temporally successive predictions.
 It learns the value function Q (S, a), which means how good to take
action "a" at a particular state "s.“
 Q-learning is a popular model-free reinforcement learning algorithm
based on the Bellman equation.
 The main objective of Q-learning is to learn the policy which
can inform the agent that what actions should be taken for
maximizing the reward under what circumstances.
 The Q stands for quality in Q-learning, which means it specifies the
quality of an action taken by the agent.
 The goal of the agent in Q-learning is to maximize the value of Q.
 The value of Q-learning can be derived from the Bellman equation.
Algorithm
Example

 The value of Q-learning can be derived from the Bellman equation.

Consider the Bellman equation given below:

 In the equation, we have various components, including reward,

discount factor (γ), probability, and end states s'.
Conti..
 But there is no any Q-value is given so first consider the below image:

 In the above image, we can see there is an agent who has three
values options, V(s1), V(s2), V(s3). As this is MDP(Markov Decision
Process), so agent only cares for the current state and the future
state. The agent can go to any direction (Up, Left, or Right), so he
needs to decide where to go for the optimal path. Here agent will take
a move as per probability bases and changes the state. But if we
want some exact moves, so for this, we need to make some changes
in terms of Q-value.
Conti..
 Q- represents the quality of the actions at each state. So instead of
using a value at each state, we will use a pair of state and action,
i.e., Q(s, a). Q-value specifies that which action is more lubricative
than others, and according to the best Q-value, the agent takes his
next move. The Bellman equation can be used for deriving the Q-
value.
 To perform any action, the agent will get a reward R(s, a), and also
he will end up on a certain state, so the Q -value equation will be:
Q-table

 A Q-table or matrix is created while performing the Q-learning. The

table follows the state and action pair, i.e., [s, a], and initializes the
values to zero. After each action, the table is updated, and the q-
values are stored within the table.
 The RL agent uses this Q-table as a reference table to select the best
action based on the q-values.

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
21ai020 & Reinforcement Learning: The Agent-Environment Interface
No ratings yet
21ai020 & Reinforcement Learning: The Agent-Environment Interface
8 pages
Unit 4
No ratings yet
Unit 4
49 pages
Unit - 5 RL
No ratings yet
Unit - 5 RL
38 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
114021
No ratings yet
114021
55 pages
Unit 5
No ratings yet
Unit 5
34 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
Unit 4
No ratings yet
Unit 4
12 pages
Module 1
No ratings yet
Module 1
81 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Unit 5
No ratings yet
Unit 5
36 pages
RL Assignment PDF
No ratings yet
RL Assignment PDF
10 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Lecture Doubts
No ratings yet
Lecture Doubts
2 pages
Unit-5 Genetic Reinforcement Markov Q-Learning
No ratings yet
Unit-5 Genetic Reinforcement Markov Q-Learning
39 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
RL Frra
No ratings yet
RL Frra
9 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Unit 5
No ratings yet
Unit 5
10 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
Bookmap Masterclass Basic and Advanced Englunlockeda4 PDF Free Pages 3 - Compressed
No ratings yet
Bookmap Masterclass Basic and Advanced Englunlockeda4 PDF Free Pages 3 - Compressed
70 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Deep Reinforcement Learning - Guide To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning - Guide To Deep Q-Learning
1 page
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
LExmark M5155
No ratings yet
LExmark M5155
1,041 pages
The Internet of Things: Architecture and Applications (ELEC423)
No ratings yet
The Internet of Things: Architecture and Applications (ELEC423)
48 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
JCM Training Overview Uba 10-11-12 14
No ratings yet
JCM Training Overview Uba 10-11-12 14
25 pages
Lab 1 - Modeling Photovoltaic Module in Matlab-Simulink
No ratings yet
Lab 1 - Modeling Photovoltaic Module in Matlab-Simulink
4 pages
Thesis Statement Worksheet 5th Grade
100% (2)
Thesis Statement Worksheet 5th Grade
4 pages
Q1-Proper Use of Tools
No ratings yet
Q1-Proper Use of Tools
10 pages
Tux Paint 06
No ratings yet
Tux Paint 06
6 pages
Offers: Why Choose Intrcity Smartbus ?
100% (1)
Offers: Why Choose Intrcity Smartbus ?
6 pages
MTH603 Final Term Solved MCQ's
No ratings yet
MTH603 Final Term Solved MCQ's
9 pages
Se Unit 2 Analysis Modelling
No ratings yet
Se Unit 2 Analysis Modelling
68 pages
CM100 SpecificationEng
No ratings yet
CM100 SpecificationEng
3 pages
Ohes4411 - 1112
No ratings yet
Ohes4411 - 1112
27 pages
IEEE - Template-Referencia
No ratings yet
IEEE - Template-Referencia
5 pages
Mark Min
No ratings yet
Mark Min
6 pages
Chetan CV
No ratings yet
Chetan CV
2 pages
Thinking Business Model
No ratings yet
Thinking Business Model
2 pages
Codeverse Documentation
No ratings yet
Codeverse Documentation
60 pages
Field-Programmable Analog Arrays Enable Mixed-Signal Prototyping of Embedded Systems
No ratings yet
Field-Programmable Analog Arrays Enable Mixed-Signal Prototyping of Embedded Systems
4 pages
Smart Lock System Project Report
No ratings yet
Smart Lock System Project Report
2 pages
Types of Event: What Is An Event?
No ratings yet
Types of Event: What Is An Event?
6 pages
Fifth Generation: List Processing: LISP
No ratings yet
Fifth Generation: List Processing: LISP
7 pages
BSC Business Analytics (Coming Soon) School of Business and Economics
No ratings yet
BSC Business Analytics (Coming Soon) School of Business and Economics
2 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
32 pages
Control Structure C
No ratings yet
Control Structure C
12 pages
Sop Sample
No ratings yet
Sop Sample
2 pages
Compilation Techniques
No ratings yet
Compilation Techniques
15 pages
Brochure CFD Foundation
No ratings yet
Brochure CFD Foundation
2 pages
4.hemalatha Resume-1
No ratings yet
4.hemalatha Resume-1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

RL MJJ

Uploaded by

RL MJJ

Uploaded by

Reinforcement

 Any problem where decision making is sequential, and the goal is

 There are mainly two types of reinforcement learning, which are:

 The positive reinforcement learning means adding something to

 The negative reinforcement learning is opposite to the positive

 It will be a difficult condition for the agent whether he should go up or

 The value of Q-learning can be derived from the Bellman equation.

 In the equation, we have various components, including reward,

 A Q-table or matrix is created while performing the Q-learning. The

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.