0% found this document useful (0 votes)

29 views10 pages

RL Assignment PDF

Sample questions

Uploaded by

kishorereddydepa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views10 pages

RL Assignment PDF

Sample questions

Uploaded by

kishorereddydepa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

REINFORCEMENT LEARNING

Assignment-1

Name: Class:AIML-A RollNO:

1.Explain reinforcement learning in detail

A.Reinforcement
ReinforcementLearning
Learning:
is a feedback-based Machine
earning technique in which an agent learns to behave in an
environment by performing the actions and seeing the
results of actions. For each good action, the agent gets
positive feedback, and for each bad action, the agent gets
negative feedback or penalty.
 In
learnsReinforcement Learning, the agent
automatically using feedbacks without any labeled data,
unlike supervised learning.
 Since there is no labeled data, so the agent is bound to
learn by its experience only.
 RL solves a specific type of problem where decision
making is sequential, and the goal is long-term, such as
game-playing, robotics, etc.
 The agent interacts with the environment and explores it
by itself. The primary goal of an agent in reinforcement
learning is to improve the performance by getting the
maximum positive rewards.
Example: Suppose there is an AI agent present within a maze
environment, and his goal is to find the diamond. The agent
interacts with the environment by performing some actions,
and based on those actions, the state of the agent gets
changed, and it also receives a reward or penalty as feedback.
The agent continues doing these three things (take action,
change state/remain in the same state, and get feedback), and
by doing these actions, he learns and explores the
environment.
The agent learns that what actions lead to positive feedback
or rewards and what actions lead to negative feedback
penalty. As a positive reward, the agent gets a positive point,
and as a penalty, it gets a negative point.
Terms used in Reinforcement Learning:
Agent(): An entity that can perceive/explore the
environment and act upon it.
Environment(): A situation in which an agent is present
or surrounded by. In RL, we assume the stochastic
environment, which means it is random in nature.
Actions(): Actions are the moves taken by an agent within
the environment.
State(): State is a situation returned by the environment
after each action taken by the agent.
Reward():A feedback returned to the agent from the
environment to evaluate the action of the agent.
Policy(): Policy is a strategy applied by the agent for the
next action based on the current state.
2.How does reinforcement learning works?
A: To understand the working process of the RL, we
need to consider two main things:
 Environment: It can be anything such as a room,
maze, football ground, etc.
 Agent: An intelligent agent such as AI robot.

Example:

In the above image, the agent is at the very first block of

the maze. The maze is consisting of an S6 block, which is
a wall, S8 a fire pit, and S4 a diamond block.
1. The agent cannot cross the S6 block, as it is a solid
wall.
2. If the agent reaches the S4 block, then get the +1
reward; if it reaches the fire pit, then gets -1 reward
point.
3. It can take four actions: move up, move down,
move left, and move right.
The agent can take any path to reach to the final point,
but he needs to make it in possible fewer steps. Suppose
the agent considers the path S9-S5-S1-S2-S3, so he will
get the +1-reward point.
The agent will try to remember the preceding steps that
it has taken to reach the final step. To memorize the
steps, it assigns 1 value to each previous step.
Now, the agent has successfully stored the previous
steps assigning the 1 value to each previous block.
But what will the agent do if he starts moving from the
block, which has 1 value block on both sides?
Consider the below diagram:

It will be a difficult condition for the agent whether he

should go up or down as each block has the same value.
So, the above approach is not suitable for the agent to
reach the destination.
we will use the Bellman equation, which is the main
concept behind reinforcement learning.
The Bellman Equation:
The Bellman equation was introduced by the
Mathematician Richard Ernest Bellman in the year 1953,
and hence it is called as a Bellman equation. It is
associated with dynamic programming and used to
calculate the values of a decision problem at a certain
point by including the values of previous states.
It is a way of calculating the value functions in dynamic
programming or environment that leads to modern
reinforcement learning.

The key-elements used in Bellman equations are:

Action performed by the agent is referred to as "a"
 State occurred by performing the action is "s."
 The reward/feedback obtained for each good and
bad action is "R."
 A discount factor is Gamma "γ."

The Bellman equation can be written as:

V(s) = max [R(s,a) + γV(s`)]

Where
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by performing
an action.
γ = Discount factor
V(s`) = The value at the previous state.

So now, using the Bellman equation, we will find value

at each state of the given environment. We will start
from the block, which is next to the target block.
For 1st block: V(s3) = max [R(s,a) + γV(s`)], here V(s')=
0 because there is no further state to move. V(s3)=
max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.
For 2nd block: V(s2) = max [R(s,a) + γV(s`)], here γ=
0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no
reward at this state. V(s2)= max[0.9(1)]=> V(s)=
max[0.9]=> V(s2) =0.9
For 3rd block: V(s1) = max [R(s,a) + γV(s`)], here γ=
0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no
reward at this state also. V(s1)= max[0.9(0.9)]=> V(s3)=
max[0.81]=> V(s1) =0.81
For 4th block: V(s5) = max [R(s,a) + γV(s`)], here γ=
0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no
reward at this state also. V(s5)= max[0.9(0.81)]=>
V(s5)= max[0.81]=> V(s5) =0.73
For 5th block: V(s9) = max [R(s,a) + γV(s`)], here γ=
0.9(lets), V(s')= 0.73, and R(s, a)= 0,
because there is no reward at this state also.
V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4)
=0.66

Now, we will move further to the 6th block, and here

agent may change the route because it always tries to
find the optimal path. So now, let's consider from the
block next to the fire pit.
Now, the agent has three options to move; if he moves
to the blue box, then he will feel a bump if he moves to
the fire pit, then he will get the -1 reward. But here we
are taking only positive rewards, so for this, he will
move to upwards only. The complete block values will
be calculated using this formula. Consider the below
image:

CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
Unit 4
No ratings yet
Unit 4
49 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
Sections
No ratings yet
Sections
76 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
114021
No ratings yet
114021
55 pages
ML U5 Notes
No ratings yet
ML U5 Notes
26 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Unit-5 Reinforcemnt and Q Learning
No ratings yet
Unit-5 Reinforcemnt and Q Learning
45 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit 4
No ratings yet
Unit 4
56 pages
Unit 5
No ratings yet
Unit 5
36 pages
Module 1
No ratings yet
Module 1
72 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
Safety of Ro-Ro Passenger and Cruise Ships PDF
88% (8)
Safety of Ro-Ro Passenger and Cruise Ships PDF
54 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
21ai020 & Reinforcement Learning UNIT 1-LM:1
No ratings yet
21ai020 & Reinforcement Learning UNIT 1-LM:1
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 3
No ratings yet
Unit 3
12 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Reinforcement Learning: Nazia Bibi
100% (1)
Reinforcement Learning: Nazia Bibi
61 pages
Lecture 07 Reinforcement Learning
No ratings yet
Lecture 07 Reinforcement Learning
23 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Lecture Doubts
No ratings yet
Lecture Doubts
2 pages
37 RL
No ratings yet
37 RL
18 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
21ai020 & Reinforcement Learning: The Agent-Environment Interface
No ratings yet
21ai020 & Reinforcement Learning: The Agent-Environment Interface
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
BSD Junction Blok A No 3, JL Pahlawan Seribu, BSD City, Tangerang Selatan PH: (021) 3032 1716 / 081 689 5500 / Cs@royalgardenspa - Co.id
No ratings yet
BSD Junction Blok A No 3, JL Pahlawan Seribu, BSD City, Tangerang Selatan PH: (021) 3032 1716 / 081 689 5500 / Cs@royalgardenspa - Co.id
26 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
Algonquin College Oda Check List
No ratings yet
Algonquin College Oda Check List
17 pages
Form and CGI
No ratings yet
Form and CGI
77 pages
Charles Oman
No ratings yet
Charles Oman
49 pages
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
100% (1)
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
34 pages
Playwright JS Course Content
No ratings yet
Playwright JS Course Content
10 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
BST Weekly Coaching Guide
No ratings yet
BST Weekly Coaching Guide
37 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
No ratings yet
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
41 pages
ID Strategi Pengembangan Cabai Keriting Di
100% (1)
ID Strategi Pengembangan Cabai Keriting Di
12 pages
Game of Bitcoins Mega Airdrop Sheet
No ratings yet
Game of Bitcoins Mega Airdrop Sheet
9 pages
Bryson Yee Resume 2018-2019 Updated
No ratings yet
Bryson Yee Resume 2018-2019 Updated
2 pages
Fisheries Code
No ratings yet
Fisheries Code
33 pages
The Writer Vol.129 N 09 (September 2016)
No ratings yet
The Writer Vol.129 N 09 (September 2016)
54 pages
ESG Module Handbook 23.24A
No ratings yet
ESG Module Handbook 23.24A
12 pages
Corporate Governanceand Ethics
No ratings yet
Corporate Governanceand Ethics
8 pages
2.3.11.a Calculating Property Drainage
No ratings yet
2.3.11.a Calculating Property Drainage
6 pages
Science Quarter 4 Week 4: Capslet
No ratings yet
Science Quarter 4 Week 4: Capslet
9 pages
Deep Reinforcement Learning - Guide To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning - Guide To Deep Q-Learning
1 page
EX - NO 10 Simulation of Error Correction Code (CRC) Aim
100% (1)
EX - NO 10 Simulation of Error Correction Code (CRC) Aim
4 pages
GS Yuasa Battery Europe Ltd. Safety Data Sheet
No ratings yet
GS Yuasa Battery Europe Ltd. Safety Data Sheet
11 pages
Compact Batching Plants CT 5097 IN - Feb 2023
No ratings yet
Compact Batching Plants CT 5097 IN - Feb 2023
6 pages
Att-4 LV Cable Epr Epr GSWB Shf-2
No ratings yet
Att-4 LV Cable Epr Epr GSWB Shf-2
7 pages
Web Wonders
No ratings yet
Web Wonders
2 pages
Certificate: Lokmanya Tilak College of Engineering
No ratings yet
Certificate: Lokmanya Tilak College of Engineering
5 pages
Social Security and Health Rights of Migrant Workers in India
No ratings yet
Social Security and Health Rights of Migrant Workers in India
4 pages
Synchro PRO 2018 - Technical Overview
No ratings yet
Synchro PRO 2018 - Technical Overview
11 pages
Compact 16 Port Master / Room Controller With Poe: Features
No ratings yet
Compact 16 Port Master / Room Controller With Poe: Features
2 pages
Cold Working of Metals 2997
No ratings yet
Cold Working of Metals 2997
7 pages
9340-1131 Turbine Water Induction Protection - TWIP
100% (1)
9340-1131 Turbine Water Induction Protection - TWIP
2 pages
Technical Tip: Overview of Ethylene Oxide (Eo or Eto) Residuals
No ratings yet
Technical Tip: Overview of Ethylene Oxide (Eo or Eto) Residuals
3 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

RL Assignment PDF

Uploaded by

RL Assignment PDF

Uploaded by

REINFORCEMENT LEARNING

Name: Class:AIML-A RollNO:

1.Explain reinforcement learning in detail

In the above image, the agent is at the very first block of

It will be a difficult condition for the agent whether he

The key-elements used in Bellman equations are:

The Bellman equation can be written as:

V(s) = max [R(s,a) + γV(s`)]

So now, using the Bellman equation, we will find value

Now, we will move further to the 6th block, and here

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.