0% found this document useful (0 votes)
41 views47 pages

ML - Unit-3 - Reinforcement Learning

Uploaded by

pabrimoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views47 pages

ML - Unit-3 - Reinforcement Learning

Uploaded by

pabrimoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

Reinforcement Learning

Unit- 3

Dr. Prabhakaran
Assistant Professor, Department of Computer Application
1. Understand RL task formulation.
2. Understand Tabular based solutions.
3. Identify Function approximation solutions.
4. Devise Policy gradient from basic (REINFORCE)
towards advanced topics.
5. Understand Model-based reinforcement
learning.
 INTRODUCTION TO RL & MARKOV DECISION
PROCESS
 MODEL-FREE PREDICTION & MODEL-FREE
CONTROL
 VALUE FUNCTION APPROXIMATION & POLICY
GRADIENT METHODS
 INTEGRATING PLANNING WITH LEARNING &
HIERARCHICAL RL
 DEEP RL & MULTI-AGENT RL
INTRODUCTION TO RL & MARKOV DECISION
PROCESS

 The RL-Problem-Markov Process


 Markov Reward Process
 Markov Decision Process and Bellman Equations
 Partially Observable MDPs Policy Evaluation
 Value Iteration, Policy Iteration
 DP Extensions and Convergence using Contraction
Mapping.
Course Structure
What is learning?
 Learning takes place as a result of interaction
between an agent and the world.
 Percepts received by an agent should be used
not only for acting, but also for improving the
agent’s ability to behave optimally in the
future to achieve the goal.
Reinforcement Learning
RL Application
RL - Application
 Video Gaming

 User Interaction

 Control

 Finance

 Technology
RL – Applications and Scope
Artificial Neural networks car sales prediction
Deep Neural networks classification
Prophet Time series – Crime rate
Prophet Time series – Tomato / crops
Le-net Deep network – Traffic sign classification
NLP – Email spam filters
NLP – Reviews
User based collaborative filtering - Recommendation
Taxonomy of AI
Model-Free
Model Based
Value Based
Policy Based
Off Policy
On Policy
Learning Comparision
 Supervised learning:
A situation in which sample (input, output) pairs of the function to be learned
can be perceived.
 Unsupervised learning
Hidden patterns in the data can be found using the unsupervised learning model.
 Reinforcement Learning
In the case of the agent acts on its environment, it receives some evaluation of its
action (reinforcement), but is not told of which action is the correct one
to achieve its goal.
Learning Comparision
RL model
 Each percept(e) is enough to determine the State (the
state is accessible)
 The agent can decompose the Reward component
from a percept.
 The agent task: to find a optimal policy, mapping
states to actions, that maximize long-run measure of
the reinforcement
 Think of reinforcement as reward
 Can be modelled as MDP model!
Markov Decesion Process
Control Tasks
State (St)
Action (At)
Rewards (Rt)
Agent
Environment
Markov Decesion Process
- Templates
- MDP
Discrete-time stochastic control process
Markov Decesion Process
Markov Decesion Process

If the process meet this property its is


know as Markov process.
Markov Decesion Process
Markov Decesion Process

Finite Infinite
Markov Decesion Process
Episotic Continuing
Markov Decesion Process
Trajectory Vs Episode
Markov Decesion Process

Rewards Returns
Markov Decesion Process
Discount Factor
Markov Decesion Process
Markov Decesion Process
Policy
Markov Decesion Process
Markov Decesion Process
State Values
Markov Decesion Process
Bellman Equation
Bellman Equation
Solving MDP
Solving MDP
MDP – Bellman Optimality Equations
MODEL-FREE PREDICTION
Model-free reinforcement learning is a
category of reinforcement learning
algorithms that do not require a model
of the environment to operate. Model-
free algorithms learn directly from
experience or trial-and-error and use
the feedback they receive to update
their internal policies or value functions.
MODEL-FREE PREDICTION
Model-free prediction is predicting the
value function of a certain policy
without a concrete model.
The simplest method is Monte-Carlo
learning.
MODEL-FREE PREDICTION
The main benefit of the model-free
approach is its computational efficiency.
Due to a cheap computational demand,
a model-free algorithm can usually
support a representation larger than
that of a model-based algorithm.
'Model-Free' reinforcement
learning algorithms
Monte Carlo Methods
1. MC Prediction
2. MC Estimation of Action Value
3. MC Control
4. MC Control without Exploring Starts
5. Off-Policy Prediction via Importance Sampling
6. Incremental Implementations
7. Off-Policy MC Control
8. Discounting-aware importance sampling
Monte Carlo Methods
- Estimating value function
- Discovering Optimal Policies
# Complete knowledge of environment
# Only Experience
# Actual Experience no prior knowledge of the environment's dynamics.
# Although the model is required (only sample generate)
# Averaging the sample returns
- To ensure the well defined returns are available
- Episodic tasks i.e experience is divided into episodes.
- Incremental in episode-by-episode not a step by step
Monte Carlo Prediction
1.Prediction One
2.Prediction Two

Function Approximation and eligibility


traces
First-Visit MC Prediction
BlackJack…
MC Estimation of Action Values
- Estimate q*
MC – Control
Approximate optimal policies
Generalized Policy Iteration (GPI)

Policy Evaluation – Policy Improvement


Monte Carlo ES

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy