ML - Unit-3 - Reinforcement Learning
ML - Unit-3 - Reinforcement Learning
Unit- 3
Dr. Prabhakaran
Assistant Professor, Department of Computer Application
1. Understand RL task formulation.
2. Understand Tabular based solutions.
3. Identify Function approximation solutions.
4. Devise Policy gradient from basic (REINFORCE)
towards advanced topics.
5. Understand Model-based reinforcement
learning.
INTRODUCTION TO RL & MARKOV DECISION
PROCESS
MODEL-FREE PREDICTION & MODEL-FREE
CONTROL
VALUE FUNCTION APPROXIMATION & POLICY
GRADIENT METHODS
INTEGRATING PLANNING WITH LEARNING &
HIERARCHICAL RL
DEEP RL & MULTI-AGENT RL
INTRODUCTION TO RL & MARKOV DECISION
PROCESS
User Interaction
Control
Finance
Technology
RL – Applications and Scope
Artificial Neural networks car sales prediction
Deep Neural networks classification
Prophet Time series – Crime rate
Prophet Time series – Tomato / crops
Le-net Deep network – Traffic sign classification
NLP – Email spam filters
NLP – Reviews
User based collaborative filtering - Recommendation
Taxonomy of AI
Model-Free
Model Based
Value Based
Policy Based
Off Policy
On Policy
Learning Comparision
Supervised learning:
A situation in which sample (input, output) pairs of the function to be learned
can be perceived.
Unsupervised learning
Hidden patterns in the data can be found using the unsupervised learning model.
Reinforcement Learning
In the case of the agent acts on its environment, it receives some evaluation of its
action (reinforcement), but is not told of which action is the correct one
to achieve its goal.
Learning Comparision
RL model
Each percept(e) is enough to determine the State (the
state is accessible)
The agent can decompose the Reward component
from a percept.
The agent task: to find a optimal policy, mapping
states to actions, that maximize long-run measure of
the reinforcement
Think of reinforcement as reward
Can be modelled as MDP model!
Markov Decesion Process
Control Tasks
State (St)
Action (At)
Rewards (Rt)
Agent
Environment
Markov Decesion Process
- Templates
- MDP
Discrete-time stochastic control process
Markov Decesion Process
Markov Decesion Process
Finite Infinite
Markov Decesion Process
Episotic Continuing
Markov Decesion Process
Trajectory Vs Episode
Markov Decesion Process
Rewards Returns
Markov Decesion Process
Discount Factor
Markov Decesion Process
Markov Decesion Process
Policy
Markov Decesion Process
Markov Decesion Process
State Values
Markov Decesion Process
Bellman Equation
Bellman Equation
Solving MDP
Solving MDP
MDP – Bellman Optimality Equations
MODEL-FREE PREDICTION
Model-free reinforcement learning is a
category of reinforcement learning
algorithms that do not require a model
of the environment to operate. Model-
free algorithms learn directly from
experience or trial-and-error and use
the feedback they receive to update
their internal policies or value functions.
MODEL-FREE PREDICTION
Model-free prediction is predicting the
value function of a certain policy
without a concrete model.
The simplest method is Monte-Carlo
learning.
MODEL-FREE PREDICTION
The main benefit of the model-free
approach is its computational efficiency.
Due to a cheap computational demand,
a model-free algorithm can usually
support a representation larger than
that of a model-based algorithm.
'Model-Free' reinforcement
learning algorithms
Monte Carlo Methods
1. MC Prediction
2. MC Estimation of Action Value
3. MC Control
4. MC Control without Exploring Starts
5. Off-Policy Prediction via Importance Sampling
6. Incremental Implementations
7. Off-Policy MC Control
8. Discounting-aware importance sampling
Monte Carlo Methods
- Estimating value function
- Discovering Optimal Policies
# Complete knowledge of environment
# Only Experience
# Actual Experience no prior knowledge of the environment's dynamics.
# Although the model is required (only sample generate)
# Averaging the sample returns
- To ensure the well defined returns are available
- Episodic tasks i.e experience is divided into episodes.
- Incremental in episode-by-episode not a step by step
Monte Carlo Prediction
1.Prediction One
2.Prediction Two