Model-Based Reinforcement Learning
Model-Based Reinforcement Learning
CS 285
Instructor: Sergey Levine
UC Berkeley
Today’s Lecture
1. Basics of model-based RL: learn a model, use model for control
• Why does naïve approach not work?
• The effect of distributional shift in model-based RL
2. Uncertainty in model-based RL
3. Model-based RL with complex observations
4. Next time: policy learning with model-based RL
• Goals:
• Understand how to build model-based RL algorithms
• Understand the important considerations for model-based RL
• Understand the tradeoffs between different model class choices
Why learn the model?
Does it work? Yes!
pure model-based
(about 10 minutes real time) model-free training
(about 10 days…)
How to train?
Main idea: need to generate
“independent” datasets to get
“independent” models
Bootstrap ensembles in deep learning
This basically works
distribution over
deterministic models
before after
More recent example: PDDM
Deep Dynamics Models for Learning Dexterous Manipulation. Nagabandi et al. 2019
Further readings
• Deisenroth et al. PILCO: A Model-Based and Data-Efficient
Approach to Policy Search.
Recent papers:
• Nagabandi et al. Neural Network Dynamics for Model-
Based Deep Reinforcement Learning with Model-Free
Fine-Tuning.
• Chua et al. Deep Reinforcement Learning in a Handful of
Trials using Probabilistic Dynamics Models.
• Feinberg et al. Model-Based Value Expansion for Efficient
Model-Free Reinforcement Learning.
• Buckman et al. Sample-Efficient Reinforcement Learning
with Stochastic Ensemble Value Expansion.
Model-Based RL with Images
What about complex observations?
observation model
dynamics model
reward model
How to train?
standard (fully observed) model:
“encoder”
+ most accurate
full smoothing posterior
- most complicated
+ simplest
single-step encoder
- least accurate
we’ll talk about this one for now
We will discuss variational inference in more detail
next week!
Model-based RL with latent space models
deterministic encoder
Designated Pixel
Goal Pixel
Task execution