0% found this document useful (0 votes)

26 views41 pages

Model-Based Reinforcement Learning

Uploaded by

aglaveakshay00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views41 pages

Model-Based Reinforcement Learning

Uploaded by

aglaveakshay00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Model-Based Reinforcement Learning

CS 285
Instructor: Sergey Levine
UC Berkeley
Today’s Lecture
1. Basics of model-based RL: learn a model, use model for control
• Why does naïve approach not work?
• The effect of distributional shift in model-based RL
2. Uncertainty in model-based RL
3. Model-based RL with complex observations
4. Next time: policy learning with model-based RL
• Goals:
• Understand how to build model-based RL algorithms
• Understand the important considerations for model-based RL
• Understand the tradeoffs between different model class choices
Why learn the model?
Does it work? Yes!

• Essentially how system identification works in classical robotics

• Some care should be taken to design a good base policy
• Particularly effective if we can hand-engineer a dynamics representation
using our knowledge of physics, and fit just a few parameters
Does it work? No!

go right to get higher!

• Distribution mismatch problem becomes exacerbated as we use more

expressive model classes
Can we do better?
What if we make a mistake?
Can we do better?
every N steps

This will be on HW4!

How to replan?
every N steps

• The more you replan, the less perfect

each individual plan needs to be
• Can use shorter horizons
• Even random sampling can often work
well here!
Uncertainty in Model-Based RL
A performance gap in model-based RL

pure model-based
(about 10 minutes real time) model-free training
(about 10 days…)

Nagabandi, Kahn, Fearing, L. ICRA 2018

Why the performance gap?

…but still have high capacity over here

need to not overfit here…
Why the performance gap?
every N steps

very tempting to go here…

How can uncertainty estimation help?

expected reward under high-variance prediction

is very low, even though mean is the same!
Intuition behind uncertainty-aware RL
every N steps

only take actions for which we think we’ll get high

reward in expectation (w.r.t. uncertain dynamics)

This avoids “exploiting” the model

The model will then adapt and get better

There are a few caveats…

Need to explore to get better

Expected value is not the same as pessimistic value

Expected value is not the same as optimistic value

…but expected value is often a good start

Uncertainty-Aware Neural Net Models
How can we have uncertainty-aware models?
Idea 1: use output entropy

why is this not enough?

Two types of uncertainty:

aleatoric or statistical uncertainty

epistemic or model uncertainty

“the model is certain about the data, but we are not

certain about the model”
what is the variance here?
How can we have uncertainty-aware models?
Idea 2: estimate model uncertainty
“the model is certain about the data, but we are not certain about the model”

the entropy of this tells us

the model uncertainty!
Quick overview of Bayesian neural networks

expected weight uncertainty

about the weight
For more, see:
Blundell et al., Weight Uncertainty in Neural Networks
Gal et al., Concrete Dropout

We’ll learn more about variational inference later!

Bootstrap ensembles
Train multiple models and see if they agree!

How to train?
Main idea: need to generate
“independent” datasets to get
“independent” models
Bootstrap ensembles in deep learning
This basically works

Very crude approximation, because the

number of models is usually small (< 10)

Resampling with replacement is usually

unnecessary, because SGD and random
initialization usually makes the models
sufficiently independent
Planning with Uncertainty, Examples
How to plan with uncertainty

distribution over
deterministic models

Other options: moment matching, more complex posterior estimation

with BNNs, etc.
Example: model-based RL with ensembles

exceeds performance of model-free after 40k steps

(about 10 minutes of real time)

before after
More recent example: PDDM

Deep Dynamics Models for Learning Dexterous Manipulation. Nagabandi et al. 2019
Further readings
• Deisenroth et al. PILCO: A Model-Based and Data-Efficient
Approach to Policy Search.
Recent papers:
• Nagabandi et al. Neural Network Dynamics for Model-
Based Deep Reinforcement Learning with Model-Free
Fine-Tuning.
• Chua et al. Deep Reinforcement Learning in a Handful of
Trials using Probabilistic Dynamics Models.
• Feinberg et al. Model-Based Value Expansion for Efficient
Model-Free Reinforcement Learning.
• Buckman et al. Sample-Efficient Reinforcement Learning
with Stochastic Ensemble Value Expansion.
Model-Based RL with Images
What about complex observations?

What is hard about this?

• High dimensionality
• Redundancy
• Partial observability
high-dimensional low-dimension
but not dynamic but dynamic
State space (latent space) models

observation model

dynamics model

reward model

How to train?
standard (fully observed) model:

latent space model:

Model-based RL with latent space models

“encoder”

+ most accurate
full smoothing posterior
- most complicated

+ simplest
single-step encoder
- least accurate
we’ll talk about this one for now
We will discuss variational inference in more detail
next week!
Model-based RL with latent space models

deterministic encoder

Everything is differentiable, can train with backprop

Model-based RL with latent space models

latent space dynamics image reconstruction reward model

Many practical methods use a stochastic encoder to

model uncertainty
Model-based RL with latent space models
every N steps
Learn directly in observation space

Finn, L. Deep Visual Foresight for Planning Robot

Motion. ICRA 2017.
Ebert, Finn, Lee, L. Self-Supervised Visual Planning
with Temporal Skip Connections. CoRL 2017.
Use predictions to complete tasks

Designated Pixel
Goal Pixel
Task execution

Review of Gen AI Models for Financial Risk Management
No ratings yet
Review of Gen AI Models for Financial Risk Management
16 pages
RL Week_1
No ratings yet
RL Week_1
53 pages
Rldl End Sem
No ratings yet
Rldl End Sem
230 pages
CM20315_01_Intro
No ratings yet
CM20315_01_Intro
62 pages
balaji-uncertainty-talk-cifar-dlrl
No ratings yet
balaji-uncertainty-talk-cifar-dlrl
65 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
AAI Lecture 9 Sp 25
No ratings yet
AAI Lecture 9 Sp 25
26 pages
Lecture_02 - Introduction - II
No ratings yet
Lecture_02 - Introduction - II
43 pages
chapter4
No ratings yet
chapter4
34 pages
AutoRL Tutorials
No ratings yet
AutoRL Tutorials
80 pages
2023_week7_modelbasedRL_updated
No ratings yet
2023_week7_modelbasedRL_updated
56 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Molefe Mohale Emmanuel 2021
No ratings yet
Molefe Mohale Emmanuel 2021
122 pages
F20-AI-L10
No ratings yet
F20-AI-L10
45 pages
Wine Quality Classification Using Weka
No ratings yet
Wine Quality Classification Using Weka
74 pages
Machine Learning
No ratings yet
Machine Learning
70 pages
Chapter 13 Clustering Algorithms
No ratings yet
Chapter 13 Clustering Algorithms
62 pages
s10846-017-0468-y
No ratings yet
s10846-017-0468-y
21 pages
1-s2.0-S0167404820304314-main_1
No ratings yet
1-s2.0-S0167404820304314-main_1
19 pages
DeepLearningHandBook2024
No ratings yet
DeepLearningHandBook2024
185 pages
Lecture 06
No ratings yet
Lecture 06
22 pages
Lecture13 Postclass
No ratings yet
Lecture13 Postclass
36 pages
DQL
No ratings yet
DQL
10 pages
Model-Based Reinforcement Learning
No ratings yet
Model-Based Reinforcement Learning
67 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
No ratings yet
Transformers As Decision Makers Provable In-Context Reinforcement Learning Via Supervised Pretraining
61 pages
Bilal Khan Resume
No ratings yet
Bilal Khan Resume
1 page
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
No ratings yet
Deep Exploration Via Bootstrapped DQN: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
18 pages
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
Continuous Deep Q-Learning With Model-Based Acceleration
No ratings yet
Continuous Deep Q-Learning With Model-Based Acceleration
13 pages
Pink Simple Profile Resume
No ratings yet
Pink Simple Profile Resume
2 pages
Summary
No ratings yet
Summary
43 pages
Ai in Sports Cardiology
No ratings yet
Ai in Sports Cardiology
13 pages
1q3b8AXWiBQ80Aki_yDW-q_qNGhtwoVV
No ratings yet
1q3b8AXWiBQ80Aki_yDW-q_qNGhtwoVV
8 pages
trajectory-transformer
No ratings yet
trajectory-transformer
15 pages
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
No ratings yet
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
23 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Origins of Life Questions and Debates
No ratings yet
Origins of Life Questions and Debates
12 pages
Deep learning
No ratings yet
Deep learning
15 pages
BEHA001: Behavioral and Social Sciences Joshua Benzon
No ratings yet
BEHA001: Behavioral and Social Sciences Joshua Benzon
44 pages
536C3B
No ratings yet
536C3B
2 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
ML Notes UT-1
No ratings yet
ML Notes UT-1
21 pages
Aiml 3
No ratings yet
Aiml 3
8 pages
Stockhammer TCP 2019
No ratings yet
Stockhammer TCP 2019
37 pages
2020.acl-main.577
No ratings yet
2020.acl-main.577
7 pages
UNIT 5 ML
No ratings yet
UNIT 5 ML
49 pages
Deep Learning Approach based on Iris, Face, and Palmprint Fusion for Multimodal Biometric Recognition System
No ratings yet
Deep Learning Approach based on Iris, Face, and Palmprint Fusion for Multimodal Biometric Recognition System
10 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
Fuzzy Means Algorithm
No ratings yet
Fuzzy Means Algorithm
14 pages
Deep Learning & Neural Networks: Kevin Duh
No ratings yet
Deep Learning & Neural Networks: Kevin Duh
86 pages
Model Ensemble Trpo
No ratings yet
Model Ensemble Trpo
15 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
ML Assignment 3
No ratings yet
ML Assignment 3
7 pages
NN
No ratings yet
NN
12 pages
Dropout As A Bayesian Approximation
No ratings yet
Dropout As A Bayesian Approximation
10 pages
Designing Predictive Maintenance
100% (1)
Designing Predictive Maintenance
28 pages
My Final File
No ratings yet
My Final File
54 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Nips10 Workshop Tutorial Final PDF
No ratings yet
Nips10 Workshop Tutorial Final PDF
73 pages
A Comprehensive Survey On Graph Neural Networks
No ratings yet
A Comprehensive Survey On Graph Neural Networks
22 pages
ICML 2018 Notes: Stockholm, Sweden
No ratings yet
ICML 2018 Notes: Stockholm, Sweden
55 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
CS4780/5780 Homework 1: Problem 1: Train/Test Splits
No ratings yet
CS4780/5780 Homework 1: Problem 1: Train/Test Splits
2 pages
DL-2
No ratings yet
DL-2
62 pages
Unit 1b - Fundamentals of Machine Learning
No ratings yet
Unit 1b - Fundamentals of Machine Learning
31 pages
Marketing Research - Hair PDF
No ratings yet
Marketing Research - Hair PDF
35 pages
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
No ratings yet
Tri-Tue-Nhan-Tao - Nathan-Lambert - Lec13 - 6up-Reinforcement-Learning - (Cuuduongthancong - Com)
8 pages
ML Exam
No ratings yet
ML Exam
5 pages
ATAL FDP Brochure
No ratings yet
ATAL FDP Brochure
2 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
No ratings yet
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
132 pages
Class X Part B Unit 1 Intro. To AI
No ratings yet
Class X Part B Unit 1 Intro. To AI
19 pages
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
No ratings yet
Bayesian Deep Reinforcement Learning Via Deep Kernel Learning
8 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Smart Traffic Management System Using IOT and Machine Learning Approach
No ratings yet
Smart Traffic Management System Using IOT and Machine Learning Approach
6 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
David de Cremer - The AI-Savvy Leader_ Nine Ways to Take Back Control and Make AI Work-Harvard Business Review Press (2024)
100% (1)
David de Cremer - The AI-Savvy Leader_ Nine Ways to Take Back Control and Make AI Work-Harvard Business Review Press (2024)
187 pages
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Module 3 Aws
No ratings yet
Module 3 Aws
132 pages
Importance of Having An AI Governance Framework
No ratings yet
Importance of Having An AI Governance Framework
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Model-Based Reinforcement Learning

Uploaded by

Model-Based Reinforcement Learning

Uploaded by

Model-Based Reinforcement Learning

• Essentially how system identification works in classical robotics

go right to get higher!

• Distribution mismatch problem becomes exacerbated as we use more

This will be on HW4!

• The more you replan, the less perfect

Nagabandi, Kahn, Fearing, L. ICRA 2018

…but still have high capacity over here

very tempting to go here…

expected reward under high-variance prediction

only take actions for which we think we’ll get high

This avoids “exploiting” the model

The model will then adapt and get better

Need to explore to get better

Expected value is not the same as pessimistic value

Expected value is not the same as optimistic value

…but expected value is often a good start

why is this not enough?

Two types of uncertainty:

epistemic or model uncertainty

“the model is certain about the data, but we are not

the entropy of this tells us

expected weight uncertainty

We’ll learn more about variational inference later!

Very crude approximation, because the

Resampling with replacement is usually

Other options: moment matching, more complex posterior estimation

exceeds performance of model-free after 40k steps

What is hard about this?

latent space model:

Everything is differentiable, can train with backprop

latent space dynamics image reconstruction reward model

Many practical methods use a stochastic encoder to

Finn, L. Deep Visual Foresight for Planning Robot

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.