0% found this document useful (0 votes)

6 views1 page

Lecture 2 Summary

The lecture introduces Reinforcement Learning (RL) as a crucial approach for achieving superhuman intelligence in AI, exemplified by AlphaGo's success. It contrasts data-driven AI with RL's focus on creative behaviors and highlights RL's role in recent advancements, particularly through Reinforcement Learning from Human Feedback (RLHF). The lecture also outlines the Markov Decision Processes (MDP) as a mathematical framework for decision-making, emphasizing the optimization of policies to maximize expected cumulative rewards.

Uploaded by

a.h.shahany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views1 page

Lecture 2 Summary

Uploaded by

a.h.shahany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Deep Reinforcement Learning (Sp25)

Instructor: Dr. Mohammad Hossein Rohban

Summary of Lecture 2: Introduction to RL
Summarized By: Amirhossein Asadi

• RL is more important than ever. In AI, there are two main directions: one focuses on reaching human-
level intelligence, and the other aims to surpass it, achieving superhuman intelligence. The first
goal has already been reached in many areas, but true breakthroughs happen when AI exceeds human
limits. RL is one of the main ways to achieve this. A great example is AlphaGo, which not only matched
human performance but also went far beyond it. Now, the challenge is to keep pushing forward and
unlock even greater possibilities.
• Data-driven AI vs. RL: Foundation models are heavily data-driven, focusing on cleaning and op-
timizing datasets. In contrast, RL aims for creative behaviors, requiring optimization during inference
time to adapt and go beyond static data patterns.
• RL in Recent Advancements: RL has played a crucial role in recent AI breakthroughs, particularly in
RLHF (Reinforcement Learning from Human Feedback), where a reward model helps refine LLMs. In
simpler cases with a single state and action, this reduces to bandits. Moreover, in large scale reasoning-
oriented RL, rewards from rule-based verifiers prevent reward hacking, leading to more reliable learning
and significant progress in AI.
• History: Before 2013, RL had not fully flourished because representing policies relied on ML methods.
Without the power of DL, these methods struggled to model complex functions, limiting RL to simpler
problems where policies were straightforward, such as PID controllers.
• A Markov Decision Processes is a simple mathematical model for defining a task. It assumes an
environment that provides a state, which the agent processes and maps to an action. This action is
then executed in the environment, leading to a new state and a reward.

• Goal: A MDP is defined as a mathematical framework for modeling decision-making. Given the MDP
components, our goal is to design a parameterized policy πθ that maps states to actions:

πθ : S → A

We aim to optimize this policy to maximize this expected cumulative reward:

" H #
X
max E γ t R(St , At , St+1 ) | π
π
t=0

Horizon represents the number of time steps in the decision-making process, which can be finite or
infinite.
By optimizing πθ , we aim to learn a policy that maximizes long-term rewards.
• Sometimes, the policy is stochastic, in which case we have two sources of randomness and two nested
expected value calculations: one for the environment’s stochasticity and another for the policy itself.

HUMSS 12 DIASS FIRST QUARTER EXAM. by ALMIRAH MACALUNAS
100% (9)
HUMSS 12 DIASS FIRST QUARTER EXAM. by ALMIRAH MACALUNAS
11 pages
Unit 3 Ge Esci Contemporary World
No ratings yet
Unit 3 Ge Esci Contemporary World
28 pages
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
79 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Chemical Signalling.
No ratings yet
Chemical Signalling.
73 pages
6th Dean Committee Report (BSC - Ag)
No ratings yet
6th Dean Committee Report (BSC - Ag)
128 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Lecture15 Deep Reinforcement Learning PDF
No ratings yet
Lecture15 Deep Reinforcement Learning PDF
109 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Reinforcement Learning Advancements Limitations An
No ratings yet
Reinforcement Learning Advancements Limitations An
14 pages
Algorithm For RL
No ratings yet
Algorithm For RL
99 pages
Chiller York San Lorenzo Ycal0024
No ratings yet
Chiller York San Lorenzo Ycal0024
112 pages
w7 - Reinforcement Learning
No ratings yet
w7 - Reinforcement Learning
5 pages
Captiva Sevies
No ratings yet
Captiva Sevies
5 pages
Book
No ratings yet
Book
100 pages
Hypothesis (Pooled T Test)
No ratings yet
Hypothesis (Pooled T Test)
31 pages
Dulac Arnold 2021
No ratings yet
Dulac Arnold 2021
50 pages
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
No ratings yet
An Invitation To Deep Reinforcement Learning: Bernhard Jaeger
39 pages
Lecture 9 - RL
No ratings yet
Lecture 9 - RL
82 pages
Lecture 1 Pre
No ratings yet
Lecture 1 Pre
71 pages
RL Intro-2
No ratings yet
RL Intro-2
24 pages
Mod1 Slides
No ratings yet
Mod1 Slides
34 pages
002 - ManualC - G - 47-50 - ING Rev.2 20.10.11
No ratings yet
002 - ManualC - G - 47-50 - ING Rev.2 20.10.11
13 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
AI Magazine - 2024 - Hanna - Toward The Confident Deployment of Real World Reinforcement Learning Agents
No ratings yet
AI Magazine - 2024 - Hanna - Toward The Confident Deployment of Real World Reinforcement Learning Agents
8 pages
AVR-15 Manual E
No ratings yet
AVR-15 Manual E
8 pages
Explainability in Deep Reinforcement Learning: Version of Record
No ratings yet
Explainability in Deep Reinforcement Learning: Version of Record
24 pages
Unit 4
No ratings yet
Unit 4
23 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
MEG511 - Term Report
No ratings yet
MEG511 - Term Report
15 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Unit-5 Ai
No ratings yet
Unit-5 Ai
19 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Reinforcement Learning in AI
No ratings yet
Reinforcement Learning in AI
4 pages
8 Step Training Model
No ratings yet
8 Step Training Model
1 page
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
No ratings yet
Imagery Use in Sport: Mediational Effects For Efficacy: Sandra E. Short, Amy Tenute, & Deborah L. Feltz
11 pages
RL
No ratings yet
RL
94 pages
Power Press
100% (1)
Power Press
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Chalmers, Constructing The World
0% (1)
Chalmers, Constructing The World
11 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
2024 Advanced Diploma Pro-Forma Invoice Full Time
No ratings yet
2024 Advanced Diploma Pro-Forma Invoice Full Time
1 page
Lecture-3.1.5
No ratings yet
Lecture-3.1.5
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
3 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
No ratings yet
Reinforcement Learning in The Era of LLMS: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, and Beyond
11 pages
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
No ratings yet
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
5 pages
Annual Report TATA Motors
No ratings yet
Annual Report TATA Motors
212 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
Final
No ratings yet
Final
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
Assignmentdetails Physics 12
No ratings yet
Assignmentdetails Physics 12
3 pages
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
No ratings yet
Reinforcement Learning: Pablo Zometa - Department of Mechatronics - GIU Berlin 1
12 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Reinforcement Learning - Basics
No ratings yet
Reinforcement Learning - Basics
7 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Adiabatic Compressibility of Liquid Ammonia
No ratings yet
Adiabatic Compressibility of Liquid Ammonia
3 pages
EDF 222 - Philosophy of Education
No ratings yet
EDF 222 - Philosophy of Education
7 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Combo
No ratings yet
Combo
11 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
9780374533557RGGReading Group Gold
No ratings yet
9780374533557RGGReading Group Gold
5 pages
Year 11 Algebra HSCs 2022 To 2005
No ratings yet
Year 11 Algebra HSCs 2022 To 2005
17 pages
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
No ratings yet
Aspratame :from Dr. Adrian Gross, FDA Toxicologist, To Carl Sharp
3 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
Deep Reinforcement Learning Handout v2.0
0% (1)
Deep Reinforcement Learning Handout v2.0
6 pages
Sentence Correction Questions by e GMAT 10
No ratings yet
Sentence Correction Questions by e GMAT 10
18 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
Mec-1200 Vet
No ratings yet
Mec-1200 Vet
2 pages
IMU (V) 2012 13 Detail Brochure
No ratings yet
IMU (V) 2012 13 Detail Brochure
6 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Kruger Ventilation Industries Pte LTD: A B C D N°xØ
No ratings yet
Kruger Ventilation Industries Pte LTD: A B C D N°xØ
1 page
Danfoss Refrigeration Basics - ESSENTIAL
100% (1)
Danfoss Refrigeration Basics - ESSENTIAL
24 pages
ICTAD Review
0% (1)
ICTAD Review
48 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 2 Summary

Uploaded by

Lecture 2 Summary

Uploaded by

Deep Reinforcement Learning (Sp25)

Instructor: Dr. Mohammad Hossein Rohban

We aim to optimize this policy to maximize this expected cumulative reward:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.