Esraa Khaled
Esraa Khaled
nt Learning
in Neural
Supervised by : Dr.Taymoor
By : Esraa Khaled
Types of ML
Reinforcement Learning
is a type of machine learning where an agent learns to make
decisions by taking actions in an environment to maximize
cumulative reward.
Key Concepts in RL
Agent:
The learner that interacts with the environment.
Environment:
Everything the agent interacts with. The environment
provides feedback to the agent based on its actions.
State (s):
A representation of the current situation of the agent in the
environment.
Key Concepts in RL
Action (a):
The choices available to the agent at any given state. The set
of possible actions can be discrete or continuous.
Reward (r):
A scalar feedback signal received after the agent takes an
action in a particular state. It indicates the immediate benefit of
that action, guiding the agent's learning.
Policy (π):
A strategy used by an agent to determine its actions based on
Applications:
Self-driving cars
Game playing
Robotics
Q-learning Algorithm
Q-learning is a reinforcement learning algorithm that
finds an optimal action-selection policy for any finite
decision process. It helps an agent learn to maximize
the total reward over time through repeated
interactions with the environment, even when the
model of that environment is not known.
1- Learning and Updating Q-values: The algorithm maintains a table of
How Does Q-Learning Work?
Q-values for each state-action pair. These Q-values represent the
expected benefit of taking a given action in a given state and following
the optimal policy afterward. The Q-values are initialized and are updated
iteratively using the experiences gathered by the agent.
2- Q-value Update Rule: The Q-values are updated using the formula:
Deep RL
Deep Reinforcement Learning extends traditional RL by
integrating deep learning techniques, allowing the agent to
handle high-dimensional state spaces more effectively.
Deep Q Network
• A popular DRL algorithm that combines Q-Learning with deep learning. It
uses a neural network to approximate the Q-values, allowing the agent to
learn from high-dimensional observations.
Policy Gradient
Directly optimize the policy using gradient
methods, suitable for environments with
large action spaces.
Training
policy
Gradient
Questions :
• What is the definition of the reinforcement learning?