Unit 5 ML-2-70
Unit 5 ML-2-70
In this case,
• Your cat is an agent that is exposed to
the environment. In this case, it is your
house. An example of a state could be
your cat sitting, and you use a specific
word in for cat to walk.
• Our agent reacts by performing an
action transition from one “state” to
another “state.”
• For example, your cat goes from sitting
to walking.
• The reaction of an agent is an action,
and the policy is a method of selecting
an action given a state in expectation of
better outcomes.
• After the transition, they may get a
reward or penalty in return.
Reinforcement Learning Algorithms
1.Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting
a long-term return of the current states under policy π.
2.Policy-based:
• In a policy-based RL method, you try to come up with such a policy
that the action performed in every state helps you to gain maximum
reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{a\s) = P\A, = a\S, =S]
3.Model-Based:
• In this Reinforcement Learning method, you need to create a virtual
model for each environment. The agent learns to perform in that
specific environment.
Characteristics of Reinforcement Learning
• There is no supervisor, only a real number or reward signal
• Sequential decision making
• Time plays a crucial role in Reinforcement problems
• Feedback is always delayed, not instantaneous
• Agent’s actions determine the subsequent data it receives
Types of Reinforcement Learning
Two types of reinforcement learning methods are:
Positive:
• It is defined as an event, that occurs because of specific behavior. It
increases the strength and the frequency of the behavior and impacts
positively on the action taken by the agent.
• It maximize performance and sustain change for a more extended
period. However, too much Reinforcement may lead to over-
optimization of state, which can affect the results.
Negative:
• Negative Reinforcement is defined as strengthening of behavior that
occurs because of a negative condition which should have stopped or
avoided.
• It helps you to define the minimum stand of performance.
• However, the drawback of this method is that it provides enough to
meet up the minimum behavior.
Learning Models of Reinforcement
There are two important learning models in reinforcement learning:
1. Markov Decision Process
2. Q learning
• Consider Markov decision process (MDP) where the agent can perceive a
set S of distinct states of its environment and has a set A of actions that it
can perform.
• At each discrete time step t, the agent senses the current state st, chooses
a current action at, and performs it.
• The environment responds by giving the agent a reward rt = r(st, at)
by producing the succeeding state st+l = δ(st, at).
• Here the functions δ(st, at) and r(st, at) depend only on the current state and
action, and not on earlier states or actions.
How shall we specify precisely which policy π we would like the agent to learn?
1.One approach is to require the policy that produces the greatest possible cumulative
reward for the robot over time.
• To state this requirement more precisely, define the cumulative value Vπ (st)
achieved by following an arbitrary policy π from an arbitrary initial state st as
follows:
• The quantity Vπ (st) is called the discounted cumulative reward achieved by
policy π from initial state s.
2. Other definitions of total reward is finite horizon reward,
Considers the average reward per time step over the entire lifetime of the agent.