Ai Unit 4
Ai Unit 4
Planning
4/7/2021 1
What is Planning?
• The task of coming up with a sequence of
actions that will achieve a goal is called
planning.
• Planning Environments
1. Classical Planning Environments
• Fully observable, deterministic, finite, static and
discrete.
2. Non classical Planning Environments
• Partially observable, stochastic with different
algorithms and agent designs.
4/7/2021 2
Planning (Contd..)
• Difficulty in problem-solving agent
– Performs irrelevant actions
• Have to explore all the states
• Ex: buy book with 10-digit ISBN 1010 actions
– Finding heuristics function
• Problem-solving agent lacks autonomy because it
depends on human to supply heuristic function for
each new problem.
– No problem decomposition
• All these problems are overcome by planning agent by
representing the goal as conjunction of subgoals.
4/7/2021 3
PLANNING PROBLEM
The planning problem is actually the question how to go to next state or the
goal state from the current state. It involves two things 'how' and 'when'.
• The planning problem is defined with:
1. Domain model
2. Initial state
3. Goal state (next state)
The domain model defines the actions along with the objects. It is necessary
to specify the operators too that actually describe the action. Along with this,
information about actions and state constraints while acting should also be
given. This entirely formulates the domain model.
The initial state is the state where any action is yet to take place (the stage
when the exam schedule is put up!).
The final state or the goal state is the state which the plan is intended to
achieve.
4/7/2021 4
Planning Problem
• Find sequence of actions - achieves a given
goal from a given initial world state.
– a set of operator descriptions
– an initial state description, and
– a goal state description or predicate,
• compute a plan
– a sequence of operator instances,
– executing them in the initial state will change the
world to a state satisfying the goal-state
description.
• Goals - specified as a conjunction of subgoals
to be achieved
4/7/2021 5
Simple Planning Agent
An agent interacts with real world via perception and
actions
Perception - sense the world and assess the situation
Actions - what the agent does in the domain.
Planning involves reasoning about actions that the agent
intends to carry out
This reasoning involves the representation of the world
that the agent has - representation of its actions.
Hard constraints - objectives have to be achieved
completely for success
The objectives - soft constraints, or preferences, to be
achieved as much as possible
4/7/2021 6
Planning vs. Problem solving
• Planning agent is very similar to problem solving agent
– Constructs plans to achieve goals, then executes them
• Planning is more powerful because of the
representations and methods used
• Search - proceeds through plan space rather than state
space
• Sub-goals - planned independently, it reduce the
complexity of the planning problem
4/7/2021 7
Planning Agents
• problem-solving agents are able to plan ahead - to
consider the consequences of sequences of actions -
before acting.
• knowledge- based agents can select actions based on
explicit, logical representations of the current state
and the effects of actions.
– This allows the agent to succeed in complex,
inaccessible environments that are too difficult for
a problem-solving agent
– Problem Solving Agents + Knowledge-based Agents
= Planning Agents
4/7/2021 8
Simple Planning Agent
• A simple planning agent is very similar to problem-solving agents in that it
constructs plans that achieve its goals, and then executes them.
• The limitations of the problem- solving approach motivates the design of
planning systems.
To solve a planning problem using a state-space search approach we would
let the:
• initial state = initial situation
• goal-test predicate = goal state description
• successor function computed from the set of operators
• once a goal is found, solution plan is the sequence of operators in the path
from the start node to the goal node
4/7/2021 9
Simple Planning Agent
• Planning can be viewed as a type of problem solving in which the
agent uses beliefs about actions and their consequences to search
for a solution over the more abstract space of plans, rather than
over the space of situations.
Algorithm of a simple planning agent:
12
State Representation
• A state is represented with a conjunction of positive
literals
• Using
– Logical Propositions: Poor Unknown
– FOL literals: At(Plane1,OMA) At(Plan2,JFK)
• FOL literals must be ground & function-free
– Not allowed: At(x,y) or At(Father(Fred),Sydney)
• Closed World Assumption
– What is not stated are assumed false
13
Goal Representation
• Goal is a partially specified state
• A proposition satisfies a goal if it contains all
the atoms of the goal and possibly others..
– Example: Rich Famous Miserable satisfies the
goal Rich Famous
14
Action Representation
At(WHI,LNK),Plane(WHI),
• Action Schema Airport(LNK), Airport(OHA)
– Action name Fly(WHI,LNK,OHA)
– Preconditions
At(WHI,OHA), At(WHI,LNK)
– Effects
• Example
Action(Fly(p,from,to),
PRECOND: At(p,from) Plane(p) Airport(from) Airport(to)
EFFECT: At(p,from) At(p,to))
15
Languages for Planning Problems
• STRIPS
– Stanford Research Institute Problem Solver
– Historically important
• ADL
– Action Description Languages
• PDDL
– Planning Domain Definition Language
16
Planning languages
• A language is the one that should be expressive.
• Three Languages
– I. Stanford Research Institute Problem Solver (STRIPS)
– 2. Action Description Language (ADL)
– 3. Planning Domain Description Language (PDDL)
1) Stanford Research Institute Problem Solver (STRIPS) :
Makes use of the first order predicates.
STRIPS allows function-free literals.
Example of a robot: The example involves a robot, a cup tea, guest and two
rooms. We want the robot to get the tea and give it to the guest.
The planning with STRIPS is done as follows:
Let us begin with the STRIPS representation for this example.
Initial and final states are depicted in figure.
4/7/2021 17
Example of a robot
4/7/2021 18
Block World
• There are ‘N’ number of Blocks resting on
table with specified sequence.
• Goal is to arrange in desired sequence.
• Available moves
– Put block on table
– Put a block on another block top
• State is represented using sequence of blocks
in current pos.
4/7/2021 19
STRIPS
• STRIPS stands for "STanford Research Institute Problem
Solver," was the planner used in Shakey, one of the first robots
built using AI technology ,which is an action-centric
representation ,for each action , specifies the effect of an
action.
4/7/2021 20
Block World Problem
• Action List:
SNo Action Precondition Effect
• Goal State:
4/7/2021 22
Block World Problem
Solution:
• Start State: On(A,B)
– On(A, table)
– On(B, table)
• Goal State: Stack(A,B)(
– On(A,B) Preconditions:
Holding(A) Pickup(A)
Clear(B)
Preconditions:
Arm Empty
On(A, Table)
Clear(A)
4/7/2021 23
Means - Ends Analysis
• Search strategies either reason forward of backward
• Mixed strategy - solve the major parts of problem first and
solve the smaller problems that arise when combining them
together.
• Such a technique is called "Means - Ends Analysis".
• Means-Ends Analysis is problem-solving techniques used in
Artificial intelligence for limiting search in AI programs.
• It is a mixture of Backward and forward search technique.
• The means -ends analysis process centers around finding the
difference between current state and goal state.
4/7/2021 24
Means - Ends Analysis
How means-ends analysis Works:
• The means-ends analysis process can be applied recursively
for a problem. It is a strategy to control search in problem-
solving.
Following are the main Steps which describes the working of
MEA technique for solving a problem.
1. First, evaluate the difference between Initial State and final
State.
2. Select the various operators which can be applied for each
difference.
3. Apply the operator at each difference, which reduces the
difference between the current state and goal state.
4/7/2021 25
MEA Algorithm
• Step 1: Compare CURRENT to GOAL, if there are no differences between
both then return Success and Exit.
• Step 2: Else, select the most significant difference and reduce it by doing
the following steps until the success or failure occurs.
a. Select a new operator O which is applicable for the current
difference, and if there is no such operator, then signal failure.
b. Attempt to apply operator O to CURRENT. Make a description of two
states.
i) O-Start, a state in which O?s preconditions are satisfied.
ii) O-Result, the state that would result if O were applied In O-start.
c. If
(First-Part <------ MEA (CURRENT, O-START)
And
(LAST-Part <----- MEA (O-Result, GOAL), are successful, then signal
Success and return the result of combining FIRST-PART, O, and LAST-
PART.
4/7/2021 26
Example of Mean-Ends Analysis:
• Apply MEA to get the goal state.
• Solution:
• To solve the above problem, first find the differences between initial states
and goal states, and for each difference, generate a new state and will
apply the operators. The operators we have for this problem are:
• Move
• Delete
• Expand
4/7/2021 27
Example of Mean-Ends Analysis:
Step 1: Evaluate Initial State
Step 3: Apply Move Operator
4/7/2021 28
NON-LINEAR PLANNING
• A plan that consists of sub-problems, which are solved
simultaneously is said non-linear plan.
• In case of the goal stack planning there are some problems. To
achieve any goal, it could have an impact on the one that has
been achieved.
4/7/2021 29
NON-LINEAR PLANNING (Contd..)
There is a concept of constraint posting that
comes with non-linear planning. The constraint
posting states that the plan can be built by
• 1. Addition of operators or suggesting
operators
• 2. Ordering them
• 3. Binding the variables to the operators
4/7/2021 30
Conditional Planning
• Conditional planning has to work regardless of the outcome of
an action.
• The outcome of actions cannot be determined so the
environment is said to be nondeterministic.
• It’s a way to deal with uncertainty by checking what is actually
happening in the environment at predetermined points in the
plan. (Conditional Steps)
Example:
Check whether SFO airport (San Francisco International
Airport) is operational. If so, fly there; otherwise, fly to some
other place.
4/7/2021 31
Conditional Planning
• Three kind of Environments
Fully Observable
• The agent always knows the current state
Partially Observable
• The agent knows only a certain amount about
the actual state. (much more common in real
world)
Unknown
• The agent knows nothing about the current
state
4/7/2021 32
Conditional Planning in Fully
Observable Environments
• Agent used conditional steps to check the
state of the environment to decide what to do
next.
• Plan information stores in a library Ex:
Action(Left) Clean v Right
4/7/2021 33
Reactive Planning
• Reactive planning is planning under uncertainty.
• Makes use of the if-then rules.
• The reactive planners are based on the concept that they should be able
to handle an unknown situation too. So, the reaction rules are used that
help them in doing so.
• A rule selection is based on the priority and a holding condition that
maximises the priority.
• The rule which is at present in execution is said to be active whereas the
and the ones with holding priority (we can call them possible competitors)
are pre-active others are inactive.
• A B-tree structure is used in reactive planning, where the things are
algorithm selects the rule. Sometimes, no rule can be selected. In such a
case, dependent on the algorithm implementation for rule selection.
4/7/2021 34
LEARNING
learning consists of various activities like
understanding, memorisation, knowledge
acquisition and inference.
Learning is a continuous process
Learning from observation is required to
construct meaningful classification of
observed objects and situations.
MACHINE LEARNING
ACTIVE LEARNING MECHANISM
Perceptual learning
– learning of new objects , categories , relations.
Episodic Learning
– Learning of events like What , where and when
Procedural learning
– actions and their sequences to accomplish a task.
Machine Learning
Machine learning is building and exploring of methods for
programming computer to make them Learn.
Machine learning Vs Human Learning
Machine Learning approach
Scope of machine learning
GOALS OF MACHINE LEARNING
• To produce learning algorithms with practical
value.
• Development and enhancement of computer
algorithms and models to meet the decision
making requirements in practical scenarios.
• To facilitate in building intelligent systems (IS)
that can be used in solving real time problems.
Challenges of Machine Learning
• Avaliability of limited learning data and
unknown perspectives.
• Aquiring Accurate , compact and precise
knowledge building.
• Require large working memory to store data.
• Focusing Too Much on Algorithms and Theories
• Monitoring and maintenance
Learning Concepts , methods and
models
Computational structure used in Machine
learning:
1. Functions.
4. Grammars
• Rote learning
• Learning from observations
• Supervised learning
• unsupervised learning
• semi-supervised learning
• ensemble learning
• Discovery based learning
• Learning by problem solving
Rote learning
•An agent is defined the computational entity which is capable of receiving the
environment and can act based on the situation.
•The agent is composed of learning element performance element and a curiosity
element.
•Based on the coordination between these elements the outcome of the agents
behaviour is measured.
Inductive Learning
● Inductive learning involves learning generalized rules from specific examples (can
think of this as the “inverse” of deduction)
● Main task: given a set of examples, each classified as positive or negative produce a
concept description that matches exactly the positive examples.
● The examples are coded in some representation language, e.g. they are coded by a
finite set of real-valued features.
● The concept description is in a certain language that is presumably a superset of
the language of possible example encodings.
● A “correct” concept description is one that classifies correctly ALL possible
examples, not just those given in the training set.
Decision Tree Learning
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.
Forward pass:
computes ‘functional signal’, feed forward
propagation of input pattern signals through
network
Backward pass phase:
computes ‘error signal’, propagates the error
backwards through network starting at output
units (where the error is the difference
between actual and desired output values)
Support Vector Machines (SVM)
Supervised learning methods for classification and regression
relatively new class of successful learning methods -
62
Support Vectors
• Support Vectors are simply the co-ordinates of individual
observation. Support Vector Machine is a frontier which best
segregates the two classes (hyper-plane/ line).
• Support vectors are the data points that lie closest to the
decision surface (or hyperplane)
• They are the data points most difficult to classify
• They have direct bearing on the optimum location of the
decision surface
• We can show that the optimal hyperplane stems from the
function class with the lowest “capacity” (VC dimension).
• Support vectors are the data points nearest to the hyperplane,
the points of a data set that, if removed, would alter the position
of the dividing hyperplane. Because of this, they can be
considered the critical elements of a data set.
63
What is a hyperplane?
• As a simple example, for a classification task with
only two features, you can think of a hyperplane as a
line that linearly separates and classifies a set of
data.
• Intuitively, the further from the hyperplane our data
points lie, the more confident we are that they have
been correctly classified. We therefore want our data
points to be as far away from the hyperplane as
possible, while still being on the correct side of it.
• So when new testing data are added, whatever side
of the hyperplane it lands will decide the class that
we assign to it. 64
4/7/2021 65
How do we find the right
hyperplane?
• How do we best segregate the two classes within the data?
• The distance between the hyperplane and the nearest data
point from either set is known as the margin. The goal is to
choose a hyperplane with the greatest possible margin
between the hyperplane and any point within the training
set, giving a greater chance of new data being classified
correctly. There will never be any data point inside the
margin.
66
But what happens when there is
no clear hyperplane?
• Data are rarely ever as clean as our simple example above. A dataset will
often look more like the jumbled balls below which represent a linearly
non separable dataset.
• In order to classify a dataset like the one above it’s necessary to move
away from a 2d view of the data to a 3d view. Explaining this is easiest
with another simplified example. Imagine that our two sets of colored
balls above are sitting on a sheet and this sheet is lifted suddenly,
launching the balls into the air. While the balls are up in the air, you use
the sheet to separate them. This ‘lifting’ of the balls represents the
mapping of data into a higher dimension. This is known as kernelling.
67
Because we are now in three dimensions, our
hyperplane can no longer be a line. It must now be a
plane as shown in the example above. The idea is
that the data will continue to be mapped into higher
and higher dimensions until a hyperplane can be
formed to segregate it. 68
How does it work? How can we
identify the right hyper-plane?
69
Identify the right hyperplane
(Scenario-1):
• Here, we have three hyperplanes (A, B and C). Now,
identify the right hyperplane to classify star and circle.
70
Identify the right hyperplane
(Scenario-2):
• Here, we have three hyperplanes (A, B and C)
and all are segregating the classes well. Now,
how can we identify the right hyperplane?
• Some of you may have selected the hyper-plane B as it has higher margin
compared to A. But, here is the catch, SVM selects the hyperplane which
classifies the classes accurately prior to maximizing margin. Here,
hyperplane B has a classification error and A has classified all correctly.
Therefore, the right hyperplane is A.
73
Can we classify two classes (Scenario-4)?
• We are unable to segregate the two classes using a straight line, as one of
star lies in the territory of other (circle) class as an outlier.
• One star at other end is like an outlier for star class. SVM has a feature to
ignore outliers and find the hyperplane that has maximum margin. Hence,
we can say, SVM is robust to outliers.
74
Find the hyperplane to segregate
to classes (Scenario-5)
• In the scenario below, we can’t have linear hyperplane
between the two classes, so how does SVM classify these two
classes? Till now, we have only looked at the linear
hyperplane.
77
Scenario-5
• When we look at the hyperplane in original
input space it looks like a circle:
4/7/2021 78
Two Class Problem: Linear Separable
Case
Class 2
• Many decision
boundaries can
separate these
two classes
• Which one should
Class 1 we choose?
Example of Bad Decision
Boundaries
Class 2 Class 2
Class 1 Class 1
Good Decision Boundary: Margin Should Be
Large
• The decision boundary should be as far away
from the data of both classes as possible
–We should maximize the margin, m 2
m
w.w
Support vectors
datapoints that the margin
pushes up against
Class 2
The maximum margin linear
classifier is the linear classifier
Class 1
m with the maximum margin.
This is the simplest kind of
SVM (Called an Linear SVM)
What is Reinforcement Learning?
• Learning from interaction with an environment to
achieve some long-term goal that is related to the
state of the environment
• The goal is defined by reward signal, which must
be maximised.
• Agent must be able to partially/fully sense the
environment state and take actions to influence
the environment state
• The state is typically described with a feature-
vector
Reinforcement learning
4/7/2021 83
Exploration versus Exploitation
• We want a reinforcement learning agent to earn lots of
reward
• The agent must prefer past actions that have been
found to be effective at producing reward
• The agent must exploit what it already knows to obtain
reward
• The agent must select untested actions to discover
reward-producing actions
• The agent must explore actions to make better action
selections in the future
• Trade-off between exploration and exploitation
Reinforcement Learning Systems
• Reinforcement learning systems have 4 main
elements:
– Policy
– Reward signal
– Value function
– Optional model of the environment
Policy
• A policy is a mapping from the perceived
states of the environment to actions to be
taken when in those states
• A reinforcement learning agent uses a policy
to select actions given the current
environment state
On-policy versus Off-policy
• An on-policy agent learns only about the
policy that it is executing
• An off-policy agent learns about a policy or
policies different from the one that it is
executing
Reward Signal
• The reward signal defines the goal
• On each time step, the environment sends a
single number called the reward to the
reinforcement learning agent
• The agent’s objective is to maximise the total
reward that it receives over the long run
• The reward signal is used to alter the policy
Value Function (1)
• The reward signal indicates what is good in the
short run while the value function indicates what
is good in the long run
• The value of a state is the total amount of reward
an agent can expect to accumulate over the
future, starting in that state
• Compute the value using the states that are likely
to follow the current state and the rewards
available in those states
• Future rewards may be time-discounted with a
factor in the interval [0, 1]
Value Function (2)
• Use the values to make and evaluate decisions
• Action choices are made based on value
judgements
• Prefer actions that bring about states of highest
value instead of highest reward
• Rewards are given directly by the environment
• Values must continually be re-estimated from the
sequence of observations that an agent makes
over its lifetime
Model-free versus Model-based
• A model of the environment allows inferences to be made
about how the environment will behave
• Example: Given a state and an action to be taken while in
that state, the model could predict the next state and the
next reward
• Models are used for planning, which means deciding on a
course of action by considering possible future situations
before they are experienced
• Model-based methods use models and planning. Think of
this as modelling the dynamics p(s’ | s, a)
• Model-free methods learn exclusively from trial-and-error
(i.e. no modelling of the environment)
• This presentation focuses on model-free methods
Advantages of Reinforcement Learning
• It can solve higher-order and complex problems. Also, the solutions
obtained will be very accurate.
• The reason for its perfection is that it is very similar to the human learning
technique.
• Due to it’s learning ability, it can be used with neural networks. This can be
termed as deep reinforcement learning.
• Since the model learns constantly, a mistake made earlier would be
unlikely to occur in the future.
• Various problem-solving models are possible to build using reinforcement
learning.
• When it comes to creating simulators, object detection in automatic cars,
robots, etc., reinforcement learning plays a great role in the models.
• The best part is that even when there is no training data, it will learn
through the experience it has from processing the training data.
• For various problems, which might seem complex to us, it provides the
perfect models to tackle them.
4/7/2021 92
Disadvantages of Reinforcement Learning
4/7/2021 102
Ensemble Learning (Contd…)
Ensemble learning method is the one where multiple learners or
learning algorithms are trained.
• In most of the learning algorithms a single hypothesis drives
the learning.
• In ensemble learning method the whole collection or
ensemble of hypothesis is selected from the hypothesis space
and their predictions are combined.
• In this approach, the learners or referred to as base learners.
• The most commonly used ensemble learning methods are
1) Boosting
2) Bagging.
4/7/2021 103
Ensemble Learning (Contd..)
Boosting:
• Boosting can probably be defined as the method for
generating accurate predictions by combining the rules that
are comparatively inaccurate.
• Boosting works on the weighted training sets. The weights of
the training example reflects the importance of training
examples.
Bagging:
• In Bagging, the training data is resampled. This is referred to
as bootstrap sampling, where the training data with
replacement is taken in the learning approaches.
104
Learning for decision making
• It is observed from different learning mechanisms that
Capability to take decisions is increased.
• speaking about the supervisor and unsupervised methodologies
the decisions taken are not sequential in nature.
• That is, if the system make a mistake on one decision, this has
no bearing on the subsequent decisions.
• To cope up with this dynamic situation there is a need to
understand the perspective of decision making.
• Another aspect is environment and system Learning, which also
needs to be looked upon during decision making. While taking
decisions one specific learning approach may not be suitable.
• The learning approach is dependent on decision scenario.
105
Distributed learning
• In distributed learning the task of learning is distributed.
• Need for distributed learning - Arises due to large data sets
and time constraints.
• More than one agent in different parts of the data set. There
will be distributed learning algorithms taken part in each
partition to get the desired outcome, which would then be
combined.
• Efficiency of distributed learning is affected to look at. it is
extremely important that outcome of distributed learning
matches with the ones acheived under the absence of
distributed environment.
• Multi agent systems can be thought of as a subset of
distributed learning.
4/7/2021 106
•
Speedup learning
• Speed learning typically deals with speeding up
problem solving by effective use of problem
solving experience. Hence, Prayer problem solving
experience is an input for speed of learning.
• In this learning,
1) There is no option with the environment.
2) New problems cannot be solved.
So, speed up learning accelerates the process
experiences And prior observations.
4/7/2021
Speedup learning (Contd..)
4/7/2021 108
Generalized learning
• Another dimension to the speed up learning is generalized
learning. this is also known as explanation based learning.
• There are a number of issues with explanation based learning,
as it is implemented and embedded in system to solve real
life problems and is suitable for a particular set of problems
where the sequential processes once developed can be used
again and again.
• But this is not the case in many real life problems, where
dynamic change in environment demands the improvement in
the established scenarios and even there is a need to keep on
learning based on the new findings
•
4/7/2021 109