Cs-A-501 Ai - Ocw
Cs-A-501 Ai - Ocw
INTELLIGENCE
Artificial intelligence (AI) has become a common term this days. Almost everyone from every
domain seems to be aware of this term. It is almost impossible to avoid its importance in our
daily life. Its presence is visible in every part of our life. For example it is visible in the
applications especially in mobile apps we use on daily basis, it is visible in the movies that the
film industries made. We can see its presence while we made purchases on e-commerce sites, or
we are availing the facilities of medical systems. Manufacturing of an intelligent robot is
possible today due to advance application of AI. Even, today the latest model of mobiles are
coming equipped with a separate processing unit named as AI processor to manage the use of
limited resources of a mobile device intelligently and what not.
The history of AI is very rich though it is one of the newest fields in science and engineering.
Even, its presence can be seen long before the development of modern digital computers.
Actually its inception had taken place while Gottfried Leibniz and Blaise Pascal constructed the
mechanical machine for calculation and following it Charles Babbage introduced the very first
machine with the capability of storing and manipulating symbols. However, the credit for the
name “Artificial Intelligence” goes to John McCarthy who along with Marvin Minsky and
Claude Shanon, organized the Dartmouth Conference in 1956. This conference was a major
turning point that helps to refuel the research on AI to move forward. But it is to be mentioned
that Alan Turing was the first to carry out substantial research in the field now known as
Artificial Intelligence or AI though he originally termed it as “Machine Intelligence”.
WHAT IS AI?
It is important to understand that AI doesn’t really have anything to do with human intelligence
despite the fact that some simulators try to mimic the human intelligence. However, some
definitions of AI focus on human intelligence while the others on hard problems. Let us see some
of such definitions:
According to Herbert Simon “We call programs ‘intelligent’, if they exhibit behaviours
that would be regarded intelligent if they were exhibited by human beings”.
In the words of Avron Barr and Edward Feigenbaum - “Physicists ask what kind of place
this universe is and seek to characterize its behavior systematically. Biologists ask what it means
for a physical system to be living. We (in AI) wonder what kind of information-processing
system can ask such questions.”
Elaine Rich explained that – “AI is the study of techniques for solving exponentially
hard problems in polynomial time by exploiting knowledge about the problem domain.”
John Haugeland described the researches in this field as follows – “The fundamental goal of this
research is not merely to mimic intelligence or produce some clever fake. “AI” wants the
genuine article; machines with minds.”
Thinking Humanly: When a computer thinks as a human, it performs tasks that require
intelligence from a human to succeed, such as driving a car. To determine whether a program
thinks like a human, you must have some method of determining how humans think, which the
cognitive modeling approach defnes. This model relies on three techniques: introspection,
psychological testing and brain imaging.
Acting Humanly: When a computer acts like a human, it best reflects the Turing test, in which
the computer succeeds when differentiation between the computer and a human isn’t possible.
Thinking Rationally: Studying how humans think using some standard enables the creation of
guidelines that describe typical human behaviors. A person is considered rational when
following these behaviors within certain levels of deviation. A computer that thinks rationally
relies on the recorded behaviors to create a guide as to how to interact with an environment
based on the data at hand. The goal of this approach is to solve problems logically, when
possible.
Acting Rationally: Studying how humans act in given situations under specifc constraints
enables you to determine which techniques are both efcient and effective. A computer that acts
rationally relies on the recorded actions to interact with an environment based on conditions,
environmental factors, and existing data.
TURING TEST
Consider the following setting. There are two rooms, A and B. One of the rooms contains a
computer. The other contains a human. The interrogator is outside and does not know which one
is a computer. He can ask questions through a teletype and receives answers from both A and B.
The interrogator needs to identify whether A or B are humans. To pass the Turing test, the
machine has to fool the interrogator into believing that it is human. For more details on the
Turing test visit the site http://cogsci.ucsd.edu/~asaygin/tt/ttest.html
Generally, problems, for which straightforward mathematical /logical algorithms are not readily
available and which can be solved by intuitive approach only, are called AI problems. The 4-
puzzle problem, for instance, is an ideal AI Problem. There is no formal algorithm for its
realization, i.e., given a starting and a goal state, one cannot say prior to execution of the tasks
the sequence of steps required to get the goal from the starting state. Such problems are called
the ideal AI problems. The well known water-jug problem, the Travelling Salesperson Problem
(TSP), and the n-Queen problem are typical examples of the classical AI problems. Among the
non-classical AI problems, the diagnosis problems and the pattern classification problem need
special mention. For solving an AI problem, one may employ both AI and non-AI algorithms.
For better understanding we can divide the problems of AI into two categories, common AI
problems and expert AI problems. For example, common AI problems may include identification
of people, objects/things, communication using natural languages, smartly moving around on the
road to avoid traffic and other obstacles etc. these are the kind of problems which can be solved
through regular practice and intelligence. Whereas, expert AI problems includes those which
demands specialized skills such as solving difficult mathematical or logical problems, defining
strategies or solutions for games requiring higher and complex level of logical thinking and
reasoning power, medical diagnosis etc. What is interesting here is that computer systems are
able to solve many sophisticated and expert level AI problem quite efficiently although many
times they failed to solve regular or common problems which are easily solvable with human
intelligence without applying any sophisticated AI techniques.
AI TECHNIQUES
The subject of AI spans a wide horizon. It deals with the various kinds of knowledge
representation schemes, different techniques of intelligent search, various methods for resolving
uncertainty of data and knowledge, different schemes for automated machine learning and many
others. Among the application areas of AI, we have Expert systems, Game-playing, and
Theorem-proving, Natural language processing, Image recognition, Robotics and many others.
The subject of AI has been enriched with a wide discipline of knowledge from Philosophy,
Psychology, Cognitive Science, Computer Science, Mathematics and Engineering. All the
researches in AI makes one thing clear, that intelligence requires knowledge. Most of the time
knowledge possesses some less desirable properties, including:
It is voluminous.
It is hard to characterize accurately.
It is constantly changing.
It differs from data by being organized in a way that corresponds to the ways it will be
used.
Two people play Tic Tac Toe with paper and pencil. One player is X and the other player is O.
Players take turns placing their X or O. If a player gets three of their marks on the board in a
row, column or one of the two diagonals, they win. When the board fills up with neither player
winning, the game ends in a draw.
From AI point of view, the problem of playing Tic-Tac-Toe will be formulated as follows:
The start state is all blank squares out of 9 squares. Player 1 can play in one square. As
the game proceeds, blank squares remain the choice, which can be marked by the players. The
data structure used to represent the board is a 9-element vector, with element position shown in
Fig. 1.1:
Any board position satisfying this condition would be declared as win for corresponding player.
The valid transitions of this problem are simply putting ‘1’ or ‘2’ in any of the element position
containing 0. In practice, all the valid moves are defined and stored. While selecting a move it is
taken from this store. In this game, valid transition table will be a vector (having 39 entries),
having 9 elements in each.
An agent is anything that can be viewed as perceiving its environment through sensors and
acting upon that environment through actuators. The current percept or a sequence of percepts
can influence the actions of an agent. We use the term percept to refer to the agent’s perceptual
inputs at any given instant. An agent’s percept sequence is the complete history of everything
the agent has ever perceived. In general, an agent’s choice of action at any given instant can
depend on the entire percept sequence observed to date, but not on anything it hasn’t perceived.
The agent can change the environment through actuators or effectors. An operation involving an
actuator is called an action.
We can also consider the agents as autonomous systems. They would be persistent, goal
oriented, pro-active and would sense the situations and surroundings. They would maintain the
social ability by communicating with the owners and the other agents. For an agent to act out its
decision, it must be embodied in some environment.
An agent can be looked upon as a system that implements a mapping from percept sequences to
actions. A performance measure has to be used in order to evaluate an agent. An autonomous
agent decides autonomously which action to take in the current situation to maximize progress
towards its goals. The agent function for an artificial agent will be implemented by an agent
program. It is important to keep these two ideas distinct. The agent function is an abstract
mathematical description; the agent program is a concrete implementation, running within some
physical system.
Some programs operate in the entirely artificial environment confined to keyboard input,
database, computer file systems and character output on a screen. The most famous artificial
environment is the Turing Test environment, in which one real and other artificial agents are
tested on equal ground. This is a very challenging environment as it is highly difficult for a
software agent to perform as well as a human.
In spite of the difficulty of knowing exactly where the environment ends and the agent begins in
some cases, it is useful to be able to classify AI environments because it can predict how difficult
the task of the AI will be. Russell and Norvig (2009) introduce seven ways to classify AI
environments, which can be remembered with the mnemonic "D-SOAKED." They are:
Observability (full or partial): A fully observable environment is one in which the agent
has access to all information in the environment relevant to its task.
Agency (single or multiple): If there is at least one other agent in the environment, it is a
multi-agent environment. Other agents might be apathetic, cooperative, or competitive.
In each case, the job of the AI (and for the programmer making the AI) is easier if the first of the
two options is the best descriptor for each category. That is, an AI that has a much more difficult
STRUCTURE OF AGENTS
The job of AI is to design an agent program that implements the agent function — the mapping
from percepts to actions. We assume this program will run on some sort of computing device
with physical sensors and actuators — we call this the architecture:
AGENT PROGRAM
The agent program takes just the current percept as input because nothing more is available from
the environment; if the agent’s actions need to depend on the entire percept sequence, the agent
will have to remember the percepts.
AGENT ARCHITECTURE
Table based agent. In table based agent the action is looked up from a table based on
information about the agent’s percepts. A table is simple way to specify a mapping from percepts
to actions. The mapping is implicitly defined by a program. The mapping may be implemented
by a rule based system, by a neural network or by a procedure. There are several disadvantages
to a table based system. The tables may become very large. Learning a table may take a very
long time, especially if the table is large. Such systems usually have little autonomy, as all
actions are pre-determined.
These kinds of agents are also called reactive agents or stimulus-response agents. They have no
notion of history. The current state is as the sensors see it right now. The action is based on the
current percepts only.
The percept base agents are efficient but they don’t have any internal representation for
reasoning inference. Besides this, these agents don’t follow any strategic planning or learning.
Finally they are not good for multiple opposing goals.
The Subsumption Architecture built in layers. There are different layers of behaviour. The higher
layers can override lower layers. Each activity is modeled by a finite state machine. The
subsumption architecture can be illustrated by Brooks’ Mobile Robot example.
State-based Agent or model-based reflex agent. State based agents differ from percept based
agents in that such agents maintain some sort of state based on the percept sequence received so
far. The state is updated regularly based on what the agent senses, and the agent’s actions.
Keeping track of the state requires that the agent has knowledge about how the world evolves,
and how the agent’s actions affect the world.
Goal-based Agent. Knowing something about the current state of the environment is not always
enough to decide
what to do. As well as a current state description, the agent needs some sort of goal information
that describes situations that are desirable. The agent program can combine this with the model
Although the goal-based agent appears less efficient, it is more flexible because the knowledge
that supports its decisions is represented explicitly and can be modified. For the reflex agent, on
the other hand, we would have to rewrite many condition–action rules. The goal-based agent’s
behavior can easily be changed to go to a different destination, simply by specifying that
destination as the goal. The reflex agent’s rules for when to turn and when to go straight will
work only for a single destination; they must all be replaced to go somewhere new.
Utility-based Agent. Goals alone are not enough to generate high-quality behavior in most
environments. Goals just provide a crude binary distinction between “happy” and “unhappy”
states. A more general performance measure should allow a comparison of different world states
according to exactly how happy they would make the agent. Because “happy” does not sound
very scientific, economists and computer scientists use the term utility instead. An agent’s utility
function is essentially an internalization of the performance measure. If the internal utility
function and the external performance measure are in agreement, then an agent that chooses
actions to maximize its utility will be rational according to the external performance measure. A
rational utility-based agent chooses the action that maximizes the expected utility of the action
Utility based agents provide a more general agent framework. In case that the agent has multiple
goals, this framework can accommodate different preferences for the different goals. Such
systems are characterized by a utility function that maps a state or a sequence of states to a real
valued utility. The agent acts so as to maximize expected utility.
Learning Agent. Learning allows an agent to operate in initially unknown environments. The
learning element modifies the performance element. Learning is required for true autonomy.
Alan Turing in his one famous paper proposed the method to build learning machines and then to
teach them. In many areas of AI, this is now the preferred method for creating state-of-the-art
systems. Learning has another advantage, as we noted earlier: it allows the agent to operate in
initially unknown environments and to become more competent than its initial knowledge alone
might allow.
A learning agent can be divided into four conceptual components, as shown in Figure 2.4. The
most important distinction is between the learning element, which is responsible for making
improvements, and the performance element, which is responsible for selecting external
actions. The learning element uses feedback from the critic on how the agent is doing and
determines how the performance element should be modified to do better in the future. The
design of the learning element depends very much on the design of the performance element.
The last component of the learning agent is the problem generator. It is responsible for
suggesting actions that will lead to new and informative experiences.
The initial state that the agent starts in. For example, the initial state for our agent in
Romania might be described as In(Arad).
A description of the possible actions available to the agent. Given a particular state s,
ACTIONS(s) returns the set of actions that can be executed in s. We say that each of
these actions is applicable in s. For example, from the state In(Arad), the applicable
actions are {Go(Sibiu), Go(Timisoara), Go(Zerind)}.
A description of what each action does; the formal name for this is the transition model,
specified by a function RESULT(s, a) that returns the state that results from doing action
a in state s. We also use the term successor to refer to any state reachable from a given
state by a single action.2 For example, we have
The goal test, which determines whether a given state is a goal state.
A path cost function that assigns a numeric cost to each path.
1. Define the problem precisely. This definition must include precise specifications of what
the initial situation(s) will be as well as what final situations constitute acceptable
solutions to the problem.
2. Analyze the problem. A few very important features can have an immense impact on the
appropriateness of various possible techniques for solving the problem.
3. Isolate and represent the task knowledge that is necessary to solve the problem.
4. Choose the best problem-solving technique(s) and apply it (them) to the particular
problem.
Searching is the universal technique of problem solving in AI. Problem solving requires two
prime considerations: first representation of the problem by an appropriately organized state
space and then testing the existence of a well-defined goal state in that space.
Together, the initial state, actions, and transition model implicitly define the state space of the
problem—the set of all states reachable from the initial state by any sequence of actions. The
state space forms a directed network or graph in which the nodes are states and the links
Artificial Intelligence CS-A-501 Page 17
between nodes are actions. (The map of Romania shown in Figure 3.1 can be interpreted as a
state-space graph if we view each road as standing for two driving actions, one in each
direction.) A path in the state space is a sequence of states connected by a sequence of actions.
PRODUCTION SYSTEM
The production system is a model of computation that can be applied to implement search
algorithms and model human problem solving. Such problem solving knowledge can be packed
up in the form of little quanta called productions. A production is a rule consisting of a situation
recognition part and an action part. A production is a situation-action pair in which the left side is
a list of things to watch for and the right side is a list of things to do so. When productions are
used in deductive systems, the situation that trigger productions are specified combination of
facts. The actions are restricted to being assertion of new facts deduced directly from the
triggering combination. Production systems may be called premise conclusion pairs rather than
situation action pair.
Expressiveness and intuitiveness: In real world, many times situation comes like “if this
happen-you will do that”, “if this is so-then this should happen” and many more. The
production rules essentially tell us what to do in a given situation.
PROBLEM CHARACTERISTICS
A problem may have different aspects of representation and explanation. In order to choose
the most appropriate method for a particular problem, it is necessary to analyze the problem
along several key dimensions. Some of the main key features of a problem are given below.
The above characteristics of a problem are called as 7-problem characteristics under which
the solution must take place.
Every search process can be viewed as traversal of a tree structure in which each node represents
a problem state and each are represents a relationship between the states represented by the node
it connects. The search process must find a path or paths through the tree that connects an initial
state with one or more final states. The tree that must be searched could, in principle, be
constructed in its entirety from the rules that define allowable moves in the problem space. But,
in practice, most of it never is. It is too large and most of it need never be explored. Instead of
first building the tree explicitly and then searching it, most search programs represent the tree
implicitly in the rules and generate explicitly only those parts that they decide to explore.
Following are some of the issues that arise in all general purpose search techniques:
Direction in which to conduct the search. We can search forward through the state space
from the start state to goal state, or we can search backward from the goal.
Production systems typically spend most of their time looking for rules to apply, so it is
critical to have efficient procedures for matching rules against states.
How to represent each node of the search processs (the knowledge representation
problem and the frame problem).
Intelligent agents are supposed to maximize their performance measure. achieving this is
sometimes simplified if the agent can adopt a goal and aim at satisfying it. Goals help organize
behavior by limiting the objectives that the agent is trying to achieve and hence the actions it
needs to consider. Goal formulation, based on the current situation and the agent’s performance
measure, is the first step in problem solving. Whereas, Problem formulation is the process of
deciding what actions and states to consider, given a goal. If the agent has three different path to
reach its goal then may be the agent will not know which of its possible actions is best, because it
does not yet know enough about the state that results from taking each action. If the agent has no
additional information—i.e., if the environment is unknown then it is has no choice but to try
one of the actions at random. In such situation, an agent with several immediate options of
unknown value can decide what to do by first examining future actions that eventually lead to
states of known value.
The process of looking for a sequence of actions that reaches the goal is called search. A search
algorithm takes a problem as input and returns a solution in the form of an action sequence.
Once a solution is found, the actions it recommends can be carried out. This is called the
execution phase. Thus, we have a simple “formulate, search, execute” design for the agent, as
shown in Figure 4.1.
After formulating a goal and a problem to solve, the agent calls a search procedure to solve it. It
then uses the solution to guide its actions, doing whatever the solution recommends as the next
thing to do—typically, the first action of the sequence—and then removing that step from the
sequence. Once the solution has been executed, the agent will formulate a new goal. It is to be
noted that while the agent is executing the solution sequence it ignores its percepts when
choosing an action because it knows in advance what they will be. According to the control
theorists it is called open-loop system.
Uniform search also known as blind search. The term means that the strategies have no
additional information about states beyond that provided in the problem definition. All they can
do is generate successors and distinguish a goal state from a non-goal state. Intuitively, these
algorithms ignore where they are going until they find a goal and report success.
Breadth first search is a general technique of traversing a graph. Breadth first search may use
more memory but will always find the shortest path first. In this type of search the state space is
represented in form of a tree. The solution is obtained by traversing through the tree. The nodes
of the tree represent the start value or starting state, various intermediate states and the final state.
In this search a queue data structure is used and it is level by level traversal. Breadth first search
expands nodes in order of their distance from the root. It is a path finding algorithm that is
capable of always finding the solution if one exists. The solution which is found is always the
optional solution. This task is completed in a very memory intensive manner. Each node in the
search tree is expanded in a breadth wise at each level.
Algorithm:
Note that in breadth first search the newly generated nodes are put at the back of fringe or the
OPEN list. What this implies is that the nodes will be expanded in a FIFO (First In First Out)
order. The node that enters OPEN earlier will be expanded earlier. This amounts to expanding
the shallowest nodes first.
BFS illustrated:
We will now consider the search space in Figure 1, and show how breadth first search works on
this graph.
Step 1: Initially fringe contains only one node corresponding to the source state A.
FRINGE: A
Step 2: A is removed from fringe. The node is expanded, and its children B and C are generated.
They are placed at the back of fringe.
(ii)
FRINGE: B C
Step 3: Node B is removed from fringe and is expanded. Its children D, E are generated and put
at the back of fringe.
FRINGE: C D E
Step 4: Node C is removed from fringe and is expanded. Its children D and G are added to the
back of fringe.
(iv)
FRINGE: D E D G
(v)
FRINGE: E D G C F
(vi)
(vii)
FRINGE: G C F B F
Step 8: G is selected for expansion. It is found to be a goal node. So the algorithm returns the
path A C G by following the parent pointers of the node corresponding to G. The algorithm
terminates.
We can easily see that it is complete—if the shallowest goal node is at some finite depth d,
breadth-first search will eventually find it after generating all shallower nodes (provided the
branching factor b is finite). The shallowest goal may not necessarily be the optimal one. If the
path cost is a non-decreasing function of the depth of the node then we can say that the breadth-
first search is optimal.
Imagine searching a uniform tree where every state has b successors. The root of the search tree
generates b nodes at the first level, each of which generates b more nodes, for a total of b 2 at the
second level. Each of these generates b more nodes, yielding b3 nodes at the third level, and so
on. Now suppose that the solution is at depth d. In the worst case, it is the last node generated at
that level. Then the total number of nodes generated is
b + b2 + b3 + ··· + bd = O(bd) .
Advantage
Disadvantage
Since each level of nodes is saved for creating next one, it consumes a lot of memory space.
Space requirement to store nodes is exponential. Its complexity depends on the number of nodes.
It can check duplicate nodes.
It is implemented in recursion with LIFO stack data structure. It creates the same set of nodes as
Breadth-First method, only in the different order. As the nodes on the single path are stored in
each iteration from root to leaf node, the space requirement to store nodes is linear. With
branching factor b and depth as m, the storage space is bm.
Algorithm
The depth first search algorithm puts newly generated nodes in the front of OPEN. This results in
expanding the deepest node first. Thus the nodes in OPEN follow a LIFO order (Last In First
Out). OPEN is thus implemented using a stack data structure.
DFS illustrated:
Let us now run Depth First Search on the search space given in Figure 34, and trace its progress.
FRINGE: A
Step 2: A is removed from fringe. A is expanded and its children B and C are put in front of
fringe.
(x)
FRINGE: B C
Step 3: Node B is removed from fringe, and its children D and E are pushed in front of fringe.
FRINGE: D E C
Step 4: Node D is removed from fringe. C and F are pushed in front of fringe.
(xii)
FRINGE: C F E C
Step 5: Node C is removed from fringe. Its child G is pushed in front of fringe.
FRINGE: G F E C
Step 6: Node G is expanded and found to be a goal node. The solution path A-B-D-C-G is
returned and the algorithm terminates.
(xiv)
FRINGE: G F E C
This algorithm takes exponential time. If N is the maximum depth of a node in the search space,
in the worst
d
case the algorithm will take time O(b ). However the space taken is linear in the depth of the
search tree, O(bN).
Disadvantage
This algorithm may not terminate and go on infinitely on one path. The solution to this issue is
to choose a cut-off depth. If the ideal cut-off is d, and if chosen cutoff is lesser than d, then this
algorithm may fail. If chosen cut-off is more than d, then execution time increases. Its
complexity depends on the number of paths. It cannot check duplicate nodes.
The embarrassing failure of depth-first search in infinite state spaces can be alleviated by
supplying depth-first search with a predetermined depth limit. That is, nodes at depth are treated
as if they have no successors. This approach is called depth-limited search. The depth limit
solves the infinite-path problem. Unfortunately, it also introduces an additional source of
incompleteness if we choose l < d, that is, the shallowest goal is beyond the depth
limit. (This is likely when d is unknown.) Depth-limited search will also be nonoptimal if we
choose l > d. Its time complexity is O(bl) and its space complexity is O(bl). Depth-first search
can be viewed as a special case of depth-limited search with l = ∞.
It searches forward from initial state and backward from goal state till both meet to identify a
common state. The path from initial state is concatenated with the inverse path from the goal
state. Each search is done only up to half of the total path.
The idea behind bidirectional search is to run two simultaneous searches—one forward from the
initial state and the other ackward from the goal—hoping that the two searches meet in the
middle (Figure 3.20). The motivation is that bd/2 + bd/2 is much less than bd, or in the figure, the
area of the two small circles is less than the area of one big circle centered on the start and
reaching to the goal.
Bidirectional search is implemented by replacing the goal test with a check to see whether the
frontiers of the two searches intersect; if they do, a solution has been found. It is important to
realize that the first such solution found may not be optimal, even if the two searches are both
breadth-first; some additional search is required to make sure there isn’t another short-cut across
the gap.
Fig. 4.2: A schematic view of a bidirectional search that is about to succeed when a branch from the start
node meets a branch from the goal node.
Figure 3.21 compares search strategies in terms of the four evaluation criteria set forth in Section
3.3.2. This comparison is for tree-search versions. For graph searches, the main differences are
that depth-first search is complete for finite state spaces and that the space and time complexities
are bounded by the size of the state space.
8 32 1 2 3
1 6 4 8 4
7 5 7 6 5
We have to design a heuristic function for the above 8 puzzle problem. We design our heuristic
function as h(n)=Minimization number of tiles out of place (with respect to goal).
We try to minimize h(n). The search space of the problem looks like
2 8 3
1 6 4
7 5
h(n)=4 h(n)=5
2 8 3 2 8 3
1 6 4 6 4
7 5 1 7 5
h(n)=3
h(n)=5
2 8 3 2 8 3
1 4 1 6 4
7 6 5 7 5
2 3 2 8 3
………………………………………………
1 8 4 1 4
h(n)=4
7 6 5 7 6 5
2 3 2 3 …… 2 8
1 8 4 1 8 4 1 4 3
7 6 5 7 6 5 7 6 5
h(n)=1 h(n)=3
1 2 3 2 3 ……………..
8 4 1 8 4
7 6 5 7 6 5
h(n)=0 h(n)=2
1 2 3 1 2 3 ……………….
8 4 7 8 4
7 6 5 6 5
goal
Thus our heuristic leads us to goal. Whenever in any step, there is a tie, we can choose any one
of the state. For convenience, when a tie occurs, states which leads to goal are chosen but,
students may try to choose a different one and check what happens. Design of heuristic function
may vary from user to user. Heuristic function may be a maximization function or may be a
Manhattan Distance Heuristic-- Another heuristic for 8 puzzle problem is Manhattan distance
heuristic. In this heuristic, the distance of a tile is measured by the sum of the difference of a tile
in the X position and Y position. So, manhattan distance heuristic for start node of the above
problem is h(n)=1+1+0+0+0+1+1+2=6. Only tile 8 has manhattan distance 2, because the
difference from goal in X position is 1 and in Y position is 1. So sum is 2. All the rest of the tiles
have manhattan distance 0 or 1.
In forward reasoning problem, we reach towards the goal state from a starting state. This class of
algorithm when implemented with heuristic function is called heuristic search for OR graphs or
the Best First Search algorithms.
In best first search algorithm, we start with a promising node (which has maximum or minimum
fitness value according to the problem designed) and generate all its children. The fitness of each
of the children is then examined and the most promising node among all the unexpanded nodes
is selected for expansion. The most promising node is then expanded and the fitness of its
offsprings are measured. Now, among all the unexpanded nodes, the most promising node is
selected for expansion. This process continues until we reach to goal. The best first search
algorithm is stated below:
Begin
Step1: Identify possible starting states. Evaluate them using heuristic function (f) and put them in
a list L.
Step2: While L is not empty do
Begin
a) Chose node n from L that has minimum of f value. If there is a tie, then select a
node randomly;
b) If n is goal state
Then return node n with its path from root node and
Exit;
Else
Remove n from list L and generate all children of n. Add them in list L.
End while;
END.
Best first search is a class of algorithms and one member of this class is A* algorithm.
Two new definition are added for discussing A* algorithm.
Definition 5.1: A node is called open if the node has been generated and the heuristic function
has been applied over it but the node has not been expanded yet.
Definition 5.2: A node is called closed if it has been expanded for generating offspring.
In A* algorithm, for evaluation of a node, two cost function are used. One is heuristic cost and
another is generation cost.
Heuristic cost: It measures the distance of the current node (x) with respect to the goal node and
denoted by h(x).
Generation cost: it measures the distance of the current node (x) with respect to starting node and
denoted by g(x).
Total cost of a node (x) is denoted as f(x)=g(x)+h(x).
Now g(x) can be easily measured as it is the distance from starting node but, distance from goal
h(x) can be measured through prediction and is denoted as h/(x). Consequently, the predicted
total cost is denoted by f/(x), where
f/(x)=g(x)+ h/(x).
y z w y z w
(4) (7)
y z w
y z w
(5) (5)
c d a b c d a b
(1) (2)
Fig. 5.5: Best First Search
Fig. 5.5 shows the procedure of best first search. Initially there is a single node x. so it is
expanded. It generates 3 nodes y, z, and w. heuristic function is applied on each node and the
values are 2, 5, and 1 respectively. Since, node w looks promising it is expanded next, generating
two solutions a and b. Heuristic function is applied to a and b. Now, another path, node y looks
promising. So, y is expanded next, generating c and d. they are evaluated. Now, c and d look less
promising to node a, which lies in another path. So, a is expanded, generating e and f. This
process continues till goal is found.
Procedure A*
Begin
Step I: Place a node n to open and measure its f/(n)=g(n)+h/(n);
Step II: Repeat (Until open is not empty )
Begin
If n = goal then stop and return n along with the path of n from starting node;
Else do
Begin
a) Remove n from open and place it under closed;
b) Generate the children of n;
c) If all children are new then add them to open and calculate their f/ and the path from root node
through back pointers;
Let us consider
X= Amount of water in 3 gallon jug
Y= Amount of water in 5 gallon jug
P is an arbitrary node in the search space
We design heuristic like that
h/(p)= 2, when 0<X<3 AND 0<Y<5
= 4, when 0<X<3 OR 0<Y<5
= 10, when i) X=0 AND Y=0
or ii) X=3 AND Y=5
= 8, when i) X=0 AND Y=5
or ii) X=3 AND Y=0
We assume g(p)=0 for root node and g(p)=n, if the number of parent nodes=n for p th node,
starting from root node. Discovery of the search space using A* Algorithm is illustrated below
(0, 0)
Step2: r g+h/=0+10
(3, 0) (0, 5)
M N
g+h/=1+8 g+h/=1+8
(3, 0) M N (0, 5)
g+h/=1+8 g+h/=1+8
(0, 3) (3, 5)
P Q
g+h/=2+4 g+h/=2+10
Behaviour of A* Algorithm
Underestimation
If we can guarantee that h, never over estimates actual value from current to goal, then A*
algorithm is able to find an optimal path to a goal, if it exists.
Underestimate A
(2+3) E
3 moves away from goal
(3+3) F
Overestimation
Overestimated A
(2+2) E
(3+1) F
(4+0)G
We expand B which have minimum f/(g+h’) value. Next, we expand node E and then node F and
get node G. The length of the solution path is 4. But, if there is a direct path from node D to node
G, we will never find it.
We have overestimated h/(D) which make D so bad that we may explore the worst path by
ignoring D.
Properties of Best first search Algorithms
a) Completeness: An algorithm is complete, if it is able to find a solution if it exists.
b) Admissibility: An algorithm is admissible, if it is able to find an optimal solution, if it
exists.
c) Dominance: An algorithm A1 is said to dominate another algorithm A2, if every node
expanded by A1 is also expanded by A2.
d) Optimality: An algorithm is optimal over a class of algorithms, if the algorithm
dominates all members of the class.
A* Search: properties
. A* is admissible under the FOLlowing conditions:
In the state space graph
Each node has finite number of successors
Every arc in the graph has a cost greater than some £>0
Heuristic function, for every node n,
h(n)<=h*(n)
A* is also complete under the above conditions.
A* is optimally efficient for a given heuristic----------It means that no other optimal
algorithm will expand fewer nodes to find a solution than A*.
Artificial Intelligence CS-A-501 Page 48
A heuristic is consistent if:
h(n)<=cost(n, n/)+h(n/). This comes under properties of heuristic in the next page.
If a heuristic h is consistent, the f values along a path will be non decreasing.
root
f(n)=g(n)+h(n)=3+4=7|
/ n
cost(n, n )=1 decreases
f/(n)=g(n/)+h(n/)=4+2=6|
n/
Proof of admissibility of A*
A* is admissible if it uses a monotone heuristic. A monotone heuristic is such that along any
path f-cost never decreases. By making use of some trick, we can make the f value monotonic.
f(m)=max(f(n),g(m)+h(m))
Where G is the optimal goal state
C* is the optimal path cost
G1 is the suboptimal goal state: g(G1)>C*
Let, A* has selected G1from OPEN for expansion. Let, n is a node on OPEN on an optimal path
to G. Thus, C*>=f(n)
As, n is not chosen for expansion over G1, f(n)>=f(G1)
G1 is a goal state f(G1)=g(G1)
Hence, C*>=g(G1)
This is a contradiction. Thus A* could not select G1 for expansion before reaching the goal by an
optimal path.
Proof of completeness of A*
Properties of Heuristics
Dominance:
h2 is said to dominate h1 if h2 (n)>=h’(n)
For any node n.
A* will expand fewer nodes on average using h2 than h1.
Proof: Every node for which f(n)<C* will be expanded. This n is expanded whenever h(n)<f*--
--g(n)
Since, h2(n)>=h’(n), any node expanded using h2 will be expanded using h1.
Admissible:
An heuristic function h is said to be admissible if
h(n)<=h*(n)
Monotonic:
A heuristic function is said to be monotonic if it satisfies
h(n)<=cost(n, n/)+h(n’) for all n, n/
Where n’ is successor of n.
Every consistent heuristic is also admissible
Proof: We have
h(n)<=k(n, n/)+h(n/) [h is consistent]
Replace ɣ against n’, we have
h(n’)<=k(n, ɣ)+h(ɣ)
=>h(n)<=h*(n)
This is the condition for admissibility.
Fig. 5.10 describes an AND-OR graph. Here the goal is “to get a good job” and the terminals of
the graph describe the possible means to achieve the goal.
B C D
Step I: A goal node is given. Find the possible means by which goal can be achieved;
Step II: Estimate h’ values at the leaves and find the leaf with minimum h’;
Cost of the Parent of the leaf=minimum (((Cost of OR Clause₁, Cost of OR
Clause₂,…………, Cost of OR Clausen)+1 ), ( (Cost of AND Clause+, Cost of AND
Clause+,………… .................,+Cost of AND Clausen)+n));
Children (n) with minimum h/(n) are calculated and pointer is attached to point from the
Parent to its promising children;
Step III: One of the unexpanded OR clauses/ the set of unexpanded AND clauses, where the
pointer points from its parent is now chosen for expansion and the h’ value of the newly
generated children are calculated. Recalculate the f1 of the parent or the parent of the parent of
new children by propagating h’ value up to the root through a least cost path. Thus, the pointers
may be modified depending on the changed cost.
Step1: A (5)
Step2: A (4)
B C D
(12)
B C D
(2) (3)
E F
(4) (6)
Step4: A (8)
B C D
(3) (12)
(3)
E F
G H
1. Important attributes
There are two attributes shown in the diagram, instance and isa. Since these attributes support
property of inheritance, they are of prime importance.
i. What are the primitives and at what level should the knowledge be represented?
ii. What should be the number (small or large) of low-level primitives or high-level facts?
High-level facts may be insufficient to draw the conclusion while Low-level primitives may
require a lot of storage.
For example: Suppose that we are interested in following facts:
John spotted Alex.
Hence, the user can add other facts, such as "Spotted (x, y) → saw (x, y)"
While selecting and reversing the right structure, it is necessary to solve following problem
statements. They include the process on how to:
Select an initial appropriate structure.
Fill the necessary details from the current situations.
Determine a better structure if the initially selected structure is not appropriate to fulfill
other conditions.
Find the solution if none of the available structures is appropriate.
Create and remember a new structure for the given condition.
There is no specific way to solve these problems, but some of the effective knowledge
representation techniques have the potential to solve them.
Connectives and the truth tables of compound prepositions are given below:
Consider 'p' and 'q' are two prepositions then,
p ¬p
0 1
1 0
p q pq
0 0 0
0 1 0
1 0 0
1 1 1
3. Disjunction (p ∨ q) indicates that either p or q or both are enclosed in parenthesis. Thus, p and
q are called disjuncts.
Truth table for disjunction:
p q pq
0 0 0
0 1 1
1 0 1
1 1 1
p
p q
q
0 0 1
0 1 1
1 0 0
1 1 1
p
p q
q
0 0 1
PROBLEM DISCUSSION:
Q1. “If SRK plays hero’s part, then the movie will be hit, if the plot is not too melodramatic. If
SRK plays the hero’s part, the plot will not be too melodramatic.
Therefore, if SRK plays hero’s part, the movie will be a hit.”
Is it a valid argument?
SAMPLE ANSWER:
Sentence 1: SRK plays hero’s part
Sentence 2: The movie will be hit
Sentence 3: the plot will not be too melodramatic
Premise 1: A(CB)
Premise 2: AC
Conclusion: AB
Note: An argument is said to be valid, if the conclusion is true, whenever the premises are true.
0 0 1 0 1 1 1 √
0 1 0 1 1 1 1 √
0 1 1 1 1 1 1 √
1 0 0 1 1 0 0
1 1 0 1 1 0 1
1 1 1 1 1 1 1 √
Through the above table we can see that, whenever premises are true, conclusions are true. So
the given argument is VALID.
Constants A, B, C.....
Functions Size, Color
Variable x, a
Terms Constant, variable or function(Term..)
Predicates True, False
Quantifiers ∀, ∃
Atomic Predicate, Predicate(Term,…),
sentences Term=Term
Sentences ¬ Sentence, Sentence ∨ Sentence,
Sentence ∧ Sentence, Sentence ⇒
Sentence, Sentence ⇔ Sentence,
Quantifier Variable,… Sentence
Semantics:
Man(Marcus)
(8)
Man(Marcus)Try(Marcus, Caesar)
(4)
¬Loy(Marcus,Caesar)
Hence it is proved.
Now, if it is told that prove the same thing through Resolution Refutation method, then the
technique is different and we must know what is Resolution.
In mathematical logic and automated theorem proving, resolution is a rule of inference leading to
a refutation theorem-proving technique for sentences in propositional logic and first-order logic.
where
all ai, bi and c are literals,
the dividing line stands for "entails".
The above may also be written as:
The clause produced by the resolution rule is called the resolvent of the two input clauses. It is
the principle of consensus applied to clauses rather than terms.
When the two clauses contain more than one pair of complementary literals, the resolution rule
can be applied (independently) for each such pair; however, the result is always a tautology.
Modus ponens can be seen as a special case of resolution (of a one-literal clause and a two-literal
clause).
is equivalent to
A resolution technique
When coupled with a complete search algorithm, the resolution rule yields a sound and complete
algorithm for deciding the satisfiability of a propositional formula, and, by extension,
the validity of a sentence under a set of axioms.
All sentences in the knowledge base and the negation of the sentence to be proved
(the conjecture) are conjunctively connected.
The resulting sentence is transformed into a conjunctive normal form with the conjuncts
viewed as elements in a set, S, of clauses.[4]
For example (A1A2)(B1B2B3)(C1) gives rise to the
set S=(A1A2),(B1B2B3),(C1) .
The resolution rule is applied to all possible pairs of clauses that contain complementary
literals. After each application of the resolution rule, the resulting sentence is simplified
by removing repeated literals. If the sentence contains complementary literals, it is
discarded (as a tautology). If not, and if it is not yet present in the clause set S, it is added
to S, and is considered for further resolution inferences.
If after applying a resolution rule the empty clause is derived, the original formula is
unsatisfiable (or contradictory), and hence it can be concluded that the initial
conjecture follows from the axioms.
If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot
be applied to derive any more new clauses, the conjecture is not a theorem of the original
knowledge base.
One instance of this algorithm is the original Davis–Putnam algorithm that was later refined into
the DPLL algorithm that removed the need for explicit representation of the resolvents.
This description of the resolution technique uses a set S as the underlying data-structure to
represent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible and
common alternatives. Tree representations are more faithful to the fact that the resolution rule is
binary. Together with a sequent notation for clauses, a tree representation also makes it clear to
see how the resolution rule is related to a special case of the cut-rule, restricted to atomic cut-
formulas. However, tree representations are not as compact as set or list representations, because
they explicitly show redundant sub derivations of clauses that are used more than once in the
derivation of the empty clause. Graph representations can be as compact in the number of clauses
as list representations and they also store structural information regarding which clauses were
resolved to derive each resolvent.
So, in a nutshell, a Resolution process is a simple iterative process, at each step, two clauses
called the parent clauses compared (i.e. resolved), yielding a new clause that has been inferred
from them.
So, we must know how to convert a predicate logic into its equivalent clauses. There are few
thumb rules to be followed sequentially while converting a predicate logic into a clause. They are
as follows:
1) Eliminate the implication sign () with the following identity:
ab is equivalent to ¬ab.
Step 3: Whatever to proof, make negation of the whole fact and assume it as a true value and try
to prove it.
Step 4: Resolve accordingly.
Procedural Declarative
knowledge knowledge
higher level of
high efficiency
abstraction
low modifiability good modifiability
good cognitive
low cognitive
matching (better for
adequacy (better for
domain experts and
knowledge engineers)
end-users)
Incipient to Consequence to
Flow
consequence incipient
Many of the operators and notations that are used in prepositional logic can also be used in
probabilistic notation. For example, P(¬S) means “the probability that it is not sunny”; P(S ∧ R)
means “the probability that it is both sunny and rainy.” P(A ∨ B), which means “the probability
that either A is true or B is true,” is defined by the following rule: P(A ∨ B) = P(A) + P(B) - P(A
∧ B)
The notation P(B|A) can be read as “the probability of B, given A.” This is known as conditional
probability—it is conditional on A. In other words, it states the probability that B is true, given
that we already know that A is true. P(B|A) is defined by the following rule: Of course, this rule
cannot be used in cases where P(A) = 0.
For example, let us suppose that the likelihood that it is both sunny and rainy at the same time is
0.01. Then we can calculate the probability that it is rainy, given that it is sunny as follows:
Conditional probability, P(A|B), indicates the probability of of event A given that we know event
B has occurred.
A Bayesian Network is a directed acyclic graph:
A graph where the directions are links which indicate dependencies that exist between nodes.
Nodes represent propositions about events or events themselves.
Conditional probabilities quantify the strength of dependencies.
Consider the following example:
The probability, that my car won't start.
If my car won't start then it is likely that
o The battery is flat or
o The staring motor is broken.
In order to decide whether to fix the car myself or send it to the garage I make the following
decision:
If the headlights do not work then the battery is likely to be flat so i fix it myself.
If the starting motor is defective then send car to garage.
If battery and starting motor both gone send car to garage.
The network to represent this is as follows:
P(B) is called the prior probability of B. P(B|A), as well as being called the conditional
probability, is also known as the posterior probability of B.
P(A ∧ B) = P(A|B)P(B)
Note that due to the commutativity of ∧ , we can also write
P(A ∧ B) = P(B|A)P(A)
Hence, we can deduce: P(B|A)P(A) = P(A|B)P(B)
This can then be rearranged to give Bayes’ theorem:
Bayes Theorem states:
This reads that given some evidence E then probability that hypothesis is true is equal to the
ratio of the probability that E will be true given times the a priori evidence on the probability of
and the sum of the probability of E over the set of all hypotheses times the probability of these
hypotheses.
The set of all hypotheses must be mutually exclusive and exhaustive.
Thus to find if we examine medical evidence to diagnose an illness. We must know all the
prior probabilities of find symptom and also the probability of having an illness based on certain
symptoms being observed.
Bayesian networks are also called Belief Networks or Probabilistic Inference Networks.
− Thus in classical set theory A (x) has only the values 0 ('false') and 1 ('true''). Such sets are
called crisp sets.
Fuzzy set theory is an extension of classical set theory where elements have varying degrees of
membership. A logic based on the two truth values, True and False, is sometimes inadequate
when describing human reasoning. Fuzzy logic uses the whole interval between
0 (false) and 1 (true) to describe human reasoning.
A student of height 1.79m would belong to both tall and not tall sets with a particular degree of
membership.
As the height increases the membership grade within the tall set would increase whilst the
membership grade within the not-tall set would decrease.
Natural Language Processing, usually shortened as NLP, is a branch of artificial intelligence that
deals with the interaction between computers and humans using the natural language. Natural
language processing is a subfield of computer science and in artificial intelligence that is
concerned with computational processing of natural languages, emulating cognitive capabilities
without being committed to a true simulation of cognitive processes. It is a theoretically
motivated range of computational techniques for analyzing and representing naturally occurring
texts at one or more levels of linguistic analysis for the purpose of achieving human like
language processing for a range of tasks or applications. It is a computerized approach to
analyzing text that is based on both a set of theories and a set of technologies. NLP is a very
active area of research and development. Naturally occurring texts can be of any language, mode
and genre etc. The text can be oral or written. The only requirement is that they be in a language
used by humans to communicate to one another. Also, the text being analyzed should not be
specifically constructed for the purpose of analysis, but rather that the text is gathered from
actual usage.
The ultimate objective of NLP is to read, decipher, understand, and make sense of the human
languages in a manner that is valuable. Most NLP techniques rely on machine learning to derive
meaning from human languages.
NLP lie in a number of disciplines like computer and information sciences, linguistics,
mathematics, electrical and electronic engineering, artificial intelligence and robotics,
psychology etc. Applications of NLP include a number of fields of studies such as machine
translation, natural language text processing, summarization, user interfaces multilingual and
Gross language information retrieval (CLIR), speech recognition, artificial intelligence and
expert system.
As natural language processing technology matures, it is increasingly being used to support other
computer applications. Such use naturally falls into two areas, one in which linguistic analysis
merely serves as an interface to the primary program and the second one in which natural
language considerations are central to the application. Natural language interfaces into a request
in a formal database query language, and the program then proceeds as it would without the use
of natural language processing techniques. Also some more application areas include information
and text categorization. In both applications, natural language processing imposes a linguistic
representation on each document being considered.
Natural Language Processing is the driving force behind the following common applications:
Lexical analysis
Syntactic analysis
Semantic analysis
Discourse Integration
Pragmatic Analysis
LEXICAL ANALYSIS
It involves identifying and analyzing the structure of words. Lexicon of a language means the
collection of words and phrases in a language. Lexical analysis is dividing the whole chunk of txt
into paragraphs, sentences, and words.
SYNTACTIC PROCESSING
Processing a sentence syntactically involves determining the subject and predicate and the place
of nouns, verbs, pronouns, etc. Given the variety of ways to construct sentences in a natural
language, it's obvious that word order alone will not tell you much about these issues, and
depending on word order alone would be frustrated anyway by the fact that sentences vary in
length and can contain multiple clauses.
In NLP, syntactic analysis is used to assess how the natural language aligns with the grammatical
rules. Computer algorithms are used to apply grammatical rules to a group of words and derive
meaning from them.
Lemmatization: It entails reducing the various inflected forms of a word into a single form for
easy analysis.
Word segmentation: It involves dividing a large piece of continuous text into distinct units.
Part-of-speech tagging: It involves identifying the part of speech for every word.
There are a number of algorithms researchers have developed for syntactic analysis, such as −
Context-Free Grammar
Top-Down Parser
SEMANTIC ANALYSIS
Semantics refers to the meaning that is conveyed by a text. Semantic analysis is one of the
difficult aspects of Natural Language Processing that has not been fully resolved yet. It involves
applying computer algorithms to understand the meaning and interpretation of words and how
sentences are structured.
Named entity recognition (NER): It involves determining the parts of a text that can be
identified and categorized into preset groups. Examples of such groups include names of people
and names of places.
Word sense disambiguation: It involves giving meaning to a word based on the context.
Natural language generation: It involves using databases to derive semantic intentions and
convert them into human language.
DISCOURSE INTEGRATION
The meaning of any sentence depends upon the meaning of the sentence just before it. In
addition, it also brings about the meaning of immediately succeeding sentence.
While syntax and semantics work with sentence-length units, the discourse level of NLP works
with units of text longer than a sentence i.e. it does not interpret multi-sentence texts as just
concatenated sentences, each of which can be interpreted singly. Discourse focuses on the
properties of the text as a whole that convey meaning by making connections between
component sentences. Several types of discourse processing can occur at this level like anaphora
resolution and discourse/text structure recognition. Anaphora resolution is the replacing of words
such as pronouns which are semantically vacant with the appropriate entity to which they refer.
For example, newspaper articles can be deconstructed into discourse components such as: lead,
main story, previous events, evaluation etc. A discourse is a sequence of sentences. Discourse
has structure much like sentences do. Understanding discourse structure is extremely important
for dialog system.
PRAGMATIC PROCESSING
During this, what was said is re-interpreted on what it actually meant. It involves deriving those
aspects of language which require real world knowledge.
This level is concerned with the purposeful use of language in situations and utilizes context over
and above the contents of the text for understanding. The goal is to explain how extra meaning is
read into texts without actually being encoded in them. This requires much world knowledge
including the understanding of intentions, plans and goals. Some NLP applications may utilize
knowledge bases and inferencing modules. Pragmatic is the study of how more gets
communicated than is said. Speech acts in the pragmatic processing is the illocutionary force, the
communicative force of an utterance, resulting from the function associated with it.
Learning process is the basis of knowledge acquisition process. Knowledge acquisition is the
expanding the capabilities of a system or improving its performance at some specified task. So
we can say knowledge acquisition is the goal oriented creation and refinement of knowledge.
The acquired knowledge may consist of various facts, rules, concepts, procedures, heuristics,
formulas, relationships or any other useful information. Knowledge can be acquired from various
sources like, domain of interests, text books, technical papers, databases, reports. The terms of
increasing levels of abstraction, knowledge includes data, information and Meta knowledge.
Meta knowledge includes the ability to evaluate the knowledge available, the additional
knowledge required and the systematic implied by the present rules.
An agent is learning if it improves its performance on future tasks after making observations
about the world. Learning can range from the trivial, as exhibited by jotting down a phone
number, to the profound, as exhibited by Albert Einstein, who inferred a new theory of the
universe.
FORMS OF LEARNING
Factoring its representation of knowledge, AI learning models can be classified in two main
types: inductive and deductive.
INDUCTIVE LEARNING
This type of AI learning model is based on inferring a general rule from datasets of input-output
pairs.. Algorithms such as knowledge based inductive learning(KBIL) are a great example of this
type of AI learning technique. KBIL focused on finding inductive hypotheses on a dataset with
the help of background information.
DEDUCTIVE LEARNING
This type of AI learning technique starts with te series of rules nad infers new rules that are more
efficient in the context of a specific AI algorithm. Explanation-Based Learning(EBL) and
Relevance-0Based Learning(RBL) are examples examples o f deductive techniques. EBL
extracts general rules from examples by “generalizing” the explanation. RBL focuses on
identifying attributes and deductive generalizations from simple example.
UNSUPERVISED LEARNING
In unsupervised learning the agent learns patterns in the input even though no explicit feedback
is supplied. The most common unsupervised learning task is clustering: detecting potentially
useful clusters of input examples. For example, a taxi agent might gradually develop a concept of
“good traffic days” and “bad traffic days” without ever being given labeled examples of each by
a teacher.
SUPERVISED LEARNING
Supervised learning models use external feedback to learning functions that map inputs to
output observations. In those models the external environment acts as a “teacher” of the AI
algorithms.
In supervised learning the agent observes some example input–output pairs and learns a function
that maps from input to output.
SEMI-SUPERVISED LEARNING:
Semi-Supervised learning uses a set of curated, labeled data and tries to infer new
labels/attributes on new data sets. Semi-Supervised learning models are a solid middle ground
between supervised and unsupervised models.
In semi-supervised learning we are given a few labeled examples and must make what we can of
a large collection of unlabeled examples. Even the labels themselves may not be the oracular
truths that we hope for. Imagine that you are trying to build a system to guess a person’s age
from a photo. You gather some labeled examples by snapping pictures of people and asking their
age. That’s supervised learning. But in reality some of the people lied about their age. It’s not
just that there is random noise in the data; rather the inaccuracies are systematic, and to uncover
them is an unsupervised learning problem involving images, self-reported ages, and true
(unknown) ages. Thus, both noise and lack of labels create a continuum between supervised and
unsupervised learning.
REINFORCEMENT LEARNING
Reinforcement learning models use opposite dynamics such as rewards and punishment to
“reinforce” different types of knowledge. This type of learning technique is becoming really
popular in modern AI solutions.
Decision tree induction is one of the simplest and yet most successful forms of machine learning.
We first describe the representation—the hypothesis space—and then show how to learn a good
hypothesis.
Decision Tree. A decision tree represents a function that takes as input a vector of attribute
values and returns a “decision”—a single output value. The input and output values can be
discrete or continuous. For now we will concentrate on problems where the inputs have discrete
values and the output has exactly two possible values; this is Boolean classification, where each
example input will be classified as true (a positive example) or false (a negative example).
A decision tree reaches its decision by performing a sequence of tests. Each internal node in the
tree corresponds to a test of the value of one of the input attributes, A i, and the branches from the
node are labeled with the possible values of the attribute, Ai =vik. Each leaf node in the tree
specifies a value to be returned by the function. The decision tree representation is natural for
humans; indeed, many “How To” manuals (e.g., for car repair) are written entirely as a single
decision tree stretching over hundreds of pages.
Explanation based learning has ability to learn from a single training instance. Instead of taking
more examples the explanation based learning is emphasized to learn a single, specific example.
For example, consider the Ludoo game. In a Ludoo game, there are generally four colors of
buttons. For a single color there are four different squares. Suppose the colors are red, green,
blue and yellow. So maximum four members are possible for this game. Two members are
considered for one side (suppose green and red) and other two are considered for another side
(suppose blue and yellow). So for any one opponent the other will play his game. A square sized
small box marked by symbols one to six is circulated among the four members. The number one
is the lowest number and the number six is the highest for which all the operations are done.
Always any one from the 1st side will try to attack any one member in the 2nd side and vice
versa. At any instance of play the players of one side can attack towards the players of another
side. Likewise, all the buttons may be attacked and rejected one by one and finally one side will
Given a training example and a functional description, we want to build a general structural
description of a bucket. In practice, there are two reasons why the explanation based learning is
important.
1. Input Examples:
2. Domain Knowledge:
is (a, Deep) ∧ has part(a, b) ∧ is a(b, handle) → liftable(a)
has part(a, b) ∧ is a(b, Bottom) ∧ is(b, flat) → Stable(a)
has part(a, b) ∧ is a(b, Y) ∧ is(b, Upward - pointing) → Open –vessel(a)
3. Goal: Bucket
B is a bucket if B is a liftable, stable and open-vessel
4. Description of Concept: These are expressed in purely structural forms like Deep, Flat,
Rounded etc.
The prior knowledge Background concerns the relevance of a set of features to the goal
predicate. This knowledge, together with the observations, allows the agent to infer a new,
general rule that explains the observations:
This kind of generalization is known as relevance-based learning or RBL although the name is
not standard). Notice that whereas RBL does make use of the content of the observations, it does
not produce hypotheses that go beyond the logical content of the background knowledge and the
observations. It is a deductive form of learning and cannot by itself account for the creation of
new knowledge starting from scratch.
A neural network consists of inter connected processing elements called neurons that work
together to produce an output function. The output of a neural network relies on the cooperation
of the individual neurons within the network to operate. Well designed neural networks are
trainable systems that can often “learn” to solve complex problems from a set of exemplars and
generalize the “acquired knowledge” to solve unforeseen problems, i.e. they are self-adaptive
systems. A neural network is used to refer to a network of biological neurons. A neural network
Mathematically let I = (I1,I2,... ...In) represent the set of inputs presented to the unit U. Each input
has an associated weight that represents the strength of that particular connection. Let W =
(W1,W2,...... Wn) represent the weight vector corresponding to the input vector X. By applying to
V, these weighted inputs produce a net sum at U given by
(1) Artificial neural networks are extremely powerful computational devices (Universal
computers).
There are two Artificial Neural Network topologies − FeedForward and Feedback. Also
according to number of layers it can be classified as Single Layer Neural Network and Multi-
layer Neural Network.
FeedForward ANN. In this ANN, the information flow is unidirectional. A unit sends
information to other unit from which it does not receive any information. There are no feedback
loops. They are used in pattern generation/recognition/classification. They have fixed inputs and
outputs.
FeedBack ANN. Here, feedback loops are allowed. They are used in content addressable
memories.
Single Layer Neural Network. A single layer neural network consists of a set of units organized
in a layer. Each unit U; receives a weighted input I m with weight Wm;. Figure shows a single
layer neural network with j inputs and n outputs.
Let I = (i1, i2 ....... ij) be the input vector and let the activation function f be simply, so that the
activation
value is just the net sum to a unit. The j x n weight matrix is calculated as follows.
Multi-layer Neural Network. A multilayer network has two or more layers of units, with the
output from one layer serving as input to the next. Generally in a multilayer network there are 3
layers present like, input layer, output layer and hidden layer. The layer with no external output
connections are referred to as hidden layers. A multilayer neural network structure is given in
figure.
Any multilayer system with fixed weights that has a linear activation function is equivalent to a
single layer linear system, for example, the case of a two layer system. The input vector to the
first layer is Ir the output O = W1 * I and the second layer produces output O2 = W2 * O. Hence
O2 = W2 * (W1 * I) = (W2 * W1) * I
So a linear system with any number n of layers is equivalent to a single layer linear system
whose weight matrix is the product of the n intermediate weight matrices. A multilayer system
that is not linear can provide more computational capability than a single layer system. Generally
multilayer networks have proven to be very powerful than single layer neural network. Any type
of Boolean function can be implemented by such a network. At the output layer of a multilayer
neural network the output vector is compared to the expected output. If the difference is zero, no
changes are made to the weights of connections. If the difference is not zero, the error is
calculated and is propagated back through the network.
The main objective in neural model development is to find an optimal set ofweight parametersw,
such that y=y(x,w) closely represents (approximates)the original problem behavior. This is
achieved through a process called training(that is, optimization inw-space). A set of training data
is presented to theneural network. The training data are pairs of (xk,dk), k=1,2,...,P, where dk is
the desired outputs of the neural model for inputs xk, and P is the total number of training
samples.
During training, the neural network performance is evaluated by computing the difference
between actual neural network outputs and desired outputs for all the training samples. The
difference, also known as the error, is quantified by
GENETIC LEARNING
Genetic algorithms are based on the theory of natural selection and work on generating a set of
random solutions and making them compete in an area where only the fittest survive. Each
solution in the set is equivalent to a chromosome. Genetic algorithm learning methods are based
on models of natural adaption and evolution. These learning methods improve their performance
through processes which model population genetics and survival of the fittest. In the field of
genetics, a population is subjected to an environment which places demands on the members.
The members which adapt well are selected formatting and reproduction. Generally genetic
algorithm uses three basic genetic operators like reproduction, crossover and mutation. These are
combined together to evolve a new population. Starting from a random set of solutions the
algorithm uses these operators and the fitness function to guide its search for the optimal
solution. The fitness function guesses how good the solution in question is and provides a
measure to its capability. The genetic operators copy the mechanisms based on the principles of
human evolution. The main advantage of the genetic algorithm formulation is that fairly accurate
results may be obtained using a very simple algorithm. The genetic algorithm is a method of
finding a good answer to a problem, based on the feedback received from its repeated attempts at
a solution.
Genetic algorithms are inspired by nature and evolution, which is seriously cool to me. It's no
surprise, either, that artificial neural networks ("NN") are also modeled from biology: evolution
is the best general-purpose learning algorithm we've experienced, and the brain is the best
general-purpose problem solver we know. These are two very important pieces of our biological
existence, and also two rapidly growing fields of artificial intelligence and machine learning
study.
The most important applied area of AI is the field of expert systems. An expert system (ES) is a
knowledge-based system that employs knowledge about its application domain and uses an
inferencing (reason) procedure to solve problems that would otherwise require human
competence or expertise. The power of expert systems stems primarily from the specific
knowledge about a narrow domain stored in the expert system's knowledge base.
Expert systems are assistants to decision makers and not substitutes for them. Expert systems do
not have human capabilities. They use a knowledge base of a particular domain and bring that
knowledge to bear on the facts of the particular situation at hand. The knowledge base of an ES
also contains heuristic knowledge - rules of thumb used by human experts who work in the
domain.
In other terms, Expert system, a computer program that uses artificial-intelligence methods to
solve problems within a specialized domain that ordinarily requires human expertise. The first
expert system was developed in 1965 by Edward Feigenbaum and Joshua Lederberg of Stanford
University in California, U.S. Dendral, as their expert system was later known, was designed to
analyze chemical compounds. Expert systems now have commercial applications in fields as
diverse as medical diagnosis, petroleum engineering, and financial investing.
The knowledge base of an ES contains both factual and heuristic knowledge. Knowledge
representation is the method used to organize the knowledge in the knowledge base. Knowledge
bases must represent notions as actions to be taken under circumstances, causality, time,
dependencies, goals, and other higher-level concepts.
Several methods of knowledge representation can be drawn upon. Two of these methods include:
1. Frame-based systems - are employed for building very powerful ESs. A frame specifies the
attributes of a complex object and frames for various object types have specified relationships.
2. Production rules - are the most common method of knowledge representation used in
business. Rule-based expert systems are expert systems in which the knowledge is represented
by production rules.
A production rule, or simply a rule, consists of an IF part (a condition or premise) and a THEN
part (an action or conclusion). IF condition THEN action (conclusion). The explanation facility
explains how the system arrived at the recommendation. Depending on the tool used to
implement the expert system, the explanation may be either in a natural language or simply a
listing of rule numbers.
2. Directs the user interface to query the user for any information it needs for further inferencing.
The facts of the given case are entered into the working memory, which acts as a blackboard,
accumulating the knowledge about the case at hand. The inference engine repeatedly applies the
rules to the working memory, adding new information (obtained from the rules conclusions) to it,
until a goal state is produced or confirmed.
Forward-chaining systems are commonly used to solve more open-ended problems of a design or
planning nature, such as, for example, establishing the configuration of a complex product.
Backward chaining is best suited for applications in which the possible conclusions are limited in
number and well defined. Classification or diagnosis type systems, in which each of several
possible conclusions can be checked to see if it is supported by the data, are typical applications.
The most common form of architecture used in expert and other types of knowledge based
systems is the production system or it is called rule based systems. This type of system uses
knowledge encoded in the form of production rules i.e. if-then rules. The rule has a conditional
part on the left hand side and a conclusion or action part on the right hand side. For example if:
condition1 and condition2 and condition3
Then: Take action4
Each rule represents a small chunk of knowledge to the given domain of expertise. When the
known facts support the conditions in the rule’s left side, the conclusion or action part of the rule
is then accepted as known. The rule based architecture of an expert system consists of the
domain expert, knowledge engineer, inference engine, working memory, knowledge base,
external interfaces, user interface, explanation module, database spreadsheets executable
programs s mentioned in figure .
The ES shell simplifies the process of creating a knowledge base. It is the shell that actually
processes the information entered by a user relates it to the concepts contained in the knowledge
base and provides an assessment or solution for a particular problem. Thus ES shell provides a
layer between the user interface and the computer O.S to manage the input and output of the
data. It also manipulates the information provided by the user in conjunction with the knowledge
base to arrive at a particular conclusion.
Expert system shells are the most common vehicle for the development of specific ESs. A shell
is an expert system without a knowledge base. A shell furnishes the ES developer with the
inference engine, user interface, and the explanation and knowledge acquisition facilities.
Domain-specific shells are actually incomplete specific expert systems, which require much less
effort in order to field an actual system.
KNOWLEDGE ACQUISITION
Knowledge acquisition is the process of adding new knowledge to a knowledge base and refining
or otherwise improving knowledge that was previously acquired. Acquisition is usually
associated with some purpose such as expanding the capabilities of a system or improving its
performance at some specified task. It is goal oriented creation and refinement of knowledge . It
may consist of facts, rules , concepts, procedures, heuristics, formulas, relationships, statistics or
other useful information.
The knowledge acquisition component allows the expert to enter their knowledge orexpertise
into the expert system, and to refine it later as and whenrequired.Historically, the knowledge
engineer played a major role in this process, but automatedsystems that allow the expert to
interact directly with the system arebecomingincreasingly common.The knowledge acquisition
process is usually comprised of three principal stages:
1. Knowledge elicitation is the interaction between the expert and the knowledge
engineer/program to elicit the expert knowledge in some systematic way.
The iterative nature of the knowledge acquisition process can be represented in the following
diagram (five stages):
Identification: -break problem into parts.
Conceptualisation: identify concepts.
Formalisation: representing knowledge.
Implementation: programming.
Testing: validity of knowledge.