Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
Learning
2
Forms of learning
• Any component of an agent can be improved by learning from
data. The improvements, and the techniques used to make them,
depend on four major factors:
– Which component is to be improved.
– What prior knowledge the agent already has.
– What representation is used for the data and the component.
– What feedback is available to learn from.
3
Agent component to be improved.
– A direct mapping from conditions on the current state to actions
– A means to infer relevant properties of the world from the percept sequence
– Information about the way the world evolves and about the results of
possible actions the agent can take
– Utility information indicating the desirably of world states
– Action-value information indicating the desirably of actions
– Goals that describe classes of states whose achievement maximizes the
agent’s utility
4
Example:
• An agent training to be come a taxi driver.
– Every time the instructor shouts “Brake!” the agent might learn a condition– action
rule for when to brake (component 1); the agent also learns every time the instructor
does not shout.
– By seeing many camera images that it is told contain buses, it can learn to recognize
them (2).
– By trying actions and observing the results—for example, braking hard on a wet
road—it can learn the effects of its actions (3).
– Then, when it receives no tip from passengers who have been thoroughly shaken up
during the trip, it can learn a useful component of its overall utility function (4).
5
Learning from Observations
• Supervised Learning – learn a function from a set of training examples which are pre-
classified feature vectors.
– Data – instantiations of some or all of the random variables describing the domain;
they are evidence
– Hypotheses – probabilistic theories of how the domain works
feature vector class
(shape,color) (circle, green) ?
(square, red) I (triangle, blue)?
(square, blue) I
(circle, red) II
(circle blue) II
(triangle, red) I
Given a previously unseen feature vector, what is
(triangle, green) I the rule that tells us if it is in class I or class II?
(ellipse, blue) II
(ellipse, red) II
6
Learning from Observations
• Unsupervised Learning – No classes are given. The idea is to find
patterns in the data. This generally involves clustering.
8
Bayesian Learning
• Bayesian learning simply calculates the probability of the
hypothesis and it makes predictions on that basis.
• That is, the predictions are made by using all the hypotheses
• The probability of each hypothesis is obtained by Bayes’ rule.
9
Bayes’ Rule
• This simple equation underlies most modern AI systems for probabilistic inference.
P(X | h) P(h)
P(h | X) = -----------------
P(X) Often assumed
constant and
left out.
• h is the hypothesis (such as the class).
• X is the feature vector to be classified.
• P(X | h) is the prior probability that this feature vector occurs, given that h is true.
• P(h) is the prior probability of hypothesis h.
• P(X) = the prior probability of the feature vector X. 10
Example
• Say that you have this (tiny) dataset that classifies animals into two classes:
cat and dog.
• probability of the example being a cat, given that hair color is black, body length is 18
inches, height is 9.2, weight is 8.1 lb, …
• The conditional probability is, generically, P(class | feature set). In our example, classes =
{cat, dog} and feature set = {hair color, body length, height, weight, ear length, claws}.
11
Choosing Hypothesis
12
Cancer Test Example
• Does patient have cancer or not?
– A patient takes a lab test and the result comes back positive. The test returns a correct
positive result in only 98% of the cases in which the disease is actually present, and a
correct negative result in only 97% of the cases in which the disease is not present.
Furthermore, .008 of the entire population have this cancer.
• Doesn’t have clear decision rules like decision trees, but highly
successful in many different applications. (e.g. face detection)
• Knowledge is represented in numeric form
14
Biological Neuron
• NET = X1W1+X2W2+....+XnWn
• f (NET)= Out
=0 if NET < T
Activation functions
• Activation functions are mathematical equations that determine the output of
a neural network.
Architectures of NN
What do we mean by architecture of NN?
• Way in which neurons are connected to together
Feed Forward NN Recurrent NN Symmetrically
connected NN
Feed-forward example
Perceptron
• The perceptron(or single-layer perceptron) is the simplest model of a
neuron that illustrates how a neural network works.
• The perceptron is a machine learning algorithm developed in 1957 by
Frank Rosenblatt and first implemented in IBM 704.
20
How the Perceptron Works
• Example:
– The perceptron has three inputs x1, x2 and x3 and one output.
• Since the output of the perceptron could be either 0 or 1, this perceptron is an example
of binary classifier.
The Formula
Let’s write out the formula that joins the inputs and the weights together to produce the output
Output = w1x1 + w2x2 + w3x3
21
END
22