12 Computational Learning Theory
12 Computational Learning Theory
Carla P. Gomes
CS4700
Computational Learning Theory
Inductive learning:
given the training set, a learning algorithm generates a hypothesis.
Run hypothesis on the test set. The results say something about how good our
hypothesis is.
But how much do the results really tell you? Can we be certain about how the
learning algorithm generalizes?
Example:
We want to use height to distinguish men and women drawing people from the
same distribution for training and testing.
We can never be absolutely certain that we have learned correctly our target
(hidden) concept function. (E.g., there is a non-zero chance that, so far, we
have only seen a sequence of bad examples)
We’ll see that it’s generally highly unlikely to see a long series of bad examples!
Carla P. Gomes
CS4700
Aside: flipping a coin
Carla P. Gomes
CS4700
Experimental data
Carla P. Gomes
CS4700
Experimental Data Contd.
Underlying principle:
Carla P. Gomes
CS4700
Notations
Notations:
– X: set of all possible examples
– D: distribution from which examples are drawn
– H: set of all possible hypotheses
– N: the number of examples in the training set
– f: the true function to be learned
error(h) ≤ ε,
Goal:
I.e, chance of “bad” hypothesis, (high error but consistent with examples) is
small (i.e, less than )
Carla P. Gomes
CS4700
Approximately Correct
So for N examples, P(hb agrees with N examples) (1- ε )N. Carla P. Gomes
CS4700
Approximately Correct Hypothesis
Carla P. Gomes
CS4700
P(Hbad contains a consistent hypothesis) |Hbad|(1- ε )N |H|(1- ε )N
Goal –
Bound the probability of learning a bad hypothesis below some small
number .
The more accuracy (smaller ε), and
the more certainty (with smaller δ)
Note: one wants, the more examples one needs.
(The more accuracy (with smaller ε), and the more certainty desired (with smaller δ), the more
examples one needs.)
– An efficient learning algorithm
Theoretical results apply to fairly simple learning models (e.g., decision list learning)
Carla P. Gomes
CS4700
PAC Learning
Two steps:
Carla P. Gomes
CS4700
Example:
Boolean Functions
2n
Consider H the set of all Boolean function on n attributes | H |2
1 1
N (ln ln | H |) O(2 n )
So the sample complexity grows as 2n !
(same as the number of all possible examples)
Not PAC-Learnable!
So, any learning algorithm will do not better than a lookup table
if it merely returns a hypothesis that is consistent with all known
examples!
No
(a) (bc)
Y Y N
a=Patrons(x,Some) b=patrons(x,Full) c=Fri/Sat(x)
Note: if we allow arbitrarily many literals per test , decision list can express all Boolean functions.
Carla P. Gomes
CS4700
(a) (b) (d) (e) (f) (h) (i)
No Yes No Yes No Yes Yes No
a=Patrons(x,None) b=Patrons(x,Some)
d=Hungry(x)
Decision Lists with limited expressiveness (K-DL) – at most k literals per test
(a) (bc)
2-DL Y Y N
For fixed k literals, the number of examples needed for PAC learning a
K-DL function is polynomial in the number of attributes n.
:
A conjunct (or test) can appear in the list as: Yes, No, absent from list
So we have at most 3 |Conj(n,k)| different K-DL lists (ignoring order)
But the order of the tests (or conjuncts) in a list matters.
1 1
N (ln ln | H |)
1 1
N (ln O(n k log 2 (n k )))
For fixed k literals, the number of examples needed for PAC learning a
K-DL function is polynomial in the number of attributes n, !
:
2 – Efficient learning algorithm – a decision list of length k can be learned in
polynomial time.
repeatedly finds a test that agrees with some subset of the training set;
adds test to the decision list under construction and removes the
corresponding examples.
uses the remaining examples, until there are no examples left, for constructing
the rest of the decision list.
Carla P. Gomes
CS4700
Decision-List-Learning Algorithm
Restaurant data.
Carla P. Gomes
CS4700
Examples
Carla P. Gomes
CS4700
Probably Approximately Correct Learning
(PAC)Learning (summary)
Carla P. Gomes
CS4700
Discussion
Word of caution:
PAC learning results worst case complexity results.
Carla P. Gomes
CS4700
Sample Complexity for Infinite Hypothesis
Spaces I: VC-Dimension
30 Carla P. Gomes
Nathalie Japkowicz
CS4700
VC Dimension: Example
Carla P. Gomes
CS4700
Sample Complexity for Infinite Hypothesis
Spaces I: VC-Dimension
32 Carla P. Gomes
Nathalie Japkowicz
CS4700
VC Dimension: Example 2
33 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?
34 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?
36 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?
(2) Can we give an efficient algorithm ?
37 Carla P. Gomes
whesse@clarkson.edu CS4700
The Mistake Bound Model of Learning
38 Carla P. Gomes
Nathalie Japkowicz
CS4700
Optimal Mistake Bounds
39 Carla P. Gomes
Nathalie Japkowicz
CS4700