0% found this document useful (0 votes)

18 views38 pages

12 Computational Learning Theory

The document discusses Computational Learning Theory, focusing on Inductive Learning and the concept of Probably Approximately Correct (PAC) Learning. It explains how learning algorithms can generate hypotheses from training sets and the importance of sample complexity in ensuring these hypotheses are approximately correct with high probability. The document also introduces the VC-dimension as a measure for sample complexity in infinite hypothesis spaces.

Uploaded by

mnvrkrishnapriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views38 pages

12 Computational Learning Theory

Uploaded by

mnvrkrishnapriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 38

Computational Learning Theory

Slides by Carla P. Gomes and Nathalie Japkowicz

(Reading: R&N AIMA 3rd ed., Chapter 18.5)

Carla P. Gomes
CS4700
Computational Learning Theory

Inductive learning:
given the training set, a learning algorithm generates a hypothesis.

Run hypothesis on the test set. The results say something about how good our
hypothesis is.

But how much do the results really tell you? Can we be certain about how the
learning algorithm generalizes?

We would have to see all the examples.

Insight: introduce probabilities to measure degree of
certainty and correctness (Valiant 1984).
Carla P. Gomes
CS4700
Computational Learning Theory

Example:

We want to use height to distinguish men and women drawing people from the
same distribution for training and testing.

We can never be absolutely certain that we have learned correctly our target
(hidden) concept function. (E.g., there is a non-zero chance that, so far, we
have only seen a sequence of bad examples)

E.g., relatively tall women and relatively short men…

We’ll see that it’s generally highly unlikely to see a long series of bad examples!

Carla P. Gomes
CS4700
Aside: flipping a coin

Carla P. Gomes
CS4700
Experimental data

C program – simulation of flips of a fair coin:

Carla P. Gomes
CS4700
Experimental Data Contd.

With a sufficient number of flips

(set of flips=example of coin bias),
large outliers become quite rare.
Coin example is the key to computational learning theory!
Carla P. Gomes
CS4700
Computational Learning Theory

Intersection of AI, statistics, and theory of computation.

Introduce Probably Approximately Correct Learning concerning

efficient learning

For our learning procedures we would like to prove that:

With high probability an (efficient) learning algorithm will find a

hypothesis that is approximately identical to the hidden target concept.

Note the double “hedging” – probably and approximately.

Why do we need both levels of uncertainty (in general)?

Carla P. Gomes
CS4700
Probably Approximately
Correct Learning

Underlying principle:

Seriously wrong hypotheses can be found out almost certainly

(with high probability) using a “small” number of examples

– Any hypothesis that is consistent with a significantly large

set of training examples is unlikely to be seriously wrong: it
must be probably approximately correct.

– Any (efficient) algorithm that returns hypotheses that are

PAC is called a PAC-learning algorithm
Carla P. Gomes
CS4700
Probably Approximately
Correct Learning

How many examples are needed to guarantee correctness?

– Sample complexity (# of examples to “guarantee”

correctness) grows with the size of the Hypothesis space

– Stationarity assumption: Training set and test sets are drawn

from the same distribution

Carla P. Gomes
CS4700
Notations

Notations:
– X: set of all possible examples
– D: distribution from which examples are drawn
– H: set of all possible hypotheses
– N: the number of examples in the training set
– f: the true function to be learned

Assume: the true function f is in H.

Error of a hypothesis h wrt f :

Probability that h differs from f on a randomly picked example:

error(h) = P(h(x) ≠ f(x)| x drawn from D)

Exactly what we are trying to measure with our test set. Carla P. Gomes
CS4700
Approximately Correct

A hypothesis h is approximately correct if:

error(h) ≤ ε,

where ε is a given threshold, a small constant

Goal:

Show that after seeing a small (poly) number of examples N, with

high probability, all consistent hypotheses will be approximately correct.

I.e, chance of “bad” hypothesis, (high error but consistent with examples) is
small (i.e, less than )

Carla P. Gomes
CS4700
Approximately Correct

Approximately correct hypotheses lie inside

the ε -ball around f;
Those hypotheses that are seriously wrong (hb 
HBad) are outside the ε -ball,

Error(hbad)= P(hb(x) ≠ f(x)| x drawn from D) > ε,

Thus the probability that the hbad (a seriously wrong

hypothesis) disagrees with one example is at least ε
(definition of error).
Thus the probability that the hbad (a seriously wrong hypothesis) agrees with one
example is no more than (1- ε).

So for N examples, P(hb agrees with N examples)  (1- ε )N. Carla P. Gomes
CS4700
Approximately Correct Hypothesis

The probability that HBad contains at least one consistent hypothesis is

bounded by the sum of the individual probabilities.

P(Hbad contains a consistent hypothesis, agreeing with all the examples)

 |Hbad|(1- ε )N  |H|(1- ε )N

hbad agrees with one example is no more than (1- ε).

Carla P. Gomes
CS4700
P(Hbad contains a consistent hypothesis)  |Hbad|(1- ε )N  |H|(1- ε )N
Goal –
Bound the probability of learning a bad hypothesis below some small
number .
The more accuracy (smaller ε), and
the more certainty (with smaller δ)
Note: one wants, the more examples one needs.

P(Hbad contains a consistent hypothesis) ≤

What is the probability
Sample Complexity: Number of examples to
P(Hgood) of learning a
guarantee a PAC learnable function class
good hypothesis? If the learning algorithm returns a
hypothesis that is consistent with this many
How large should N be? examples, then with probability at least (1-) the
learning algorithm has an error of at most ε.
Derivation: see blackboard and the hypothesis is
Probably Approximately Correct.
Probably Approximately correct hypothesis h:
– If the probability of a small error (error(h) ≤ ε ) is greater than or equal to
a given threshold 1 - δ
– A bound on the number of examples (sample complexity) needed to
guarantee PAC, that is polynomial

(The more accuracy (with smaller ε), and the more certainty desired (with smaller δ), the more
examples one needs.)
– An efficient learning algorithm

Theoretical results apply to fairly simple learning models (e.g., decision list learning)

Carla P. Gomes
CS4700
PAC Learning

Two steps:

Sample complexity – a polynomial number of examples suffices to specify a good consistent

hypothesis (error(h) ≤ ε ) with high probability ( 1 – δ).

Computational complexity – there is an efficient algorithm for learning a consistent

hypothesis from the small sample.

Let’s be more specific with examples.

Carla P. Gomes
CS4700
Example:
Boolean Functions

2n
Consider H the set of all Boolean function on n attributes | H |2
1 1
N  (ln  ln | H |) O(2 n )
 
So the sample complexity grows as 2n !
(same as the number of all possible examples)
Not PAC-Learnable!
So, any learning algorithm will do not better than a lookup table
if it merely returns a hypothesis that is consistent with all known
examples!

Intuitively what does it say about H?

Finite H required!
Carla P. Gomes
CS4700
Coping With Learning Complexity

1. Force learning algorithm to look for smallest consistent hypothesis.

We considered that for Decision Tree Learning, often worst case

intractable though.
.

2. Restrict size of hypothesis space.

e.g., Decision Lists  restricted form of Boolean Functions:
Hypotheses correspond to a series of tests, each of which a
conjunction of literals

Good news: only a poly size number of examples

is required for guaranteeing PAC learning K-DL functions
and there are efficient algorithms for learning K-DLCarla P. Gomes
CS4700
Decision Lists

Resemble Decision Trees, but with simpler structure:

Series of tests, each test a conjunction of literals;
If a test succeeds, decision list specifies value to return;
If test fails, processing continues with the next test in the list.

(a) (bc)
Y Y N
a=Patrons(x,Some) b=patrons(x,Full) c=Fri/Sat(x)
Note: if we allow arbitrarily many literals per test , decision list can express all Boolean functions.
Carla P. Gomes
CS4700
(a) (b) (d) (e) (f) (h) (i)
No Yes No Yes No Yes Yes No

a=Patrons(x,None) b=Patrons(x,Some)
d=Hungry(x)

e=Type(x,French) f=Type(x,Italian) g=Type(x,Thai) h=Type(x,Burger)

Carla P. Gomes
i=Fri/Sat(x) CS4700
K Decision Lists

Decision Lists with limited expressiveness (K-DL) – at most k literals per test

(a) (bc)
2-DL Y Y N

K-DL is PAC learnable!!!

For fixed k literals, the number of examples needed for PAC learning a
K-DL function is polynomial in the number of attributes n.
:

There are efficient algorithms for learning K-DL functions.

So how do we show K-DL is PAC-learnable? Carla P. Gomes

CS4700
2-DL
(x) (y) (wv) (ub) K Decision Lists:
No Yes No Yes No Sample Complexity

1 1 What’s the size of the hypothesis space H,

N  (ln  ln | H |)
  i.e, |K-DL(n)|?
K-Decision Lists  set of tests: each test is a conjunct of at most k literals
How many possible tests (conjuncts) of length at most k, given n literals, conj(n,k)?

| Conj (n, k ) | 2n  ( 22n ) ( 23n )  ( 2kn ) O(n k )

A conjunct (or test) can appear in the list as: Yes, No, absent from list
So we have at most 3 |Conj(n,k)| different K-DL lists (ignoring order)
But the order of the tests (or conjuncts) in a list matters.

|k-DL(n)| ≤ 3 |Conj(n,k)| |Conj(n,k)|!

Carla P. Gomes
CS4700
After some work, we get (useful exercise!; try mathematica or maple)
O ( n k log 2 ( n k ))
| K  DL(n) |2
1 - Sample Complexity of K-DL is: Recall sample complexity formula

1 1
N  (ln  ln | H |)
1 1  
N  (ln  O(n k log 2 (n k )))
 
For fixed k literals, the number of examples needed for PAC learning a
K-DL function is polynomial in the number of attributes n, !
:
2 – Efficient learning algorithm – a decision list of length k can be learned in
polynomial time.

So K-DL is PAC learnable!!!

Carla P. Gomes
CS4700
Decision-List-Learning Algorithm

Greedy algorithm for learning decisions lists:

repeatedly finds a test that agrees with some subset of the training set;

adds test to the decision list under construction and removes the
corresponding examples.

uses the remaining examples, until there are no examples left, for constructing
the rest of the decision list.

(see R&N, page 672. for details on algorithm).

Carla P. Gomes
CS4700
Decision-List-Learning Algorithm

Restaurant data.
Carla P. Gomes
CS4700
Examples

1. H space of Boolean functions

Not PAC Learnable, hypothesis space too big: need too many examples
(sample complexity not polynomial)!
2. K-DL
PAC learnable
3. Conjunction of literals
PAC learnable

Carla P. Gomes
CS4700
Probably Approximately Correct Learning
(PAC)Learning (summary)

A class of functions is said to be PAC-learnable if there exists an efficient

learning algorithm such that for all functions in the class, and for all
probability distributions on the function's domain, and for any values of
epsilon and delta (0 < epsilon, delta <1), using a polynomial number of
examples, the algorithm will produce a hypothesis whose error is smaller
than  with probability at least .
The error of a hypothesis is the probability that it will differ from the target function on a
random element from its domain, drawn according to the given probability distribution.

Basically, this means that:

• there is some way to learn efficiently a "pretty good“ approximation of the target function.
• the probability is as big as you like that the error is as small as you like.
(Of course, the tighter you make the bounds, the harder the learning algorithm is likely to have to work).

Carla P. Gomes
CS4700
Discussion

Computational Learning Theory studies the tradeoffs between the

expressiveness of the hypothesis language and the complexity of learning

Probably Approximately Correct learning concerns efficient learning

Sample complexity --- polynomial number of examples

Efficient Learning Algorithm

Word of caution:
PAC learning results  worst case complexity results.
Carla P. Gomes
CS4700
Sample Complexity for Infinite Hypothesis
Spaces I: VC-Dimension

• The PAC Learning framework has 2 disadvantages:

– It can lead to weak bounds
– Sample Complexity bound cannot be established for infinite hypothesis
spaces

• We introduce new ideas for dealing with these problems:

– A set of instances S is shattered by hypothesis space H iff for every
dichotomy of S there exists some hypothesis in H consistent with this
dichotomy.

30 Carla P. Gomes
Nathalie Japkowicz
CS4700
VC Dimension: Example

Carla P. Gomes
CS4700
Sample Complexity for Infinite Hypothesis
Spaces I: VC-Dimension

The Vapnik-Chervonenkis dimension, VC(H),

of hypothesis space H defined over instance space X
is the size of the largest finite subset of X shattered by
H.

If arbitrarily large finite sets of X can

be shattered by H, then VC(H)=

32 Carla P. Gomes
Nathalie Japkowicz
CS4700
VC Dimension: Example 2

• H = Axis parallel rectangles in R2

• What is the VC dimension of H
• Can we PAC learn?

33 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?

• Some four instances (points on the rectangle) can be shattered

Shows that VC(H)>=4

34 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?

• But, no five instances can be shattered

• Two points must share a line, and if we take 4 points

from different lines, there is no rectangle that separates
the 4 points from the remaining one. Therefore VC(H) = 4
35 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?
(2) Can we give an efficient algorithm ?

36 Carla P. Gomes
whesse@clarkson.edu CS4700
Learning Rectangles
• Consider axis parallel rectangles in the real plane
• Can we PAC learn it ?
(1) What is the VC dimension ?
(2) Can we give an efficient algorithm ?

Find the smallest rectangle that

contains the positive examples
(necessarily, it will not contain any
negative example, and the hypothesis
is consistent.

Axis parallel rectangles are efficiently PAC learnable.

37 Carla P. Gomes
whesse@clarkson.edu CS4700
The Mistake Bound Model of Learning

• The Mistake Bound framework is different from the

PAC framework as it considers learners that receive a
sequence of training examples and that predict, upon
receiving each example, what its target value is.
• The question asked in this setting is: “How many
mistakes will the learner make in its predictions before
it learns the target concept?”
• This question is significant in practical settings where
learning must be done while the system is in actual use.

38 Carla P. Gomes
Nathalie Japkowicz
CS4700
Optimal Mistake Bounds

• Definition: Let C be an arbitrary nonempty concept class. The optimal

mistake bound for C, denoted Opt(C), is the minimum over all
possible learning algorithms A of MA(C). Opt(C)=minALearning_Algorithms
MA(C)
• Proposition: For any concept class C, the optimal mistake bound is
bound as follows:
VC(C)  Opt(C)  log2(|C|)

39 Carla P. Gomes
Nathalie Japkowicz
CS4700

ML 3
No ratings yet
ML 3
36 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
ML Notes
No ratings yet
ML Notes
161 pages
PAC Bayesian Learning Introduction
No ratings yet
PAC Bayesian Learning Introduction
124 pages
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
100% (1)
Solutions For Exercises in Foundations of Machine Learning, 2nd Edition - Mohri & Rostamizadeh
5 pages
05 LecColt
No ratings yet
05 LecColt
73 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Lecture22 s12
No ratings yet
Lecture22 s12
21 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
100% (1)
Foundations of Machine Learning, Second Edition (2nd Ed) (Instructor Res. N. 1 of 3, Solution Manual, Solutions) (201
61 pages
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
No ratings yet
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
195 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Computational Learning Theorem
No ratings yet
Computational Learning Theorem
91 pages
SCIEX QTRAP 5500 System Specification
No ratings yet
SCIEX QTRAP 5500 System Specification
13 pages
Computational Learning
No ratings yet
Computational Learning
12 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
How To Run Celeb On Twitter
No ratings yet
How To Run Celeb On Twitter
7 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
MachineLearning - UNIT III
No ratings yet
MachineLearning - UNIT III
30 pages
Machine Leaning 3
No ratings yet
Machine Leaning 3
44 pages
Csup AL
No ratings yet
Csup AL
5 pages
ML - Unit 4
No ratings yet
ML - Unit 4
15 pages
Module 3
No ratings yet
Module 3
70 pages
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
No ratings yet
Introduction To Machine Learning (67577) : Shai Shalev-Shwartz
124 pages
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
No ratings yet
WINSEM2021-22 CSE4020 ETH VL2021220501968 Reference Material I 22-01-2022 PAC Learning
34 pages
Poa Sba
100% (2)
Poa Sba
14 pages
Unit 3
No ratings yet
Unit 3
99 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
No ratings yet
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
44 pages
Unit-2 Pac
No ratings yet
Unit-2 Pac
22 pages
SML Lecture2
No ratings yet
SML Lecture2
35 pages
PSO
No ratings yet
PSO
74 pages
Week 3
No ratings yet
Week 3
56 pages
Lec 37 ML
No ratings yet
Lec 37 ML
8 pages
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
No ratings yet
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
8 pages
Napolitano Amri 200912 PHD
No ratings yet
Napolitano Amri 200912 PHD
235 pages
MLSM Lecture2 120923
No ratings yet
MLSM Lecture2 120923
35 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
Tutorial
No ratings yet
Tutorial
81 pages
Bayesian
No ratings yet
Bayesian
91 pages
PAC Learning Detailed
No ratings yet
PAC Learning Detailed
2 pages
Lecture 01
No ratings yet
Lecture 01
11 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
AIML Module-03
No ratings yet
AIML Module-03
40 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Jdavis Indlearn2
No ratings yet
Jdavis Indlearn2
91 pages
Lecture2 DT
No ratings yet
Lecture2 DT
89 pages
Unit-2 Dbms To Follow
No ratings yet
Unit-2 Dbms To Follow
37 pages
Rules
No ratings yet
Rules
84 pages
ML 3
No ratings yet
ML 3
45 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
pp28
No ratings yet
pp28
46 pages
ch1011 Tree Hash Index Csi3317
No ratings yet
ch1011 Tree Hash Index Csi3317
42 pages
7 - M - Com - CA Syllabus (2017-18)
No ratings yet
7 - M - Com - CA Syllabus (2017-18)
32 pages
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
No ratings yet
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
6 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Chapter 10
No ratings yet
Chapter 10
25 pages
Muda Mura Muri
No ratings yet
Muda Mura Muri
11 pages
Exp1 Se Lab
No ratings yet
Exp1 Se Lab
14 pages
UCO Bank Statement Sample Format
No ratings yet
UCO Bank Statement Sample Format
5 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
Wad Exp
No ratings yet
Wad Exp
7 pages
Machine Learning - Computational Learning Theory PDF
No ratings yet
Machine Learning - Computational Learning Theory PDF
7 pages
NIBDocument NIB16
No ratings yet
NIBDocument NIB16
92 pages
Unit Iii
No ratings yet
Unit Iii
6 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
Se Schema Iii-I
No ratings yet
Se Schema Iii-I
6 pages
Lec04 BayesianLearning
No ratings yet
Lec04 BayesianLearning
39 pages
Assignment: Citibank: Performance Evaluation
No ratings yet
Assignment: Citibank: Performance Evaluation
17 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
DBMS-UNIT-5 R16 (Ref-2)
No ratings yet
DBMS-UNIT-5 R16 (Ref-2)
9 pages
Machine Learning Learning
No ratings yet
Machine Learning Learning
35 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
5 - Pile Contact Safety Switch
No ratings yet
5 - Pile Contact Safety Switch
1 page
User Manual
No ratings yet
User Manual
128 pages
Analog PPT 2
No ratings yet
Analog PPT 2
86 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Servers
No ratings yet
Servers
4 pages
Direcpeciallfbi Po Prelims
No ratings yet
Direcpeciallfbi Po Prelims
20 pages
E610-Dtu (433c30) e+User+Manual en v1.0
No ratings yet
E610-Dtu (433c30) e+User+Manual en v1.0
48 pages
WPC MP
No ratings yet
WPC MP
19 pages
Annexure-V-Blooms Taxonomy
No ratings yet
Annexure-V-Blooms Taxonomy
1 page
EE3402 LIC Notes QUESTION BANK - by WWW - Notesfree.in
No ratings yet
EE3402 LIC Notes QUESTION BANK - by WWW - Notesfree.in
9 pages
Walmart's (Key Success Factors)
No ratings yet
Walmart's (Key Success Factors)
4 pages
MiniWave Manual
No ratings yet
MiniWave Manual
16 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
YATO Konteyner 9
No ratings yet
YATO Konteyner 9
8 pages
PDF 132821 67441
No ratings yet
PDF 132821 67441
10 pages
6SL3210-5HE12-0UF0 Datasheet en
No ratings yet
6SL3210-5HE12-0UF0 Datasheet en
2 pages
Ramesh 02 Mar 2025
No ratings yet
Ramesh 02 Mar 2025
1 page
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
No ratings yet
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
6 pages
Sify Safescrypt
No ratings yet
Sify Safescrypt
1 page
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
No ratings yet
A Comparative Performance Analysis of ANN Algorithms For MPPT Energy Harvesting in Solar PV System
16 pages
Files and Data Structures
No ratings yet
Files and Data Structures
3 pages
Lenovo IdeaPad Flex 5 14 2-In-1 Touchscreen Lapt
No ratings yet
Lenovo IdeaPad Flex 5 14 2-In-1 Touchscreen Lapt
1 page
Daily Water Station Check List
No ratings yet
Daily Water Station Check List
1 page
TH460 Service Report 023832
No ratings yet
TH460 Service Report 023832
1 page
Sata SSD 2.5 Inch
No ratings yet
Sata SSD 2.5 Inch
2 pages
Leni Andriani - 1.0.1.2 Class Activity - Top Hacker Shows Us How It Is Done
No ratings yet
Leni Andriani - 1.0.1.2 Class Activity - Top Hacker Shows Us How It Is Done
2 pages
Set-Theoretic Paradoxes and their Resolution in Z-F
From Everand
Set-Theoretic Paradoxes and their Resolution in Z-F
Samuel Horelick
4.5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

12 Computational Learning Theory

Uploaded by

12 Computational Learning Theory

Uploaded by

Computational Learning Theory

Slides by Carla P. Gomes and Nathalie Japkowicz

(Reading: R&N AIMA 3rd ed., Chapter 18.5)

We would have to see all the examples.

E.g., relatively tall women and relatively short men…

C program – simulation of flips of a fair coin:

With a sufficient number of flips

Intersection of AI, statistics, and theory of computation.

Introduce Probably Approximately Correct Learning concerning

For our learning procedures we would like to prove that:

With high probability an (efficient) learning algorithm will find a

Note the double “hedging” – probably and approximately.

Why do we need both levels of uncertainty (in general)?

Seriously wrong hypotheses can be found out almost certainly

– Any hypothesis that is consistent with a significantly large

– Any (efficient) algorithm that returns hypotheses that are

How many examples are needed to guarantee correctness?

– Sample complexity (# of examples to “guarantee”

– Stationarity assumption: Training set and test sets are drawn

Assume: the true function f is in H.

Error of a hypothesis h wrt f :

Probability that h differs from f on a randomly picked example:

error(h) = P(h(x) ≠ f(x)| x drawn from D)

A hypothesis h is approximately correct if:

where ε is a given threshold, a small constant

Show that after seeing a small (poly) number of examples N, with

Approximately correct hypotheses lie inside

Error(hbad)= P(hb(x) ≠ f(x)| x drawn from D) > ε,

Thus the probability that the hbad (a seriously wrong

The probability that HBad contains at least one consistent hypothesis is

P(Hbad contains a consistent hypothesis, agreeing with all the examples)

hbad agrees with one example is no more than (1- ε).

P(Hbad contains a consistent hypothesis) ≤

Sample complexity – a polynomial number of examples suffices to specify a good consistent

Computational complexity – there is an efficient algorithm for learning a consistent

Let’s be more specific with examples.

Intuitively what does it say about H?

1. Force learning algorithm to look for smallest consistent hypothesis.

We considered that for Decision Tree Learning, often worst case

2. Restrict size of hypothesis space.

Good news: only a poly size number of examples

Resemble Decision Trees, but with simpler structure:

e=Type(x,French) f=Type(x,Italian) g=Type(x,Thai) h=Type(x,Burger)

K-DL is PAC learnable!!!

There are efficient algorithms for learning K-DL functions.

So how do we show K-DL is PAC-learnable? Carla P. Gomes

1 1 What’s the size of the hypothesis space H,

| Conj (n, k ) | 2n  ( 22n ) ( 23n )  ( 2kn ) O(n k )

|k-DL(n)| ≤ 3 |Conj(n,k)| |Conj(n,k)|!

So K-DL is PAC learnable!!!

Greedy algorithm for learning decisions lists:

(see R&N, page 672. for details on algorithm).

1. H space of Boolean functions

A class of functions is said to be PAC-learnable if there exists an efficient

Basically, this means that:

Computational Learning Theory studies the tradeoffs between the

Probably Approximately Correct learning concerns efficient learning

Sample complexity --- polynomial number of examples

• The PAC Learning framework has 2 disadvantages:

• We introduce new ideas for dealing with these problems:

The Vapnik-Chervonenkis dimension, VC(H),

If arbitrarily large finite sets of X can

• H = Axis parallel rectangles in R2

• Some four instances (points on the rectangle) can be shattered

Shows that VC(H)>=4

• But, no five instances can be shattered

• Two points must share a line, and if we take 4 points

Find the smallest rectangle that

Axis parallel rectangles are efficiently PAC learnable.

• The Mistake Bound framework is different from the

• Definition: Let C be an arbitrary nonempty concept class. The optimal

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.