0% found this document useful (0 votes)

93 views8 pages

Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms

- The document discusses learning decision trees and DNFs (disjunctive normal forms) from examples. - It shows that decision trees of depth d are exactly learnable in polynomial time, and size-s decision trees are approximately learnable. - It also shows that DNFs of width w are approximately concentrated on Fourier degrees up to O(wlog(1/ε)), allowing them to be learned.

Uploaded by

Lokesh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views8 pages

Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms

Uploaded by

Lokesh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Analysis of Boolean Functions (CMU 18-859S, Spring 2007)

Lecture 9: Learning Decision Trees and DNFs

Feb. 18, 2007
Lecturer: Ryan ODonnell Scribe: Suresh Purini
1 Two Important Learning Algorithms
We recall the following denition and two important learning algorithms discussed in previous
lecture.
Denition 1.1 Given a collection S of subsets of [n], we say f : {1, 1}
n
Rhas -concentration
on S, if

S/ S

f(S)
2
.
Theorem 1.2 Let C be a class of n-bit functions, such that f C, f is -concentrated on S =
{S [n]| |S| d}, then the function class C is learnable under the uniform distribution to an
accuracy of O(), with a probability of at least 1, in time poly(|S|, 1/)poly(n) log (1/) using
random examples only.
This algorithmis called LowDegree algorithmand was proposed by Linial, Mansour and Nisan
in [3]. Refer theorem 5.4 in lecture notes 8.
Theorem 1.3 Let C be a class of n-bit functions, such that f C, f is -concentrated on some
collection S. Then the function class C is learnable using membership queries (Goldreich-Levin
Algorithm) in poly(|S|, 1/)poly(n) log (1/) time.
This algorithmis called Kushilevitz-Mansour algorithm[2]. Refer corollary 5.5 in lecture notes
8.
2 Learning Decision Trees
A decision tree is a binary tree in which the internal nodes are labeled with variables and the leafs
are labeled with either 1 or +1. And the left and right edges corresponding to any internal node
is labeled 1 and +1 respectively. We can think of the decision tree as dening a boolean function
in the natural obvious way. For example, the decision tree in the gure 1 denes a boolean function
whose DNF formula is x
1
x
2
x
3
+ x
1
x
2
x
4
+ x
1
x
2
.
Note that, given any boolean function we can come up with a corresponding decision tree.
Let P be a path in the decision tree. An example of a path in the gure 1 is P = (x
1
=
1, x
2
= +1, x
4
= 1).
1
Figure 1:
Let 1
P
: {1, 1}
n
{0, 1} be an indicator function for path P. For example,
1
P
=
_
1 if x
1
= 1, x
2
= +1, x
4
= 1
0 else
Observation 2.1 A boolean function f can be expressed in terms of path functions 1
P
s, corre-
sponding to various paths in the decision tree of the function f as follows
f(x) =

Paths P
1
P
(x)f(P)
where f(P) is the label on the leaf when the function f takes the path P in its decision tree.
Observation 2.2 Let V be the set of variables occurring in a path function 1
P
and d be the cardi-
nality of the set V . Then the Fourier expansion of 1
P
looks like

SV
2
d
X
S
.
It is easy to see the proof of the above observation by noting that the Fourier expansion for the
path function 1
P
, when P = (x
1
= 1, x
2
= +1, x
4
= 1), is 1
P
= x
1
x
2
x
4
= (
1
2

1
2
x
1
)(
1
2
+
1
2
x
2
)(
1
2

1
2
x
4
).
Proposition 2.3 If f : {1, 1}
n
{1, 1} is computable by a depth-d decision tree then
1. Fourier expansion of f has degree at most d i.e.,

|S|>d

f(S)
2
= 0.
2. All Fourier coefcients are integer multiples of 2
d
.
3. The number of nonzero Fourier coefcients is at most 4
d
.
2
Proof:(1) follows from observation 2.1. We can observe that all the Fourier coefcients look like
k2
d

for some d

d, which can be written as k2

d+d

2
d
. This proves (2). A depth-d decision tree
has at most 2
d
leaves and hence we have at most 2
d
2
d
= 4
d
Fourier coefcients, which proves
(3). 2
Corollary 2.4 Depth-d decision trees are exactly learnable with random examples in time
poly(4
d
)poly(n) log (1/).
Proof:Use Kushilevitz-Mansour algorithm, with =
2
d
4
and round each Fourier coefcient esti-
mate to the nearest multiple of 2
d
. 2
Remark 2.5 log (n)-depth decision trees are exactly learnable in polynomial time. This algorithm
can be derandomized.
Observation 2.6 Size-s decision trees are -close to a depth log (s/) decision trees.
Proof:Let T be decision tree of size s corresponding to boolean function f. Consider the decision
T

obtained from T by chopping all paths whose depth is greater than log (
s

) to log (
s

). The
decision tree T

gives an incorrect value for f(X) only when X takes a path of length greater than
log (
s

) in T. When we pick X at random, this happens with probability 2

log (
s

)
=

s
. Therefore
by union bound, we get that Pr
x{1,1}
n [T(x) = T

(x)] .
2
Corollary 2.7 Size-s decision trees are O()-concentrated on a collection of size size 4
log (s/)
=
(s/)
2
.
Denition 2.8 Given a function f : {1, 1}
n
R, the spectral norm or L
1
-Fourier norm of f is
||

f||
1
=

S[n]
|

f(S)|
Observation 2.9 If a function f is an AND of literals, then ||

f||
1
= 1. Refer observation 2.2 for
the proof idea.
The following observation follows from the fact a, b R, |a +b| |a| +|b| and |ab| = |a||b|.
Observation 2.10
1. ||

f + g||
1
||

f||
1
+|| g||
1
2. ||

cf||
1
= |c|||

f||
1
3
Proposition 2.11 If f has a decision tree of size s, ||

f||
1
s.
Proof:
||

f||
1

Paths P

1
P
f(P)

Paths P

1
P
s
2
Proposition 2.12 Given any function f with ||f||
2
2
1 and > 0, S = {S [n]||

f(S)|

f||
1
}, then f is -concentrated on S. Note that |S|
_
||

f||
1

_
2
.
Proof:

S/ S

f(S)
2
max
S / S
|

f(S)|
_

S/ S
|

f(S)|
_
max
S / S
|

f(S)|
_

S/ S
|

f(S)| +

SS
|

f(S)|
_

f||
1
||

f||
1

2
Corollary 2.13 Any class of functions C = {f| ||f||
2
2
1 and ||

f||
1
s} is learnable with
random examples in time poly(s,
1

).
Let us now consider functions which are computable by decision trees where nodes branch on
arbitrary parities of variables. Figure 2 contains an example of a function computable by decision
tree on the parity of the various subsets of variables. Another example is parity function which is
computable by a depth-1 parity decision tree.
Proposition 2.14 If a function f : {1, 1}
n
{1, 1} is expressible as a size-s decision tree on
parities, then ||

f||
1
s.
4
Figure 2:
Proof:Let 1
P
be an {0, 1}-indicator function for a path P in the decision tree. Let the path P =
(X
S
1
= b
1
, , X
S
d
= b
d
), i.e., we get the path P by taking the edges labeled b
1
, , b
d
{1, 1}
starting from the root node. We have
1
P
= (
1
2
+
1
2
b
1
X
S
1
) (
1
2
+
1
2
b
d
X
S
d
)
It can be seen that ||

1
P
||
1
= 1. Since f(x) =

Paths P
1
P
(x)f(P), we have ||

f||
1
s. 2
Denition 2.15 An AND of parities is called a coset.
Remark 2.16 If a function f : {1, 1}
n
{1, 1} is expressible as

s
i=1
1
P
i
, where P
i
s are
cosets then ||

f||
1
s.
Remark 2.17 Proposition 2.14 implies that we can learn all parity functions in poly(
1

) time.
Observe that we cannot see this result straightforward from the usual decision trees on parity
functions.
Theorem 2.18 [1] If a function f : {1, 1}
n
{1, 1} with ||

f||
1
s, then
f =
2
2
O(s
4
)

i=1
1
P
i
where P
i
s are cosets.
3 Learning DNFs
Proposition 3.1 If f has a size-s DNF formula, it is -close to a width-log(
s

) DNF.
5
Proof:Let the function f : {1, 1}
n
{1, 1} has a size-s DNF. Drop all the terms whose width
is larger than log(
s

) from the DNF of f and let the new DNF represents the function f

. If we
look at a particular term in the DNF of f whose width is greater than log(
s

), then the probability

that a randomly chosen x {1, 1} sets it to 1(or 1 if we look at f as boolean function from
{0, 1}
n
to {0, 1}) is at most 2
log(
s

)
=

s
. Since there are at most s terms in the DNF, we have that
Pr
x
[f(x) = f

(x)] by union bound. 2

Proposition 3.2 If a function f : {1, 1}
n
{1, 1} has a width w DNF, then I(f) 2w.
Proof:Left as an exercise. 2
Corollary 3.3 If a function f : {1, 1}
n
{1, 1} has a width w DNF, then f is -concentrated
on a S = {S| |S|
2w

}. Thus the function f can be learnable in n

O(
w

)
.
In the rest of the class, we shall prove the following theorem making use of Hastads switching
lemma.
Theorem 3.4 DNFs of width w are -concentrated on degree up to O(wlog(
1

)).
Remark 3.5 Observe that we are replacing the
1

-factor with log(

)-factor on the maximumdegree

of the Fourier coefcients.
Denition 3.6 A random restriction with -probability on [n] is a random pair (I, X) where I is
a random subset of [n] chosen by including each coordinate with probability independently and
X is a random string from {1, 1}
|

I|
.
Given a function f : {1, 1}
n
{1, 1}, we shall write f
X

I
: {1, 1}
|I|
R for a restric-
tion of f. If the function f is computable by a width w DNF, then after a random restriction with
-probability =
1
10w
, with very high probability, f
X

I
: {1, 1}
|I|
R has a O(1)-depth deci-
sion tree. The reason for this is intuitively that in each term of the DNF,
1
10
variables survive the
random restriction on an average. Thus resulting in a a constant depth decision tree. This intuition
is formalized in the following lemma due to Hastad.
Theorem 3.7 (Hastads Switching Lemma) Let f : {1, 1}
n
{1, 1} be a width w computable
DNF. When we apply a random restriction on the function f with -probability , then
Pr
(I,X)
[DT-depth(f
X

I
) > d] (5w)
d
Theorem 3.8 Let f be computable by a width-w DNF. Then d 5,

|U|20dw

f(u)
2
2
d+1
.
6
Proof:Let (I, X) be a random restriction with =
1
10w
. We know from Hastads switching lemma
f
X

I
has a depth greater than d with a probability less than 2
d
. Hence the following sum is
nonzero (and less than 1) with a probability less than 2
d
.

SI,|S|>d

f
X

I
(S)
2
Therefore, we have
2
d
E
(X,I)
_

SI
|S|>d

f
X

I
(S)
2
_

_
= E
I
_

_
E
X{1,1}
|

I|
_

SI
|S|>d

f
X

I
(S)
2
_

_
_

_
= E
I
_

SI
|S|>d
E
X{1,1}
|

I|
_
F
SI
(X)
2

_
(Recall F
SI
(x) =

f
x
(S))
= E
I
_

SI
|S|>d

F
SI
(T)
2
_

_
= E
I
_

SI
|S|>d

f(S T)
2
_

_
=

f(U)
2
Pr
I
[|U I| > d]
Suppose |U| 20dw, then |U I| is binomially distributed with mean 20dw = 2d. Using
Chernoff bound, we get that Pr
I
[|U I| > d]
1
2
, when d 5. Therefore we have the

f(U)
2
Pr
I
[|U I| > d] 2
d

U
|U|20dw

f(U)
2
1
2
2
d

U
|U|20dw

f(U)
2
2
d+1
2
Remark 3.9 By putting dw = wlog (
1

), we get the theorem 3.4

Further References Yishay Mansours survey paper[4] also contains some of the ideas in this
lecture notes.
7
References
[1] B. Green and T. Sanders. A quantitative version of the idempotent theorem in harmonic anal-
ysis. ArXiv Mathematics e-prints, Nov. 2006.
[2] E. Kushilevitz and Y. Mansour. Learning decision trees using the fourier spectrum. In STOC
91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages
455464, New York, NY, USA, 1991. ACM Press.
[3] N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, fourier transform, and learnabil-
ity. J. ACM, 40(3):607620, 1993.
[4] Y. Mansour. Learning boolean functions via the fourier transform. In V. Roychowdhury, K.-
Y. Siu, and A. Orlitsky, editors, Theoretical Advances in Neural Computation and Learning.
Kluwer, 1994.
8

Combinatorics and Complexity of Partition Functions (2016)
No ratings yet
Combinatorics and Complexity of Partition Functions (2016)
304 pages
Eci 2023
No ratings yet
Eci 2023
507 pages
Book
No ratings yet
Book
106 pages
Probbook
No ratings yet
Probbook
158 pages
Book
No ratings yet
Book
113 pages
Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
Discrete Time
No ratings yet
Discrete Time
106 pages
SMo Notes1
No ratings yet
SMo Notes1
81 pages
Module A
No ratings yet
Module A
43 pages
Unit - 1 - Ohs352-Project Report Writing
No ratings yet
Unit - 1 - Ohs352-Project Report Writing
23 pages
Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
No ratings yet
Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
341 pages
Lecture 4 - September 16, 2014
100% (1)
Lecture 4 - September 16, 2014
4 pages
ProbabilisticCombinatorics 15 MAR 2019
No ratings yet
ProbabilisticCombinatorics 15 MAR 2019
114 pages
FundProb Notes22
No ratings yet
FundProb Notes22
52 pages
Revision1OfTR21 102
No ratings yet
Revision1OfTR21 102
21 pages
Notes On Randomized Algorithms
No ratings yet
Notes On Randomized Algorithms
539 pages
Topics in Random Graphs
No ratings yet
Topics in Random Graphs
35 pages
Notes
No ratings yet
Notes
97 pages
Notes
No ratings yet
Notes
422 pages
TR19 003
No ratings yet
TR19 003
99 pages
Week3 Notes
No ratings yet
Week3 Notes
7 pages
On The Approximation Method and The P Versus NP Problem
No ratings yet
On The Approximation Method and The P Versus NP Problem
37 pages
Research: 1 Theorems and Open Problems
No ratings yet
Research: 1 Theorems and Open Problems
12 pages
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
No ratings yet
Notes On Randomized Algorithms: James Aspnes March 3rd, 2020
453 pages
Models of Computation Solution Manual
No ratings yet
Models of Computation Solution Manual
113 pages
Note
No ratings yet
Note
46 pages
0STAS301ExtraNotes Statistics
No ratings yet
0STAS301ExtraNotes Statistics
21 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Prof (1) F P Kelly - Probability
No ratings yet
Prof (1) F P Kelly - Probability
78 pages
Lecture Notes On Probability
No ratings yet
Lecture Notes On Probability
95 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Shannon Counting Argument
No ratings yet
Shannon Counting Argument
4 pages
LN SP2018
No ratings yet
LN SP2018
139 pages
Probability Theory
No ratings yet
Probability Theory
68 pages
Threshold Logic
No ratings yet
Threshold Logic
14 pages
Orbits
No ratings yet
Orbits
8 pages
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
No ratings yet
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
8 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
Probability Theory and Random Processes (MA 225) : Class Notes
No ratings yet
Probability Theory and Random Processes (MA 225) : Class Notes
35 pages
Lecture Notes On Binary Decision Diagrams
No ratings yet
Lecture Notes On Binary Decision Diagrams
15 pages
Ky Thuat Tim Kiem Dua Tren Fourier
No ratings yet
Ky Thuat Tim Kiem Dua Tren Fourier
6 pages
STAT333 Lecture Notes Book Version
No ratings yet
STAT333 Lecture Notes Book Version
71 pages
Week15 2x2 PDF
No ratings yet
Week15 2x2 PDF
3 pages
Fe 9
No ratings yet
Fe 9
38 pages
More Efficient PAC-learning of DNF With Membership Queries Under The Uniform Distribution
No ratings yet
More Efficient PAC-learning of DNF With Membership Queries Under The Uniform Distribution
30 pages
Lecture Notes On Cryptographic Boolean Functions: Anne Canteaut
No ratings yet
Lecture Notes On Cryptographic Boolean Functions: Anne Canteaut
48 pages
The Probabilistic Method - ProbabilisticMethod
No ratings yet
The Probabilistic Method - ProbabilisticMethod
9 pages
5 Junior P.E and Arts
No ratings yet
5 Junior P.E and Arts
83 pages
Ecture Otes On Robability: MER Amuz
No ratings yet
Ecture Otes On Robability: MER Amuz
88 pages
Fundamental Approximation Theorems: Kunal Narayan Chaudhury
No ratings yet
Fundamental Approximation Theorems: Kunal Narayan Chaudhury
4 pages
Hitchhiker S Guide To Probability
No ratings yet
Hitchhiker S Guide To Probability
6 pages
CSE291 Course Notes
No ratings yet
CSE291 Course Notes
69 pages
Data Quality DMB Ok Dam A Brasil
100% (1)
Data Quality DMB Ok Dam A Brasil
46 pages
02.0 PP Ix Xiv Contents
No ratings yet
02.0 PP Ix Xiv Contents
6 pages
Sol All
No ratings yet
Sol All
66 pages
Part1 Overview Release 13 en
No ratings yet
Part1 Overview Release 13 en
38 pages
XXXXX
No ratings yet
XXXXX
5 pages
EAPP Module2 v2
No ratings yet
EAPP Module2 v2
7 pages
REU Project: Topics in Probability: Trevor Davis August 14, 2006
No ratings yet
REU Project: Topics in Probability: Trevor Davis August 14, 2006
12 pages
CS30053 Foundations of Computing, Spring 2004
No ratings yet
CS30053 Foundations of Computing, Spring 2004
1 page
SAP PM Configuration 3
100% (1)
SAP PM Configuration 3
30 pages
Reu Project: 1 Preface
No ratings yet
Reu Project: 1 Preface
10 pages
Nissan - Resilience Strategy
0% (1)
Nissan - Resilience Strategy
2 pages
Control Lab1
0% (1)
Control Lab1
59 pages
? Gallery Walk Scoring Rubric
No ratings yet
? Gallery Walk Scoring Rubric
2 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
LN40D550 - Fast Track Troubleshooting Manual PDF
No ratings yet
LN40D550 - Fast Track Troubleshooting Manual PDF
4 pages
WS - 3 Class X Phy CH - 10 (Light - Refraction) - 1
No ratings yet
WS - 3 Class X Phy CH - 10 (Light - Refraction) - 1
3 pages
Inolab Cond 730
No ratings yet
Inolab Cond 730
80 pages
Hydroline Breather FSB TB 130417
No ratings yet
Hydroline Breather FSB TB 130417
3 pages
Grader Operator SOP
100% (1)
Grader Operator SOP
2 pages
Footscan®v9 Software Packages
No ratings yet
Footscan®v9 Software Packages
1 page
G 2 Catalogue
No ratings yet
G 2 Catalogue
60 pages
Very Low Drop 5V Regulator With Reset: Description
No ratings yet
Very Low Drop 5V Regulator With Reset: Description
79 pages
Project Documentation 2023 - 24 TK
No ratings yet
Project Documentation 2023 - 24 TK
18 pages
FRC Series
No ratings yet
FRC Series
1 page
DRAGO COSIC-prezentacija HIDROGEN
No ratings yet
DRAGO COSIC-prezentacija HIDROGEN
12 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
Fluostar 2L
No ratings yet
Fluostar 2L
1 page
Mosi Debat
No ratings yet
Mosi Debat
8 pages
Tổng Hợp Đề Thi Ielts Speaking Quý 4 - 2019 by Ngocbach
No ratings yet
Tổng Hợp Đề Thi Ielts Speaking Quý 4 - 2019 by Ngocbach
14 pages
Unit 4 - Week 2: Introduction To Python: Assignment 2
No ratings yet
Unit 4 - Week 2: Introduction To Python: Assignment 2
4 pages
OD328816327605052100
No ratings yet
OD328816327605052100
1 page
Molo District Health Center: AP (Pre-Natal)
No ratings yet
Molo District Health Center: AP (Pre-Natal)
2 pages
A Feasibility Study of Eco Bag Made in Banana Fiber
No ratings yet
A Feasibility Study of Eco Bag Made in Banana Fiber
3 pages
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
No ratings yet
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
6 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms

Uploaded by

Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms

Uploaded by

Analysis of Boolean Functions (CMU 18-859S, Spring 2007)

Lecture 9: Learning Decision Trees and DNFs

d, which can be written as k2

) in T. When we pick X at random, this happens with probability 2

), then the probability

(x)] by union bound. 2

}. Thus the function f can be learnable in n

-factor with log(

)-factor on the maximumdegree

), we get the theorem 3.4

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.