0% found this document useful (0 votes)

47 views9 pages

Adapting To Unknown Smoothness: R. M. Castro May 20, 2011

This document discusses adapting regression models to functions of unknown smoothness. It begins by reviewing bounds for regression of Lipschitz smooth functions, then generalizes the approach to handle functions of unknown smoothness. The key ideas are to consider a countable class of models with increasing complexity, assign penalties based on model complexity using a coding argument, and select the optimal model complexity automatically via penalized maximum likelihood. This approach achieves near-optimal statistical rates without needing to know the true smoothness of the target function in advance.

Uploaded by

Raunak Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views9 pages

Adapting To Unknown Smoothness: R. M. Castro May 20, 2011

Uploaded by

Raunak Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Adapting to Unknown Smoothness

R. M. Castro
May 20, 2011
1 Introduction
In this set of notes we see how to make the most of the oracle bounds we
proved in class. We begin by re-deriving our bounds for regression of Lipschitz
smooth functions, and see how to generalize the result regression of functions
of unknown smoothness.
Suppose we have a function f

: [0, 1] [R, R] satisfying the Lipschitz

smoothness assumption
s, t [0, 1] [f

(s) f

(t)[ L[s t[ .
We have seen these functions can be well approximated by piecewise constant
functions of the form
g(x) =
m

j=1
c
j
1x I
j
, where I
j
=
_
j 1
m
,
j
m
_
.
Lets use maximum-likelihood to pick the best such model. Suppose we have
following regression model
Y
i
= f

(x
i
) +
i
, i = 1, . . . , n ,
where x
i
= i/n and
i
i.i.d.
A(0,
2
). To be able to use the corollary we derived
we need a countable or nite class of models. The easiest way to do so is to
discretize/quantize the possible values of each constant piece in the candidate
models. Dene
T
m
=
_
_
_
m

j=1
c
j
1x I
j
: c
j
Q
_
_
_
,
where
Q = R, R
n 1
n
, . . . , R =
_
R
k n
n
, k = 0, . . . , 2n
_
.
Therefore T
m
has exactly (2n + 1)
m
elements in total. This means that, by
taking c(f) = log
2
((2n + 1)
m
) = mlog
2
(2n + 1) for all f T
m
we satisfy the
1
Kraft inequality

fFm
2
c(f)
=

fFm
1
[T
m
[
= 1 .
So we are ready to apply our oracle bound. Since c(f) is just a constant (not
really a function of f) the estimator is simply the MLE

f
n
= arg min
fFm
_
1
n
n

i=1
(Y
1
f(x
i
))
2
+
4
2
c(f) log 2
n
_
= arg min
fFm
_
1
n
n

i=1
(Y
1
f(x
i
))
2
_
.
The corollary then says that
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
2 min
fFm
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
+
4
2
c(f) log 2
n
_
.
So far the result is extremely general, as we have not made use of the Lipschitz
assumption. We have seen earlier that there is a piecewise constant function

f
m
(x) =

m
j=1
c
j
1x I
j
such that for all x [0, 1] we have [f

(x)

f
m
(x)[
L/m. The problem is that, generally

f
m
/ T
m
since c
j
/ Q. Take instead the
element of T
m
that is closest to

f
m
, namely

f
m
= arg min
fFm
sup
x[0,1]
[f(x)

f
m
(x)[ .
It is clear that [f(x)

f
m
(x)[ R/n for all x [0, 1] therefore, by the triangle
inequality we have
[f(x)

f
m
(x)[ [f(x)

f
m
(x)[ +[

f
m
(x)

f
m
(x)[
L
m
+
R
n
.
Now, we can just use this in our bound
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
2 min
fFm
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
+
4
2
c(f) log 2
n
_

2
n
n

i=1
_
L
m
+
R
n
_
2
+
8
2
mlog
2
(2n + 1) log 2
n
= 2
_
L
m
+
R
n
_
2
+
8
2
mlog
2
(2n + 1) log 2
n
.
So, to ensure the best bound possible we should choose m minimizing the right-
hand-side. This yields m (n/ log n)
1/3
and
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
= O
_
(n/ log n)
2/3
_
2
which, apart from the logarithmic factor is the best we can ever hope for (this
logarithmic factor is due to the discretization of the model classes, and is an
artifact of this approach). If we want the truly best possible bound we need
to minimize the above expression with respect to m, and for that we need to
know L. Can we do better? Can we automagically choose m using the data?
The answer is yes, and for this we will start taking full advantage of our oracle
bound.
Since we want to choose the best possible m we must consider the following
class of models
T =

_
m=1
T
m
.
This is clearly a countable class of models (but not nite). So we need to be
a bit more careful in constructing the map c(). Lets use a coding argument:
begin by dening
m(f) = min
mN
m : f T
m
.
Encode f T using rst the bits 00 . . . 01 (total m(f) bits) to encode m(f)
and them log
2
[T
m
[ bits to encode which model inside T
m
is f. This is clearly
a prex code and therefore satises the Kraft inequality. More formally
c(f) = m(f)+log
2
[T
m(f)
[ = m(f)+log
2
_
(2n + 1)
m(f)
_
= m(f)(1+log
2
((2n + 1)) .
Although we know, from the coding argument, that the map c() satises
the Kraft inequality for sure, we can do a little sanity check, and ensure this is
indeed true:

fF
2
c(f)

m=1

fFm
2
c(f)
=

m=1

fFm
2
m(f)log
2
|F
m(f)
|

m=1

fFm
2
mlog
2
|Fm|
=

m=1
2
m

fFm
1
[T
m
[
=

m=1
2
m
= 1 .
Now, similarly to what we had before

f
n
= arg min
fF
_
1
n
n

i=1
(Y
1
f(x
i
))
2
+
4
2
c(f) log 2
n
_
= arg min
fF
_
1
n
n

i=1
(Y
1
f(x
i
))
2
+
4
2
m(f)(1 + log
2
(2n + 1)) log 2
n
_
,
3
which is no longer the MLE, but rather a maximum penalized likelihood esti-
mator. Then
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
2 min
fF
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
+
4
2
m(f)(1 + log
2
(2n + 1)) log 2
n
_
2 min
mN
_
min
fFm
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
_
+
4
2
m(1 + log
2
(2n + 1)) log 2
n
_
min
mN
_
2
_
L
m
+
R
n
_
2
+
8
2
m(1 + log
2
(2n + 1)) log 2
n
_
.
Therefore this estimator automatically chooses the best possible number of
parameters m. Note that the price we pay is very very modest - the only
change from what we had before was that the term log
2
(2n + 1) is replaced
by 1 + log
2
(2n + 1), which is a very minute change as log
2
(2n + 1) 1 for
reasonable sample sizes.
Although this is remarkable, we can probably do much better, and adjust to
unknown smoothness. Well see how to do it in the next section.
2 Holder smooth functions
For 0 < 1, dene the space of functions
H

C
_
,
for some constant 0 < C < . This class contains functions that are bounded,
but less smooth than Lipschitz functions. Indeed, the space of Lipschitz func-
tions corresponds to = 1. Functions in H
1
are uniformly continuous, but
functions in H

, < 1, are generally not uniformly continuous (although are

still continuous). Therefore a larger corresponds to smoother functions. Since
we want functions that are smoother than Lipschitz it makes sense to look at
derivatives. If 1 < 2 dene
H

, 1 < 2, contains dierentiable functions and their rst

derivative is Holder smooth with smoothness 1.
If f H

(C), 0 < 2, then we say that f is Holder smooth with

Holder constant C. The notion of Holder smoothness can also be extended to
> 2 in a straightforward way. There are other equivalent ways of dening
4
Holder smooth functions, in particular characterizing their local approximation
by polynomials. In particular a function f is Holder- smooth if it has |
derivatives and
[f(x) T

y
(x)[ C[x y[

x, y ,
where | is the largest integer such that | < , and T

y
is the Taylor
polynomial of degree | around the point y. In words, a Holder- smooth
function is locally well approximated by a polynomial of degree |. In this
lecture we will work with the rst denition (this will also give you an indication
why the two denitions are equivalent). Note: If a function is Holder-
2
smooth
and
1
<
2
then the function is also Holder-
1
smooth.
Note that since Holder smoothness essentially measures how dierentiable
functions are, the Taylor polynomial is the natural way to approximate Holder
smooth functions. We will focus on Holder smooth function classes with 0 <
2. Thus, we will work with piecewise linear approximations, the Taylor
polynomial of degree 1. If we were to consider smoother functions, > 2 we
would need consider higher degree Taylor polynomial approximation functions,
i.e. quadratic, cubic, etc...
3 Regression of Holder smooth functions
Consider the usual regression model
Y
i
= f

(x
i
) +
i
, i = 1, . . . , n ,
where x
i
= i/n and
i
i.i.d.
A(0,
2
). Lets assume f

: [0, 1] [R, R] is a
smooth function, in the sense that f

(C) for some unknown 0 < 2.

Lets see how well can we estimate f

. Intuitively, the smoother f

is the better
we should be able to estimate it. The smoother f

is,the more averaging we

can perform to reduce noise. In other words for smoother f

we should average
over larger bins. Also, we will need to exploit the extra smoothness in our
approximation of f

. To that end, we will consider candidate functions that are

piecewise linear, i.e., functions of the form
m

j=1
(a
j
+ b
j
x)1x I
j
, where I
j
=
_
j 1
m
,
j
m
_
.
As before, we want to consider countable/nite classes of models to be able to
apply our corollary, so we will consider a slight modication of the above. Each
linear piece can be described by their beginning and end points respectively. So
we are going to restrict those to lie on a grid. Namely refer to Figure 1
Dene the class
T
m
=
_
_
_
f(x) =
m

j=1

j
(x)1x I
j

_
_
_
,
5
(i1)/k i/k
C
0
C
n
levels
Figure 1: Example on the discretization of f on interval
_
j1
m
,
j
m
_
where

j
(x) =
x (j 1)/m
1/m
b
j
+
j/mx
1/m
a
j
= (mx j + 1)b
j
+ (j mx)a
j
,
and a
j
, b
j

_
k

n
R : k 0, . . . , 2

n
_
. Clearly [T
m
[ = (2

n + 1)
2m
.
Since we dont know the smoothness a priori, we must choose m using the
data. Therefore, in the same fashion as before we take the class
T =

_
m=1
T
m
,
with m(f) = min
mN
m : f T
m
, and
c(f) = m(f) + log
2
[T
m(f)
[ = m(f)(1 + 2 log
2
_
(2

n + 1)
_
.
Exactly as before, dene the estimator

f
n
= arg min
fF
_
1
n
n

i=1
(Y
1
f(x
i
))
2
+
4
2
m(f)(1 + 2 log
2
(2

n + 1)) log 2
n
_
,
6
Then
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
2 min
fF
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
+
4
2
m(f)(1 + 2 log
2
(2

n + 1)) log 2
n
_
2 min
mN
_
min
fFm
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
_
+
4
2
m(1 + 2 log
2
(2

n + 1)) log 2
n
_
.
In the above, the rst term is essentially our familiar approximation error, and
the second term is in a sense bounding the estimation error. Therefore this esti-
mator automatically seeks the best balance between the two. To say something
more concrete about the performance of the estimator we need to bring in the
assumptions we have on f

(so far we havent used any).

First, suppose f

(C) for 1 < 2. We need to nd agood model

in the class T
m
that makes the approximation error small. Take x I
j
where
j is arbitrary (so x can be any number in [0, 1]). From Taylors theorem with
remainder we have
f

(x) = f

_
j 1
m
_
+
f

x
(x

)
_
x
j 1
m
_
for some x

_
j1
m
, x

. This suggests using a non-discretized piecewise linear

approximation of the form

f
m
(x) =
m

j=1
_
f

_
j 1
m
_
+
f

x
_
j 1
m
__
x
j 1
m
__
1x I
j
.
Note that this is not necessarily the best piecewise linear approximation to f

,
but it is good enough for our purposes. What can we say about [

f
m
(x)f

(x)[?
Take again x I
j
. Now

f
m
(x) f

(x)

_
j 1
m
_
+
f

x
_
j 1
m
__
x
j 1
m
_
f

(x)

_
j 1
m
_
+
f

x
_
j 1
m
__
x
j 1
m
_
f

_
j 1
m
_

x
(x

)
_
x
j 1
m
_

x
_
j 1
m
__
x
j 1
m
_

x
(x

)
_
x
j 1
m
_

x
_
j 1
m
_

x
(x

_
x
j 1
m
_
C
_
1
m
_
1
1
m
= Cm

,
7
where the last line follows simply from the use of the smoothness assumption,
together with the fact that x (j 1)/m 1/m and x

(j 1)/m 1/m.
So, we just showed that, for any piecewise linear function of the form con-
sidered
x [0, 1] [

f
m
(x) f

(x)[ Cm

.
Now, clearly

f
m
is not necessarily in T
m
, so we still need a bit of work. Let
f
m
be the closest function to f

, in the sense that

f
m
= arg min
piecewise linear functions
sup
x[0,1]
[f(x) f

(x)[ .
Now take the function in T
m
that is closest to that function

f
m
= arg min
fFm
sup
x[0,1]
[f(x) f
m
(x)[ ,
because of the way we discretized we know that sup
x[0,1]
[f(x) f
m
(x)[
R/

n. Using this, together with the triangle inequality yields

x [0, 1] [

f
m
(x) f

(x)[ Cm

+
R

n
.
If f

(C) for 0 < 1 we can proceed in a similar fashion, but simply

have to note that such functions are well approximated by piecewise constant
functions. Furthermore these are a subset of T
m
. So, the reasoning we used for
Lipschitz functions applies directly here. Let

f
m
(x) =
m

j=1
f

_
j 1
m
_
1x I
j
.
Then, for x I
j

f
m
(x) f

(x)

_
j 1
m
_
f

(x)

j 1
m
x

,
and similarly, for any

f
m
T
m
x [0, 1] [

f
m
(x) f

(x)[ Cm

+
R

n
.
8
So, we can just plug-in these results into the bound of the corollary.
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
2 min
mN
_
min
fFm
_
1
n
n

i=1
(f

(x
i
) f(x
i
))
2
_
+
4
2
m(1 + 2 log
2
(2

n + 1)) log 2
n
_
2 min
mN
_
_
Cm

+
R

n
_
2
+
4
2
m(1 + 2 log
2
(2

n + 1)) log 2
n
_
2 min
mN
O
_
max
_
m
2
,
m

n
,
1
n
,
mlog n
n
__
.
It is not hard to see that the rst and last terms dominate the bound, and so
we attain the minimum by taking (in the bound)
m
_
n
log n
_
1/(2+1)
,
which yields
E
_
1
n
n

i=1
(

f
n
(x
i
) f

(x
i
))
2
_
= O
_
_
n
log n
_

2
2+1
_
.
Note that the estimator does not know !!! So we are indeed adapting to
unknown smoothness. If the regression function f

is Lipschitz this estimator

has error rate O
_
(n/ log n)
2/3
_
. However, if the function is smoother (say
= 2) the estimator has error rate O
_
(n/ log n)
4/5
_
, which decays much
quicker to zero. More remarkably, apart from the logarithmic factor it can be
shown this is the best one can hope for! So the logarithmic factor is the very
small price we need to pay for adaptivity.
9

Monte Carlo Assignment
No ratings yet
Monte Carlo Assignment
1 page
Stats 205 Notes
No ratings yet
Stats 205 Notes
99 pages
Writing Successful Undergraduate Dissertations in Social Sciences A Student S Handbook 2nd Edition Francis Jegede Download
No ratings yet
Writing Successful Undergraduate Dissertations in Social Sciences A Student S Handbook 2nd Edition Francis Jegede Download
46 pages
ECON-C4210 - Econometrics II: Capstone: Lecture 6: Maximum Likelihood Approach To Estimation
No ratings yet
ECON-C4210 - Econometrics II: Capstone: Lecture 6: Maximum Likelihood Approach To Estimation
34 pages
Fungsi Komposisi English YOLANDA
No ratings yet
Fungsi Komposisi English YOLANDA
11 pages
Newsletter Term 3 2024
No ratings yet
Newsletter Term 3 2024
12 pages
Lesson 6 Psy Art
No ratings yet
Lesson 6 Psy Art
3 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
Samsung AR18 2PK Fast Cooling Air Conditioner - Samsung ID PDF
No ratings yet
Samsung AR18 2PK Fast Cooling Air Conditioner - Samsung ID PDF
10 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Defarch 3-2
No ratings yet
Defarch 3-2
14 pages
Fundations Data Science
No ratings yet
Fundations Data Science
16 pages
Adaptive Estimation in An Autoregression and A Geometrical
No ratings yet
Adaptive Estimation in An Autoregression and A Geometrical
37 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
7 Mle
No ratings yet
7 Mle
31 pages
XXXX Statistical Estimation
No ratings yet
XXXX Statistical Estimation
87 pages
Concave Mirror
No ratings yet
Concave Mirror
3 pages
Ashish Mcdiarmid
No ratings yet
Ashish Mcdiarmid
22 pages
Gradient Descendent
No ratings yet
Gradient Descendent
10 pages
LP
No ratings yet
LP
478 pages
КТЖ 10 Action 68 Сағат Жаңа
No ratings yet
КТЖ 10 Action 68 Сағат Жаңа
9 pages
Emp Proc Lecture Notes
No ratings yet
Emp Proc Lecture Notes
172 pages
Chapter 1 - Persistent Questions
No ratings yet
Chapter 1 - Persistent Questions
2 pages
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
No ratings yet
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
205 pages
Optimization Best
No ratings yet
Optimization Best
71 pages
Cours2 ML
No ratings yet
Cours2 ML
21 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Aabcs
No ratings yet
Aabcs
2 pages
Function Approximation: A Gradient Boosting Machine.
No ratings yet
Function Approximation: A Gradient Boosting Machine.
45 pages
Co-Evolution and Contradiction A Diamond Model of
No ratings yet
Co-Evolution and Contradiction A Diamond Model of
77 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Final Slides HL
No ratings yet
Final Slides HL
22 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Scheduling Assignment 1
No ratings yet
Scheduling Assignment 1
3 pages
Appointment Letter
No ratings yet
Appointment Letter
1 page
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
0 Inference For Diffusion Processes
No ratings yet
0 Inference For Diffusion Processes
20 pages
Traditional File Processing System Ne
100% (1)
Traditional File Processing System Ne
4 pages
Consistency of One-Class SVM and Related Algorithms
No ratings yet
Consistency of One-Class SVM and Related Algorithms
8 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Lecture Notes For Machine Learning Theory
No ratings yet
Lecture Notes For Machine Learning Theory
167 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
No ratings yet
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
27 pages
Convexity, Lipschitzness, Smoothness
No ratings yet
Convexity, Lipschitzness, Smoothness
5 pages
Invariant density estimation. Let us introduce the local time estimator (x) = Λ (x)
No ratings yet
Invariant density estimation. Let us introduce the local time estimator (x) = Λ (x)
30 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
Introduction To Kernels: Max Welling
No ratings yet
Introduction To Kernels: Max Welling
16 pages
Networking in Sensor: Rajiv Shrivastava Roll No. 15MP08 ME Manufacturing 2 Sem
No ratings yet
Networking in Sensor: Rajiv Shrivastava Roll No. 15MP08 ME Manufacturing 2 Sem
11 pages
Arcot WebFort 5.4 Installation and Deployment Guide
No ratings yet
Arcot WebFort 5.4 Installation and Deployment Guide
170 pages
Foundations Computational Mathematics: Online Learning Algorithms
No ratings yet
Foundations Computational Mathematics: Online Learning Algorithms
26 pages
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
No ratings yet
Local Linear Regression For Functional Data: Alain Berlinet, Abdallah Elamine, André Mas Université Montpellier 2
23 pages
Freshers Resume Sample PDF
100% (1)
Freshers Resume Sample PDF
2 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
74 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
Stat 2013
No ratings yet
Stat 2013
132 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Hall and Wayman 1990
No ratings yet
Hall and Wayman 1990
25 pages
%%temporal Amount of Points %%spatial Amount of Points.: For End
No ratings yet
%%temporal Amount of Points %%spatial Amount of Points.: For End
3 pages
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
No ratings yet
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
17 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
Extensive Games Raunak Jain
No ratings yet
Extensive Games Raunak Jain
2 pages
Theory of Quadrature PDF
No ratings yet
Theory of Quadrature PDF
280 pages
InTech-Hilbert Transform and Applications
No ratings yet
InTech-Hilbert Transform and Applications
11 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
No ratings yet
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
22 pages
Creel M Econometrics
No ratings yet
Creel M Econometrics
479 pages
Organizational Leadership
No ratings yet
Organizational Leadership
15 pages
Prime Insurance
100% (1)
Prime Insurance
32 pages
TEAP - Modelo 01
No ratings yet
TEAP - Modelo 01
8 pages
Tutorial de Emulación Sobre Gameboy Advance Retrovicio
No ratings yet
Tutorial de Emulación Sobre Gameboy Advance Retrovicio
10 pages
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
No ratings yet
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
8 pages
Variational Problems in Machine Learning and Their Solution With Finite Elements
No ratings yet
Variational Problems in Machine Learning and Their Solution With Finite Elements
11 pages
Language Arts Teach-A-Lesson Assignment
No ratings yet
Language Arts Teach-A-Lesson Assignment
12 pages
A Short Course On Nonparametric Curve Estimation R PDF
No ratings yet
A Short Course On Nonparametric Curve Estimation R PDF
114 pages
LAB 3 Triangle of Forces
67% (3)
LAB 3 Triangle of Forces
3 pages
Account Creation Confirmation On Campusfrance - India: Is In600944
No ratings yet
Account Creation Confirmation On Campusfrance - India: Is In600944
1 page
Cheat Sheet Transit HPS Retro
No ratings yet
Cheat Sheet Transit HPS Retro
1 page
Job Hunting Guide For: International Students
No ratings yet
Job Hunting Guide For: International Students
34 pages
(April-18) (MBA-202) BBA & IMBA Degree Examination Iv Semester Organizational Behavior Time: 3 Hours Max - Marks: 60
No ratings yet
(April-18) (MBA-202) BBA & IMBA Degree Examination Iv Semester Organizational Behavior Time: 3 Hours Max - Marks: 60
2 pages
Functionspaces PDF
No ratings yet
Functionspaces PDF
15 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
PPST Domain Learnin G Area Strength/S Weaknesse S Opportunitie S Threats
No ratings yet
PPST Domain Learnin G Area Strength/S Weaknesse S Opportunitie S Threats
13 pages
Caffeine Experiment
No ratings yet
Caffeine Experiment
6 pages
Introduction To 8051 Programming
No ratings yet
Introduction To 8051 Programming
24 pages
Ear: 1 Semester: 2
No ratings yet
Ear: 1 Semester: 2
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Adapting To Unknown Smoothness: R. M. Castro May 20, 2011

Uploaded by

Adapting To Unknown Smoothness: R. M. Castro May 20, 2011

Uploaded by

Adapting to Unknown Smoothness

: [0, 1] [R, R] satisfying the Lipschitz

, < 1, are generally not uniformly continuous (although are

, 1 < 2, contains dierentiable functions and their rst

(C), 0 < 2, then we say that f is Holder smooth with

(C) for some unknown 0 < 2.

. Intuitively, the smoother f

is,the more averaging we

. To that end, we will consider candidate functions that are

(so far we havent used any).

(C) for 1 < 2. We need to nd agood model

. This suggests using a non-discretized piecewise linear

, in the sense that

n. Using this, together with the triangle inequality yields

(C) for 0 < 1 we can proceed in a similar fashion, but simply

is Lipschitz this estimator

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.