Kondor Regression
Kondor Regression
functions
Risi Kondor
February 5, 2004
Given data points (x1 , y1 ) , (x2 , y2 ) , . . . , (xN , yN ) where x X and y R, the
task of regression is to fit a real valued function f : X 7 R to these points.
In the simplest case X = R. In multidimensional regression X = RD . It is
sometimes necessary to do regression on more complicated spaces, but we are
not going to deal with that here.
The easiest way to attack the regression problem is to look for f in a finite
dimensional space of functions spanned by a given basis. In other words, we
specify a set of functions 0 , 1 , . . . , P from X to R and look for f in the form
of a linear combination
P
X
f (x) =
i i (x).
(1)
i=0
Different Bases
Linear regression
The simplest case is that of linear regression. In the one dimensional case we
would simply take 0 (x) = 1 and 1 (x) = x. This gives
f (x) =
1
X
i i (x) = 0 + 1 x,
i=0
so by tuning 0 and 1 we can make f be any linear function. In the multidimensional case we would take 1 (x) = [x]1 , 2 (x) = [x]2 , all the way to D (x) = [x]D .
Here [x]i denotes the ith component of the vector x. Unfortunately, we cannot
use the simpler notation xi for this purpose, because that is already reserved
for the ith data point. All linear functions f : RD 7 R can be expressed in
this basis:
f (x) =
D
X
i=0
Realizing the constant term by setting 0 (x) = 1 will be common to all the
function classes we discuss.
1
Polynomial regression
Another possible choice of basis (in the one-dimensional case) is to set i (x) = xi
for i = 1, 2, . . . , P . This lets us choose f from the class of polynomial functions
of degree at most P :
f (x) =
P
X
i i (x) = 0 + 1 x + 2 x2 + . . . + P xP .
i=0
i i (x) =
(0,1,1) [x]2 [x]3 + (1,0,1) [x]3 [x]1 + (2,0,0) [x]1 + (0,2,0) [x]2 + (0,0,2) [x]3
Gaussian RBFs
The last basis we look at is that of Gaussian Radial Basis Functions (RBFs)
z = ek xz k
/(2 2 )
where is a pre-set variance parameter. Of course, this would give an uncountably infinite number of basis functions (z can be anywhere in RD ), which
we cannot have. We remedy the situation by only considering Gaussian RBFs
centered at the data points themselves,
i = ek xxi k
/(2 2 )
This step is not as arbitrary as it sounds, but we cannot descibe the justifiction
for it here. As before, 0 is still the constant function 0 (x) = 1.
1
2
(y f (x)) .
2
This is the simplest possible loss function, and it just says that the loss is
proportional to the square of the difference between the predicted value and the
true value. The empirical risk is then
Remp (f ) = Remp (0 , 1 , . . . , P ) =
!2
P
N
N
X
1 X
1 X
j j (xi ) .
L(yi , f (xi )) =
yi
2N i=1
2N i=1
j=0
To simplify the development,
0
1
= .
..
y0
y1
y= .
..
yN
P
Q=
0 (x1 )
0 (x2 )
..
.
1 (x1 )
1 (x2 )
0 (xN )
1 (xN )
...
..
.
...
P (x1 )
P (x2 )
..
.
P (xN )
(2)
On the slides X is used for Q, but in the general case where i are not linear
functions that might be misleading. The empirical risk can then be written in
the much shorter form
Remp =
1
2
k y Q k .
2N
To find , we can just set the derivatives of the empirical risk with respect
to each i equal to zero
1
Remp
2
=
k y Q k = 0
i
i 2N
3
and solve for . In short hand, this is written as the single equation
Remp = 0.
We can then solve for the optimal by
0
=
=
=
=
=
leading to
Remp
1
2
k y Q k
2N
h
i
1
T
(y Q) (y Q)
2N
h
i
1
y Ty 2y T Q + T QT Q
2N
1
2QT y + 2QT Q
2N
QT Q = Q T y
and
1 T
= QT Q
Q y.
(3)
P
X
i ek xxi k
i=1
/(2 2 )