0% found this document useful (0 votes)
46 views4 pages

Kondor Regression

This document discusses different methods for regression analysis by fitting data to basis functions. It describes linear regression using a linear basis, polynomial regression using a polynomial basis, and Gaussian radial basis function (RBF) regression. For each method, the coefficients for the basis functions are determined by minimizing the squared error between the predicted and actual values. This is done by taking the derivative of the empirical risk function with respect to the coefficients and setting it equal to zero, resulting in a formula to calculate the optimal coefficients.

Uploaded by

Elio Amicarelli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views4 pages

Kondor Regression

This document discusses different methods for regression analysis by fitting data to basis functions. It describes linear regression using a linear basis, polynomial regression using a polynomial basis, and Gaussian radial basis function (RBF) regression. For each method, the coefficients for the basis functions are determined by minimizing the squared error between the predicted and actual values. This is done by taking the derivative of the empirical risk function with respect to the coefficients and setting it equal to zero, resulting in a formula to calculate the optimal coefficients.

Uploaded by

Elio Amicarelli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Regression by linear combination of basis

functions
Risi Kondor
February 5, 2004
Given data points (x1 , y1 ) , (x2 , y2 ) , . . . , (xN , yN ) where x X and y R, the
task of regression is to fit a real valued function f : X 7 R to these points.
In the simplest case X = R. In multidimensional regression X = RD . It is
sometimes necessary to do regression on more complicated spaces, but we are
not going to deal with that here.
The easiest way to attack the regression problem is to look for f in a finite
dimensional space of functions spanned by a given basis. In other words, we
specify a set of functions 0 , 1 , . . . , P from X to R and look for f in the form
of a linear combination
P
X
f (x) =
i i (x).
(1)
i=0

Performing the regression then reduces to finding the real parameters 1 , 2 , . . . , P .

Different Bases
Linear regression
The simplest case is that of linear regression. In the one dimensional case we
would simply take 0 (x) = 1 and 1 (x) = x. This gives
f (x) =

1
X

i i (x) = 0 + 1 x,

i=0

so by tuning 0 and 1 we can make f be any linear function. In the multidimensional case we would take 1 (x) = [x]1 , 2 (x) = [x]2 , all the way to D (x) = [x]D .
Here [x]i denotes the ith component of the vector x. Unfortunately, we cannot
use the simpler notation xi for this purpose, because that is already reserved
for the ith data point. All linear functions f : RD 7 R can be expressed in
this basis:
f (x) =

D
X

i i (x) = 0 + 1 [x]1 + 2 [x]2 + . . . + D [x]D .

i=0

Realizing the constant term by setting 0 (x) = 1 will be common to all the
function classes we discuss.
1

Polynomial regression
Another possible choice of basis (in the one-dimensional case) is to set i (x) = xi
for i = 1, 2, . . . , P . This lets us choose f from the class of polynomial functions
of degree at most P :
f (x) =

P
X

i i (x) = 0 + 1 x + 2 x2 + . . . + P xP .

i=0

The multidimensional case is more complicated, because i has to become a


multi-index i = (i1 , i2 , . . . , iD ) and the sum over is becomes a sum over all
multi-indices with i1 + i2 + . . . + id P . For example, for D = 3 and P = 2,
f (x) =

i i (x) =

(i1 ,i2 ,i3 )

0 + (1,0,0) [x]1 + (0,1,0) [x]2 + (0,0,1) [x]3 + (1,1,0) [x]1 [x]2 +


2

(0,1,1) [x]2 [x]3 + (1,0,1) [x]3 [x]1 + (2,0,0) [x]1 + (0,2,0) [x]2 + (0,0,2) [x]3
Gaussian RBFs
The last basis we look at is that of Gaussian Radial Basis Functions (RBFs)
z = ek xz k

/(2 2 )

where is a pre-set variance parameter. Of course, this would give an uncountably infinite number of basis functions (z can be anywhere in RD ), which
we cannot have. We remedy the situation by only considering Gaussian RBFs
centered at the data points themselves,
i = ek xxi k

/(2 2 )

This step is not as arbitrary as it sounds, but we cannot descibe the justifiction
for it here. As before, 0 is still the constant function 0 (x) = 1.

Solving for the s


To find the optimal value for 0 , 1 , . . . , P we
1. define a loss function L;
2. using the loss function define the empirical risk Remp () quantifying the
loss over all the training data for particular values of 0 , 1 , . . . , P ;
3. solve for the particular setting of the parameters (denoted 0 , 1 , . . . , P )
that minimizes the empirical risk.
We shall use the squared error loss function
L(y, f (x)) =

1
2
(y f (x)) .
2

This is the simplest possible loss function, and it just says that the loss is
proportional to the square of the difference between the predicted value and the
true value. The empirical risk is then
Remp (f ) = Remp (0 , 1 , . . . , P ) =
!2

P
N
N
X
1 X
1 X
j j (xi ) .
L(yi , f (xi )) =
yi
2N i=1
2N i=1
j=0
To simplify the development,

0
1

= .
..

we now introduce the vectors

y0
y1

y= .

..

yN
P

and the matrix

Q=

0 (x1 )
0 (x2 )
..
.

1 (x1 )
1 (x2 )

0 (xN )

1 (xN )

...
..

.
...

P (x1 )
P (x2 )
..
.
P (xN )

(2)

On the slides X is used for Q, but in the general case where i are not linear
functions that might be misleading. The empirical risk can then be written in
the much shorter form
Remp =

1
2
k y Q k .
2N

To find , we can just set the derivatives of the empirical risk with respect
to each i equal to zero

1
Remp
2
=
k y Q k = 0
i
i 2N
3

and solve for . In short hand, this is written as the single equation
Remp = 0.
We can then solve for the optimal by
0

=
=
=
=
=

leading to

Remp

1
2

k y Q k
2N
h
i
1
T
(y Q) (y Q)
2N
h
i
1
y Ty 2y T Q + T QT Q
2N

1
2QT y + 2QT Q
2N
QT Q = Q T y

and

1 T
= QT Q
Q y.

(3)

This optimal value of we denote .


In summary, we can use the same formula (3) no matter whether we do linear,
polynomial or RBF regression, the only thing that changes is the definition of
the matrix Q (2).
In fact, a notable simplification occurs in the RBF case, if we omit the bias
term (leave 0 (x) = 1 out of the basis). In this case Q is a symmetric square
basis, so (3) reduces to just = Q1 y.
To plot the resulting regression function we substite 1 , 1 , . . . , P back into
(1). For example, in the RBF case
f (x) = 0 +

P
X

i ek xxi k

i=1

/(2 2 )

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy