Research On Interpolation and Data Fitting: Basis and Applications
Research On Interpolation and Data Fitting: Basis and Applications
Applications
University of Reading
Abstract
In the era of big data, we first need to manage the data, which requires us to
find missing data or predict the trend, so we need operations including interpo-
lation and data fitting. Interpolation is a process to discover deducing new data
points in a range through known and discrete data points. When solving sci-
entific and engineering problems, data points are usually obtained by sampling,
experiments, and other methods. These data may represent a finite number of
numerical functions in which the values of independent variables. According to
these data, we often want to get a continuous function, i.e., curve; or more dense
discrete equations are consistent with known data, while this process is called
fitting.
This article describes why the main idea come out logically and how to apply
various method since the definitions are already written in the textbooks. At
the same time, we give examples to help introduce the definitions or show the
applications.
i
Part 3 Project (Project Report) 27803035, 27803052
Contents
1 Interpolation by Polynomials 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Newton Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Uniqueness and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Data Fitting 16
5.1 Key Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Linear Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
List of Figures
List of Tables
1 Interpolation by Polynomials
1.1 Introduction
Put simply, polynomial interpolation is the interpolation of a given data set through
polynomials. Given a set of n + 1 distinct data points (xi , yi ), i = 0, 1, · · · , n, we try
to find a polynomial p, such that
p(xi ) = yi , i = 0, 1, · · · , n.
Note that the uniqueness theorem shows that such polynomial p exists and is unique,
which would be covered later.
Suppose we have three distinct points (x1 , y1 ), (x2 , y2 ), (x3 , y3 ), obviously we have to
construct a 3-order curve which could go across all three points. Rather than find it
directly by its form as f (x) = ax2 + bx + c, a 6= 0, we want to follow Joseph-Louis
Lagrange’s thought: try to construct proper function by the composition of three
functions related to these three points.
Hence
• y1 f1 (x) can make sure take value y1 at x1 , 0 at other points;
• y2 f2 (x) can make sure take value y2 at x2 , 0 at other points;
• y3 f3 (x) can make sure take value y3 at x3 , 0 at other points.
Hence, f (x) = y1 f1 (x)+y2 f2 (x)+y3 f3 (x) must go through (x1 , y1 ), (x2 , y2 ), (x3 , y3 ).We
can test and verify it by MATLAB:
1.2.2 Standardization
The only work left is to standardize the procedure above. We use symbol li (xj ),where
i = 1, 2, 3 · · · n, j = 1, 2, 3 · · · n which satisfies:
• li (x) must be a n-th order function, where n is the number of data points minus
1, considering
( we start from 0.
0 i = j,
• li (x) =
1 i 6= j.
Hence we can construct li (xj ):
1≤j≤n
Y (x − xj )
li (x) = .
(x i − xj )
j6=i
i = 0, 1, · · · , n. Moreover, let
n
Y
ωn+1 (x) = (x − xj ) ,
j=0
then n
Y
0
ωn+1 (xi ) = (xi − xj ) ,
j=0
j6=i
The Lagrange interpolation is simple and trivial, but when we add one point, the
whole process of calculation needs to be repeated. So we would introduce the Newton
interpolation.
1.3.1 Background
Suppose we have n + 1 distinct points (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ) and try to find
function f to interpolate these points such that yi = f (xi ), i = 0, 2, · · · , n.
First we consider f1 (x) passing through points x0 , f (x0 ) , x1 , f (x1 ) :
f1 (x) = f (x0 ) + k1 (x − x0 ).
Note that the structure of k(x − x0 ) ensures f1 (x0 ) = f (x0 ). By substituting f1 (x1 ) =
f (x1 ), we get the coefficient k.
f (x1 ) − f (x0 )
k1 = ,
x1 − x0
hence
f (x1 ) − f (x0 )
f1 (x) = f (x0 ) + (x − x0 ).
x1 − x0
With the same procedure, we can find the function f2 for (x0 , f1 (x0 )), (x1 , f (x1 )), (x2 , f (x2 ) :
f [i, j] − f [j, k]
• f [xi , xj , xk ] = , i 6= j 6= k is the second order difference quotient.
x i − xk
And universally
f [x1 , · · · , xk ] − f [x0 , · · · , xk−1 ]
f [x0 , x1 , · · · , xk ] =
xk − x0
is the k-th order difference quotient of f (x).
Hence we could use the difference quotients to define and simplify the Newton inter-
polation.
Definition. The Newton interpolation of points x0 , f (x0 )), (x1 , f (x1 )), ..., (xn , f (xn )
is
f (x) =f (x0 ) + f [x0 , x1 ] (x − x0 ) + f [x0 , x1 , x2 ] (x − x0 ) (x − x1 ) + · · ·
+ f [x0 , x1 , · · · , xn−2 , xn−1 ] (x − x0 ) (x − x1 ) · · · (x − xn−2 ) (x − xn−1 )
+ f [x0 , x1 , · · · , xn−1 , xn ] (x − x0 ) (x − x1 ) · · · (x − xn−1 ) (x − xn ) .
We can follow the table to calculate the difference quotient in Newton interpolation,
we can also see that by adding a new point, we only need to calculate only one new
item instead of recalculating the whole procedure again.
1.4.1 Uniqueness
We have learned from linear algebra, that if the whole system of equations has a
unique solution if and only if the determinant of the Vandermonde matrix is not 0.
When i 6= j, xi 6= xj , we have
Y
|V | = (xj − xi ) 6= 0,
0≤i<j≤n
which means that when xi , i = 0, 1, · · · , n are distinct points, the interpolation poly-
nomial exists and is unique.
Hence we can conclude that results of the undetermined coefficient method, Lagrange
interpolation and Newton interpolation are the same.
1.4.2 Error
Definition. Let f (x) be the function passing through n+1 points (x0 , y0 ), (x1 , y1 ), · · · ,
(xn , yn ), and Pn (x) be the n-th order interpolation, then the n-th order error is ex-
pressed as Rn (x), and
f (n+1) (ξ)
Rn (x) = f (x) − Pn (x) = ωn+1 (x),
(n + 1)!
where n
Y
ωn+1 (x) = (x − xj ) .
j=0
Note: This error could be applied for both Lagrange and Newton interpolation since
they are the same expression, by the proof of the uniqueness theorem above.
If we know two points, the interpolation is to find a point on the straight line connected
by two points naturally.( Note: That’s the reason why we call it ’linear’ ) It can be
verified that the following formula is the line from (x0 , y0 ) to (x1 , y1 ) (x0 6= x1 ):
x − x1 x − x0
P0,1 (x) = y0 + y1 ,
x0 − x1 x1 − x0
Then we consider the case of three points. We try to interpolate one point between
(x0 , y0 ) and (x1 , y1 ), interpolate another point between (x1 , y1 ) and (x2 , y2 ):
x − x2 x − x1
P1,2 (x) = y1 + y2 .
x1 − x2 x2 − x1
We acquiesce that two points are very closed to (x0 , y0 ), (x1 , y1 )
x − x2 x − x0
P0,1 + P1,2 = P0,1,2
x0 − x 2 x2 − x0
By elimination we can get:
(x − x1 ) (x − x2 ) (x − x0 ) (x − x2 ) (x − x0 ) (x − x1 )
y0 + y1 + y2 = P0,1,2 ,
(x0 − x1 ) (x0 − x2 ) (x1 − x0 ) (x1 − x2 ) (x2 − x0 ) (x2 − x1 )
so we could find the recursion:
x − x0 x − xn
P0,1··· ,n = P1,2··· ,n (x) + P0,1,2··· ,n−1 (x).
xn − x 0 x0 − xn
We can generalize the steps as the above figure:
We could interpolate step by step, so that’s the reason why we call it ’Stepwise’.
Hence we could get the theorem as follows.
Theorem 2.1 (L. & D. (2011)). Let a function f be defined at points x0 , x1 , · · · , xk
where xj and xi are two distinct members. For each k, there exists a Lagrange poly-
nomial P that interpolates the function f at the k + 1 points x0 , x1 , · · · , xk . The k th
Lagrange polynomial is defined as:
(x − xj ) P0,1,··· ,j−1,j+1,··· ,k (x) − (x − xi ) P0,1,··· ,i−1,i+1,··· ,k (x)
P (x) = .
(xi − xj )
The P0,1,··· ,j−1,j+1,··· ,k and P0,1,··· ,i−1,i+1,··· ,k are often denoted Q̂ and Q, respectively,
for ease of notation.
(x − xj ) Q̂(x) − (x − xi ) Q(x)
P (x) = .
(xi − xj )
The procedure that uses the result of Theorem 2.1 to recursively generate interpolat-
ing polynomial approximations is called Neville’s method.(L. & D. 2011)
2.2.1 Advantages
The advantage of Neville’s interpolation is that if the data are arranged in order of
closeness to the interpolated point, is that none of the work performed to obtain a spe-
cific degree result must be redone to evaluate the next higher degree result.(Rexhepi
et al. 2018)
2.2.2 Disadvantages
The disadvantage of Neville’s interpolation is that all of the work must be re-
done for each new value of x. The amount of work is essentially the same as for a
Lagrange polynomial. The divided difference polynomial minimizes these disadvan-
tages.(Rexhepi et al. 2018)
We have introduced some methods of interpolation, which usually concerns the value
of points, say, yi . However, in practice, the derivatives of points, say, yi0 , are also
important since they represent the rates of change and we do not wish to discard
this information. That is the reason we introduce the Hermite interpolation, which
requires only one more rule than the Lagrange interpolation: The n-th order interpo-
lation function Hi (x) has the same derivative with yi , i.e.,
where i = 0, 1.
Just using the undetermined coefficient method, let H3 (x) = ax3 + bx2 + cx + d. It
would be hard to calculate and not easy to generalize into higher orders, so we intro-
duce the method of ’base function’: note four base functions α0 (x), α1 (x), β0 (x), β1 (x),
which satisfy:
α 0 (x 0 ) = 1;
α 1 (x 0 ) = 0;
β0 (x 0 ) = 0; β1 (x0 ) = 0;
α0 (x1 ) = 0; α1 (x1 ) = 1; β0 (x1 ) = 0; β1 (x1 ) = 0;
0 0 0 .
α0 (x0 ) = 0;
α1 (x0 ) = 0;
β0 (x0 ) = 1;
β10 (x0 ) = 0;
0 0 0 0
α0 (x1 ) = 0. α1 (x1 ) = 0. β0 (x1 ) = 0. β1 (x1 ) = 1.
Note that α0 (x), α1 (x), β0 (x), β1 (x) are all 3-order functions. Let H3 = y0 α0 (x) +
y1 α1 (x) + y00 α0 (x) + y1 α1 (x), such that H3 is a polynomial interpolation which has an
order less than three.
The remaining problem is how to find the base function. First, we try to find α0 (x),
3.2 Standardization
Definition (L. & D. (2011)). If f ∈ C 1 [a, b] and x0 , . . . , xn ∈ [a, b] are distinct, the
unique polynomial of least degree agreeing with f and f 0 at x0 , . . . , xn is the Hermite
polynomial of degree at most 2n + 1 given by
n
X n
X
H2n+1 (x) = f (xj ) Hn,j (x) + f 0 (xj ) Ĥn,j (x)
j=0 j=0
where, for Ln,j (x) denote the j-th Lagrange coefficient polynomial of degree n, we have
Hn,j (x) = 1 − 2 (x − xj ) L0n,j (xj ) L2n,j (x) and Ĥn,j (x) = (x − xj ) L2n,j (x).
3.3 Error
Definition (Phillips & Taylor (1996)). If H2n+1 (x) is the Hermite interpolation poly-
nomial of f (x) at points x0 , x1 , . . . , xn (x0 , x1 , . . . , xn ∈ [a, b]), then the (2n + 1)-th
f (2n+2) (ξx ) 2
R2n+1 (x) = f (x) − H2n+1 (x) = ω (x),
(2n + 2)!
We have shown above that if we have n + 1 points (x0 , y0 ), (x2 , y2 ), · · · , (xn , yn ), then
we can have a n-th order polynomial by any kind of interpolation. However, there
would be some problems encountered when n is large.
1
Figure 6: Demonstration of Runge’s phenomenon for y = 1+25x2
,x ∈ [−1, 1]
4.1.1 Proof
The Weierstrass approximation theorem states that for every continuous function
f (x) defined on an interval [a, b], there exists a set of polynomial functions Pn (x)
for n = 0, 1, 2, · · · , each of degree at most n, that approximates f (x) with uniform
convergence over [a, b] as n tends to infinity, that is,
lim max |f (x) − Pn (x)| = 0.
n→∞ a≤x≤b
Consider the case where one desires to interpolate through n + 1 equispaced points
of a function f (x) using the n-degree polynomial Pn (x) that passes through those
points. Naturally, one might expect from Weierstrass’ theorem that using more points
would lead to a more accurate reconstruction of f (x). However, this particular set
of polynomial functions Pn (x) is not guaranteed to have the property of uniform
convergence; the theorem only states that a set of polynomial functions exists, without
providing a general method of finding one.
The Pn (x) produced in this manner may diverge away from f (x) as n increases;
this typically occurs in an oscillating pattern that magnifies near the ends of the
interpolation points. This phenomenon is attributed to Runge.(Epperson 1988)
Instead of using the n-th order interpolation polynomial considering n is too large,
we would use multiple polynomials among small intervals, one common application
is the Cubic spline, which is to use multi 3-order polynomials and combine them to
the final interpolation formula instead.
Definition. (L. & D. 2011) Given a function f defined on [a, b] and a set of nodes
a = x0 < x1 < · · · < xn = b, a cubic spline interpolant S for f is a function that
satisfies the following conditions:
(a) S(x) is a cubic polynomial, denoted Sj (x), on the subinterval [xj , xj+1 ] for each
j = 0, 1, . . . , n − 1;
(b) Sj (xj ) = f (xj ) and Sj (xj+1 ) = f (xj+1 ) for each j = 0, 1, . . . , n − 1;
(c) Sj+1 (xj+1 ) = Sj (xj+1 ) for each j = 0, 1, . . . , n − 2 (Implied by (b));
0
(d) Sj+1 (xj+1 ) = Sj0 (xj+1 ) for each j = 0, 1, . . . , n − 2;
00
(e) Sj+1 (xj+1 ) = Sj00 (xj+1 ) for each j = 0, 1, . . . , n − 2;
(f ) One of the following sets of boundary conditions is satisfied:
(i) S 00 (x0 ) = S 00 (xn ) = 0 (natural (or free) boundary);
(ii) S 0 (x0 ) = f 0 (x0 ) and S 0 (xn ) = f 0 (xn ) (clamped boundary).
When the free boundary conditions occur, the spline is called a natural spline(L. &
D. 2011). Note that the natural spline is the most commonly used spline.
Theorem 4.1. (L. & D. 2011) If f is defined at a = x0 < x1 < · · · < xn = b, then f
has a unique natural spline interpolant S on the nodes x0 , x1 , . . . , xn ; that is, a spline
interpolant that satisfies the natural boundary conditions S 00 (a) = 0 and S 00 (b) = 0.
4.2.1 Construction
We have stated that the Runge’s phenomenon would happen if we choose a set of
equispaced interpolation points, which means it could be avoided by selecting non-
equispaced points, say, Chebyshev nodes.
Definition. Let z = eiθ be a point on the unit circle. The associated x coordinate
is x = cos θ or θ = cos−1 x where x ∈ [−1, 1]. Define the nth degree Chebyshev
polynomial to be Tn (x) = cos nθ. The Chebyshev nodes x0 , x1 , · · · , xn are the roots of
Chebyshev polynomial Tn+1 .
The reason why Chebyshev nodes work should be originated from the error between
the generating function and the interpolating polynomial of order n, which is given
by
n+1
f (n+1) (ξ) Y
f (x) − Pn (x) = (x − xi )
(n + 1)! i=1
For the case of the Runge function which is interpolated at equidistant points, each
of the two multipliers in the upper bound for the approximation error grows to in-
(n+1)
finity with n. Since f (n+1)!(x) could be regarded as a constant, we need to optimize
Qn
i=0 |x − xi | instead. If we choose Chebyshev nodes from x0 to xn , then
Tn+1 (x)
(x − x0 ) (x − x1 ) . . . (x − xn ) =
2n
where
(2k + 1)π
xk = cos k = 0, 1, . . . , n.
2(n + 1)
5 Data Fitting
In interpolation, we construct a curve through the data points. In doing so, we make
the implicit assumption that the data points are accurate and distinct. Data fitting
is applied to data that contain scatter, like noise, usually due to measurement errors.
Here we want to find a smooth curve that approximates the data in some sense. Thus
the curve does not necessarily hit the data points. (Kiusalaas 2013)
The least squares method is introduced to decide the parameters of a curve that when
the sum of distances squared from different points to the curve achieves the minimum.
We consider an example here, suppose we have a series of data points as follows:
i 1 2 3 4 5 6
xi −9.19 −5.26 −1.39 6.71 4.7 2.66
yi −8.01 6.78 −1.47 4.71 4.1 4.23
Now we want to find a line that satisfies the linear squares condition.
For our convenience, we consider only 3 points A1 : (−9.19, −8.01), A2 : (−5.26, 6.78),
A3 : (−1.39, −1.47). Denote the distances from points A1 , A2 , A3 to the line y = ax+b
as d1 , d2 , d3 respectively. We want to minimize
We get
d1
= −9.19a + b + 8.01,
d2 = −5.26a + b − 6.78,
d3 = −1.39a + b + 1.47.
Hence:
D2 = (−9.19a + b + 8.01)2 + (−9.19a + b + 8.01)2 + (−9.19a + b + 8.01)2 .
With the knowledge in calculus, we can get the solution of a, b when D2 achieves the
minimum by calculating its derivative. The result for coefficients is
(
a ≈ 0.958,
b ≈ −0.584.
So the fitting line by least squares method of our example is :
y = 0.958x − 0.584.
5.2.2 Advantages
The least squares method is the most convenient procedure for determining best linear
approximations, but there are also important theoretical considerations that favour
it. The minimax approach generally assigns too much weight to a bit of data that is
badly in error, whereas the absolute deviation method does not give sufficient weight
to a point that is considerably out of line with the approximation. The least squares
approach puts substantially more weight on a point that is out of line with the rest of
the data, but will not permit that point to completely dominate the approximation.
An additional reason for considering the least squares approach involves the study of
the statistical distribution of error.(J. 1982)
Sometimes we assume that the data exponentially distributed, which would require
us to approximate the form
y = beax
or
y = bxa
or n
X
E= (yi − bxai )2 .
i=1
5.2.4 Standardization
ri = f (x, α1 , α2 , . . . , αn ) − yi , i = 1, 2, ..., n.
which would simplify to normal equations. This would be discussed in the later part.
Suppose we have m samples, each sample has n continuous features and a label value
y. Now it is necessary to find out the linear relationship between features and label
values. We can define such a cost function:
Definition. The cost function for m samples is
J(θ0 , θ1 , ..., θm ) = (hθ (x1 ) − y1 )2 + (hθ (x2 ) − y2 )2 + · · · + (hθ (xm ) − ym )2
Xm
= [ hθ (xi ) − yi ]2 .
i=1
Note that
• yi is the true value of the i-th training sample;
• hθ (x1 ) is prediction function of the i-th training sample;
• θ are the coefficients in the loss function.
Looking at the form of J(θ), we can relate it with D2 we talked about before. We want
to find coefficients set θ = θ1 , θ2 , . . . , θn , which minimizes the value of loss function.We
just need to find the partial derivative:
∂
J(Θ) = · · · = 0, j = 1, 2, · · · , n.
∂θj
Then we get
1 3000 33 5 4500
1 6500 30 2 12000
X=
1
y=
3500 31 4 8500
1 2800 28 5 4000
By the method of normal equations, we can calculate:
−1
Θ = XT · X · XT · y.
−1
1 1 1 1 1 3000 33 5 1 1 1 1 4500
3000 6500 3500 2800 1 6500 30 2 3000 6500 3500 2800 12000
Θ = · · · .
33 30 31 28 1 3500 31 4 33 30 31 28 8500
5 2 4 5 1 2800 28 5 5 2 4 5 4000
2764
5.3.2 Standardization
We will give several definitions, which would help us standardize the whole process.
Definition. The cost function of multi-variable regression is:
i=1
1 X 2
hθ xi − y (i)
J (θ0 , θ1 , · · · , θn) =
2m m
hθ (x) = ΘT X = θ0 x0 + θ1 x1 + · · · + θn xn .
The normal equation is to find the parameters that minimize the cost function by
solving the following equation:
∂
J (θj ) = 0.
∂Θj
Definition. For m training samples, and each sample has n features, the dataset is
defined as (1) (1)
x0 · · · xn
.. .. .. ,
X= . . .
(m) (m)
x0 · · · xn0
(i)
where xj denotes the j-th feature of i-th sample.
Definition. The coefficient is defined as:
Θ = [θ0 , θ1 , · · · , θn ]T .
By simplifying we have:
1
−2XT Y + 2XT XΘ = 0.
2m
References
Epperson, J. (1988), ‘On the runge example’, The American Mathematical Monthly
94, 329–341.
J., L. H. (1982), Introduction to probability theory and statistical inference, John Wiley
and Sons, New York, United States of America.
L., B. R. & D., F. J. (2011), Numerical Analysis (9th ed.)., Brooks Cole, Boston, MA,
United States of America.
Rexhepi, S., Iseni, E., Shaini, B. I. & Zenku, T. (2018), ‘Some notes on neville’s algo-
rithm of interpolation with applications to trigonometric interpolation’, MathLAB
Journal 1(3), 302–313.