0% found this document useful (0 votes)
43 views25 pages

Research On Interpolation and Data Fitting: Basis and Applications

The document discusses interpolation and data fitting methods. It introduces polynomial interpolation methods including Lagrange and Newton interpolation. It also discusses Neville's algorithm for stepwise linear interpolation, Hermite interpolation which considers derivatives, and solutions to Runge's phenomenon including cubic splines and Chebyshev nodes. It concludes with an overview of the linear least squares method for data fitting.

Uploaded by

Thùy Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views25 pages

Research On Interpolation and Data Fitting: Basis and Applications

The document discusses interpolation and data fitting methods. It introduces polynomial interpolation methods including Lagrange and Newton interpolation. It also discusses Neville's algorithm for stepwise linear interpolation, Hermite interpolation which considers derivatives, and solutions to Runge's phenomenon including cubic splines and Chebyshev nodes. It concludes with an overview of the linear least squares method for data fitting.

Uploaded by

Thùy Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Research on Interpolation and Data Fitting: Basis and

Applications

Yijie Xu, Runqi Xu


27803035, 27803052

Department of Mathematics and Statistics


arXiv:2208.11825v1 [math.NA] 25 Aug 2022

University of Reading

February 11, 2022

Abstract

In the era of big data, we first need to manage the data, which requires us to
find missing data or predict the trend, so we need operations including interpo-
lation and data fitting. Interpolation is a process to discover deducing new data
points in a range through known and discrete data points. When solving sci-
entific and engineering problems, data points are usually obtained by sampling,
experiments, and other methods. These data may represent a finite number of
numerical functions in which the values of independent variables. According to
these data, we often want to get a continuous function, i.e., curve; or more dense
discrete equations are consistent with known data, while this process is called
fitting.

This article describes why the main idea come out logically and how to apply
various method since the definitions are already written in the textbooks. At
the same time, we give examples to help introduce the definitions or show the
applications.

We divide interpolation into several parts by their methods or functions for


the structure. What comes first is the polynomial interpolation, which contains
Lagrange interpolation and Newton interpolation, which are essential but crit-
ical. Then we introduce a typical stepwise linear interpolation —— Neville’s
algorithm. If we are concerned about the derivative, it comes to Hermite inter-
polation; if we focus on smoothness, it comes to cubic splines and Chebyshev
nodes. Finally, in the Data fitting part, we introduce the most typical one: the
Linear squares method, which needs to be completed by normal equations.

i
Part 3 Project (Project Report) 27803035, 27803052

Contents

1 Interpolation by Polynomials 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Newton Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Uniqueness and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Neville’s Algorithm——Stepwise Linear Interpolation 7


2.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Hermite Interpolation: About Derivatives 9


3.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 About Runge’s Phenomenon - Cubic Splines and Chebyshev Nodes 12


4.1 Runge’s Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Solution 1: Piecewise Polynomials . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Solution 2: Chebyshev Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Data Fitting 16
5.1 Key Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Linear Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2.2 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

MA3PRO & MA3PRONU ii Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

5.2.3 Some other forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


5.2.4 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.1 Main Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

List of Figures

1 The polynomial interpolation of three points . . . . . . . . . . . . . . . . . 1


2 Three curves fi (x), i = 1, 2, 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 The result of interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Table for difference quotients . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Recursion for the Neville’s algorithm . . . . . . . . . . . . . . . . . . . . . . 7
1
6 Demonstration of Runge’s phenomenon for y = 1+25x2
,x ∈ [−1, 1] . . . . . . 12
7 The figure of the comparison of interpolation and data fitting . . . . . . . . 16
8 The figure of the fitting line . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

List of Tables

1 A table of data points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16


2 Example: loan record in a bank . . . . . . . . . . . . . . . . . . . . . . . . . 19

MA3PRO & MA3PRONU iii Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

1 Interpolation by Polynomials

1.1 Introduction

Put simply, polynomial interpolation is the interpolation of a given data set through
polynomials. Given a set of n + 1 distinct data points (xi , yi ), i = 0, 1, · · · , n, we try
to find a polynomial p, such that

p(xi ) = yi , i = 0, 1, · · · , n.

Note that the uniqueness theorem shows that such polynomial p exists and is unique,
which would be covered later.

1.2 Lagrange Interpolation

1.2.1 Main Idea

Suppose we have three distinct points (x1 , y1 ), (x2 , y2 ), (x3 , y3 ), obviously we have to
construct a 3-order curve which could go across all three points. Rather than find it
directly by its form as f (x) = ax2 + bx + c, a 6= 0, we want to follow Joseph-Louis
Lagrange’s thought: try to construct proper function by the composition of three
functions related to these three points.

Figure 1: The polynomial interpolation of three points

We would construct three curves, where


• The first curve f1 (x) takes value 1 at x1 , while taking 0 at other two points;
• The second curve f2 (x) takes value 1 at x2 , while taking 0 at other two points;
• The third curve f3 (x) takes value 1 at x3 , while taking 0 at other two points.
with the figures shown below.

MA3PRO & MA3PRONU 1 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

Figure 2: Three curves fi (x), i = 1, 2, 3

Hence
• y1 f1 (x) can make sure take value y1 at x1 , 0 at other points;
• y2 f2 (x) can make sure take value y2 at x2 , 0 at other points;
• y3 f3 (x) can make sure take value y3 at x3 , 0 at other points.
Hence, f (x) = y1 f1 (x)+y2 f2 (x)+y3 f3 (x) must go through (x1 , y1 ), (x2 , y2 ), (x3 , y3 ).We
can test and verify it by MATLAB:

Figure 3: The result of interpolation

However, this is a simple case, so we need to standardize it in the next chapter so


that it could be used more universally.

MA3PRO & MA3PRONU 2 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

1.2.2 Standardization

The only work left is to standardize the procedure above. We use symbol li (xj ),where
i = 1, 2, 3 · · · n, j = 1, 2, 3 · · · n which satisfies:
• li (x) must be a n-th order function, where n is the number of data points minus
1, considering
( we start from 0.
0 i = j,
• li (x) =
1 i 6= j.
Hence we can construct li (xj ):
1≤j≤n
Y (x − xj )
li (x) = .
(x i − xj )
j6=i

Obviously, li (xj ) satisfies the above conditions.


So finally, we get
n
X
L(x) = yi li (x),
i=1

which is the Lagrange interpolation, which could also be defined as follows:

Definition. The n-th order Lagrange interpolation for function f (x) is


n
X (x − x1 ) (x − x2 ) · · · (x − xi−1 ) (x − xi+1 ) · · · (x − xn )
Ln (x) = f (xi ) ,
i=0
(x0 − x1 ) (x0 − x2 ) · · · (x0 − xi−1 ) (x0 − xi+1 ) · · · (x0 − xn )

i = 0, 1, · · · , n. Moreover, let
n
Y
ωn+1 (x) = (x − xj ) ,
j=0

then n
Y
0
ωn+1 (xi ) = (xi − xj ) ,
j=0
j6=i

then Ln (x) can also be expressed as


n
X ωn+1 (x)
Ln (x) = 0
f (xi )
i=0
(x − xi ) ωn+1 (xi )

The Lagrange interpolation is simple and trivial, but when we add one point, the
whole process of calculation needs to be repeated. So we would introduce the Newton
interpolation.

MA3PRO & MA3PRONU 3 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

1.3 Newton Interpolation

1.3.1 Background

Constructing interpolation by the undetermined coefficient method and Lagrange in-


terpolation could solve problems in proper ways, but the workload of a computer
would be unimaginable when it comes to a large amount of a number set. So the
Newton interpolation is introduced to overcome this shortage.

1.3.2 Main Idea

Suppose we have n + 1 distinct points (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ) and try to find
function f to interpolate these points such that yi = f (xi ), i = 0, 2, · · · , n.
 
First we consider f1 (x) passing through points x0 , f (x0 ) , x1 , f (x1 ) :

f1 (x) = f (x0 ) + k1 (x − x0 ).

Note that the structure of k(x − x0 ) ensures f1 (x0 ) = f (x0 ). By substituting f1 (x1 ) =
f (x1 ), we get the coefficient k.

f (x1 ) − f (x0 )
k1 = ,
x1 − x0
hence
f (x1 ) − f (x0 )
f1 (x) = f (x0 ) + (x − x0 ).
x1 − x0

With the same procedure, we can find the function f2 for (x0 , f1 (x0 )), (x1 , f (x1 )), (x2 , f (x2 ) :

f2 (x) = f1 (x) + k2 (x − x0 )(x − x1 ).

By solving f2 (x2 ) = f (x2 ), we get:


f (x2 )−f (x1 ) f (x1 )−f (x0 )
x2 −x1
− x1 −x0
k2 = ,
x2 − x0
hence
f (x2 )−f (x1 ) f (x1 )−f (x0 )
f (x1 ) − f (x0 ) x2 −x1
− x1 −x0
f2 (x) = f (x0 ) + (x − x0 ) + (x − x0 ) (x − x1 ) .
x1 − x0 x2 − x0
Analysing the forms of k1 and k2 , some regularities can be found, so we would intro-
duce the difference quotient.

Definition. The k-th order difference quotient at x0 , x1 , · · · , xk of f (x) is defined as


follows:
f (xi ) − f (xj )
• f [xi , xj ] = , i 6= j is the first order difference quotient;
xi − xj

MA3PRO & MA3PRONU 4 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

f [i, j] − f [j, k]
• f [xi , xj , xk ] = , i 6= j 6= k is the second order difference quotient.
x i − xk
And universally
f [x1 , · · · , xk ] − f [x0 , · · · , xk−1 ]
f [x0 , x1 , · · · , xk ] =
xk − x0
is the k-th order difference quotient of f (x).

Hence we could use the difference quotients to define and simplify the Newton inter-
polation.

Definition. The Newton interpolation of points x0 , f (x0 )), (x1 , f (x1 )), ..., (xn , f (xn )
is
f (x) =f (x0 ) + f [x0 , x1 ] (x − x0 ) + f [x0 , x1 , x2 ] (x − x0 ) (x − x1 ) + · · ·
+ f [x0 , x1 , · · · , xn−2 , xn−1 ] (x − x0 ) (x − x1 ) · · · (x − xn−2 ) (x − xn−1 )
+ f [x0 , x1 , · · · , xn−1 , xn ] (x − x0 ) (x − x1 ) · · · (x − xn−1 ) (x − xn ) .

We can follow the table to calculate the difference quotient in Newton interpolation,
we can also see that by adding a new point, we only need to calculate only one new
item instead of recalculating the whole procedure again.

Figure 4: Table for difference quotients

Now that we have introduced two kinds of interpolation, it is necessary to point


out that these two formulas form the same polynomial interpolation given the same
points, which would be proved in the next section.

1.4 Uniqueness and Error

1.4.1 Uniqueness

We have shown multiple methods to calculate the polynomial interpolation given


numbers of points. The uniqueness theorem shows that the result is always the same
if using different methods, which could be proved by linear algebra.

MA3PRO & MA3PRONU 5 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

Consider the case of constructing the polynomial interpolation by the undetermined


coefficient method:
n

 a0 + a1 x0 + · · · an x0n = f (x0 )

a0 + a1 x1 + · · · an x1 = f (x1 )


 ···
a0 + a1 xn + · · · an xnn = f (xn ) .

Usually, we solve a system of equations by converting it to the matrix form, so the


Vandermonde matrix would be introduced.
Definition. The (n + 1)-th order Vandermonde matrix is
1 x0 x20 · · · xn0
1 x1 x21 · · · xn1
V = .. .. .. ..
. . . .
1 xn x2n · · · xnn .

Then the polynomial could be written in the matrix form, i.e.,


   
f (x0 ) a0
 f (x1 ) 
 = V  a1  .
 

 ···   ··· 
f (xn ) an

We have learned from linear algebra, that if the whole system of equations has a
unique solution if and only if the determinant of the Vandermonde matrix is not 0.
When i 6= j, xi 6= xj , we have
Y
|V | = (xj − xi ) 6= 0,
0≤i<j≤n

which means that when xi , i = 0, 1, · · · , n are distinct points, the interpolation poly-
nomial exists and is unique.
Hence we can conclude that results of the undetermined coefficient method, Lagrange
interpolation and Newton interpolation are the same.

1.4.2 Error

Definition. Let f (x) be the function passing through n+1 points (x0 , y0 ), (x1 , y1 ), · · · ,
(xn , yn ), and Pn (x) be the n-th order interpolation, then the n-th order error is ex-
pressed as Rn (x), and
f (n+1) (ξ)
Rn (x) = f (x) − Pn (x) = ωn+1 (x),
(n + 1)!
where n
Y
ωn+1 (x) = (x − xj ) .
j=0

Note: This error could be applied for both Lagrange and Newton interpolation since
they are the same expression, by the proof of the uniqueness theorem above.

MA3PRO & MA3PRONU 6 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

2 Neville’s Algorithm——Stepwise Linear Interpolation

2.1 Main Idea

If we know two points, the interpolation is to find a point on the straight line connected
by two points naturally.( Note: That’s the reason why we call it ’linear’ ) It can be
verified that the following formula is the line from (x0 , y0 ) to (x1 , y1 ) (x0 6= x1 ):
x − x1 x − x0
P0,1 (x) = y0 + y1 ,
x0 − x1 x1 − x0
Then we consider the case of three points. We try to interpolate one point between
(x0 , y0 ) and (x1 , y1 ), interpolate another point between (x1 , y1 ) and (x2 , y2 ):
x − x2 x − x1
P1,2 (x) = y1 + y2 .
x1 − x2 x2 − x1
We acquiesce that two points are very closed to (x0 , y0 ), (x1 , y1 )
x − x2 x − x0
P0,1 + P1,2 = P0,1,2
x0 − x 2 x2 − x0
By elimination we can get:
(x − x1 ) (x − x2 ) (x − x0 ) (x − x2 ) (x − x0 ) (x − x1 )
y0 + y1 + y2 = P0,1,2 ,
(x0 − x1 ) (x0 − x2 ) (x1 − x0 ) (x1 − x2 ) (x2 − x0 ) (x2 − x1 )
so we could find the recursion:
x − x0 x − xn
P0,1··· ,n = P1,2··· ,n (x) + P0,1,2··· ,n−1 (x).
xn − x 0 x0 − xn
We can generalize the steps as the above figure:

Figure 5: Recursion for the Neville’s algorithm

We could interpolate step by step, so that’s the reason why we call it ’Stepwise’.
Hence we could get the theorem as follows.
Theorem 2.1 (L. & D. (2011)). Let a function f be defined at points x0 , x1 , · · · , xk
where xj and xi are two distinct members. For each k, there exists a Lagrange poly-
nomial P that interpolates the function f at the k + 1 points x0 , x1 , · · · , xk . The k th
Lagrange polynomial is defined as:
(x − xj ) P0,1,··· ,j−1,j+1,··· ,k (x) − (x − xi ) P0,1,··· ,i−1,i+1,··· ,k (x)
P (x) = .
(xi − xj )

MA3PRO & MA3PRONU 7 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

The P0,1,··· ,j−1,j+1,··· ,k and P0,1,··· ,i−1,i+1,··· ,k are often denoted Q̂ and Q, respectively,
for ease of notation.

(x − xj ) Q̂(x) − (x − xi ) Q(x)
P (x) = .
(xi − xj )

The procedure that uses the result of Theorem 2.1 to recursively generate interpolat-
ing polynomial approximations is called Neville’s method.(L. & D. 2011)

2.2 Advantages and Disadvantages

2.2.1 Advantages

The advantage of Neville’s interpolation is that if the data are arranged in order of
closeness to the interpolated point, is that none of the work performed to obtain a spe-
cific degree result must be redone to evaluate the next higher degree result.(Rexhepi
et al. 2018)

2.2.2 Disadvantages

The disadvantage of Neville’s interpolation is that all of the work must be re-
done for each new value of x. The amount of work is essentially the same as for a
Lagrange polynomial. The divided difference polynomial minimizes these disadvan-
tages.(Rexhepi et al. 2018)

MA3PRO & MA3PRONU 8 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

3 Hermite Interpolation: About Derivatives

3.1 Main Idea

We have introduced some methods of interpolation, which usually concerns the value
of points, say, yi . However, in practice, the derivatives of points, say, yi0 , are also
important since they represent the rates of change and we do not wish to discard
this information. That is the reason we introduce the Hermite interpolation, which
requires only one more rule than the Lagrange interpolation: The n-th order interpo-
lation function Hi (x) has the same derivative with yi , i.e.,

Hi0 (x) = yi0 , i = 0, 1, · · · , n.

where n is the number of conditions minus 1, considering we start from 0.


We can compare the Hermite interpolation with the main idea of the Lagrange in-
terpolation —— to construct ’base functions’. Without loss of generality, we discuss
a two points, 3-order case of the Hermite interpolation. Usually, when we want to
have a n-th order interpolation, we may need n distinct points, but for the Hermite
interpolation, we only need almost half of the points, since we have twice condi-
tions than before: the additional derivative conditions. Suppose we have two points
(x0 , y0 ), (x1 , y1 ), when we construct a function H3 (x), it needs to satisfy the following
conditions: (
H3 (xi ) = yi
H30 (xi ) = yi0

where i = 0, 1.
Just using the undetermined coefficient method, let H3 (x) = ax3 + bx2 + cx + d. It
would be hard to calculate and not easy to generalize into higher orders, so we intro-
duce the method of ’base function’: note four base functions α0 (x), α1 (x), β0 (x), β1 (x),
which satisfy:
   

 α 0 (x 0 ) = 1; 
 α 1 (x 0 ) = 0; 
 β0 (x 0 ) = 0;  β1 (x0 ) = 0;

α0 (x1 ) = 0; α1 (x1 ) = 1; β0 (x1 ) = 0; β1 (x1 ) = 0;
   
0 0 0 .

 α0 (x0 ) = 0; 
 α1 (x0 ) = 0; 
 β0 (x0 ) = 1; 
 β10 (x0 ) = 0;
 0  0  0  0
α0 (x1 ) = 0. α1 (x1 ) = 0. β0 (x1 ) = 0. β1 (x1 ) = 1.

Note that α0 (x), α1 (x), β0 (x), β1 (x) are all 3-order functions. Let H3 = y0 α0 (x) +
y1 α1 (x) + y00 α0 (x) + y1 α1 (x), such that H3 is a polynomial interpolation which has an
order less than three.
The remaining problem is how to find the base function. First, we try to find α0 (x),

MA3PRO & MA3PRONU 9 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

from the given condition we have:


(
 α0 (x1 ) = 0
⇒ α0 (x) = [a + b (x − x0 )] (x − x1 )2 ,


 0
 α0 (x1 ) = 0



1
α0 (x0 ) = 1 ⇒a=

 (x0 − x1 )2
2


0
⇒b=

α0 (x0 ) = 0, .

(x1 − x0 ) (x0 − x1 )2

From above all, we can get:


  2
x − x0 x − x1
α0 (x) = 1 + 2 .
x1 − x0 x0 − x 1
Similarly, we get
   2
 2 (x − x0 ) x − x0

 α1 (x) = 1 + ,
x1− x0 x1 − x 0



2
 
x − x1

β0 (x) = (x − x0 ) ,

 x0 − x1 
  2

 x − x0
 β
 1
 (x) = (x − x 1 ) .
x1 − x0
So the 3-order Hermite Interpolation can be expressed as:
   2
x − x0 0 x − x1
H3 (x) = 1 + 2 y0 + (x − x0 ) y0
x1 − x0 x0 − x1
   2
x − x1 0 x − x0
+ 1+2 y1 + (x − x1 ) y1 .
x0 − x1 x1 − x0

3.2 Standardization

Definition (L. & D. (2011)). If f ∈ C 1 [a, b] and x0 , . . . , xn ∈ [a, b] are distinct, the
unique polynomial of least degree agreeing with f and f 0 at x0 , . . . , xn is the Hermite
polynomial of degree at most 2n + 1 given by
n
X n
X
H2n+1 (x) = f (xj ) Hn,j (x) + f 0 (xj ) Ĥn,j (x)
j=0 j=0

where, for Ln,j (x) denote the j-th Lagrange coefficient polynomial of degree n, we have

Hn,j (x) = 1 − 2 (x − xj ) L0n,j (xj ) L2n,j (x) and Ĥn,j (x) = (x − xj ) L2n,j (x).
 

3.3 Error

Definition (Phillips & Taylor (1996)). If H2n+1 (x) is the Hermite interpolation poly-
nomial of f (x) at points x0 , x1 , . . . , xn (x0 , x1 , . . . , xn ∈ [a, b]), then the (2n + 1)-th

MA3PRO & MA3PRONU 10 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

order error is expressed as R2n+1 (x), and

f (2n+2) (ξx ) 2
R2n+1 (x) = f (x) − H2n+1 (x) = ω (x),
(2n + 2)!

for some ξx ∈ (a, b), and


n
Y
ω(x) = (x − xi ).
i=0

MA3PRO & MA3PRONU 11 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

4 About Runge’s Phenomenon - Cubic Splines and Cheby-


shev Nodes

We have shown above that if we have n + 1 points (x0 , y0 ), (x2 , y2 ), · · · , (xn , yn ), then
we can have a n-th order polynomial by any kind of interpolation. However, there
would be some problems encountered when n is large.

4.1 Runge’s Phenomenon

It could be summarized that Runge phenomenon is an oscillation problem that occurs


at the edges of an interval containing data points when using polynomial interpolation
with a high degree of a polynomial over a set of equispaced interpolation points.
Runge’s phenomenon states that when the order of the polynomial is too high, it
would cause high errors, which might diverge to infinity.

1
Figure 6: Demonstration of Runge’s phenomenon for y = 1+25x2
,x ∈ [−1, 1]

4.1.1 Proof

The Weierstrass approximation theorem states that for every continuous function
f (x) defined on an interval [a, b], there exists a set of polynomial functions Pn (x)
for n = 0, 1, 2, · · · , each of degree at most n, that approximates f (x) with uniform
convergence over [a, b] as n tends to infinity, that is,
 
lim max |f (x) − Pn (x)| = 0.
n→∞ a≤x≤b

Consider the case where one desires to interpolate through n + 1 equispaced points
of a function f (x) using the n-degree polynomial Pn (x) that passes through those
points. Naturally, one might expect from Weierstrass’ theorem that using more points

MA3PRO & MA3PRONU 12 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

would lead to a more accurate reconstruction of f (x). However, this particular set
of polynomial functions Pn (x) is not guaranteed to have the property of uniform
convergence; the theorem only states that a set of polynomial functions exists, without
providing a general method of finding one.
The Pn (x) produced in this manner may diverge away from f (x) as n increases;
this typically occurs in an oscillating pattern that magnifies near the ends of the
interpolation points. This phenomenon is attributed to Runge.(Epperson 1988)

4.2 Solution 1: Piecewise Polynomials

Instead of using the n-th order interpolation polynomial considering n is too large,
we would use multiple polynomials among small intervals, one common application
is the Cubic spline, which is to use multi 3-order polynomials and combine them to
the final interpolation formula instead.

Definition. (L. & D. 2011) Given a function f defined on [a, b] and a set of nodes
a = x0 < x1 < · · · < xn = b, a cubic spline interpolant S for f is a function that
satisfies the following conditions:
(a) S(x) is a cubic polynomial, denoted Sj (x), on the subinterval [xj , xj+1 ] for each
j = 0, 1, . . . , n − 1;
(b) Sj (xj ) = f (xj ) and Sj (xj+1 ) = f (xj+1 ) for each j = 0, 1, . . . , n − 1;
(c) Sj+1 (xj+1 ) = Sj (xj+1 ) for each j = 0, 1, . . . , n − 2 (Implied by (b));
0
(d) Sj+1 (xj+1 ) = Sj0 (xj+1 ) for each j = 0, 1, . . . , n − 2;
00
(e) Sj+1 (xj+1 ) = Sj00 (xj+1 ) for each j = 0, 1, . . . , n − 2;
(f ) One of the following sets of boundary conditions is satisfied:
(i) S 00 (x0 ) = S 00 (xn ) = 0 (natural (or free) boundary);
(ii) S 0 (x0 ) = f 0 (x0 ) and S 0 (xn ) = f 0 (xn ) (clamped boundary).

When the free boundary conditions occur, the spline is called a natural spline(L. &
D. 2011). Note that the natural spline is the most commonly used spline.

Theorem 4.1. (L. & D. 2011) If f is defined at a = x0 < x1 < · · · < xn = b, then f
has a unique natural spline interpolant S on the nodes x0 , x1 , . . . , xn ; that is, a spline
interpolant that satisfies the natural boundary conditions S 00 (a) = 0 and S 00 (b) = 0.

4.2.1 Construction

We could always get a system of 4n equations if we are given enough information


which is stated in the definition above, and we always try to solve by algorithms and
apply them in computers when n is large. Here follows the algorithm for calculating
the natural cubic spline, note that the time complexity is O(n).

MA3PRO & MA3PRONU 13 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

Algorithm 1 Algorithm for natural cubic spline(L. & D. 2011)


Input: n; x0 , x1 , . . . , xn ; a0 = f (x0 ) , a1 = f (x1 ) , . . . , an = f (xn )
Output: aj , bj , cj , dj for j = 0, 1, . . . , n − 1
1: for i = 0, 1, . . . , n − 1 do
2: set hi = xi+1 − xi .
3: for i = 1, 2, · · · , n − 1 do
4: set αi = h3i (ai+1 − ai ) − 3
hi−1 (ai − ai−1 ).
5: Set l0 = 1;
6: µ0 = 0;
7: z0 = 0.
8: for i = 1, 2, · · · , n − 1 do
9: set li = 2 (xi+1 − xi−1 ) − hi−1 µi−1 ;
10: µi = hi /li ;
11: zi = (αi − hi−1 zi−1 ) /li .
12: Set ln = 1;
13: zn = 0;
14: cn = 0.
15: for j = n − 1, n − 2, · · · , 0 do
16: set cj = zj − µj cj+1 ;
17: bj = (aj+1 − aj ) /hj − hj (cj+1 + 2cj ) /3;
18: dj = (cj+1 − cj ) / (3hj ).
19: return aj , bj , cj , dj for 0, 1, · · · , n − 1

4.3 Solution 2: Chebyshev Nodes

We have stated that the Runge’s phenomenon would happen if we choose a set of
equispaced interpolation points, which means it could be avoided by selecting non-
equispaced points, say, Chebyshev nodes.

Definition. Let z = eiθ be a point on the unit circle. The associated x coordinate
is x = cos θ or θ = cos−1 x where x ∈ [−1, 1]. Define the nth degree Chebyshev
polynomial to be Tn (x) = cos nθ. The Chebyshev nodes x0 , x1 , · · · , xn are the roots of
Chebyshev polynomial Tn+1 .

The reason why Chebyshev nodes work should be originated from the error between
the generating function and the interpolating polynomial of order n, which is given
by
n+1
f (n+1) (ξ) Y
f (x) − Pn (x) = (x − xi )
(n + 1)! i=1

for some ξ in (−1, 1). Thus,


n
f (n+1) (x) Y
max |f (x) − Pn (x)| ≤ max max |x − xi | .
−1≤x≤1 −1≤x≤1 (n + 1)! −1≤x≤1
i=0

MA3PRO & MA3PRONU 14 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

For the case of the Runge function which is interpolated at equidistant points, each
of the two multipliers in the upper bound for the approximation error grows to in-
(n+1)
finity with n. Since f (n+1)!(x) could be regarded as a constant, we need to optimize
Qn
i=0 |x − xi | instead. If we choose Chebyshev nodes from x0 to xn , then

Tn+1 (x)
(x − x0 ) (x − x1 ) . . . (x − xn ) =
2n

where  
(2k + 1)π
xk = cos k = 0, 1, . . . , n.
2(n + 1)

Note that Tn+1


2n
(x)
could have the smallest k · k∞ value over the interval [−1, 1]. We
could prove it by assuming there exists qn+1 which k · k∞ value is not greater than
Tn+1 .
Now Tn+1
2n
= 2−n is achieved n + 2 times within [−1, 1]. By definition |qn+1 (x)| <

1
at each n + 2 extreme points. Thus
2n
Tn+1
D(x) = − qn+1
2n
is a polynomial of degree ≤ n and has the same sign as Tn+1 at each of the n + 2
extreme points.
Note that D(x) must change sign n + 1 times on [−1, 1] which is impossible for a
polynomial of degree ≤ n, which would yield a contradiction. So we would get better
results if using Chebyshev nodes rather than equispaced nodes.(Peirce n.d.)
Note that Chebyshev nodes are only between [−1, 1], but we could rescale any intervals
to [−1, 1] to achieve universality(L. & D. 2011).

Theorem 4.2. (Peirce n.d.) For t ∈ [−1, 1], let


t(b − a) + a + b
x= ,
2

then we can rescale any interval [a, b] to [−1, 1].

MA3PRO & MA3PRONU 15 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

5 Data Fitting

5.1 Key Difference

In interpolation, we construct a curve through the data points. In doing so, we make
the implicit assumption that the data points are accurate and distinct. Data fitting
is applied to data that contain scatter, like noise, usually due to measurement errors.
Here we want to find a smooth curve that approximates the data in some sense. Thus
the curve does not necessarily hit the data points. (Kiusalaas 2013)

Figure 7: The figure of the comparison of interpolation and data fitting

5.2 Linear Squares Method

5.2.1 Main Idea

The least squares method is introduced to decide the parameters of a curve that when
the sum of distances squared from different points to the curve achieves the minimum.
We consider an example here, suppose we have a series of data points as follows:

i 1 2 3 4 5 6
xi −9.19 −5.26 −1.39 6.71 4.7 2.66
yi −8.01 6.78 −1.47 4.71 4.1 4.23

Table 1: A table of data points

Now we want to find a line that satisfies the linear squares condition.
For our convenience, we consider only 3 points A1 : (−9.19, −8.01), A2 : (−5.26, 6.78),
A3 : (−1.39, −1.47). Denote the distances from points A1 , A2 , A3 to the line y = ax+b
as d1 , d2 , d3 respectively. We want to minimize

D2 = d21 + d22 + d23 .

MA3PRO & MA3PRONU 16 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

We get 
d1

 = −9.19a + b + 8.01,
d2 = −5.26a + b − 6.78,

d3 = −1.39a + b + 1.47.

Hence:
D2 = (−9.19a + b + 8.01)2 + (−9.19a + b + 8.01)2 + (−9.19a + b + 8.01)2 .
With the knowledge in calculus, we can get the solution of a, b when D2 achieves the
minimum by calculating its derivative. The result for coefficients is
(
a ≈ 0.958,
b ≈ −0.584.
So the fitting line by least squares method of our example is :
y = 0.958x − 0.584.

Figure 8: The figure of the fitting line

5.2.2 Advantages

The least squares method is the most convenient procedure for determining best linear
approximations, but there are also important theoretical considerations that favour
it. The minimax approach generally assigns too much weight to a bit of data that is
badly in error, whereas the absolute deviation method does not give sufficient weight
to a point that is considerably out of line with the approximation. The least squares
approach puts substantially more weight on a point that is out of line with the rest of
the data, but will not permit that point to completely dominate the approximation.
An additional reason for considering the least squares approach involves the study of
the statistical distribution of error.(J. 1982)

MA3PRO & MA3PRONU 17 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

5.2.3 Some other forms

Sometimes we assume that the data exponentially distributed, which would require
us to approximate the form
y = beax

or
y = bxa

for a, b ∈ C. So we need to minimize


n
X
E= (yi − beaxi )2 ,
i=1

or n
X
E= (yi − bxai )2 .
i=1

5.2.4 Standardization

Given function f (x, α1 , α2 , . . . , αn ), and n distinct points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )


on f . We would give the definition of residuals:
Definition. The residual ri for function f (x, α1 , α2 , . . . , αn ) and n distinct points
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) is

ri = f (x, α1 , α2 , . . . , αn ) − yi , i = 1, 2, ..., n.

The aim is to minimize the sum of residuals, that is :


N
X N
X
min ri2 = min [f (xi ; α1 , α2 , · · · , αn ) − yi ]2
i=1 i=1

which would simplify to normal equations. This would be discussed in the later part.

5.3 Normal Equations

5.3.1 Main Idea

Suppose we have m samples, each sample has n continuous features and a label value
y. Now it is necessary to find out the linear relationship between features and label
values. We can define such a cost function:
Definition. The cost function for m samples is
J(θ0 , θ1 , ..., θm ) = (hθ (x1 ) − y1 )2 + (hθ (x2 ) − y2 )2 + · · · + (hθ (xm ) − ym )2
Xm
= [ hθ (xi ) − yi ]2 .
i=1

MA3PRO & MA3PRONU 18 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

Note that
• yi is the true value of the i-th training sample;
• hθ (x1 ) is prediction function of the i-th training sample;
• θ are the coefficients in the loss function.
Looking at the form of J(θ), we can relate it with D2 we talked about before. We want
to find coefficients set θ = θ1 , θ2 , . . . , θn , which minimizes the value of loss function.We
just need to find the partial derivative:

J(Θ) = · · · = 0, j = 1, 2, · · · , n.
∂θj

By solving it we can get, by matrix form,


−1
Θ = XT · X XT Y.
We will get a detailed proof and calculation later. Now we look at an example of a
loan record in a bank, where x represents different features and y represent the value:

x0 Income x1 Age x2 Credit level x3 Values y


1 3000 33 5 4500
1 6500 30 2 12000
1 3500 31 4 8500
1 2800 28 5 4000

Table 2: Example: loan record in a bank

Then we get    
1 3000 33 5 4500
 1 6500 30 2   12000 
X=
 1
 y= 
3500 31 4   8500 
1 2800 28 5 4000
By the method of normal equations, we can calculate:

−1
Θ = XT · X · XT · y.

   −1   
1 1 1 1 1 3000 33 5 1 1 1 1 4500
3000 6500 3500 2800 1 6500 30 2 3000 6500 3500 2800 12000
Θ =  ·  · · .
      
33 30 31 28 1 3500 31 4 33 30 31 28 8500
5 2 4 5 1 2800 28 5 5 2 4 5 4000

Hence the coefficient for X, i.e., Θ, is


−22921
 
 5 
Θ=
 −210  .

2764

MA3PRO & MA3PRONU 19 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

5.3.2 Standardization

We will give several definitions, which would help us standardize the whole process.
Definition. The cost function of multi-variable regression is:
i=1
1 X 2
hθ xi − y (i)

J (θ0 , θ1 , · · · , θn) =
2m m

Definition. The predict function hθ (x) is defined as:

hθ (x) = ΘT X = θ0 x0 + θ1 x1 + · · · + θn xn .

The normal equation is to find the parameters that minimize the cost function by
solving the following equation:

J (θj ) = 0.
∂Θj
Definition. For m training samples, and each sample has n features, the dataset is
defined as  (1) (1)

x0 · · · xn
 .. .. ..  ,
X= . . . 
(m) (m)
x0 · · · xn0
(i)
where xj denotes the j-th feature of i-th sample.
Definition. The coefficient is defined as:

Θ = [θ0 , θ1 , · · · , θn ]T .

Definition. The output variable is defined as:


T
Y = y (1) , y (2) , · · · , y (m) .


With the definitions above, we could define the cost function.


Definition. The cost function is defined as:
1
J (θ0 , θ1 , · · · , θn ) = (X · Θ − Y)T (X · Θ − Y)
2m
1
YT Y − YT XΘ − ΘT XT Y + ΘT XT XΘ .

=
2m
To find the derivative, we get:
"   #
0 1 ∂ YT Y ∂ YT XΘ ∂ΘT XT Y ∂ΘT XT Xθ
J = − − + .
2m ∂Θ ∂Θ ∂Θ ∂Θ

By simplifying we have:
1
−2XT Y + 2XT XΘ = 0.

2m

MA3PRO & MA3PRONU 20 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

Hence the coefficients could be derived as


−1
Θ = XT X XT Y.

We have ended our proof.

MA3PRO & MA3PRONU 21 Prof. Peter Sweby


Part 3 Project (Project Report) 27803035, 27803052

References
Epperson, J. (1988), ‘On the runge example’, The American Mathematical Monthly
94, 329–341.

J., L. H. (1982), Introduction to probability theory and statistical inference, John Wiley
and Sons, New York, United States of America.

Kiusalaas, J. (2013), Numerical Methods in Engineering with Python, Cambridge


University Press.

L., B. R. & D., F. J. (2011), Numerical Analysis (9th ed.)., Brooks Cole, Boston, MA,
United States of America.

Peirce, A. (n.d.), ‘Lecture notes on variational and approximate methods in applied


mathematics’, p. 3. Accessed: 2021-03-01.

Phillips, G. M. & Taylor, P. J. (1996), Theory and Applications of Numerical Analysis,


Academic Press, Cambridge, Massachusetts, United States of America.

Rexhepi, S., Iseni, E., Shaini, B. I. & Zenku, T. (2018), ‘Some notes on neville’s algo-
rithm of interpolation with applications to trigonometric interpolation’, MathLAB
Journal 1(3), 302–313.

MA3PRO & MA3PRONU 22 Prof. Peter Sweby

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy