APM2613_Lesson 5_0_2023
APM2613_Lesson 5_0_2023
Numerical Analysis I
APM2613
Year Module
BARCODE
Open Rubric
1 Lesson 5: Polynomial Interpolation and Approximation
1.1 Objectives
The objectives of this Lesson are:
• To highlight further useful theory and properties of functions;
2. To compute the actual error incurred in using the interpolating polynomial approximations at
a given point;
4. To use the following functions to approximate a function whose values are tabulated at certain
points in the least-squares sense:
1.3 Introduction
It is common that data is generated from experiments, observations or any such activity that gen-
erates data or tabulated points, (x0 , f (x0 )), (x1 , f (x1 )), . . . , (xn , f (xn )). Since functions are used
to summarise relationships associated with points, it would be desirable to obtain estimations of
values that are not in the collected data. The function f (x) would normally not be known. Two
contexts of using other functions to approximates a function represented by data are discussed:
interpolation and least-squares approximation. Attributed to their nice functional properties of con-
tinuity, differentiability, and integrability polynomials once again are popularly first candidates in the
2
APM2613/LN05/0/2022
Interpolation is discussed first (Chapter 3 in the textbook). This technique is about finding a func-
tional expression that approximates the true function whose data are given and require that the
approximating function coincides with the data points or tabulated points. Hence the main focus
of interpolation is to find polynomials of suitable degree that interpolate the given data. Different
forms of polynomials given different names are presented in the textbook, but the bottom line is
that they are just different ways of expressing the same polynomial. The fundamental theorem on
which all the formulation is based is the Weierstrass Approximation Theorem (Theorem 3.1).
Least squares approximation (Chapter 8.1) on the other hand relaxes the requirement that the
approximating polynomial equals the tabulated functional values but only requires that the approx-
imating function minimises the total error in its deviations from the tabulated points in the square
sense. Here the approximating function is extended to other classes than polynomials.
The chapter begins with the Lagrange form of interpolating polynomials, then tackles the Neville’s
form and divided difference forms. All these are standard forms of the interpolating polynomial.
Section 3.4 onwards presents the so called osculating polynomials. These are characterised by
the fact that they do not only coincide with the function f (x) at the nodes x0 , x1 , . . . , xn but also have
the derivatives up to order less than or equal to n that coincide with the same order derivatives of
the function.
Although normally more than one data points would be used to construct interpolating functions
(polynomials), the basic building block of polynomials is the linear polynomial. We know that we
need only two points to construct/define a line, so only two of the data points would be needed.
Without going into the implications of using only two points out of the many, let us start with the
linear polynomial to illustrate the notion of polynomial interpolation.
First we consider passing a first degree polynomial (or straight line) through any two successive
tabulated points (xk , f (xk )) and (xk+1 , f (xk+1 )) to approximate f (x̂), x ∈ [xk , xk+1 ].
The equation of a straight line that passes through (xi , f (xi )) and (xk+1 , f (xk+1 )) is
3
We note that if in the interpolating polynomial P1 (x) we let x = xk , a tabulated node, then
P1 (x) = P1 (xk ) = f (xk ) + 0 = f (xk )
and in the same way x = xk+1 yields
f (xk+1 ) − f (xk )
P1 (x) = P (xk+1 ) = f (xk ) + (xk+1 − xk )
xk+1 − xk
= f (xk+1 )
and thus P1 (x) is said to interpolate exactly at xk and xk+1 .
Rearranging 1 above as
f (xk+1 ) − f (xk ) f (xk+1 ) − f (xk )
P1 (x) = x + f (xk ) − xk .
xk+1 − xk xk+1 − xk
which is in turn written as
P1 (x) = a1 x + a0 ,
a linear expression in x, and hence the term linear interpolation. Once more the requirement that
the straight line passes through (xk , f (xk )) and (xk+1 , f (xk+1 )) can easily be confirmed for this lin-
ear equation form.
Suppose now that a second degree polynomial a2 x2 + a1 x + a0 was used to interpolate the data.
Then f (x) would be required to coincide with Pn (x) = P2 (x) at three points, say xi , xk+1 adn xk+2 .
The system of equations resulting from this requirement is
P2 (xk ) = a2 x2k + a1 xk + a0 = f (xk )
P2 (xk+1 ) = a2 x2k+1 + a1 xk+1 + a0 = f (xk+1 )
P2 (xk+2 ) = a2 x2k+2 + a1 xk+2 + a0 = f (xk+2 )
which when solved yields the values of the coefficients a0 , a1 , and a2 .
The focus of this topic is in presenting other, hopefully more efficient ways to construct the interpo-
lating polynomial. Beginning with the Lagrange interpolating polynomial, other forms are presented
and discussed in the sequel.
4
APM2613/LN05/0/2022
P1 (x) − a1 x − a0 = 0
f (x0 ) − a1 x0 − a0 = 0
f (x1 ) − a1 x1 − a0 = 0
From the results of linear algebra we know that this system of equations has a non-trivial solution
only if the determinant of the system is zero. That is, if
P1 (x) −x −1
f (x0 ) −x0 −1 =0 (2)
f (x1 ) −x1 −1
where
x − x1 x − x0
L0 (x) = , L1 (x) =
x0 − x1 x x − x0
are the coefficients associated with f (x0 ) and f (x1 ), respectively. This is the first degree Lagrange
interpolating polynomial.
NB: The textbook notation, Ln,k (x) is used to denote the coefficient associated with the node xk .
A similar evaluation of a higher order determinant would yield similar expressions for the corre-
sponding polynomial. Using the pattern observed for the coefficients of the linear polynomial, the
general n-th degree Lagrange interpolating polynomial through n + 1 data points
{(x0 , f (x0 )), (x1 , f (x1 )), (x2 , f (x2 )), . . . , (xn , f (xn ))}
is written as
Pn (x) = f (x0 )L0 (x) + f (x1 )L1 (x) + f (x2 )L2 (x) + · · · + f (xn )Ln (x) (4)
5
In order to ensure that at each nodes point xk , the requirement
Pn (xk ) = f (xk )
is met, the coefficients L are chosen so that at xk only Lk (x) is 1 and the other L’s are zero. Hence
the general expression for Lk (x) is
(x − x0 )(x − x1 )(x − x2 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )
Lk (x) = (5)
(xk − x0 )(xk − x1 )(xk − x2 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
We note the following about the above expression:
• The factor (x − xk ) is missing in the expression of Lk (x);
• At any xj such that xj ̸= xk , one of the factors in the numerator is zero, and Lk (x) is zero;
• At x = xk . on the other hand, the numerator and denominator are exactly equal, and Lk (x) =
1;
• Since the factor (x − xk ) is missing for Lk (x), its numerator is an n-th degree polynomial in x;
thus
• since x does not appear in the denominator, each Lk (x) itself is an n-th degree polynomial in
x, and so is Pn (x).
It is also worth noting that the two forms of the interpolating polynomial represent the same poly-
nomial. There are other forms as well but the interpolating polynomial of degree n is unique.
6
APM2613/LN05/0/2022
• Centered-difference: From centre to the right using 1 or 2 differences across the table.
Formula (3.9) suggests that you can start with any node xi , depending on the degree of interest of
the interpolating polynomial, and move diagonally for the respective differences. The denominator
in f [xi , xi+1 , . . . , xi+k ] in (3.9) is xi+k − xi and the general interpolating polynomial is (3.5):
with ak = f [x0 , x1 , . . . , xk ]. In these formulae for the case of equally spaced nodes with xi+1 −xi = h,
the coefficients in the expressions make use of the value s calculated from the the target value
x = x0 + sh, which solves to give s = (x − x0 )/h.
NB: The value of the difference f (x0 , x1 , . . . , xn ) is independent of the order of the numbers x0 , x1 , . . . , xk
for the data points; i.e. the nodes do not have to be in increasing order to be used in the formulae.
Note also the formulae for computing the respective binomial coefficients:
Forward-difference formula:
s s(s − 1) . . . (s − k + 1)
=
k k!
Backward-difference formula:
s s(s + 1) . . . (s + k − 1)
= (−1)k
k k!
As to which points to use for a desired degree polynomial, the aim is to use a formula that makes the
earliest use possible use of the data points closest to x and make use of the n-th divided difference;
e.g. if x is closer to the lower/upper end of the nodes, use the forward/backward difference formula;
if x is near the centre, the centered-difference formula may be more appropriate.
or n
X
Pn (x) = ak (x − x0 )(x − x1 ) . . . (x − xk−1 ).
k=0
The rest of the discussion focuses on the expressions of the coefficients ak , which are first given in
terms of divided differences,
f [x1 , x2 , . . . , xk ] − f [x0 , x1 , . . . , xk−1 ]
ak = f [x0 , x1 , . . . , xk ] = ,
xk − x0
7
for k = 0, 1, 2, . . . , n. This leads to the formula (3.10),
n
X
Pn (x) = f [x0 , x1 , . . . , xk ](x − x0 )(x − x1 ) . . . (x − xk−1 ).
k=0
Hence the divided-difference form of the polynomial consists in calculating the differences, f [x0 , x1 , . . . , xk ]
and putting them together in the summation expression (3.10). In this form of the divided-difference
interpolating polynomial, it doesn’t matter how the data points xi are ordered and the spacing be-
tween them.
However, the discussion in p126 continues with a simplification of the above formula by required
that the data points be ordered consecutively and assuming that the spacing between the xi is
equal, say h = xi+1 − xi .
If you follow the textbook’s derivation, the formula (3.12), comes from this ordering and equi-
spacing of the xi . By letting x = x0 +sh, the value of s is derived from the difference x−xi = (s−i)h.
This determination of the value s leads to the expression of the interpolating polynomial in (3.10)
as:
Pn (x) = f (x0 + sh) = f [x0 ] + shf [x0 , x1 ] + . . . + s(s − 1)(s − 2). . . (s − n + 1)hn f [x0 , x1 , x2 , . . . , xn ],
which in turn is written in summation notation as in (3.11) as
n
X s
Pn (x) = Pn (x0 + sh) = f [x0 ] + k!hk f [x0 , x1 , . . . , xk ],
k=1
k
noting that here the product s(s − 1)(s − 2). . . (s − n + 1) is replaced using the binomial coefficient
and written as
s
s(s − 1)(s − 2). . . (s − n + 1) = k!.
k
Thus (3.10) and (3.11) are equivalent expressions of the divided-difference formula for the interpo-
lating polynomials.
To obtain the Newton forward-difference and backward-difference formulas, the Aitken’s notation ∆
is reintroduced and used to write the differences, f [x0 , x1 , x2 , . . . , xn ] as
1
f [x0 , x1 , x2 , . . . , xn ] = ∆k f (x0 )
k!hk
By so doing, the k!hk factors cancel out, leading to (3.12),
n
X s
Pn (x) = f (x0 ) + ∆k f (x0 ),
k=1
k
The Backward-difference formula is derived from the same principles but now the differences in
(3.10) are taken backward, starting with f [xn ], and
1
f [xn , xn−1 , . . . , xn−k ] = ∇k f (xn ).
k!hk
8
APM2613/LN05/0/2022
s(s + 1) s(s + 1) . . . (s + n − 1) n
Pn (x) = f [xn ] + s∇f (xn ) + + ··· + ∇ f (xn )
2 n!
n
X −s
= f [xn ] + (−1)k ∇k f (xn )
k=1
k
• the binomial coefficient in the former involves s while in the latter it involves −s;
• the direction in which the values in the difference table are read - from top left in a downward
diagonal line to the right in the forward-difference and from bottom left in an upward diagonal
in the backward-difference formula.
NB: The error estimation formula can only be applied if the function f (x) whose value is being
approximated is known - this is the difficulty with the Lagrange polynomial error term.
dk P (xi ) dk f (xi ))
= for eachi = 0, 1, . . . , n and k = 0, 1, . . . , mi
dxk dxk
Thus the criteria for constructing these polynomials involve conditions on the derivative values at
the respective nodes.
Three types of osculating polynomials are discussed: Hermite polynomials, Cubic Splines and
Bezier curves. Hermite polynomials are the basis for the latter. While Hermite interpolation seeks
to find a single polynomial to approximate the desired function over the whole interval containing
the nodes, the cubic splines and Bezier curves generate pieces of cubic polynomials that satisfy
certain conditions pertaining to f (x) the polynomials Sj (x). What is common for these polynomials
is that they not only agree with f (x) at the nodes, but their derivatives at the nodes also agree with
those of f (x). The sets of nodes used to construct these cubic polynomials should span the entire
interval containing the nodes.
Because of the involvement of the derivative(s), the derivative values used in the computation
would be included in the tabulated values.
9
1.7.1 Hermite Polynomials
Two forms of the Hermite polynomials are given in the text:
n
X n
X
H2n+1 = f (xj )Hn,j (x) + f ′ (xj )Ĥn,j (x),
j=0 j=0
based on the coefficients of the Lagrange polynomial (p134); and the form
2n+1
X
H2n+1 (x) = f [z0 ] + f [z0 , . . . , zk ](x − z + 0)(x − z1 ) . . . (x − zk−1 ),
k=1
based on Newton’s interpolatory divided difference formula. These are discussed in detail on
pages 134-139. What is worth noting for these is that the data points include an additional column
of derivatives f ′ (xk ).
The main form of the cubic splines Sj (x), is (the first equation on p145):
Sj (x) = aj + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3
for each j-th piece (arc), j = 0, 1, . . . , n − 1 between nodes (xj , yj ) and (xj+1 , yj+1 ), and spacing
hj = xj+1 − xj . These spacings may differ between any two successive nodes.
Constructing cubic splines consists is calculating the coefficients aj , bj , cj , dj for each of the arcs
for the selected nodes x0 , x1 , . . . , xn−1 .
To design the spline we take two adjacent pieces, Sj (x) and Sj+1 (x), j = 0, 1, 2, . . . , n − 1 at a time,
that interpolate three points xj , xj+1 , xj+2 , j = 0, 1, 2, . . . , n − 2 to define a ’cubic piece’ over two ad-
jacent subintervals [xj , xj+1 ] and [xj+1 , xj+2 ]. Note that each pair of subintervals share a common
node xj+1 , but do not overlap with other subintervals.
10
APM2613/LN05/0/2022
The definition specifies conditions to be met by each pair of adjacent ’spline pieces’ and used to
determine the coefficient sets {aj , bj , cj , dj }, j = 0, 1, 2 . . . , n − 1. The meaning of these conditions
is what we use to unpack the equations that should be solved in order to determine the set of
coefficients for each ’spline piece’.
(a) This condition reiterates the description of the spline in terms of the ’spline pices’.
(f) The final condition applies to the whole spline S(x) at the endpoints x0 and xn , the boundary
of the domain of the function f (x). The interpretation of the two parts of the condition are as
follows:
(i) The natural boundary condition S ′′ (x0 ) = S ′′ (xn ) = 0 can be interpreted in terms of
calculus to mean that the endpoints are inflection points of the spline.
(ii) The clamped boundary condition S ′ (x0 ) = f ′ (x0 ) and S ′ (xn ) = f ′ (xn ) can be interpreted
to mean the slopes of the spline at the endpoints x0 and xn coincides with the slopes of
the function in question at those points. This can also be seen to mean the direction of
the spline coincides with that of the function at the endpoints.
Applying the conditions given in Definition 3.10 yield a set of equations in the various coefficients.
These are simplified first by observing that the coefficients aj are determined from the given f (xj ).
With some ingenuity, this system of equations reduce to a system Ax = b where the solution vector
x = (c0 , c1 , . . . , cn ) is obtained with A consisting of expressions in terms of hj , and b involves only
the coefficients aj . Note the different formulations of A and b for the natural spline (p146-147) and
the clamped spline (p152).
Once the a’s and the c’s are determined, the c’s are related to the d’s in
cj+1 − cj
cj+1 = cj + 3dj hj ⇒ dj =
3hj
11
and the b’s are computable from the relation
1 hj
bj = (aj+1 − aj ) − (cj+1 + 2cj ).
hj 3
This now fully determines all the coefficients aj , bj , cj and dj defining the j-th piece Sj , j = 0, 1, . . . , n−
1 of the polynomial, to obtain the cubic spline
S0 (x) on [x0 , x1 ]
S1 (x) on [x1 , x2 ]
..
.
S(x) =
S j (x) on [xj , xj+1 ]
.
..
Sn−1 (x) on [xn , xn−1 ]
Example 2 (p148) is a detailed procedure for constructing a spline (natural spline in this case).
Note that the examples used in the textbook use points with nodes that are equally spaced. this is
for ease of computation in demonstrating the method, but this is not a requirement.
Two choices need to be made in the construction of the Bezier curves: the guidepoints; and the
step size of the parameter t. The step size determines the smoothness of the curve, while the
guidepoints control the approximation of the derivative, and hence the tangent line (computation of
the coefficients).
Note the scale factor of 3 incorporated in equations (3.25) and (3.26) for the Bezier curve, the
general parametric equations. Algorithm 3.6 is simply for generating the coefficients ai and bi for
the i-th curve using two nodes (xi , yi ) and (xi+1 , yi+1) and two guidepoints (xi + α0 , yi + β0 ) and
− −
(xi+1 − α1 , yi+1 − β1 ). The guidepoints in the text algorithm are denoted by (x+ +
i , yi ) and (xi+1 , yi+1 ),
respectively. The coefficients, ai , bi ; i = 0, . . . , 3 generated in the algorithm correspond to the coef-
ficients in (3.25) and (3.26) as summaries of the coefficients there, used in the first equation of the
algorithm:
(i) (i) (i) (i) (i) (i) (i) (i)
(xi (t), yi (t)) = (a0 + a1 t + a2 t2 + a3 t3 , b0 + b1 t + b2 t2 + b3 t3 ),
for the Ci , the pieces of the curve between (xi , yi ) and (xi+1 , yi+1 ), i = 0, 1, . . . , n − 1. Note that the
coordinates (x(t), y(t)) yield a parametric curve, Si between (xi , yi ) and (xi+1 , yi+1 ) as t varies in
the interval 0 ≤ t ≤ 1 at the predetermined steps.
12
APM2613/LN05/0/2022
that the approximating function agrees with the sought function at these points.
Let g(x) be such an approximating function. This function is required to minimise the least squares
error (square of the deviations from the tabulated points)
m
X
E = (yi − g(xi ))2
i=1
m
X m
X m
X
= yi2 −2 g(xi )yi + (g(xi ))2
i=1 i=1 i=1
The general requirement for minimising this error is that the set of partial derivatives with respect
to the unknown coefficients in g(x) be equal to zero. This leads to a set of normal equations which
are solved and used in the expression of g(x).
Two cases of the function g(x) are considered: a polynomial Pn (x) and an exponential function
y = beax or y = bxa . The n−th degree least squares polynomial approximation uses
The task here is to determine the coefficients ai , i = 0, 1, . . . , n that satisfy the system of normal
equations
n j+k n
X X j+k
X
ak xi = yi xji , for each j = 0, 1, . . . , n.
k=0 xi=1 i=1
The values in the coefficient matrix and the right hand side can all be calculated from the tabulated
points. Once calculated, the system can be solved by suitable methods.
Finding the approximating exponential functions y = beax or y = bxa requires is more manageable
if they are converted to linear form by taking the logarithm on both sides. This reduces the expo-
nential forms to linear forms which can then be treated as for linear least squares polynomials by
letting
z = ln y, c = ln b and or z = ax + c
for y = beax , and for y = bxa
ln y = ln b + a ln x
can be replaced by
z = c + a ln x.
It is critical to remember to substitute back to obtain the original values:
c = ln b ⇒ b = ec .
13
An example of this kind of calculation is given on p512.
The error in the approximation of f (x̂) is estimated by substituting the approximating function in the
error term m
X
E= (yi − g(xi ))2 ,
i=1
1.9 In Summary
Different forms of the interpolating polynomial have been presented. Although the expressions are
different, the interpolating polynomial is unique. Hence the different forms are different expressions
of the same polynomial. Generally, there is no requirement that the spacing between the nodes be
equal. If the nodes are equally spaced, it may simplify the formulae by replacing xi+1 − xi by h in
the formulae.
For n + 1 nodes given on [x0 , xn ], the various degree polynomials make use of some of the
points/nodes - k + 1 for k-th degree polynomial, k ≤ n. Thus the challenge remains, as to which
set of k nodes would give the closest approximation of a particular value f (x), x ∈ (x0 , xn ). The
highest degree polynomial for n + 1 nodes is n. It may be necessary to compare different polyno-
mials of the same degree using different sets of nodes.
Intuitively, the selection of the nodes to be used in the interpolating polynomial is based on en-
suring that the value where the function is approximated is included in the interval containing the
nodes.
2 Useful Formulas
1. Lagrange interpolating polynomial on the nodes (x0 , f (x0 ), (x1 , f (x1 )), (xn , f (xn ))
where
(x − x0 )(x − x1 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )
Lk (x) =
(xk − x0 )(xk − x1 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
14