MATH/APPM 4650
Class notes, February 18
Divided differences and Hermite interpolation
Recall that the Lagrange polynomial, given three points {x0, x1, x2}, is
P2(x) = f(x0) [(x-x1)(x-x2)] / [(x0-x1)(x0-x2)] + f(x1) [(x-x0)(x-x2)] / [(x1-x0)(x1-x2)] + f(x2) [(x-x0)(x-x1)] / [(x2-x0)(x2-x1)].
Although it's easy to write an algorithm to find this, there are a few problems with this form:
- Having the previous Lagrange polynomial doesn't much help us get the next one.
- The polynomial is not in Horner (nested) form. So it will take a long time to evaluate it at any particular x.
- It's not obvious how many terms one would need to get a good approximation.
- As all the points approach x0, the Lagrange polynomial should approach the Taylor polynomial about x0. That's not obvious from this form.
An alternative formula for the same thing was discovered by Newton, using divided differences.
P2(x) = f[x0] + f[x0, x1] (x-x0) + f[x0, x1, x2] (x-x0) (x-x1).
The divided differences are computed recursively using
f[x0] = f(x0)
f[x0, x1] = [ f[x1] - f[x0] ] / [ x1 - x0 ]
f[x0, x1, x2] = [ f[x1, x2] - f[x0,x1] ] / [ x2 - x0 ]
etc.
We can make a table out of these:
|
dd[0,0]
|
|
|
|
|
dd[1,0]
|
dd[1,1]
|
|
|
|
dd[2,0]
|
dd[2,1]
|
dd[2,2]
|
|
|
dd[3,0]
|
dd[3,1]
|
dd[3,2]
|
dd[3,3]
|
Here the general recursion formula is
dd[j,i] = ( dd[j,i-1] - dd[j-1,i-1] ) / ( x[j] - x[j-i] ).
So each term depends both on the term to the left, and the term above that one.
Hence each term along the diagonal (the one we want) depends on the entire triangle formed to the left and above it.
Here is a pseudocode algorithm to compute the divided differences, given an array of values "xvals."
dd[0,0] = f(xvals[0])
for j from 1 to Max do
dd[j,0] = f(xvals[j])
for i from 1 to j do
dd[j,i] = (dd[j,i-1]-dd[j-1,i-1]) / (xvals[j]-xvals[j-i])
end
end
The polynomial is then
Pn(x) = dd[0,0] + dd[1,1] (x - x0) + dd[2,2] (x - x0) (x - x1) + ... + dd[n,n] (x - x0) (x - x1) ... (x - xn-1).
Obviously to compute this for any particular value of x, we should use Horner's method.
This algorithm computes the jth interpolating polynomial at x, given the array dd[0,0] through dd[j,j] and the array xvals[0] through xvals[j-1].
poly[j](x) = dd[j,j]
for k from 1 to j do
poly[j](x) = (x-xvals[j-k])*poly[j](x) + dd[j-k,j-k]
end do
Practically, these programs rely on being able to store the coefficients as doubly-indexed arrays.
In Maple, you could first declare the arrays you need as follows. You list 0 (the first index), two periods, and Max (the last index).
dd := array(0..Max, 0..Max);
xvals := array(0..Max);
and then access the terms using
dd[j,i]
xvals[i]
In MATLAB, things are a bit different. Although you can work with arrays without declaring them first, in the long run it's better to declare them.
You declare an array by filling up an array of the desired size with zeros. Also, you cannot start with index zero (which is sometimes awkward).
dd = zeros(Max+1, Max+1)
xvals = zeros(Max+1)
You access the terms using
dd(j,i)
xvals(i)
In Mathematica, you declare an array as follows.
Array[dd, {Max,Max}, {0,0}]
Array[xvals, Max, 0]
The terms are accessed using
dd[j,i]
xvals[i]
We then come to a few important questions.
- How should we choose the points x0, x1, ..., xn so as to minimize the error?
- How many terms will we need to take to get a good approximation?
- The answer to the first question is a bit subtle. Recall the error formula for interpolating,
f(x) - Pn = f(n+1)(ξ) / (n+1)! (x - x0) (x - x1)
... (x - xn)
We have no control over the (n+1)st derivative of f, especially since we don't know ξ. All we can do is try to make (x - x0) (x - x1) ... (x - xn) small for all values of x.
Here is what the errors look like if we use equally-spaced interpolation points. Observe that by far the worst error occurs close to the endpoints.
Errors (rescaled to see easily) in using equally-spaced points to interpolate.
We saw this phenomenon on Friday, when the interpolating polynomials tend to do poorly near endpoints.
To avoid such things, one could choose the interpolating points better, with fewer points in the middle and more points on the ends.
Chebyshev figured out the optimal way to do this, which involves choosing {x0,x1,...,xn} in order to minimize the maximum of Q(x) = (x - x0) (x - x1)
... (x - xn).
The solution is the set of Chebyshev points xk = cos(π (2k+1) / (2n+2)) for 0 ≤ k ≤ n.
The worst error we can get from using these points is the following.
Errors (rescaled to see easily) in using the Chebyshev points to interpolate.
This is good to know, and it will come up again when we study integration.
However, you can't pick these points unless you already know in advance how large of an n to use. Which brings us to our next question.
- Practically, it is very difficult to know how many terms you'll need.
If you can estimate all the derivatives of the function, you can use the official error estimate.
But usually this is impossible.
Alternatively, you can use the wrong-but-common method: stop adding terms when the digits stop changing.
(It's wrong because it doesn't work for the harmonic series, where the terms get small but the sum is still infinite.)
Observe that the last term one adds is
f[x0, x1, ..., xn] (x - x0) (x - x1) ... (x - xn-1),
so you could just compute this quantity for some values of x and check to see that it's small.
If it's not small, add another point xn+1 somewhere and try again.
There's no ideal way of doing this, so it basically comes down to experience, trial and error, and such things.
It's the same problem one has in approximating a function by its Taylor polynomials: how many terms do you take?
Speaking of Taylor polynomials...
Recall that we obtain the nth Taylor polynomial about x0 by knowing the (n+1) numbers f(x0), f'(x0), ..., f(n)(x0).
We obtain the nth Lagrange polynomial at numbers {x0, x1, ..., xn} by knowing the (n+1) numbers f(x0), f(x1), ..., f(xn).
Also, if we let the points {x0, x1, ..., xn} all approach the single point x0, the Lagrange polynomial becomes the Taylor polynomial.
We can try to get the best of both worlds by using mi derivatives at points {x0, x1, ..., xk}, where
m0 + m1 + ... + mk + k = n.
The nth degree polynomial we get is called an osculating polynomial.
For a Taylor polynomial, k=0 and m0 = n.
For a Lagrange polynomial, k=n, and m0 = 0, ..., mn = 0.
The Hermite polynomial is what we get if n is even, and k=n/2, and m0=1, ..., mk=1.
In other words, we use half as many points as we would for the Lagrange polynomial, but we incorporate not just the values of the function, but the derivatives at each interpolating point.
We expect this to approximate not just the function, but also the shape of the function.
The easiest way to derive the formula is to imagine there were 2k points z0, z1, ..., z2k, z2k+1.
Let's do k=2 to make things simple.
Then the Lagrange polynomial is
P(x) = f[z0] + f[z0, z1] (x - z0) + f[z0, z1, z2] (x - z0) (x - z1) + f[z0, z1, z2, z3] (x - z0) (x - z1) (x - z2)
We compute the divided differences using our table algorithm:
|
f[z0]
|
|
|
|
|
f[z1]
|
f[z0, z1]
|
|
|
|
f[z2]
|
f[z1, z2]
|
f[z0, z1, z2]
|
|
|
f[z3]
|
f[z2, z3]
|
f[z1, z2, z3]
|
f[z0, z1, z2, z3]
|
And now we ask what happens as z0 and z1 both approach x0, and z2 and z3 both approach x1?
- The first column is easy.
- f[z0] → f[x0]
- f[z1] → f[x0]
- f[z2] → f[x1]
- f[z3] → f[x2]
- We have trouble in the second column: we have to divide by (z1-z0), (z2-z1), and (z3-z2). But it's OK.
- f[z0, z1] = ( f(z1) - f(z0) ) / (z1 - z0) → f'(x0)
- f[z1, z2] → f[x0, x1]
- f[z2, z3] → f'(x1)
- Nothing goes wrong in the third column. We're dividing by (z2-z0) and (z3-z1), and these both become (x1-x0).
- f[z0, z1, z2] → ( f[x0, x1] - f'(x0) ) / (x1 - x0).
- f[z1, z2, z3] → ( f'(x1) - f[x0, x1] ) / (x1 - x0).
- Nothing wrong in the fourth column either.
- f[z0, z1, z2, z2] = ( f[z1, z2, z3] - f[z0, z1, z2] ) / (x1 - x0).
We can get this result by arranging the matrix as follows:
|
f[x0]
|
|
|
|
|
f[x0]
|
f[x0, x0] = f'(x0)
|
|
|
|
f[x1]
|
f[x0, x1]
|
f[x0, x0, x1]
|
|
|
f[x1]
|
f[x1, x1] = f'(x1)
|
f[x0, x1, x1]
|
f[x0, x0, x1, x1]
|
The algorithm for doing this is very similar to the divided differences algorithm.
Quiz:
What would the table look like if we wanted to use the value and two derivatives at x0 and just the value at x1 to get the osculating polynomial?