MATH/APPM 4650
Class notes, February 11

Interpolation and the Lagrange polynomial

Typical interpolation problem: The war in Iraq cost $88 billion in 2003 and $170 billion in 2007. How much did it cost in 2004?

To solve this, using no other information, we assume that the cost is a linear function of time: C = m T + b, for some numbers m and b. We then solve for m and b using the given conditions.
C is cost in billions, T is time in years since 2000.
88 = 3 m + b,
170 = 7 m + b.

Solving, we get 4m=82 so m = 20.5, while b=26.5.

Thus we have deduced the equation C = 20.5 T + 26.5, and we conclude that in 2004, the total cost was $108.5 billion.

The actual cost was $111 billion.
This seems like a pretty good prediction, until...

An interpolation using only the two points in red.

Keep in mind: given only the two points in red, all we can really do is fit a line through those two points. This is interpolation.
Given all five points, we could try to fit a line through them through a method like least-squares approximation. This is regression.

A regression using all five points.


In Chapter 3, we will only discuss the interpolation problem: we will have the same number of unknowns as the number of data points, and all our functions will pass through all our data points.


Another typical problem (extrapolation): Social Security outlays for 1980: $118.5 billion; for 1990: $248.6 billion; for 2000: $409.4 billion. Estimate Social Security outlays for 2005.

We write this as three data points: S(0) = 118.5, S(1) = 248.6, and S(2) = 409.4. We want to estimate S(2.5).

Given three data points, the simplest interpolation model is a parabola S(t) = a t2 + b t + c. We want to solve for a, b, and c.
We get three equations: S(0)=c=118.5, S(1)=a+b+c=248.6, S(2)=4a+2b+c=
S(0)=c=118.5
S(1)=a+b+c=248.6
S(2)=4a+2b+c=409.4

The solutions are easily seen to be a=15.35, b=114.75, c=118.5.

The prediction for S(2.5) is therefore
S(2.5) = 15.35 (2.5)2 + 114.75 (2.5) + 118.5 = 501.3.
The actual value is $523.3.

Not super close, but not terrible.

One has to be very careful with extrapolation though.
Estimating values close to given values is usually pretty good; estimations farther out (like trying to estimate S(6) from this data) are usually pretty terrible.

A recent much-discussed example is the projections of the Social Security Administration.
The SSA Trustees periodically make predictions about how much money SSA will take in and pay out for the next 75 years. They make three sets of assumptions: "optimistic," "pessimistic," and "intermediate." There are huge variations in the three predictions, despite the fact that the actual numbers involved (assumed rate of productivity, for example) are very close to each other.




Basic principle of interpolation (Theorem 3.2):
Given (n+1) data points (x0, y0), ..., (xn, yn), there is exactly one polynomial P of degree n which passes through all the points. In other words, P(x0) = y0, ..., P(xn) = yn.

Two points determine a line, three points determine a parabola, etc.

This theorem is proved using the fact that the polynomial is determined by (n+1) coefficients, so we have (n+1) equations for (n+1) unknowns. The special form of the equations means that the determinant of the matrix is always nonzero, so there is always a unique solution.

However, as we will see in Chapter 6, solving a general system of linear equations is difficult numerically. Therefore we want to use the special properties of the interpolation system.

Lagrange's idea (special case):
To find the polynomial of degree two passing through (x0, y0), (x1, y1), and (x2, y2), we use the following technique:

Example:
Find the cubic polynomial passing through (-1,5), (1,6), (2,-2), and (4,-1).
L3,0(x) = -(x-1) (x-2) (x-4)/30.
L3,1(x) = (x+1) (x-2) (x-4)/6.
L3,2(x) = -(x+1) (x-1) (x-4)/6.
L3,3(x) = (x+1) (x-1) (x-2)/30.
Then the polynomial is
P(x) = -(x-1) (x-2) (x-4)/6 + (x+1) (x-2) (x-4) - (x+1) (x-1) (x-4)/3 - (x+1) (x-1) (x-2)/30.


All mathematical software can compute an interpolating polynomial given data.
For the polynomial above, suppose you want P(0). Maple gives the polynomial in standard form; MATLAB just gives the coefficients; Mathematica gives the polynomial in Horner's method form.

Normally we'd like to use an array to represent the x-values, and another array to represent the y-values.
(This makes it easier to enter in values from a file.)

Suppose the file sampledata.txt looks like this:
-1 5
1 6
2 -2
4 -1



In Maple, this is awkward.
with(ListTools):
sillydata := readdata("C:/Documents and Settings/Stephen Preston/Desktop/website/math4650/sampledata.txt",2);
Transpose(sillydata);
interp(sillydata[1],sillydata[2],x);


MATLAB loves reading files into arrays.
load sampledata.txt
xvals = sampledata(:,1)
yvals = sampledata(:,2)
polyfit(xvals,yvals,3)


Mathematica, like Maple, is a bit awkward. (It looks easy because Mathematica wants its interpolation data in exactly the opposite form as Maple and MATLAB.)
sillydata=ReadList["C:/Documents and Settings/Stephen Preston/Desktop/website/math4650/sampledata.txt", {Number,Number}]
InterpolatingPolynomial(sillydata,x)






If you have (n+1) data points found by observation, obviously all you can do to interpolate is find the nth-order Lagrange polynomial. Then hope the actual function f(x) is close to the interpolating polynomial Pn(x) at the values of x you want.

This is not a good way to use interpolating polynomials.
Caution: Never use a high-order polynomial for experimental data!

Consider the example from above, fitting (-1,5), (1,6), (2,-2), and (4,-1). We add in a fifth point, colored in red, at x=1.5, and see what happens as we change the y-value from 0 to 4.
Maple worksheet.

Observe: the nature of the polynomial changes drastically.
Also, values predicted by the polynomial are way outside the range of the data points.

You should only use high-order interpolating polynomials if you know exactly what the function is.

What if you can find out exactly what f(x) is, at any x you'd like, but it takes a very long time to compute it?

Example: f(x) = ∫0x exp(-t2) dt.
Example: f is the solution of f(x)f(x) = x.

If you need a bunch of values of f, you might find it much easier to compute just a few values of it and use interpolation to find the rest.
This is a common application of interpolation.


What is the error?
Let Pn(x) be the nth-order interpolating polynomial, which agrees with f(x) at (n+1) points.

Then for every x, there is a ξ(x) such that
f(x) - Pn(x) = f(n+1)(ξ)/(n+1)! (x-x0) ... (x-xn).


Of course, if we knew the (n+1)st derivative of f, we'd probably know f as well. So this formula is mostly useless.

The book suggests instead interpolating as much as you need until the numbers stop changing:
For example, you want to compute f everywhere in [0,1]. Obviously, for this to be a decent algorithm, we don't want to have to start over to compute P4.
Already having P2 should help somehow.

But the default algorithms don't really help us with this.
That's why the book proposes Neville's method.