Curve Fitting, Data Modelling, Interpolation, ExtrapolationThis vague heading covers a multitude of closely related endeavors. There are many approaches, described in the following sections.Commercial packages: Other approaches to data modelling are: InterpolationFor one dimensional work, see Stoer & Bulirsch 1980 or Press et al 1992. No doubt some of the techniques described there can be found in netlib (see Netlib ) by using the NIST guide (see Indices of NA Software on the Net ).For greater than or equal one dimensions, some references are: IAMG Resources: IAMG International Association for Mathematical Geology: resources Watson, Dave, 1992, Contouring: A guide to the analysis and display of spatial data, Pergamon Press, ISBN 0 08 040286 0. Text includes a floppy with source code in Basic (sigh). Subject areas: isoline maps, isochor maps, methodology, data sorting, subset selection, local coordinates, binary local coordinates, barycentric coordinates, rectangular local coordinates, natural neighbour coordinates, gradient estimation, least squares gradients, least squares quadratics, spline gradients, cross product gradients, neighbourhoodbased gradients, interpolation, inverse distance weighting, inverse distance weighted gradients, minimum curvature splines, neighbourhoodbased interpolation, selected references to contouring literature [Dave Watson] writes: Linear trianglebased interpolation is quick and dirty  probably the best compromise available if you are in a hurry. Natural neighbor interpolation is much slower but does provide a result that most people consider to be closest to that which an experienced manual draftsman would generate. The 'kriging' method is a variation on the splines technique  the main difference being that a subjectively selected probabilistic function is used rather than one of the analytically derived radial basis functions. This class of method generates discontinuities at points where the interpolation subset changes, and kriging attempts to ameliorate this effect be recomputing the coefficients for each interpolation point  this makes it very slow. Interpolation on spheres: Dave Rusin's FAQ See also the comp.graphics.algorithms FAQ and the graphics FAQ. Software:[Albrecht Preusser] wrote: You could take routines from ACM Alg. 624 "Triangulation and Interpolation of Arbitrarily Distributed Points in the Plane" by Robert J. Renka. This Algorithm is available from netlib (see Netlib ) in the toms directory. Another algorithm in FORTRAN is contained in Akima's ACM Alg. 526, it is faster, but takes much more memory. By the way, I have developed methods based on the Delaunay Triangulation from both algorithms for computing nonlinear contours and nonlinear interpolation, which are published as ACM Alg. 626 and 684.Spatial and Geometric Analysis toolbox (SaGA)  a MATLAB package for various aspects of spatial data interpolation and analysis and geometric modeling. It is available at: Glenn Kirill OAX is an spacial objective analysis program. OAX applies the method of optimal interpolation (objective analysis) to estimate the values of variables at specified points in a multidimensional space. For each point, the computation is based on a weighted average of the values of a specified number of data points (i.e. "nearest neighbours") which are closest to that point. For the sake of efficiency, several dependent variables can be interpolated simultaneously using the same weights if the underlying statistical model and relative noise level in the input data are assumed to be same for each dependent variable. In this version, OAX extends the capability of earlier implementations (OPTIMAL_ESTIMATOR directive, oax3.0, oax4.0) by allowing the parameters controlling the calculation of the weights to vary as a function of the independent variables, and by allowing the specification of a relative uncertainty (NOISE) for each input data record separately. Statistical Curve FittingSee also: Probability and Statistics .See also: LevenbergMarquardt algorithm BooksJobson, J.D., Applied Multivariate Data Analysis, volumes 1 and 2, SpringerVerlag, New York. 1991. Johnson, Richard A.; Wichern, Dean W, Applied Multivariate Statistical Analysis, Third Edition. [Ramin Samadani] writes: I like the book by Lawson and Hanson, solving least squares problems. I think its supposed to be back in print now or soon. Lawson, Charles L.; Hanson, Richard J. PrenticeHall, Englewood Cliffs, N.J. 1974 Press et al 1991 also has a description of the field. Spaeth, Helmuth: "Mathematical Algorithms for Linear Regression" 1987, Academic Press. SoftwareODRPACK, from Netlib , is a portable collection of Fortran subprograms for fitting a model to data. It is designed primarily for instances when the explanatory as well as the response variables have significant errors, implementing a highly efficient algorithm for solving the weighted orthogonal distance regression problem, i.e., for minimizing the sum of the squares of the weighted orthogonal distances between each data point and the curve described by the model equation.LevenbergMarquardt algorithm[Alan Miller]: The nonlinear leastsquares problem is to find a vector of parameter values, a, to minimize the sum of squares of differences between a given vector, y, of observed values and a vector of fitted values f(a,x) where the values of x are known. e.g. a particularly notorious problem is to fit: sometimes with more exponential terms, to a set of (x,y)pairs. Writing S = sum [y(i)  f(a,x(i))]^2, where i denotes the ith observation pair, we have that the first and second derivatives of S with respect to the parameters, a(j), are:
where
The GaussNewton (GN) method uses the linearization of f about the latest estimates of a, and solves a linear least sqaures problem to obtain a new vector a. If J denotes the Jacobian = the n x k matrix of values of (df/da(j)) at the current estimates of the parameter values, where n is the number of observations and k is the number of parameters, then the GN method ignores the second derivatives of f in S'' and uses This guarantees a downhill search direction but often overshoots unless good starting values are available. The LevenbergMarquardt (LM) method uses a trustregion approach, also known as ridge regression or regularization, to shrink the step sizes to ensure a reduction in S at each iteration. In the LM method where D is a diagonal matrix. Levenberg suggested that D = d.I, that is that the same scalar value be added to each diagonal element. In practice, the diagonal elements of J'J are often of vastly different magnitudes, and Marquardt suggested For the first iteration, lambda is given a fairly large value (say 1.0). If the first attempt at an iteration reduces S then lambda is reduced for the next iteration by a factor of say 510. If the first step increases S then lambda is increased by a factor of perhaps 35 until S is reduced. The final value of lambda for the iteration is then used for the next iteration. The LM method converges upon a stationary point (or sometimes a subspace) which may be the minimum which is being sought, but it could be a saddlepoint, a local minimum, or one or more of the parameter values may tend to infinity. The Jacobian of first derivatives is often illconditioned so that QR factorizations are usually employed to solve the LS equations. The LM algorithm is often very efficient. Methods which use second derivatives require substantially more computation and then it is common to find that the Hessian matrix is not positive definite very close to the minimum. Software:MINPACK includes LM algorithms using either analytic derivatives or derivatives estimated from differences. Obtain from the minpack directory of netlib (see Netlib ). For many problems, the function f is linear in some of the parameters. In the sum of exponentials example mentioned, f is linear in a(0), a(1) and a(3). The VARPRO code uses an LM type of algorithm but separates the linear from the nonlinear parameters. This code is available from the opt directory of netlib (see Netlib ). The package NL2SOL (TOMS algorithm 573) uses a hybrid combination of GN and estimation of a positivedefinite approximation to the Hessian from differences of first derivatives. It is available from the toms directory of netlib (see Netlib ). (Fortran 90 version from Alan Miller ). Updates which allow for bounds on parameter values can be found in netlib directory port with file names starting n2f, n2g, rn2f or rn2g. (See Netlib ). There is a very easytouse shareware package for PCs including graphics (NLREG) which is based upon the NL2SOL package, and includes graphics. It is more successful than many of the nonlinear leastsquares solvers found in commercial statistical packages. It can be downloaded from the math directory of simtelnet. Time series analysisTime series analysis may be used for some types of data. See:Brockwell, Peter J.; Davis, Richard A., 1991, Time Series: Theory and Methods, Second Edition, SpringerVerlag, New York. ISBN 0387974296. [SJS]: Beautiful approach with emphasis on theory. Emphasis on the linear model. Hamilton, James D., 1994, Time Series Analysis, Princeton U. Press, Princeton, NJ. ISBN 0691042896. [SJS]: Applied approach, with heavy emphasis on applications from economics. Not having an economics background is a detriment in reading this text. Broad mix of models. Montgomery, Douglas C. / Forecasting and time series analysis (1990) Pankratz, Alan / Forecasting with Univariate Time Series Models: Concepts and Cases / (1983) / good, clear Vandaele, Walter / Applied Time Series and BoxJenkins Models (1983). Good; pedagogical and practical ARfit is a Matlab package for the estimation and spectral decomposition of multivariate autoregressive models. ARfit is a collection of Matlab routines for ARfit may be freely distributed. Suggestions for improvement are greatly appreciated. Mfiles, documentation, and a paper describing the algorithms used are available. Also see Statlib . Also see Probability and Statistics . SplinesUseful for modelling single and multiple dimensional data.Books:de Boor, Carl, 1978, A Practical Guide to Splines, SpringerVerlag, New York, ISBN 0540903569. [Tim Strotman]: Provides an excellent blending of theory and practical implementation. Dierckx, Paul, 1993, Curve and surface fitting with splines, Clarendon Press, ISBN 0198534418 Farin, Gerald E., 1995, NURB curves and surfaces : from projective geometry to practical use, A.K. Peters, ISBN 1568810385 Farin, Gerald E., 1993, Curves and surfaces for computer aided geometric design : a practical guide, third edition, Academic Press Nurnberger, Gunther, 1989, Approximation by Spline Functions, SpringerVerlag. Piegl, Les; Tiller, Wayne, 1995, The NURBS Book, SpringerVerlag, New York, ISBN 0387550690. [Tim Strotman]: Provides an excellent blending of theory and practical implementation. Spaeth, Helmuth, 1995, One Dimensional Spline Interpolation Algorithms, A.K. Peters, ISBN 1568810164 Spaeth, Helmuth, and A.K. Peters, 1995, Two Dimensional Spline Interpolation Algorithms, A.K. Peters, ISBN 1568810172 Wahba, Grace, 1990, Spline Models for Observational Data Software:See Indices of NA Software on the Net .Nearest Neighbor algorithmsBooks:Dasarathy, Belur V., 1991, Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los Alamitos, CA. [SJS]: Excellent. Fitting a circle to a set of pointsProblem: Given N Points: we want to find for bestfit circle:
(Note: for fitting an *ellipse*, substitute the equation for an ellipse for the equation for a circle in the "Brute Force Approach"). Brute Force Approach (leads to a nonlinear system): [Amara Graps]Idea: Minimize by least squares the rootmeansquared error of the radius in the equation for a circle. In this method, one minimizes the sum of the squares of the *length* differences, which gives you a nonlinear problem with all the associated pitfalls.
where:
1) Get first estimate for
(Details: Find some points to make first estimates either solve the circle exactly (3 equations, 3 unknowns) to get a first estimate of the center and radius, or just do a kind of centroid calculation on all points to get a rough estimate of the center and radius.)
2) Calculate
3) Calculate the rootmeansquared error For example, for 5 given points on the circle:
(dividing by "3" because we have 3 free parameters.)
4) Change
Minimization details:Because minimization algorithms can get very computationally intensive, if one's circlefitting problem is simple, I would look for a "canned" minimization routine. Some commercial computer programs for plotting and spreadsheets do this sort of thing. For example, the Excel spreadsheet has the builtin "Solver" that will perform minimzation.Other possibilties for minimization software:Matlab with the optimization toolbox. IDL with Markwardt's MPFIT or MACSYMA. ODRPACK from Netlib (see Netlib ) is an excellent library for a production mode approach to fitting these data, with a graphical user interface at Plato: Decision Tree for Optimization Software At the very beginning of the User's Guide there is a description of how to fit data to an ellipse or circle. [Don Wells, Hans D. Mittelmann]. Gaussfit is a superb response to the whole family of problems of adjusting parameters to fit functions to data; it is especially effective and easy to use for onetime problems and for exploratory work [Don Wells]. The LBFGSB Nonlinear Optimization package allows one to specify bounds on the variables, from: LBFGSB [Helmut Jarausch]. The Fortran programs fcircle.f, fellipse.f and the Lawson & Hanson leastsquares routines ls.f shows how to implement these leastsquares problems at: fcircle.f, fellipse.f, etc. Alan Miller has updated the above for Fortran 90. The leastsquares software is at: Alan Miller's Sofware Some Web Guides [Amara Graps]GraphPad Guide to Nonlinear RegressionUser's Manual  GaussFit: {A} system for least squares and robust estimation", William H. Jefferys and Michael J. Fitzpatrick and Barbara E. McArthur and James E. McCartney", [Don Wells] Other PapersI. Kasa, "A Curve Fitting Procedure and Its Error Analysis", IEEE Trans. on Inst. Meas., 3/96, describes an algorithm that is a modified least squares analysis that uses the difference of the square of the radii rather than the absolute difference. [Rod Loewen] Moura, L. and Kitney, R. I. (1991). A direct method for least squares circle fitting. Computer Physics Communications 64, 5763. [Colin B. Desmond] "A Buyers Guide to Conic Fitting", A.W. Fitzgibbon, British Machine Vision Conference, 1995. Cunningharn, Robert W.: Comparison of three methods for determining fit parameter uncertainties for the Marquardt Compromise, Computers in Physics 7(5), 570, (1993) [Amara Graps] W. Gander, G. H. Golub and R. Strebel Least Squares Fitting of Circles and Ellipses BIT vol.34, 1994, pgs. 558578 [Suhail A Islam]. Berman & Somlo: "Efficient procedures for fitting circles..." IEEE Instrum & Meast., IM35, no.1, pp. 3135, 1986. [Peter Somlo] Paul T. Boggs and Richard H. Byrd and Robert B. Schnabel, "A Stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM Journal on Scientific and Statistical Computing, 1987, 8, 6,10521078 [Don Wells]. P. T. Boggs and J. R. Donaldson and R. H. Byrd and R. B. Snabel, "Algorithm 676, {ODRPACK}  Software for Weighted orthogonal distance regression", ACM Transactions On Mathematical Software, 1989, 15, 348364 [Don Wells]. 5) Calculate r (r1, r2 etc.) again from new (X0,Y0,R) above. 6) Calculate RMS error again. 7) Compare current RMS error with previous RMS error. If it doesn't vary by more some small amount, say 10^{3} then you're done, otherwise continue steps 4  7. Other (more elegant) approaches that reduce to a linear system. Linear solutionIf you choose to minimize the squares of the *area* differences, you get a linear problem, which is a much safer way. [Pawel Gora, Zdislav V. Kovarik, Daniel Pfenniger, Condensed by Amara Graps]1. Function to minimize is (sum of the area differences):
2. A necessary condition is the system of 3 equations with 3 unknowns
3. Developing we get the linear leastsquares problem:
(for example, for 5 points) where Take any good leastsquares algorithm to solve it, yielding a,b,c. So the final circle solution will be given with
By the way, with 5 points you can also find the unique quadratic form (ellipse, hyperbola) which passes exactly through 5 points. With more than 5 points one can do a linear leastsquares as well. The problem is then to minimize:
The robust or L1 or leastfirstpower approximation [Zdislav V. Kovarik]. If you try to minimize all you have to do is set up the 10 (i.e. if 5, choose 3) circles passing through every choice of 3 of 5 points, calculate W(a,b,r) for each of them and pick the minimizing circle. The extra feature is that this procedure separates and discards points which are conspicuously far from the majority trend. (Of course, it becomes increasingly timeconsuming when the number of given points increases.) This method of determining the minimum bounding circle from a set of _circles_ is solved, and with code available at: [Sky Coyote] ZunZun Interactive 2D and 3D Data ModelingZunZun is an online interactive fitting and graphing for 2D and 3D data sets.CurveExpertCurveExpert is a comprehensive curve fitting system for Windows. XY data can be modelled using a toolbox of linear regression models, nonlinear regression models, interpolation, or splines. Over 30 models are builtin, but custom regression models may also be defined by the user. Fullfeatured graphing capability allows thorough examination of the curve fit. The process of finding the best fit can be automated by letting CurveExpert compare your data to each model to choose the best curve. This program was designed to be simple but powerful, so that all users can obtain a model for their data quickly and easily.
