Chapter 7 - Solving Simultaneous Equations, Curve Fitting, and FFTs

Solving Simultaneous Equations
- Solving Simultaneous Equations with Multiple Right Hand Sides
Linear Least Squares Data Analysis
Fast Fourier Transforms

QED-FORTH provides a powerful set of commands to solve sets of simultaneous equations, do linear least squares data analysis, and perform Fast Fourier Transformations (FFTs). These make it easy for an application program to accomplish sophisticated data processing.

Solving Simultaneous Equations

In the "Getting Started With the QED Board" booklet we solved the set of simultaneous equations:

1.07 x₁ + 0.19 x₂ + 0.23 x₃ = 2.98
0.38 x₁ + 1.00 x₂ + 0.74 x₃ = 3.48
0.12 x₁ + 0.53 x₂ + 1.20 x₃ = 3.70

which is written in matrix form as

MX = R

where M is a coefficient matrix, X is a matrix of unknowns, and R is a matrix of constants (sometimes called the residue matrix in linear algebra), and

      1.07 0.19 0.23          x₁            2.98
M = 0.38 1.00 0.74    X = x₂       R = 3.48
      0.12 0.53 1.20          x₃            3.70

The following code solves the system of equations by dimensioning and initializing the coefficient and residue matrices, placing the xpfas of the 3 relevant matrices on the stack, and calling the command SOLVE.EQUATIONS which calculates the values of the X matrix:

MATRIX: M ↓ ok \ create coefficient matrix
3 3 ' M DIMMED ↓ ok \ dimension coefficient matrix
MATRIX M = 1.07 0.19 0.23 ↓ \ initialize coefficients
0.38 1.00 0.74 ↓
0.12 0.53 1.20 ↓ ok
MATRIX: R ↓ ok \ create residue matrix
3 1 ' R DIMMED ↓ ok \ dimension residue matrix
MATRIX R = 2.98 3.48 3.70 ↓ ok \ initialize residue
MATRIX: X ↓ ok \ create matrix of unknowns
' M ' R ' X SOLVE.EQUATIONS ↓ ok \ solve the equations
' X M.. ↓ \ print the results
Matrix X = \ QED-FORTH prints the answer
2.099
0.8259
2.509
ok

Solving Simultaneous Equations with Multiple Right Hand Sides

Sometimes it is necessary to solve several systems of equations, each having the same left hand sides but with different right hand sides, or residue matrices. In this case, the equation MX=R is most efficiently solved by finding the LU decomposition of M, using the word LU.Decomposition. ("LU" refers to "lower/upper", a decomposition method that minimizes the propagation of round-off error). Now the matrix equation becomes,

LU(M) X= R

where LU(M) is the decomposition of the M matrix. This equation can be solved for the unknown matrix X with a minimum of computation using the word LU.BACKSUBSTITUTION. To solve a given equation for several residue matrices, LU(M) needs to be computed only once. LU.BACKSUBSTITUTION can then solve each residue matrix to get each solution matrix with minimal additional computation. The following program implements this technique for the single system of equations above:

MATRIX: INDEX.ARRAY ↓ ok \ we need this temporary matrix
\ We then replace M with its LU decomposition:
' INDEX.ARRAY ' M LU.DECOMPOSITION ↓ ok
\ and drop the determinant's sign and do the backsubstitution:
DROP ' R ' M ' INDEX.ARRAY LU.BACKSUBSTITUTION ↓ ok
\ replacing R with the solution.
CR CR ." The solution for the R residue matrix is: " ' R M. ↓
The solution for the R residue matrix is:
2.099
0.8259
2.509
ok

As expected, this solution is the same as that calculated by SOLVE.EQUATIONS. Now for a new right hand side matrix named Q, the following program solves the equations with minimal additional computation:

MATRIX: Q 3 1 ' C DIMMED ↓ ok \ define a second right hand side
MATRIX Q = 1.2 2.6 4.3 ↓ ok \ and initialize
' Q ' M ' INDEX.ARRAY LU.BACKSUBSTITUTION
CR CR ." The solution for the Q residue matrix is: " ' Q M.
The solution for the Q residue matrix is:
0.38
-0.2495
3.656
ok

This approach efficiently solves the equations for multiple residue matrices.

Linear Least Squares Data Analysis

One of the most powerful and generally used techniques of data analysis is least squares analysis, also called curve fitting, in which a model containing adjustable coefficients is fitted to a set of data. The kernel word LEAST.SQUARES determines the optimal values for these coefficients given a particular set of data. Using this word it is possible to write curve fitting words for any kind of model that is linear in the coefficients.

This section describes how to perform curve fitting using LEAST.SQUARES. The description starts with simple examples and progresses to a detailed look at the analysis of errors calculated by the curve fit. You need only read as much as your application dictates. The sample programs listed in this section demonstrate exactly how to use the LEAST.SQUARES word, and the glossary entry provides a terse description of the actions performed by LEAST.SQUARES.

This section is not meant as a tutorial on least squares curve fitting. If after reading this description you still have questions, please refer to more complete references, for example, Data Reduction and Error Analysis for the Physical Sciences by Philip R. Bevington, McGraw Hill 1969, Mathematical Handbook for Scientists and Engineers by G.A. Korn and T.M. Korn, McGraw Hill 2nd Ed 1968, or Numerical Recipes - The Art of Scientific Computing by W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling, Cambridge University Press, 1986.

In least squares analysis, data is condensed and summarized by fitting it to a "model" that depends linearly on coefficients. The model is usually an equation in which functions of independent variable(s) are summed to form a function of the dependent variable(s),

Y^model = C₀ X₀ + C₁ X₁ + C₂ X₂ + ... C_m-1 X_m-1 = ∑_j C_j X_j for j=0…m-1

in which Ymodel represents a function of the dependent variable(s), the C0-Cm-1 are the coefficients and the X0-Xm-1 are functions of the independent variable(s). The "linear" in least squares data analysis refers to the coefficients; the functions may themselves be either linear or nonlinear in the independent variables, and in fact need not be analytical functions at all. For example the model could be a polynomial,

Y^model = C₀ + C₁x + C₂x^{2<sup> + … C_m-1x<sup>m-1}

or it could be based on trigonometric functions:

Ymodel = C₀ + C₁COS²(x) + C₂SIN⁴(xz)

The adjustable coefficients C0, C1, etc. determine the contributions of the functions of the independent variable(s) to the sum.

LEAST.SQUARES is a general curve fitting word that does all of the following:

It fits the model to the dependent data values to find the optimum values of the model coefficients.
It provides error estimates for the determined coefficients.
It provides a statistical measure of the goodness of fit.
It allows individual data points to be weighted to reflect the measurement errors or expected variances for those points.

The general model equation described above can be represented by the matrix equation,

Y_np = X_nm C_mp

where:

X is a matrix of n evaluations of the m functions, the X0-Xm-1 above, of the independent variable(s)
Y is a matrix of n instances of p functions of the dependent variable(s).
C is an m by p matrix of unknown model coefficients

As long as the number of instances n is greater than the number of functions m, that is, as long as we have a sufficient number of data points, the matrix equation is said to be over-determined. We can use least squares fitting to solve an over-determined matrix equation for optimal values of the matrix C. Most problems require only a single function of the dependent variable(s) so Y and C are generally just single-column matrices (p=1). It turns out that many useful problems can be framed in this general matrix equation format.

For example, assume that we have an observable process to be modeled by the equation,

y = C₀ + C₁x + C₂x² for x = 0, 1, 2, 3, 4

where x is the independent variable, C0, C1, and C2 are the coefficients that we want to determine, and y is the measured dependent variable. The model could represent the calibration of a thermocouple in which the Y are several thermocouple junction potentials corresponding to several temperatures, x. We can perform an experiment by setting x to each of each of the four prescribed values (0,1,2,3,4) and measuring the corresponding four y outputs of the system we are modeling. Then the matrix equation can be written as

Ynp     =     Xnm    Cmp

y0        1  0  0    C0
y1    =   1  1  1    C1
y2        1  2  4    C2
y3        1  3  9
y4        1  4 16

where y0...y4are the measured outputs, and we have filled in the X matrix using the specified values of x for the terms x0, x1, and x2 (called "basis functions") in the model. Note that the first basis function is just a constant function, of value unity, for all instances of x. In this example:

n, the number of "evaluations" or "observations" or "instances", equals 5,
p, the number of functions of the dependent variable, equals 1, and
m, the number of (basis) functions of the dependent variable equals 3.

The goal of a general least squares program is to find the "best" values of the coefficients so that the model most closely fits the dependent data values. The figure of merit minimized by LEAST.SQUARES to determine the closeness of fit is the observed sample variance. It is defined as the weighted sum of squares of the differences between the actual instances of Y and the modeled Y, divided by the number of degrees of freedom. It is a variance per instance. For each column of Y the associated observed sample variance is,

Sample.Error^{2 = ∑ w_i (y_i - y_i<sup>model} )² /(n-m)

where

w_i	are "normalized" weights
y_i	are the input y values
y_i^model	are the predicted values of the dependent variable(s) computed from Y = X C
(n-m)	is the number of degrees of freedom in the optimization, equal to the number of instances (rows of X or Y) minus the number of coefficients determined (columns of X or rows of C).

The summation is taken over the number of instances, or rows in Y. The "normalized weight", w_i, associated with each measured data point y_i, when known, is proportional to the inverse of the variance expected for that data point.

LEAST.SQUARES finds the best values for the coefficients that minimize the Sample.Error2. The routine reports the value of Sample.Error² for each column of Y, and can optionally report the predicted y values (y_i^model), the errors on y, ( y_i - y_i^model), and the covariance matrix whose diagonal elements are proportional to the variances of the determined coefficients and whose off-diagonal elements are proportional to the covariances between them.

In the following sections we'll demonstrate using LEAST.SQUARES by way of several examples.

A Least Squares Example - Fitting a Line

Suppose that you wish to straight-line fit a measured value, y, to an independent variable x. You have nine measurements whose x,y data pairs are given as

( x0, y0 ) = 1.0 15.6 ( x5, y5 ) = 6.0 61.6
( x1, y1 ) = 2.0 17.5 ( x6, y6 ) = 7.0 64.2
( x2, y2 ) = 3.0 36.6 ( x7, y7 ) = 8.0 70.4
( x3, y3 ) = 4.0 43.8 ( x8, y8 ) = 9.0 98.8
( x4, y4 ) = 5.0 58.2

The expected variances on the y values aren't known so you decide to uniformly weight the data. You'd like to determine the coefficients (the slope and intercept) together with their expected errors for the equation

y_i^model = C₀ + C₁x_i

The following program solves this problem and prints the values of the coefficients and their standard errors.

ANEW LINE.FIT.EXAMPLE
 
\ First define and initialize matrix Y-DATA that holds the measured (dependent) data.
\ The Y data matrix is just the list of y values:
 
MATRIX: Y-DATA    9 1    '  Y-DATA DIMMED
MATRIX  Y-DATA =     15.6 17.5 36.6 43.8 58.2 61.6 64.2 70.4 98.8
 
\ The X matrix is constructed by evaluating the two basis functions at each of
\ the values of the independent variable x.  The two basis functions in this
\ case are X0 = 1 and X1 = x, so we have,
 
 
MATRIX: X-DATA
9 2 ' X-DATA DIMMED     \ Dimension and initialize and  the X matrix.
\ X0=1, and X1=x
MATRIX X-DATA =
    1     1.0
    1.0    2.0
    1.0    3
     1.0    4.0
     1.0    5
     1    6.0
     1    7.0
     1    8.0
     1    +9
\ Note that numbers can be entered as integer or real numbers.
\ The word MATRIX accepts either, and converts all numbers to floating point.
 
\  The following additional data structures are needed:
    MATRIX: COEFFICIENTS    \ will hold the determined coefficients
    MATRIX: Y-MODEL    \ after the fit this holds predicted y values
    MATRIX: Y-ERRORS    \ after the fit this holds errors on y values
    MATRIX: SAMPLE.VARIANCE    \ to hold the observed sample variance
    MATRIX: COVARIANCES    \ after the fit this holds the variances and
        \ covariances on the determined coefficients
 
 
: CLEAN.UP ( -- )
    \ Delete the various matrices.
    ' COEFFICIENTS    DELETED    ' Y-MODEL     DELETED
    ' X-DATA    DELETED    ' Y-ERRORS    DELETED
    ' Y-DATA    DELETED    ' COVARIANCES    DELETED
    ' SAMPLE.VARIANCE     DELETED
    ;
 
\ This word ensures that if the matrices are forgotten they are first deallocated.
: ON.FORGET ( -- )
    CLEAN.UP
    ;
 
\ The following word does the curve fit by setting up the stack and calling the
\ kernel word LEAST.SQUARES :
 
: LINE.FIT (  --  )
    \ Performs a straight line curve fit.  First, set up the stack picture:
     ' X-DATA          \ Tell where the functions of the x values are
    ' Y-DATA          \ and tell where to find the y values.
    ' COEFFICIENTS        \ Tell where to put the determined
    ' SAMPLE.VARIANCE    \ coefficients and the sample variance.
     0\0            \ There is no weighting other than uniform weighting
                \ so we pass a 0\0 instead of a matrix.
    ' Y-MODEL        \ Tell where to place the model predictions
    ' Y-ERRORS          \ and the individual Y errors.
     ' COVARIANCES        \ Tell where to put the variance-covariance matrix.
    LEAST.SQUARES      \ Call the curve fit.
     \ As explained in a later section, we now multiply the variance-covariance
     \ matrix by the observed sample variance
     \ to get absolute variances on the determined coefficients:
     0 0 SAMPLE.VARIANCE F@   ' COVARIANCES XDUP   S*MATRIX
    ;
 
: PRINT.RESULTS ( -- )
    CR ." The intercept is "
    0 0 COEFFICIENTS F@ F.    ." +/- " 0 0 COVARIANCES F@ FSQRT F.
    CR ." and the slope is "
    1 0 COEFFICIENTS F@ F.    ." +/- " 1 1 COVARIANCES F@ FSQRT F.
    CR ." The variance/covariance matrix for the determined coefficients is"
    ' COVARIANCES M.
    CR CR ." The rms average sample deviation of the data from the best fit is "
    0 0 SAMPLE.VARIANCE F@ FSQRT F.
    CR CR ." The predicted values of the data are: "        ' Y-MODEL M.
    CR ." The deviations of the data from the best fit are: "        ' Y-ERRORS M.
    ;

\ Now execute the fit, print the answer, and clean up by deleting the matrices:
LINE.FIT PRINT.RESULTS CLEAN.UP↓
The intercept is 4.814 +/- 4.886
and the slope is 9.408 +/- 0.8683
The variance/covariance matrix for the determined coefficients is
23.87 -3.77
-3.77 0.7539
The average sample standard deviation of the data from the best fit is 6.726
The predicted values of the data are:
14.22
23.63
33.04
42.45
51.86
61.26
70.67
80.08
89.49
The deviations of the data from the best fit are:
1.377
-6.131
3.561
1.353
6.345
0.3364
-6.472
-9.68
9.312
ok

Another Example: Fitting an Exponential Equation Using Weights

To optimally determine the coefficients in a least squares fit, each y value should be weighted with a number proportional to the inverse of its expected or measurement variance. In this way, the greater the uncertainty associated with a particular measurement, the lower will be its weight, and the less that data point contributes to the determination of the coefficients. LEAST.SQUARES gives you the option of including a matrix of weights. If the measurement variances are known, the weights should be represented as a matrix that is parallel to (i.e., has a one-to-one correspondence with) the Y matrix. If the variances are not known or if they are expected to be the same for all of the y values, then LEAST.SQUARES automatically assigns a unity weight to each dependent data point.

In the following example, because a nonlinear transformation is applied to the raw data before performing the curve fit, properly weighting the data is essential to correct data analysis.

Consider an experiment in which the activity of a radioactive source is measured as a function of time and we wish to determine the initial activity and the rate of decay. A Geiger counter counts radioactive events as a function of time with the following results:

R(    0 sec)    =    106    R(    75 sec)    =    73
R(    15 sec)    =    80    R(    90 sec)    =    49
R(    30 sec)    =    98    R(    105 sec)=    38
R(    45 sec)    =    75    R(    120 sec)=    37
R(    60 sec)    =    74    R(    135 sec)=    22

Theoretical considerations dictate that the function should be a decreasing exponential of the form,

R(t) = R₀ exp(-t/T)

We would like to fit this function to the data to find the initial rate, R0, and the characteristic decay time, T. Unfortunately the equation is not linear in the coefficients we'd like to determine. It is, however, what is called "intrinsically linear", meaning that with a little algebra we can write it so that it is linear in the two coefficients. If we take the logarithm of each side we obtain,

Ln(R) = Ln(R₀) + (-1/T)t

which is now in a form equivalent to our standard,

y = C0 X₀ + C1 X₁

if we only identify y= Ln(R), C₀= Ln(R₀), C₁= (-1/T), X₁= 1, and X₁=t.

The LEAST.SQUARES routine calculates the optimal values of C0 and C1, and we can solve the original equation

R(t) = R₀ exp(-t/T)

by substituting for C₀ and C₁ as

R(t) = exp(C₀) * exp(C₁*t)

We know from statistical considerations that in counting experiments like this the expected variance on each count (the square of the measurement error), VAR(R), is numerically equal to the count, R. But we are not fitting the count directly; rather, we're fitting a logarithmic function of it, so we must transform the expected variance appropriately. The variance on y, VAR(y) is found by differentiating y with respect to R as,

VAR(y) = (dy/dR)² VAR(R)

Because dy/dR = 1/R and VAR(R)= R, we find VAR_(y)=1/R. The weights should be inversely proportional to the variance so we choose weights to be given by R, w_i = R_i.

We can now code and solve the problem:

ANEW WEIGHTED.FIT.EXAMPLE
 
MATRIX: Y-DATA    10 1 ' Y-DATA  DIMMED    \ First we define and dimension the data matrices.
MATRIX: X-DATA    10 2 ' X-DATA  DIMMED
MATRIX: WEIGHTS    10 1 ' WEIGHTS DIMMED
 
\ We'll also need matrices to place the results in:
 
MATRIX: SAMPLE.VARIANCE        MATRIX: COEFFICIENTS        MATRIX: COVARIANCES
 
\ For convenience we'll define a word to set up the stack for LEAST.SQUARES and then execute it:
 
: WEIGHTED.LINE.FIT ( -- )
    ' X-DATA    \ We set up the stack for LEAST.SQUARES
    ' Y-DATA    \ by giving it the X matrix and the Y matrix,
    ' COEFFICIENTS    \ a place to put the determined coefficients
    ' SAMPLE.VARIANCE    \ and the least squared error, and
    ' WEIGHTS    \ telling it where to find the input weights.
    0\0        \ We don't need the best fits or the errors on each data
    0\0           \ point so we pass dummy matrix.xpfas.
    ' COVARIANCES    \ This is where we'd like the covariance matrix.
    LEAST.SQUARES    \ Do the curve fit.
    \ We now multiply the variance-covariance matrix by the "a priori" sample
    \ variance to get absolute variances on the determined coefficients.
    \ The "a priori" sample variance is just the inverse of the average
    \ sample weight.  This is explained in a later section of the chapter.
    ' WEIGHTS ?DIM.MATRIX DROP FLOT    \ Number of instances for the average
    ' WEIGHTS MATRIX.SUM F/             \ Use the a priori sample variance
    ' COVARIANCES XDUP S*MATRIX        \ to scale the covariance matrix.
    ;
 
\ and a word to print the results,
 
: PRINT.RESULTS ( -- )
   CR ." The intercept is "
      0 0 COEFFICIENTS F@ F.    ." +/- " 0 0 COVARIANCES F@ FSQRT F.
   CR ." and the slope is "
      1 0 COEFFICIENTS F@ F.    ." +/- " 1 1 COVARIANCES F@ FSQRT F.
   CR ." The variance/covariance matrix for the determined coefficients is"
      ' COVARIANCES M. CR
   CR ." The average sample standard deviation of the data from the best fit is "
   0 0 SAMPLE.VARIANCE F@ FSQRT F.
   ;
 
 
: CLEAN.UP ( -- )
    \ Delete the various matrices.
    ' COEFFICIENTS    DELETED    ' COVARIANCES    DELETED
    ' X-DATA    DELETED    ' WEIGHTS        DELETED
    ' Y-DATA    DELETED
    ' SAMPLE.VARIANCE DELETED
    ;
 
 
: ON.FORGET ( -- )    \ This word ensures that if the matrices are forgotten,
    CLEAN.UP ;    \ for example by ANEW, they are deleted from the heap first.
 
\ The X matrix is initialized as:
\ X0=1, and X1=t.
MATRIX X-DATA =
    1      0
    1     15
    1     30
    1     45
    1     60
    1     75
    1     90
    1    105
    1    120
    1    135
 
\ and the data matrix and weights are initialized by the following.  Note that
\ we are entering the logarithms of the radioactivity counts directly, but
\ we could just as well have entered the counts and then used a word like
\ TRANSFORM.MATRIX to convert the entire matrix to its logarithms.
 
MATRIX Y-DATA =
    4.663 4.382 4.585 4.317 4.304 4.290 3.892 3.638 3.611 3.091
    \ Enter the logs of the original measured variable.
 
MATRIX WEIGHTS =
    106      80    98    75    74    73    49    38    37    22
    \ The weights are numerically equal to the measured variable.

\ and we run the program by typing,
WEIGHTED.LINE.FIT PRINT.RESULTS ↓

The intercept is 4.725 +/- 0.06449
and the slope is -0.009031 +/- 0.001005
The variance/covariance matrix for the determined coefficients is
0.004159 -5.150E-05
-5.150E-05 1.010E-06
The average sample standard deviation of the data from the best fit is
0.1722 ok

Thus we have solved the original equation

R(t) = R₀ exp(-t/T)

in the form

R(t) = exp(C₀) * exp(C₁*t)

and the result is

R(t) = exp(4.725) * exp(-0.009031*t)
= 112.73 * exp(-0.009031*t)

Fitting Several Functions of the Dependent Variables

Each of the rows of the X matrix and of the Y matrix represents a "trial" in the experiment corresponding to a particular value of the independent variable(s). We could perform several experiments, each with the same values of independent variable(s), but producing different measured values of y (because of noise, experimental error, etc.) These additional experiments would be represented as additional columns in the Y and C matrices; the X matrix would be unchanged. Or, we might want to fit the same X matrix to several different functional forms of the dependent variable in order to determine which provides the better model. In any case, the LEAST.SQUARES routine can handle these multi-column problems as well as the simpler single-column case shown in the above examples.

Interpreting the Covariance Matrix

If the user provides no weights, or if there is uniform weighting, or if all weight columns are equal to one another, then the variance-covariance matrix returned by LEAST.SQUARES provides the relative covariances of the coefficients in any of the columns of the coefficient matrix, C. Because the input weights are not considered to be equal to the inverses of the expected variances on the data, but only proportional to those inverses, (this is the most versatile way of solving the problem) the variances and covariances for the determined coefficients are also not absolute but only relative.

There are two ways to compute the actual variances and covariances for a particular column of C. The first should be used if the actual experimental or measurement errors are known. In this case the coefficient variances are determined from the expected measurement errors by multiplying the variance-covariance matrix by the "a priori" sample variance. The "a priori" sample variance is just the inverse of the average sample weight, where the sample weight for each data point is the inverse of the actual variance assigned to that data point. We used this method in the second example above.

The second method is used if the uncertainties associated with a set of measurements are not known in advance. In this case, the curve fit's estimate of the average sample variance, the observed sample variance returned by LEAST.SQUARES, is used instead of the "a priori" sample variance. In the first example above we used this method to estimate the errors on the slope and intercept because we did not know what errors to expect on the y values; instead we estimated them from the observed deviations of the data points from the fitted values. In that example, the covariance matrix was multiplied by the observed sample variance, found in SAMPLE.VARIANCE. If there is more than one column in Y and C then the variance-covariance matrix should be multiplied in turn by each entry of SAMPLE.VARIANCE to get the actual variance-covariance matrix for the coefficients of the corresponding column of C.

If you are solving a problem with multiple columns in Y and the weights differ from column to column, the covariance matrix that is returned applies only to the first column of C, corresponding to the first column of Y or W. If you need the covariance matrix for each column of C then separate least squares problems should be solved, one for each column of Y with different weights.

After being scaled with the observed sample variance or with the a priori sample variance, the diagonal elements of the variance-covariance matrix are the variances on the coefficients of C, and their square roots are the expected errors on the coefficients. Comparing these expected errors with the magnitude of the determined values gives an indication of the sensitivity of the model to the coefficients. The off-diagonal elements provide the covariances between pairs of determined coefficients. If the experiment is designed so that the X matrix after weighting is orthogonal, then the covariances between determined coefficients are zero. This is the ideal experimental design. As columns in the X matrix become more similar the covariances increase, and if the experimental design is so faulty that the columns of X (the functions X0-Xm-1) are degenerate (that is, mutually linearly dependent or redundant) then the covariances increase without bound and the results become nonsense. It is the responsibility of the experimental designer to pick model functions that span the data set and yet are not redundant. Near redundancy in the model functions can be diagnosed by inspection of the covariances; if coefficients are highly correlated then some of the model functions should be deleted, perhaps with the aid of an orthogonalization procedure. Another solution is to apply a curve fitting technique such as singular value decomposition that automatically identifies and removes redundancy in the X matrix.

Evaluating Goodness of Fit

The LEAST.SQUARES routine reports a value for observed sample variance for each column of Y in the user-supplied SAMPLE.VARIANCE matrix. If the user provides appropriate matrix.xpfas LEAST.SQUARES also reports the predicted Y values in Y.model.matrix, the residuals or errors on Y, (Y-Ymodel), in Y.err.matrix, and the variance-covariance matrix for the determined coefficients in Covar.matrix.

The observed sample variance is used to evaluate the goodness-of-fit. The observed sample variance can be interpreted as the sample variance (the average variance per data point) that is observed in light of the fitted model. The model or fit is considered good if the observed sample variance is comparable to that expected from a priori knowledge of the sample variance. If the experimenter does not know what errors to expect, the validity of the model cannot be determined by any means, no matter how small the observed errors or how close the fit. This does not preclude the possibility of the model's being very useful still, but its validity in explaining the data cannot be determined unless it is known a priori just how precisely the model must explain the data in order to be valid.

If the expected sample variance is known from a priori considerations, then it should be compared to the observed sample variance. If the weights are indeed equal to the inverses of the actual experimental variances for each data point then the expected sample variance is computed from the weights as n/∑wi , where n is the number of instances. The most useful comparison is to calculate the ratio:

observed sample variance ÷ expected sample variance

This quantity is what is generally referred to in the statistics books as chi-squared or reduced chi-squared (reduced because we have divided by the number of degrees of freedom in computing the observed sample variance). If the reported sample variance is comparable to the expected sample variance, that is, chi-squared is approximately unity, it is likely that the fit is good and that the observed errors simply result from expected statistical fluctuations.

A more precise quantitative evaluation of the goodness-of-fit is found by considering the chi-squared probability distribution. The probability that statistical fluctuation would be responsible for the observance of a chi-squared of at least the magnitude observed or greater is given by the "chi-distribution for n-m degrees of freedom". This probability for different chi-squared and numbers of degrees of freedom is usually tabulated in appendices to statistics books but can also be computed from the complementary incomplete gamma function. Please refer to the above cited references, in particular to the Numerical Recipes book for more detail.

If the reported sample variance is very much larger than expected than there are several possibilities:

The a priori measurement errors were underestimated badly - they are actually larger than stated.
The model is wrong (it is genuinely a poor fit) and an analysis of residuals may reveal significant trends in the errors.
The errors are not nearly normally distributed.

If the reported sample variance is consistently much smaller that expected then it may be that:

The measurement errors had been overestimated, or
The data had been preprocessed to inappropriately bias the results, perhaps, for example, by overly enthusiastic elimination of "outliers".

Another measure related to the goodness-of-fit that is often used (and more frequently abused) is given by the linear correlation coefficient, r, for each column of Y. The linear correlation coefficient spans the range zero to one; it is unity if the model explains all the variance in the y data, and zero if the model explains none of the variance. It is given by,

r² = ( 1 - ((n-m)/n) Sample.Error² / Unmodeled.Sample.Variance )

where,

Unmodeled.Sample.Variance = ∑w_iy_i² /n - (∑w_iy_i/n)²

and w_i represents normalized weights; the weights are normalized so that ∑w_i = n).

How Least Squares Works

Weights are considered to be relative; their absolute magnitudes are ignored and the user's weight matrix is internally normalized by LEAST.SQUARES so that the column averaged weight is unity, that is, the sum of weights in each column is made equal to the number of instances (rows) in the column. This is done transparently to the user; the user's matrix is not actually modified:

w_i<-- n w_i/ ∑w_i

For the unweighted or uniformly weighted problem the normal equations are built as

X^TXC = X^TY

and they are solved as

C = (X^TX)-¹X^TY

For the weighted problem, the normal equations are built only for a single column of C, W, and Y, and solved for a single column of C as

C_j = (X^TW_jX)-¹X^TW_jY_j

where C_j and Y_j are the jth columns of C and Y respectively and W_j is a diagonal matrix comprising the jth column of the normalized weights matrix. The variance-covariance matrix is given by

(X^TX)^-1

for the unweighted case. For the weighted case the variance-covariance matrix that is returned,

(X^TW₀X)^-1

applies only to the first columns of the Y and C matrices.

Y^model is given by

Y^model = X C

and the errors are given by

Y_err = Y - Y^model

The quantity that is minimized by the determined coefficients is the observed sample variance for the jth column of Y, given by,

Sample.Error_j² = ∑_iw_ij(y_ij - y_ij^model)² /(n-m) = Y_jerr^TW_jY_jerr/(n-m)

Fast Fourier Transforms

Many signal processing techniques rely on analyzing or manipulating the frequency spectrum of a digitized waveform. The Discreet Fourier Transform (DFT) converts a digitized time-domain signal (that is, a signal that varies as a function of time) into a frequency domain signal that is represented as a function of frequency. The DFT requires of the order of Nˆ2 computations, where N is the number of digitized samples in the waveform. If the number of samples is an integer power of two, a Fast Fourier Transform (FFT) algorithm can be used which takes only of the order of N*log(N) computations. QED-FORTH implements a floating point decimation-in-time FFT as described in standard references. The 16 point FFT shown below executes in 0.1 seconds; a 64 point FFT executes in 0.6 seconds.

The maximum size FFT that can be performed is 8192 points.

How to Use the FFT Function

To perform an FFT, simply initialize the matrix containing the input waveform, place its xpfa on the stack and call the kernel word FFT to perform the transformation.

The Fast Fourier Transform operates on complex numbers. A quick discussion of complex numbers will clarify how to initialize the input matrix and interpret the results.

A complex number takes the form

A + i B

where i is the imaginary operator, equal to the square root of -1. A is called the "real part" and B is called the "imaginary part" of the complex number. Complex numbers are convenient when a compact representation of both the amplitude and the phase of a signal is required, and they facilitate the representation of sines and cosines as exponential quantities.

QED-FORTH treats a complex number as a pair of floating point numbers (A,B) stored in sequential memory locations with the imaginary part B stored just above the real part A in memory.

The input to the FFT is a set of N complex numbers, where N is the number of samples in the waveform to be transformed. The input waveform is in a matrix that has 2 rows and N columns. Row#0 holds the real part of each complex sample, and row#1 holds the imaginary part. Each column holds a complex value that corresponds to one sample of the input waveform.

To perform the transformation call the kernel word

FFT ( data.xpfa -- )

This word expects on the stack the xpfa of the input data. The FFT results are placed in the input data matrix. The interpretation of the transform results is discussed in conjunction with the following example.

To perform the inverse Fast Fourier Transform, execute

IFFT ( data.xpfa -- )

The inverse FFT converts a frequency domain signal into a time domain representation.

An Example of Fast Fourier Transformation

Let's transform a 16-point input waveform that comprises a constant plus a 2-cycle sinusoid (i.e., one that repeats twice in the course of the 16 points) plus a 5-cycle sinusoid. For example, if the entire waveform is 0.1 second long, then it consists of a DC term plus a 20 Hz sinusoid plus a 50 Hz sinusoid. We assume that these three terms have magnitudes of 1.5, 2.0, and 3.0, respectively. The first part of the program initializes the input waveform.

ANEW FFT.EXAMPLE
\ generate waveform with 1.5 * DC + 2.0 * 2 cycles + 3.0 * 5 cycles
 
DECIMAL
 
INTEGER:    #POINTS    16    TO #POINTS
REAL:        DC.COEFFICIENT    1.5    TO DC.COEFFICIENT
REAL:        2X.COEFFICIENT    2.0    TO 2X.COEFFICIENT
REAL:        5X.COEFFICIENT    3.0    TO 5X.COEFFICIENT
 
: SINE.TERM    ( n1\n2 -- r | n1 = #cycles, n2 = time,  r = sin )
    \ given a frequency n1 and time index n2, and the #points N,
    \ calculate the sine{2πn1n2/N}
    FLOT ROT FLOT F*         \ n1*n2
    2.0 F* PI F*             \ 2πn1*n2
    #POINTS FLOT F/         \ 2πn1*n2/N
    FSIN
    ;
 
MATRIX: WAVEFORM    \ define the input waveform matrix
 
: INIT.WAVEFORM        ( -- )
    \ initialize the input waveform matrix
    2 #POINTS ' WAVEFORM DIMMED    \ dimension as 2xN
    ' WAVEFORM ZERO.MATRIX        \ set imaginary terms to 0.0
    #POINTS 0                    \ for each sample
    DO                        \ init real terms in row 0
        DC.COEFFICIENT             \ add the 3 terms per sample
        2 I SINE.TERM 2X.COEFFICIENT F* F+
        5 I SINE.TERM 5X.COEFFICIENT F* F+
        0 I WAVEFORM F!            \ save real part of sample
    LOOP ;
 
1 RIGHT.PLACES !  2 LEFT.PLACES !
FIXED        \ set printing format for easy viewing of data & results

INIT.WAVEFORM ' WAVEFORM M..
Matrix WAVEF___ =

1.5 5.7 1.4 1.8 4.5 -1.1 -2.6 2.9 1.5 0.1 5.6 4.1 -1.5 1.2 1.6 -2.7
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
ok

The input waveform has been initialized. Note that there are N columns of data with row#0 containing the real part of the input and row#1 containing the imaginary part. For an actual time-domain signal, the imaginary parts are all 0.

The next part of the program performs the FFT and prints the results:

' WAVEFORM FFT \ do the FFT
' WAVEFORM M.. \ print the results of the FFT

Matrix WAVEF___ =
24. -0. -0. -0. 0. 0. 0. 0. -0. 0. 0. 0. 0. -0. -0. -0.
0. 0. -16. -0. -0. -24. 0. 0. 0. -0. -0. 24. 0. 0. 16. -0.
ok

The transformation results are placed in the input matrix WAVEFORM. Row#0 is the real part and row#1 is the imaginary part. Assume that the input waveform spans T seconds. Then column#0 of the transform corresponds to a frequency of 0, column #1 corresponds to 1/T, column#2 corresponds to 2/T, etc. up to a maximum frequency of (plus or minus) N/2T. After the midway point (N/2), the frequencies are negative. That is, the last point at column#N-1 corresponds to -1/T, the next to last point at column#N-2 corresponds to -2/T, etc.

The FFT results for the example show a frequency component of magnitude 24 at 0 frequency. This is the DC component of the input waveform. The magnitude equals the number of points (16) times the input magnitude (1.5). At frequencies of +/- 2/T there are imaginary components of amplitude -/+16. These represent the "2-cycle" term in the time waveform. The sum of the magnitudes in the frequency domain (32) equals the number of points (16) times the amplitude in the time domain (2). At frequencies of +/- 5/T there are imaginary components of amplitude -/+24 which represent the "5-cycle" term in the time waveform.

Sine waves in the time domain are converted to imaginary numbers in the frequency domain. Cosines in the time domain are converted to real numbers in the frequency domain. Thus the relationship of the real and imaginary parts of the frequency components indicates the phase of the time-based signal.

To convert the frequency spectrum back into the time domain, the inverse FFT is performed as

' WAVEFORM IFFT \ do the inverse FFT

Matrix WAVEF___ =
1.5 5.7 1.4 1.8 4.5 -1.1 -2.6 2.9 1.5 0.1 5.6 4.1 -1.5 1.2 1.6 -2.7
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
ok

Note that the original time signal is recovered. To clean up, we can delete the WAVEFORM matrix and re-establish the default floating point print format by executing

' WAVEFORM DELETED
FP.DEFAULTS

For Experts: Definition of the Discrete Fourier Transform

You need not read or understand this section to use the FFT function.

Those interested in the decimation-in-time algorithm may consult Digital Signal Processing, A.V.Oppenheim and R.W.Schafer, Prentice-Hall, Inc., 1975, pg 291-302. For a description of the standard format of input and output arrays for the FFT, see Numerical Recipes, W.H. Press, B. P. Flannery, et.al., Chapter 12.2, FFT, Cambridge Univ. Press, N.Y., 1986.

For a repeating time series x(t) of N points ( t=0....N-1 ), the discrete Fourier transform X(f) is given by:

X(f) = ∑ x(t)exp[-2πift/N]

where t is time, f is frequency, i is the imaginary operator, and the sum is taken from t = 0 to t = N-1.

The inverse discrete Fourier transform is given by:

x(t) = (1/N) ∑ X(f)exp[+2πift/N]

where the sum is taken from f = 0 to f = N-1.

The FFT and IFFT routines have access to a pre-initialized constant matrix of exponential constants holding the values

exp[-i(π/2ˆI)]

as complex numbers. The specified routine then performs the required summations to transform the input waveform. It becomes clearer that the FFT is performing a sinusoidal decomposition when you note that

exp[-i(π/2ˆI)]= cos(π/2ˆI) - i sin(π/2ˆI)

The Fast Fourier Transform works for waveforms with the number of samples equal to an integer power of 2, up to and including 213=8192 points. It reduces the computation from the order of Nˆ2 for the DFT to the order of N*log(N) computations for the FFT.

This page is about: Solving Simultaneous Equations, Curve Fitting, and FFTs on Single Board Computers – QED FORTH provides powerful set of commands to solve sets of simultaneous equations, do linear least squares data analysis, and perform Fast Fourier Transformations (FFTs). These make it easy for an application program to accomplish sophisticated data …