# Chapter 6 - Arrays, Matrices, and Fast Vector Operations

QED-Forth provides a set of data structures and associated operations that simplify and streamline sophisticated processing tasks. Arrays and matrices can be easily created, dynamically dimensioned, deleted, initialized, and copied. Matrices, which are 2-dimensional arrays of floating point numbers, support a comprehensive set of pre-programmed data editing and matrix algebra operations. This chapter describes how to define and use these data structures.

## Arrays

An array is a data structure stored in memory whose elements can be individually addressed and accessed. QED-Forth supports dynamically dimensioned arrays whose elements are stored in heap. The array can be redimensioned as needed under program control and the array's storage space can be dynamically re-allocated. Consult the "Array" section of the Categorized Word List in the QED-Forth Glossary for a list of available routines in the array library.

### Array Size and Dimensions

The number of elements per dimension must be between 1 and 16,383, and the number of bytes per element must be between 1 and 65,535. The size of an array is limited by the amount of space available in the current heap. The size of the heap, in turn, is limited only by the amount of available contiguous RAM. The heap may occupy multiple contiguous pages.

The value stored in the user variable MAX#DIMENSIONS determines the maximum number of dimensions in an array; the default is 4, but you can specify any number up to a maximum of 255. The higher the contents of MAX#DIMENSIONS, the more memory is required in the array's parameter field which holds the dimensioning information of each array. Each additional dimension requires 2 additional bytes in the parameter field.

If you wish to change the default value of MAX#DIMENSIONS, it is recommended that you do so *before* any arrays are defined. Unpredictable behavior may result if arrays are defined while MAX#DIMENSIONS is small, and then it is increased. This is because the value of MAX#DIMENSIONS determines the size of the array's parameter field when the array is created. If MAX#DIMENSIONS is later increased, operations on the array will assume a larger size parameter field, perhaps writing over memory that is associated with the parameter field of another array.

### Creating Arrays

To create a named array execute ARRAY: followed by a name of your choice, as

ARRAY: ARRAY.NAME↓ok

The array is created but no heap memory is assigned to it yet. Memory is allocated and the array is dimensioned using the command DIMENSIONED which requires the limits of the indices under the number of indices under the number of bytes per element under the array's extended parameter field address (xpfa). There can be up to MAX#DIMENSIONS indices (the default is 4). For example to dimension an already named array to have 2 dimensions, 5 rows and 10 columns, with 2 bytes per element, execute

DECIMAL↓ok\ set base to decimal

5 10 2 2 ' ARRAY.NAME DIMENSIONED↓ok

where ' (pronounced "tick") places the extended parameter field address on the stack; the xpfa is used to refer to the array as a whole. Memory for the array is allocated in the current heap. If the heap has insufficient room for the specified array, QED-Forth will ABORT and print the message:

ARRAY.NAME is too large. Not enough memory !

### Addressing Array Elements

Individual elements are referred to by placing the element's indices on the stack and stating the array's name; the element's extended address is then left on the stack. The element size and all indices are 16-bit unsigned integers, and *indices are zero-based*. This means that for a two dimensional array with 5 rows and 10 columns the rows are numbered 0 through 4 and the columns are numbered 0 through 9. To store the integer 104 at the second row (row#1) and seventh column (column#6) you can say

104 1 6 ARRAY.NAME !↓ok

and to fetch and print the value from the same location you can execute,

1 6 ARRAY.NAME @ .↓104 ok

If DEBUG is ON, indices are checked to insure that they are within their limits. Out-of-range indices cause an ABORT and an error message. For example,

5 6 ARRAY.NAME @↓Error at [] ARRA______ 5 is an out-of-range index ! ok

The error message reports the routine that detected the error (which is [], a word that calculates an address given the indices), the name of the array, the offending index, and the reason for the ABORT. Note that only the first 4 letters of the array's name were printed. This is because we assume that the user variable WIDTH equals its default value of 4. The remainder of the name is represented by the underbar character. (To improve the reporting of names in error messages, you can increase WIDTH with a simple ! command).

If DEBUG is OFF, error checking is not done, and array element addressing occurs more rapidly.

### References to Entire Arrays

An entire array can be referred to, as opposed to referring to only an individual element, by using the kernel word ' (tick) before the name. Executing ' followed by the array's name leaves the 32-bit extended parameter field address (xpfa) of the array on the stack; this address must be used whenever an entire array is treated as a single object. It is useful when you need to pass an entire array to a FORTH word which then operates on all elements of the array rather than on just a single element. References to arrays can be passed from one word to another without the called word needing prior knowledge of the array's name.

### Operations for Entire Arrays

An entire array can be initialized to zero as

' ARRAY.NAME ZERO.ARRAY↓ok

Suppose a character array is needed to hold a character string. To define and dimension a 1-dimensional array with 80 single-byte elements, execute

ARRAY: CHARACTER.ARRAY 80 1 1 ' CHARACTER.ARRAY DIMENSIONED↓ok

and then initialize it to contain ascii blanks as

' CHARACTER.ARRAY BLANK.ARRAY↓ok

The array can be initialized to contain any specified character using FILL.ARRAY. For example, to initialize an array to hold ascii zeros (whose code is 48), execute

' CHARACTER.ARRAY ASCII 0 FILL.ARRAY↓ok

Try fetching and printing the first element of the array to verify that the operation worked:

0 CHARACTER.ARRAY C@ .↓48 ok

Note that C@ was used to retrieve a single byte from the array; @ would have retrieved two successive bytes.

The word SWAP.ARRAYS expects the xpfa's of two arrays on the stack. It swaps the contents of the parameter fields of the arrays (which contain dimensioning information and an indirect pointer to the contents) without moving the contents in the heap. Thus the swap is accomplished very rapidly.

The contents of one array can be copied to another array, irrespective of the prior dimensions or contents of the destination array, using COPY.ARRAY. For example, to copy ARRAY.NAME to CHARACTER.ARRAY, execute

' ARRAY.NAME ' CHARACTER.ARRAY COPY.ARRAY↓ok

Deleting an array de-allocates its memory in heap, undimensions it by clearing its parameter field, and leaves only its name remaining. It is accomplished by executing

' CHARACTER.ARRAY DELETED↓ok

### Determining a Pre-existing Array's Dimensions

?ARRAY.SIZE returns the 32-bit number of elements (not number of bytes!) of an array:

' ARRAY.NAME ?ARRAY.SIZE D.↓50 ok

and the limits of all its indices are returned by

' ARRAY.NAME ?DIMENSIONS↓[ 4 ] \ 5 \ 10 \ 2 \ 2 ok

which are the dimensions we specified. The stack may be cleared using

SP!↓ok

### Internal Representation of Arrays

This section is provided for those interested in understanding the definitions of array defining words. You need not read or understand this section to use arrays.

An array has four parts:

- Name: Stored in the name area of the dictionary.
- Action: Performed by the FORTH word [] which is automatically called when the array's name is executed with indices on the stack; the extended address of the specified array element is placed on the stack. Error checking is done if DEBUG is ON.
- Parameter field: Holds the limits for each of the array indices, the number of dimensions, the element size, a handle to the array's storage space in the heap, and a reference to CURRENT.HEAP. The parameter field is stored in the variable area in RAM.
- Contents: Stored in the heap as a relocatable block of memory.

After an array name is created, the array dimensions (index limits) and element size are stored in the parameter field and the array elements are stored in the heap.

The contents of the parameter field are as follows:

Address | Description |
---|---|

pfa+0 | 32-bit extended handle to the array's contents in heap |

pfa+4 | addr of CURRENT.HEAP (page is same as handle's) |

pfa+6 | size in bytes of each element |

pfa+8 | number of dimensions; 255 is maximum allowed |

pfa+10 | limit of first index (#cols); for 0,1,2,...n : limit=n+1 |

pfa+12 | limit of second index (#rows) |

pfa+14 | limit of third index, etc.... |

... | |

pfa+10+2*MAX#DIMENSIONS | limit of last index |

If the handle is zero then the array is undimensioned. The element size and all indices are 16-bit unsigned integers. Indices are all zero-based. The command

ARRAY: <name>

creates a dictionary entry for <name> and initializes its dimensions to 0. It does not allocate heap space. Memory is allocated when arrays are dimensioned or redimensioned using DIMENSIONED. For example:

#index.n #index.n-1 ... #index.1 n element.size ' <name> DIMENSIONED

allocates sufficient memory from the heap for the array and stores the handle and dimensions in the parameter field of <name>. Any prior contents of the array are deleted. The command

' <name> DELETED

de-allocates memory for the array and returns its handle to the heap manager.

Note that executing an X@ from the xpfa of the array returns the extended address of the handle to the heap item; executing an X@ from this handle returns the base xaddress of the heap item. As discussed in the heap chapter, the handle remains valid as long as the array is dimensioned, but the base xaddress (the contents of the handle) may change every time the heap is compacted. The array words take care of address calculation for you so you don't have to perform explicit memory fetches to find the address of an array element. Just place the indices on the stack and state the array's name to obtain an element's address.

## Matrices

*A matrix is a two-dimensional array of floating point numbers.* Matrix defining words work analogously to array defining words except that the number of dimensions (2) and element size (4 bytes) are fixed and need not be explicitly stated. QED-Forth routines have been defined so that matrices can be added, subtracted, multiplied, transposed, edited, and inverted easily. As described in the next chapter, systems of linear equations can be solved, linear least squares curve fitting can be done, and FFTs can be performed, each with a single command. The Categorized Word List in the QED-Forth Glossary summarizes all of the available matrix functions under the headings "Matrix Dimensioning and Access", "Matrix Editing", "Matrix I/O", "Matrix Math", and "Matrix Row/Col Operations". An overview of some of the capabilities offered by this comprehensive matrix algebra library is presented here.

### Creating, Dimensioning, and Addressing Matrices

To create a matrix use MATRIX: followed by the desired name. For example, to create a matrix called MATRIX.NAME execute

MATRIX: MATRIX.NAME↓ok

This command creates the matrix but does not assign dimensions or allot memory to it. It can be dimensioned using DIMMED. For example to dimension a matrix with 5 rows and 10 columns execute

5 10 ' MATRIX.NAME DIMMED↓ok

which allocates memory from the heap and sets up the parameter field to specify the number of dimensions. At this point the matrix elements are not initialized.

The useful word ?DIM.MATRIX expects the xpfa of a dimensioned matrix on the stack, and returns the number of rows under the number of columns for the specified matrix:

' MATRIX.NAME ?DIM.MATRIX↓ok ( 2 ) \ 5 \ 10

..↓10 5 ok

If DEBUG is on, ?DIM.MATRIX aborts with an error message if the specified xpfa is not associated with a dimensioned matrix. Thus it provides error checking as well as returning the dimensions of the matrix.

Floating point matrices that consist of a single row or a single column are still assigned two dimensions. For example, a column vector with N rows is dimensioned as an Nx1 (pronounced "n-by-one") matrix, and a row vector with M columns is dimensioned as an 1xM matrix. This keeps the matrix notation standard, and allows row vectors and column vectors to be distinguished from one another; this is important for certain matrix operations.

Placing the row and column indices on the stack and invoking the name of the matrix leaves the extended address of the matrix element on the stack. The word M[], which is automatically called when the matrix name is executed, performs the address calculation. Given the element address, the standard floating point store and fetch operators F! and F@ can be used to access the element. For example, to store the floating point number 15.6 at the second row (row#1) and fifth column (column#4), execute

15.6 1 4 MATRIX.NAME F!↓ok

and to fetch and print the number execute

1 4 MATRIX.NAME F@ F.↓15.6 ok

Storing and fetching floating point values can also be accomplished with the words M[]! and M[]@, as

34.5 2 1 ' MATRIX.NAME M[]!↓ok

2 1 ' MATRIX.NAME M[]@ F.↓34.5 ok

Another useful word is [0] which converts a matrix xpfa on the stack into the address of the first element. Its advantage is that using ' MATRIX.NAME [0] is faster than using 0 0 MATRIX.NAME or 0 0 ' MATRIX.NAME M[] .

### Matrix Initialization

After being dimensioned, a matrix can be initialized to contain all zeros by executing

' MATRIX.NAME ZERO.MATRIX↓ok

or to contain random gaussian numbers by executing

' MATRIX.NAME RANDOMIZED

RANDOMIZED repeatedly calls the word RANDOM.GAUSSIAN to install in each element a random number drawn from a gaussian distribution with a mean of zero and a variance of unity.

A matrix can be initialized to be the identity matrix (unity values on the diagonals, and zero elsewhere) using IS.IDENTITY.

The word MATRIX can be used to enter a set of data values into a dimensioned matrix. For example, if we re-dimension MATRIX.NAME as a 3 by 2 (that is, a matrix with 3 rows and 2 columns) as

3 2 ' MATRIX.NAME DIMMED↓ok

we can then enter 6 data points as

MATRIX MATRIX.NAME = 1.5 2.5 3 4 5 6.45↓ok

Note that either floating point or integer values can be entered; the integer values are automatically converted to floating point numbers before being stored into the matrix.

### Matrix Display

To display the contents of a matrix, use M.. as

' MATRIX.NAME M..↓Matrix Matr___ =1.5 2.53. 4.5. 6.45ok

Note that, except for the "ok", QED-Forth's response is a valid initialization command. That is, the M.. display format is the same as that required by the QED-Forth initialization word MATRIX . The command M. displays the contents of the matrix without printing the name of the matrix.

### Copying and Swapping Matrices

A matrix can be copied to a destination matrix as

' SOURCE.MATRIX ' DESTINATION.MATRIX COPY.MATRIX

The destination matrix need not be dimensioned in advance; COPY.MATRIX dimensions it properly.

SWAP.MATRIX interchanges the contents of the parameter fields of two matrices to rapidly swap the matrices without moving their contents in heap.

### Matrix Multiplication

Matrix multiplication is performed by MATRIX* as

' FIRST.MATRIX ' SECOND.MATRIX ' DESTINATION.MATRIX MATRIX*

The above command multiplies FIRST.MATRIX by SECOND.MATRIX and places the result in DESTINATION.MATRIX. Matrix multiplication is one of the most widely used operations in data processing, and all linear algebra textbooks explain the rules that specify how two matrices are multiplied. The order of the operands is important, and the two source matrices must have compatible dimensions as specified by the multiplication rules (the number of columns in the first source matrix must equal the number of rows in the second source matrix). If DEBUG is ON, QED-Forth will ABORT if incompatible dimensions are detected, issuing an error message such as

Error at FIRST.MATRIX SECOND.MATRIX --Incompatible sizes !

The destination for the matrix multiplication may be one of the source matrices. For example, we could state

' FIRST.MATRIX ' SECOND.MATRIX XDUP MATRIX*

which duplicates the extended parameter field address of SECOND.MATRIX so that it is both a source and the destination matrix.

*Except where explicitly noted in the glossary, all matrix arithmetic words allow the destination matrix to be one of the source matrices.* The matrix operation checks to see if the destination is also a source and, if necessary, dimensions a temporary matrix in the heap to hold temporary results and deletes it before returning. The programmer doesn't have to worry about these details.

The words 2xN.MATRIX* (pronounced "2-by-N-matrix-star") and 3xN.MATRIX* perform very fast multiplications where the first source matrix is a 2x2 or 3x3. 2xN.MATRIX* multiplies a 2x2 matrix (i.e., a matrix having 2 rows and 2 columns) by a 2xN matrix to produce a 2xN result. 3xN.MATRIX* multiplies a 3x3 matrix by an 3xN matrix to produce a 3xN result.

For example, if FIRST.MATRIX is dimensioned as a 2x2, and SECOND.MATRIX is dimensioned as a 2x6, then the command

' FIRST.MATRIX ' SECOND.MATRIX 2xN.MATRIX*

performs the multiplication and leaves the results in SECOND.MATRIX. To maximize speed, no error checking of dimensions is performed, so be certain that the matrices are properly dimensioned before using these words. A 2x2 matrix multiplication is performed in just 5.3 msec, and a 3x3 takes only 17 msec.

### Matrix Arithmetic

Matrices can be added using MATRIX+ as

' FIRST.MATRIX ' SECOND.MATRIX ' DESTINATION.MATRIX MATRIX+

Likewise, they can be subtracted using MATRIX-, and element-by-element division and multiplication are performed by MATRIX.ELEMENT/ and MATRIX.ELEMENT* (not to be confused with the standard matrix multiplication MATRIX*).

MATRIX.MAX and MATRIX.MIN place the element-by-element maximum and minimum, respectively, of two source matrices in the specified destination matrix.

MATRIX.SUM leaves the floating point sum of all the matrix elements on the stack, and MATRIX.VARIANCE leaves a floating point number equal to the sum of the squares of the matrix elements.

In the glossary you'll find additional words to perform other arithmetic operations, to transpose matrices, to do mixed scalar/matrix computations and to copy or delete matrices. Just as for arrays, matrices should be deleted before being forgotten so that their heap space is deallocated. (The kernel word ON.FORGET described in the "Program Development Techniques" chapter is useful in this regard).

### Scalar/Matrix Arithmetic

S*MATRIX (the name suggests "scalar times matrix") multiplies each element in the matrix by a specified floating point scalar. For example, to double each element in FIRST.MATRIX and put the result in DESTINATION.MATRIX, execute

2.0 ' FIRST.MATRIX ' DESTINATION.MATRIX S*MATRIX

S+MATRIX adds the scalar to each element, S-MATRIX subtracts each element from the scalar, and S/MATRIX divides the scalar by each element. As usual, the source matrix can equal the destination matrix.

### Applying a Functional Transformation to a Matrix

A unary operation can be applied to each element of a matrix using the powerful word TRANSFORM.MATRIX. A unary operation transforms a single floating point input to produce a single floating point output. For example, the squaring function is unary. To use TRANSFORM.MATRIX, the unary operation must be performed by a single FORTH word. For example, we can define

: F^2 ( r -- r^2 ) FDUP F* ;

Then, to square each element in FIRST.MATRIX and place the results in DESTINATION.MATRIX, we can execute

' FIRST.MATRIX ' DESTINATION.MATRIX CFA.FOR Fˆ2 TRANSFORM.MATRIX

The code field address (cfa) is the address of the first byte of a word's definition. The command CFA.FOR Fˆ2 leaves the extended cfa of the unary function on the stack, and TRANSFORM.MATRIX uses this to apply the function to each element of the matrix.

### Finding the Inverse of a Matrix

The inverse of any matrix can be computed by executing the kernel word INVERTED which expects references to source and destination matrices on the stack. INVERTED uses an LU (lower-upper) decomposition scheme rather than Gaussian Elimination to minimize round-off error propagation for large matrices.

### Editing Matrices

A set of matrix editing words is available. To concatenate the columns of two source matrices to form a destination matrix, you can state

' FIRST.MATRIX ' SECOND.MATRIX ' DESTINATION.MATRIX

COLUMN.CONCATENATE

The number of rows in the two source matrices must be the same; otherwise, an ABORT will be executed and an "incompatible sizes" error message will be printed.

Similarly, ROW.CONCATENATE concatenates the rows of two source matrices to form a destination matrix.

ROW/COL.DELETED eliminates a specified row or column from a matrix, and ROW/COL.INSERTED inserts a specified row or column into a matrix. COLUMN.TRUNCATE eliminates a specified number of columns from the end of a matrix. SELECT.COLUMNS accepts a list of column indices, properly dimensions a destination matrix, and copies the specified columns from the source to the destination. Consult the glossary for stack pictures and detailed descriptions of these editing commands.

## Fast Vector Arithmetic

QED-Forth defines a "vector" as a collection of floating point numbers in memory, either stored contiguously or with elements evenly separated in memory. A vector is specified by a base xaddress, a separation, and the double number of elements in the vector. Separation is expressed as a multiple of the size of a floating point number (4 bytes), and the maximum separation is 16,383 (which means that the maximum separation between successive elements is 65,532 bytes). The base address must be an even multiple of 4 bytes; this constraint is imposed because it improves the speed of the vector operations. Matrix dimensioning words and the heap manager automatically perform this address alignment. The maximum number of elements in a vector is limited only by available memory. A vector may cross one or more page boundaries.

For example, a matrix can be represented as a vector. The base xaddress is the extended address of the first matrix element in the heap, the separation is 1 because the floating point matrix elements are stored contiguously in memory, and the number of elements is simply the number of rows times the number of columns, expressed as a 32-bit double number. A single row of a matrix could be represented as a vector with a non-unity separation. (For experts: Matrices are stored by column, so the base xaddress of the row vector is the xaddress of the first element in the row, the separation equals the number of rows, and the number of elements equals the number of columns expressed as a double number; see the glossary entries for ROW→ V and COL→ V). Thus the vector representation can be used to describe a variety of floating point data structures.

Performing mathematical operations on a vector is faster than performing operations on an equivalent matrix using standard matrix addressing. The calculation of a matrix address given a row and column requires a time-consuming multiplication, while stepping through a vector can be accomplished using fast addition.

The floating point package includes routines to perform fast vector arithmetic, comparison, scaling, dot product, swapping and copying. A particularly powerful word is 2V.TRANSFORM, which allows you to perform a specified floating point transformation that combines two source vectors and places the result in a destination vector. Please consult the QED-Forth Glossary for descriptions of all of the vector words; they are listed in the "Vector" section of the Categorized Word List.

The majority of the matrix transformation words described in the previous section are coded using the fast vector routines. Using vector arithmetic can greatly speed up many common matrix operations should you choose to write your own matrix words. See for example the second definition of TRANSPOSED presented later in this chapter.

## Matrix Row/Column Operations

A full set of operators for matrix rows and columns is included in the dictionary. These are based on the fast vector arithmetic words, and allow portions of matrices to be transformed.

For example, to sum the elements in column 2 of FIRST.MATRIX and print the result, we could execute

-1 2 ' FIRST.MATRIX ROW/COL.SUM F.

The first two numbers specify the row or column. A -1 as the row index indicates that we are not specifying a row; the 2 specifies the column index. Then the extended pfa of the matrix is placed on the stack using ' (tick), and ROW/COL.SUM places the sum on the stack and F. prints it. If we want to sum the elements in row 3, we can execute

3 -1 ' FIRST.MATRIX ROW/COL.SUM F.

where 3 is the row number and -1 indicates that a column is not being specified. All of the ROW/COL words expect on the stack a pair of numbers, one equal to -1, and the other equal to a valid row or column number.

ROW/COL words can operate on a specified row or column to do scalar/vector arithmetic, add, subtract, multiply, or divide rows and columns by one another, copy and swap rows and columns, and perform comparisons, dot products and unary transformations. Consult the "Matrix Row/Col Operations" section of the Categorized Word List in the QED-Forth Glossary for a complete list of the available functions.

## Constant Arrays and Matrices

Arrays and matrices have their parameter fields in the variable area and their contents in the heap. Because both of these areas are in RAM, the dimensions and the contents can be modified at run time.

When defining look-up tables and other data structures that hold unchanging data, it is useful to put the parameter field as well as the contents in the definitions area which may be write-protected when the application is finished. This sets up an array or matrix of constants which cannot be modified or corrupted at run time.

The kernel words DIM.CONSTANT.ARRAY: and DIM.CONSTANT.MATRIX: create and dimension an array or matrix, respectively, in the definitions area. The dimensioning is done at the time of creation so that the proper amount of space is ALLOTed in the dictionary. A constant array or matrix should not be redimensioned, as this could corrupt the dictionary.

For example to dimension a constant array named LOOKUP to have 1 dimension with 5 elements and 2 bytes per element, execute

5 1 2 DIM.CONSTANT.ARRAY: LOOKUP↓ok

To define and dimension a constant matrix with 6 rows and 4 columns, execute

6 4 DIM.CONSTANT.MATRIX: LOOKUP.MATRIX↓ok

Constant arrays and matrices behave just like standard arrays and matrices except that their dimensions and contents are not modifiable at run time. A constant array or matrix may cross one or more page boundaries, and so may advance the dictionary pointer DP to another page. DIM.CONSTANT.MATRIX: automatically aligns the base address of the matrix to be an even multiple of 4 bytes; this ensures that the constant matrix can be operated upon by all of the matrix library functions as well as the fast vector operations.

To obtain an element address, simply place indices on the stack and state the name of the array or matrix. Use ' to refer to the data structure as a whole. Constant arrays and matrices must be initialized before the dictionary is write-protected.

## An Example: Finding the Transpose of a Matrix

The transpose AT of a matrix A is the matrix whose rows are the columns of A and whose columns are the rows of A. The following definition of #1.TRANSPOSED finds the transpose of any input matrix, under the condition that the destination must be different from the source matrix. This definition is presented as an example of the usage of the matrix manipulation words.

A second example of a transposition word is then presented to demonstrate the use of fast vector words to speed execution. Then, in Chapter 9 (Designing Re-entrant Code), a third version is presented that allows the destination matrix to equal the source by dimensioning a temporary matrix while preserving the re-entrancy of the code for multitasking.

: #1.TRANSPOSED ( src.xpfa\dest.xpfa -- ) \ places the transpose of the source in the destination; \ source and destination must be different matrices XOVER ?DIM.MATRIX ( src.xpfa\dest.xpfa\#src.rows\#src.cols -- ) LOCALS{ &#cols &#rows x&dest.pfa x&src.pfa } &#cols &#rows x&dest.pfa DIMMED \ dimension dest, reversing rows, cols &#rows 0 DO \ for each source row &#cols 0 DO \ for each source column J I x&src.pfa M[] F@ \ fetch source element... I J x&dest.pfa M[] F! \ ...and store in destination element LOOP LOOP ;

#1.TRANSPOSED takes references to the source matrix and the destination matrix from the stack. It finds the dimensions of the source matrix, and loads the extended parameter field addresses of the source and destination as well as the number of rows and columns into local variables. Using local variables minimizes stack manipulations and makes the code easier to read. The number of rows and columns in the source matrix are swapped and used to dimension the destination matrix. A pair of nested DO LOOPs is used to fetch each element of the source array and copy it to the appropriate location in the destination array.

Note how the outer loop index J and the inner loop index I are used to specify the row and column numbers. Because TRANSPOSED doesn't know the names of the passed matrices, it can not determine matrix element addresses just by invoking the matrix name with indices on the stack. Instead, it uses the word M[] which expects a matrix xpfa on the stack over the indices and returns the element address.

Several concatenations are available which could make this word slightly more efficient. The kernel word J\I (pronounced "J under I") is a short form for J I, and I\J is a concatenation for I J. The word M[]@ is a short form for M[] F@ . It calculates a matrix address given the indices and pfa, and fetches the floating point contents. Similarly, M[]! ( a concatenation of M[] F! ) stores a floating point number to the specified matrix address.

### A Faster Matrix Transpose Routine

In #1.TRANSPOSED, each element is copied to the destination matrix one at a time. Transposition can be accomplished must more rapidly, however, by using the fast vector arithmetic word VECTOR.COPY to copy an entire column of the source to a row of the destination. The following definition illustrates how this is accomplished:

: #2.TRANSPOSED ( src.xpfa\dest.xpfa -- ) \ places the transpose of the source in the destination; \ source and destination must be different matrices XOVER ?DIM.MATRIX ( src.xpfa\dest.xpfa\#src.rows\#src.cols -- ) LOCALS{ &#cols &#rows x&dest.pfa x&src.pfa } &#cols &#rows x&dest.pfa DIMMED \ dimension dest, reversing rows, cols &#cols 0 DO \ for each column in source I x&src.pfa COL->V \ convert source column to vector 2DROP \ drop d.#el I x&dest.pfa ROW->V \ convert destination row to vector V.COPY \ copy source col to destination row LOOP ;

The first three lines of this definition are similar to the previous one. But #2.TRANSPOSED only needs a single loop, one for each column in the source. Inside the loop the stack picture needed by V.COPY (vector-copy) is built up. V.COPY copies a source vector (in this case, a column in the source matrix) to a destination vector (a row in the destination matrix). COL→ V converts the specified column number and source matrix xpfa to a vector specification. The number of elements is not needed (it equals the number of elements in the destination vector) and is dropped. ROW→ V converts the specified row in the destination to a vector specification, and V.COPY copies the source column into the destination row. V.COPY rapidly copies an entire column at a time, and performs significantly faster than the first version.

## Benchmarks

The BENCHMARK: word described in the floating point chapter shows that #1TRANSPOSED transposes a 40x40 matrix in 1.84 seconds, while #2TRANSPOSED requires only 295 msec. Thus using the fast vector words to code the transposition results in more than a 6-to-1 speed improvement.

QED-Forth performs a 10x10 matrix inversion in 0.8 sec, and a 3x3 matrix multiplication in only 17 msec. A 10x10 matrix multiply performs 2,000 floating point array element fetches, 1,000 floating point multiplies, 1,000 floating point additions, and 100 floating point array element stores, yet it only takes 0.56 seconds.

With an 8 MHz crystal on the QED Board, the fast vector operations used in a program can operate at a sustained rate of over 3500 floating point operations per second (FLOPS) for an even mix of floating point additions and multiplications (division is even faster). The "raw speed" of the floating point operations is over 6000 FLOPS, but the overhead associated with accessing matrices in the heap and pushing and popping floating point numbers on the data stack results in a sustainable speed of 3500 FLOPS. Of course, these speeds double if the board is operated at a 16 MHz crystal frequency. This is an impressive computational speed for a microcontroller.