Chapter 5 - Using Heap Memory
Like many workstations but unlike most microcontroller systems, QED-Forth provides a heap memory management system. QED-Forth's advanced array and matrix math routines make extensive use of the heap memory manager, but they do it transparently. To use these features you need only a cursory familiarity with the information in this chapter. On the other hand, if you want to create your own sophisticated dynamically allocated data structures, you should read this chapter carefully.
A heap is a pool of memory from which smaller blocks of memory of variable size are allocated. The heap manager allocates blocks of memory requested by the user's program from this pool. When the program is finished with the block of memory, it can return it to the heap. This results in efficient use of memory, especially for data structures which can vary in size and for temporary data structures which are used and then de-allocated. The de-allocated memory becomes available for use by other segments of the program. QED-Forth can maintain any number of heaps simultaneously, each containing numerous data structures.
Those interested in a the implementation of a heap manager may consult the final chapter in Object Oriented Forth – Implementation of Data Structures, by Dick Pountain, Academic Press, 1987. QED-Forth's implementation of the heap manager is similar to that proposed by Pountain. It is also similar to that used by the Apple Macintosh computer as described in Apple's Inside Macintosh publications.
Heap Compacting
The heap manager works by allocating blocks of memory beginning at the designated starting address of the heap. When a heap item is de-allocated and returned to the heap, its memory is free for use by other programs. A simple-minded heap manager which simply keeps track of which areas of the heap are used and which are free encounters the problem of heap fragmentation. For example, if heap items #1, #2, and #3 are sequentially allotted memory, they will occupy blocks of memory starting at the bottom of the heap. Then if item #2 is de-allocated, the heap's free memory will consist of two disjoint sections: the memory that was occupied by item #2, and the memory above item #3. As this process of allocation and de-allocation continues, the heap gets fragmented into smaller available pieces so that less of the heap consists of contiguous memory. This makes it difficult or impossible for the heap manager to respond to requests for large blocks of contiguous memory.
To solve this problem, the QED-Forth heap manager compacts the heap each time a heap item is de-allocated. This involves rearranging the blocks in the heap so that all of the free heap memory is available in one contiguous block above the allocated heap items.
Handles to Heap Items
The problem then becomes one of keeping track of where the items are in the heap. The process of compaction moves the heap items so that the base addresses of the heap items change. But the program that requests a heap allocation must have a reliable address that can be used to refer to the heap item. The heap manager solves this problem by adding a level of indirection in the addressing. Instead of informing the requesting program of the actual base address of the heap item when it is allocated, the heap manager returns to the requesting program a "handle" (also called an "xhandle") which is an extended address that contains the extended base address of the heap item. As long as a particular heap item is allocated, its handle does not change and can be used by the requesting program. The heap manager changes the handle's contents appropriately as the heap is compacted, and the requesting program can fetch the contents of the handle at any time to obtain a valid base xaddress for the heap item.
Thus the problems of heap fragmentation and the changing of base addresses are solved by compacting the heap and using handles instead of direct addresses.
Heap Size
In QED-Forth, the maximum size of a heap is limited only by the amount of available contiguous RAM. A heap can flow over page boundaries. Likewise, the size of a data structure in the heap is limited only by the available memory in the heap.
The only heap limitation is that the list of handles which is maintained near the top of the heap must be on the last page of the heap, and the handle list cannot flow over a page boundary. Because each handle requires only 4 bytes, this poses very little limitation for most practical applications.
The following implementation details clarify how the heap works. The next chapter discusses how array and matrix data structures use the heap to facilitate dynamic dimensioning and memory allocation.
Initializing the Heap
To create a heap, specify a starting extended address and an ending extended address and execute IS.HEAP. For example, let's designate the 20 Kbyte RAM area from 3000H to 7FFFH on page 15 as the heap by executing
HEX 3000 F 7FFF F IS.HEAP↓ ok
This command sets the user variable CURRENT.HEAP in the user area to 7FFF\FH, and initializes the heap management variables on the heap page to indicate that START.HEAP is at 3000\FH and there are no allocated heap items. The hexadecimal number base is more convenient than decimal for specifying memory map address locations, so it will be used for the remainder of the chapter.
The user variable called CURRENT.HEAP contains an extended address that specifies the address and page of the top byte + 1 in the heap. The heap allocation begins in low memory at the extended address pointed to by START.HEAP; a variable named HEAP.PTR holds the extended address of the next byte available for allocation. The contents of START.HEAP and HEAP.PTR as well as 2 other variables that manage the heap (HANDLE.PTR and FREE.HANDLE) are kept in the heap area, just below the specified end of the heap. The handles are kept below these variables in the heap.
The kernel word ROOM calculates the amount of space available in the current heap and returns it as a 32-bit number on the stack:
ROOM D.↓ 4FE7 ok
which is nearly 20 Kbytes, as expected. When all available heap memory has been allocated and no more items can be added to the heap, ROOM returns 0\0.
Allocating Heap Items
To request a block of 1FEH bytes from the heap execute
DIN 1FE FROM.HEAP↓ ok [ 2 ] \ 7FEF \ F
which expects a double number size in bytes and returns the extended address of the handle for the heap item. A non-zero handle indicates that the heap manager successfully allocated the memory. The requesting program should save this handle because its contents contain the base address needed to access the data in the heap item. It can be saved in a self-fetching variable whose name corresponds to the function of the heap item. For example,
XADDR: 1ST.DATA.BLOCK↓ ok [ 2 ] \ 7FEF \ F
TO 1ST.DATA.BLOCK↓ ok \ save the handle in the variable
A handle of 0\0 is returned if there is not enough memory in the heap to allocate the item. None of the heap manager words ABORT if an error is detected; rather, they signal the error by passing a zero handle or an error flag to the calling program. It is the calling program's responsibility to take the appropriate response in case of an error.
If the requested size passed to FROM.HEAP is not an even multiple of 4 bytes, the heap manager rounds the size up to the next 4-byte multiple. In this case, the size is rounded up to 200H which is 2 bytes more than the requested size. In addition, the heap manager ensures that the base address of each heap item is an even multiple of 4 bytes. These steps speed heap compaction and also assure proper operation of the fast vector and matrix arithmetic functions which require all elements to be aligned on 4-byte boundaries.
To obtain the base address of the heap item, fetch the extended base address from the handle:
1ST.DATA.BLOCK X@↓ ok ( 2 ) \ 3004 \ F
SP!↓ ok \ clear the data stack
The contents, 3004H on page FH, specify the base address of the heap item at this time. The heap manager saves the 32-bit size of the heap item in the four bytes below the base address. The word ?HANDLE.SIZE expects the extended handle address on the stack and returns the 32-bit size of the heap item:
1ST.DATA.BLOCK ?HANDLE.SIZE D.↓ 200 ok
which prints 200H as expected.
The word .HANDLES prints the status of the current heap:
.HANDLES↓
Size Addr Handle
200 3004\ F 7FEF\ F
ok
This shows that one handle (7FEF\FH) has been allocated with current base address 3004\FH and size 200H. .HANDLES always prints in hexadecimal base, regardless of the current number conversion base.
Another heap item can be allocated as
XADDR: 2ND.DATA.BLOCK↓ ok
DIN 400 FROM.HEAP↓ ok [ 2 ] \ 7FEB \ F
TO 2ND.DATA.BLOCK↓ ok
The heap manager returns a handle 7FEB\FH and again the allocation is successful.
De-allocating Heap Items
To de-allocate the first heap item heap item execute
1ST.DATA.BLOCK TO.HEAP .↓ FFFF ok
TO.HEAP expects a handle xaddress on the stack, and returns a success flag. A true flag indicates that the handle is valid and that the heap item has been successfully de-allocated, and a false flag indicates that the handle was invalid and no action was taken. Execute .HANDLES to see the effect of the de-allocation:
.HANDLES↓
Size Addr Handle
400 3004\ F 7FEB\ F
* 0\ 0 7FEF\ F
ok
The handle of 1ST.DATA.BLOCK that was returned to the heap is shown with an asterisk to indicate that it is not in use (because it has been de-allocated) and that the listed base address and size are not significant. The handle of 2ND.DATA.BLOCK has not changed, but its base address has been changed to 3004\FH. It was moved to the bottom of the heap during the heap compaction that occurred when 1ST.DATA.BLOCK was de-allocated. The heap manager moved the memory and adjusted the contents of the handle to implement the heap compaction. All of the available memory is maintained as one contiguous block above the last allocated heap item.
Maximizing the Efficiency of Heap Compaction
Heap compaction occurs each time an item is returned to the heap. All of the bytes above the de-allocated item are moved down in memory. This move requires 17 microseconds per byte, or 17 milliseconds per Kbyte. Notice, however, that compacting the heap to de-allocate the last item allocated is accomplished by simply adjusting the heap pointer; there is no need to move a block of memory in the heap. When possible, programs should de-allocate heap items in the proper order so that the last item allocated is the first item de-allocated; this maximizes execution speed. For example, when allocating several temporary matrices to hold the intermediate results of a calculation, de-allocating them in the proper order improves performance.
Resizing and Copying Heap Items
RESIZE.HANDLE expects a valid 32-bit handle under a 32-bit number of bytes on the stack, and tries to resize the heap item to the specified size, preserving as much of the item's data as possible. The heap must have enough room to copy the heap item. A flag is returned to indicate whether the resizing was successful.
DUP.HEAP.ITEM expects a handle xaddress on the stack, and creates a copy of the specified item in the current heap. The copy's handle is returned if the duplication was successful; otherwise 0\0 is returned. For example, to duplicate 2ND.DATA.BLOCK, execute
XADDR: 2ND.COPY↓ ok \ create a name for the new heap item
2ND.DATA.BLOCK DUP.HEAP.ITEM↓ ok [ 2 ] \ 7FEF \ F
TO 2ND.COPY↓ ok \ save the handle
To verify that the heap item has been duplicated, execute .HANDLES
.HANDLES↓
Size Addr Handle
400 3004\ F 7FEB\ F
400 3408\ F 7FEF\ F
ok
The heap manager has re-used the handle from the recently de-allocated word 1ST.DATA.BLOCK and assigned it to the newly allocated item.
Multiple Heaps
In a timesliced multitasking environment where several tasks are using heap memory, each task must have its own separate heap. If tasks share a single heap, an interrupting task could compact the heap and change the contents of a handle that is about to be used by another task. Thus multitasking environments may require multiple heaps. There may be other cases where it is advantageous for a single task to have multiple heaps.
To use multiple heaps, the programmer must manage the user variable CURRENT.HEAP. To see how this works, let's set up a second heap with a size of 8 Kbytes that starts on page 6 at address 7000H and ends on page 7 at 1000H. Check to be sure that a 128K RAM is installed in the RAM/ROM socket of your board so that this memory area is available, and turn DIP switch #2 OFF to make sure that pages 6 and 7 are not write protected. Then execute:
7000 6 1000 7 IS.HEAP↓ ok \ set up new heap
XADDR: 1ST.HEAP↓ ok
XADDR: 2ND.HEAP↓ ok \ define save variables for CURRENT.HEAP
7FFF F TO 1ST.HEAP↓ ok
1000 7 TO 2ND.HEAP↓ ok \ initialize the save variables
The self-fetching variables 1ST.HEAP and 2ND.HEAP hold the values of CURRENT.HEAP that specify the two heaps. Recall that the contents of the 32-bit user variable CURRENT.HEAP specifies the heap in which heap items are to be allocated and de-allocated. Thus manipulating the single extended address in CURRENT.HEAP enables multiple heaps to be managed. For example, to make the heap on page F the current heap, execute
1ST.HEAP CURRENT.HEAP X!
and to make the heap on pages 6 and 7 the current heap, execute
2ND.HEAP CURRENT.HEAP X!
Transferring Heap Items
TRANSFER.HEAP.ITEM copies a heap item into any specified heap on any page. For example, to transfer a heap item from the 1ST.HEAP into 2ND.HEAP, put the handle of the source heap item and the destination CURRENT.HEAP on the stack and execute TRANSFER.HEAP.ITEM as
XADDR: 3RD.DATA.BLOCK↓ ok
2ND.DATA.BLOCK 2ND.HEAP TRANSFER.HEAP.ITEM↓ ok [ 2 ] \ FF0 \ 7
TO 3RD.DATA.BLOCK↓ ok \ save the new handle
The new item resides in 2ND.HEAP and its handle is saved in 3RD.DATA.BLOCK. The transfer was successful; otherwise, TRANSFER.HEAP.ITEM would have returned a handle of 0\0 .
To verify the status of 2ND.HEAP, execute
2ND.HEAP END.HEAP X! \ set current heap
.HANDLES↓
Size Addr Handle
400 7004\ 6 FF0\ 7
ok
which shows that the heap item was copied to 2ND.HEAP .
Heap-Based Data Structures
Data structures that are maintained in the heap are usually created by "defining words" such as ARRAY: and MATRIX:. These defining words create and initialize a parameter field in the variable area, as described in the next chapter. The contents of the parameter field are filled in when the data structure is dimensioned or allocated. When the data structure is dimensioned it is assigned to the heap specified by CURRENT.HEAP. At this same time the dimensioning information (e.g., number of rows and columns), the handle (which contains the base xaddress of the data) and the value of CURRENT.HEAP are saved in the parameter field associated with the data structure. Based on the information in the parameter field, the memory manager can resize, copy, or de-allocate the heap item irrespective of the current contents of the user variable CURRENT.HEAP.
For example, the word DELETED which de-allocates arrays and matrices automatically saves CURRENT.HEAP, looks at the parameter field of the item to be deleted, puts its heap specification into the user variable CURRENT.HEAP, executes TO.HEAP to de-allocate the item, and restores the original value of CURRENT.HEAP. Thus arrays and matrices in any heap can be deleted. Other array and matrix operations perform similar user-transparent manipulations of CURRENT.HEAP to make the words powerful and remove book-keeping chores from the programmer.
Because both the heap and the variable area where the parameter field is stored are in modifiable memory (RAM), the data structure can be allocated, re-dimensioned, or de-allocated "on the fly" as the program executes. This is a powerful programming capability. It is the basis for the comprehensive matrix mathematics package which is discussed in detail in the next chapter.
