Using the Forth Assembler
During software development using the Forth high-level language, it is often useful to optimize time-critical functions using low-level 68HC11 or 9S12 (HCS12) assembly code. QED-Forth's in-line assembler enables you to compile machine code by typing the Motorola assembler mnemonics to specify the desired instructions. High level FORTH-like control structures allow you to easily implement testing, branching, and looping constructs in assembly code.
For most applications you don't need to use the assembler to write high performance code. Forth compiles fast subroutine threaded code, and most kernel words are already optimized in assembler code. But for certain critical applications, especially in interrupt service routines, assembly coding can improve performance.
QED-Forth's assembler is fully integrated with the rest of the integrated development environment (IDE), and it is easy to switch back and forth between high level and assembly code. Because QED-Forth is a subroutine threaded language, the compiled definition of each high level word consists of executable machine code. QED-Forth's assembler also compiles executable machine code into the dictionary. Since high level and assembly code are indistinguishable in the dictionary, assembly code can be inserted "in-line into a high level definition, giving great programming flexibility.
Debugging assembly coded routines
Assembly coded routines can be debugged using all of the built-in debugging tools. You can install break points, single step through code routines, and view the contents of the programming registers as each instruction executes. The Debugging Programs chapter of the PDQ Board Users Guide describes the debugging C programs in detail, and a brief summary is presented at the end of this chapter.
Using control structures
The assembler operates in a single pass and, unlike traditional assemblers, does not use labels. Rather, loops and branches are specified using control structures patterned after those of FORTH. The mnemonics for these control structures end with a comma to suggest that they are compiling code into the dictionary and to distinguish them from their high level FORTH analogs. The following control structures are implemented:
- IF, … ENDIF, ( or THEN, )
- IF, … ELSE, … ENDIF, ( or THEN, )
- BEGIN, … UNTIL,
- BEGIN, … AGAIN,
- BEGIN, … WHILE, … REPEAT,
In addition, each assembly subroutine defined with the CODE command has a name specified by the programmer when the subroutine is defined. The name allows a subroutine to be called by subsequently defined routines. This procedure results in highly structured label-less source code for easy readability.
Unlike the C in-line assembler, QED-Forth's assembler uses a FORTH-like post-fix style mnemonic sequence: the operand is stated first, followed by the address mode and then the instruction. The assembler works while the QED-Forth interpreter is in execution mode (not in compilation mode). Operands and address modes leave numeric values on the stack, and the instruction mnemonic uses these values to compile the proper opcode sequence into the dictionary.
It is important to understand that execution of the assembly source code mnemonics does not execute the assembly instruction; rather, it causes compilation of the proper opcode sequences into the dictionary. The examples presented in the following text will make this clear.
We don't attempt here to describe the assembly commands of the 68HC11. Motorola's manuals give a complete description of each assembly command and its mnemonic representation. Each Motorola mnemonic has a corresponding entry (with the same spelling) in the ASSEMBLER vocabulary in QED-Forth's dictionary. Consult the QED-Forth Assembler Glossary for definitions of all of the assembler mnemonics and directives.
Creating an assembly routine definition
Assembly coded definitions are created using the words CODE and END.CODE (or its synonym END-CODE). This creates a routine that can be called from QED-Forth. The definition of CODE is
Create a header in the name area for the word following CODE, and "smudge" the header so it cannot be found in the dictionary until END.CODE executes. Remain in the execution mode.
Execute ASSEMBLERto make the assembler vocabulary the context (search) vocabulary so that the assembler mnemonics can be found in the dictionary.
After CODE opens the definition and creates a header, assembly instructions are entered to form the body of the definition. Some instructions consist of a single mnemonic; for example, the instruction ABA tells the processor to add the contents of accumulator B to accumulator A and place the result in A. No operand or address mode is needed to specify this instruction. Other instructions require an operand and an addressing mode as well as the instruction mnemonic. For example, to compile code to load register X with the quantity 0x1234, execute (assuming that QED-Forth is in hexadecimal base),
1234 IMM LDX
which specifies 0x1234 as the operand, IMMediate as the addressing mode, and LDX (load register X) as the instruction. When this set of mnemonics is executed, the 3-byte sequence
CE 12 34 is compiled into the dictionary. When later executed, these bytes instruct the processor to perform the desired operation of loading the value 0x1234 into register X.
The addressing modes are assigned the following mnemonics:
|Address mode mnemonic||Description|
|IMM|| immediate |
The actual operand is specified in the instruction.
|REL|| relative |
The operand is a signed relative offset byte between -128 and +127 that specifies a branch destination.
|DIR|| direct |
The operand resides at an address whose most significant byte is
|EXT|| extended |
The operand resides at the address specified in the instruction.
|IND,X|| indexed, register X |
The operand resides at the address calculated by adding the single-byte positive offset specified in the instruction to the contents of the X register.
|IND,Y|| indexed, register Y |
The operand resides at the address calculated by adding the single-byte positive offset specified in the instruction to the contents of the Y register.
|INH|| inherent |
The instruction does not need an operand. Specification of this mode is optional.
The inherent addressing mode is not used; the assembler knows which words are inherent (i.e., do not need a specified address mode). nevertheless, the INH addressing mode keyword is defined as a do-nothing command in case you want to use it to clarify your source code.
Please consult the Motorola 68HC11 Manual for a detailed discussion of the meanings of these address modes.
The last instruction in an assembly coded definition is usually an RTS (return from subroutine) which is the equivalent of ; in high level FORTH. END.CODE completes the definition by executing FORTH to return to the standard vocabulary and "smudging" the header created by CODE so that the name of the definition can be found in the dictionary. END.CODE also prints an error message if excess items were placed on or removed from the data stack during the course of the code definition.
QED-Forth register usage
The 68HC11 has three 16-bit registers available for programming use:
- Index register X,
- Index register Y, and,
- the D register which is made up of 2 8-bit accumulators:
- A (most significant byte) and
- B (least significant byte)
There is also a return stack pointer register (S), a condition code register (CCR), and a program counter (PC); these manage the execution of the instructions. All mathematical operations use one or both of the accumulators that comprise the D register. The X and Y registers may be used as index registers which point to memory locations whose contents then become the operands of machine instructions. The programming registers are fully explained in the 68HC11 manuals.
Your assembly coded routines may freely use the D and X registers. Their contents need not be preserved, as QED-Forth uses them only as scratchpad registers.
The contents of the Y register must be handled with care, however. QED-Forth uses the Y register as the data stack pointer. Assembly coded words must not corrupt Y by leaving it in an arbitrary state. If an assembly routine does not need to access the data stack and needs to use the Y register to hold temporary data, it should save the data stack pointer on the return stack with a PSHY instruction, use the Y register as necessary, and restore the data stack pointer with a PULY instruction before exiting the routine. This may cause hard-to-diagnose problems, however, if the assembly coded routine is interrupted after Y is modified; the interrupt routine may assume that Y is a valid data stack pointer. In addition, note that assembly code routines that do not use Y as a data stack pointer may be difficult to debug using the QED-Forth trace and debugging tools. In summary, it is recommended that assembly routines preserve the Y register's role as the data stack pointer.
To ensure that code is fully pre-emptable and will not cause multitasking failures, follow these rules when putting items on or taking them off the data stack: When putting an item on the data stack, decrement the stack pointer first (e.g., using the DEY instruction) and then put the value on the stack. The following code fragment puts the value 1234 on the data stack:
1234 IMM LDD \ put value in D DEY DEY \ decrement data stack pointer to make room on stack 0 IND,Y STD \ then put the value on data stack
When taking an item off the stack, move the value into another register or storage location before incrementing the Y register. This code fragment correctly removes an item from the stack and puts it in the X register:
0 IND,Y LDX \ put top stack item in X register INY INY \ then increment stack pointer to drop the item
The following assembly coded routine duplicates an item on the stack and then adds 8 to the top stack item. Executing these instructions causes the definition to be compiled into the dictionary:
HEX CODE DUP.8+ ( n -- n\n+8 ) 0 IND,Y LDD \ put top stack item into D 8 IMM ADDD \ add 8 to contents of D DEY DEY \ make room on stack for new item 0 IND,Y STD \ put incremented item on data stack RTS \ return END.CODE
CODE and END.CODE delimit a set of assembly mnemonics that cause the compilation of assembly code into the dictionary. CODE creates a header for
DUP.8+ in the dictionary. Executing
DUP.8+ from the terminal causes the word to be executed, while using it inside a colon definition causes it to be compiled. In short, it behaves just like a FORTH word.
Performing in-line assembly within high level definitions
It is easy to assemble in-line code into a definition. From a high level definition, >ASSM invokes the ASSEMBLER vocabulary and enters execution mode, enabling assembler mnemonics to be entered into the definition. >FORTH returns to the high level definition, executing FORTH to restore the original vocabulary and returning to compilation mode. For example, the following high level definition uses the in-line assembler to code arithmetic operations, while the I/O operations are handled by FORTH commands:
: SUM4 ( n1\n2\n3\n4 -- | prints sum of n1+n2+n3+n4 ) >ASSM \ invoke assembler 3 IMM LDX \ X register is loop counter; loop 3 times 0 IND,Y LDD \ init D register to n4 (Y is the stack pointer) BEGIN, \ start the loop INY INY \ DROP and point to next data stack item 0 IND,Y ADDD \ add top item to accumulator D DEX \ decrement counter EQ UNTIL, \ loop until counter = 0 0 IND,Y STD ( -- sum ) \ put answer on stack >FORTH \ go back to high level CR ." The sum is " . CR ; \ print sum
The use of the BEGIN,
<condition.code> UNTIL, construction is explained later ([#Testing, branching, and looping structures]).
Calling high level Forth words from within assembly code definitions
A single word or a set of previously defined words can be called from an assembler definition by simply stating
>FORTH <routine's.name> <another.routine's.name> ... >ASSM
The routines could be kernel words, words defined in high level FORTH by the programmer, or even words previously defined in assembly using CODE. For example, the following word calculates a sum in assembler code and then calls high level Forth words to print the result.
CODE SUM4&PRINT ( n1\n2\n3\n4 -- | prints sum of n1+n2+n3+n4 ) 3 IMM LDX \ X register is loop counter 0 IND,Y LDD \ init D register to n4 BEGIN, \ start the loop INY INY \ point to next data stack item 0 IND,Y ADDD \ add top item to accumulator D DEX \ decrement counter EQ UNTIL, \ loop until counter = 0 0 IND,Y STD ( -- sum ) \ put answer on stack >FORTH \ go to high level CR ." The sum is " . CR \ print CR and sum >ASSM \ return to assembly RTS END.CODE
SUM4&PRINT compile identical code; one is a high level colon definition that invokes the assembler, and one is a low level CODE definition that calls high level FORTH routines.
Also note that high level FORTH routines may modify the contents of the X and D registers. In the
SUM4&PRINT definition, for example, we know that the sum is present in the D register before >FORTH is invoked, but we cannot assume that the sum will still be present in D after we re-enter the assembly definition. In fact, the CR and ." printing statements modify the contents of D. If you need to preserve the contents of X or D while calling high level FORTH routines, push the register contents onto the return stack before invoking the high level commands, and pull the saved contents back into the registers after returning to the assembly coded portion of the routine.
Rely on high level routines to perform page changes
When an address on a different page must be accessed from inside an assembly coded definition, it is best to use high level code to perform the page change. Similarly, when calling previously defined high level or assembly coded words from within an assembly routine, you could execute,
>FORTH <routine's.name> >ASSM
to compile the call. The high level compiler correctly compiles page changes if necessary. A more convenient way of calling a single word from within a CODE definition is to use the directive CALL followed by the name of the word to be called,
For example, the following definition fetches from an extended address and adds 9 to the result:
CODE @9+ ( xaddr -- n | n = contents of xaddr + 9 ) CALL @ ( -- n1 ) \ compile a call to @ 0 IND,Y LDD \ put n1 in D 9 IMM ADDD \ add 9 to contents of D 0 IND,Y STD ( -- n ) \ put answer on stack RTS END.CODE
Note that the kernel word @ is used to perform the page change required to fetch from the specified address. An equivalent but less elegant way to compile a call to a previously defined routine from within an assembly code definition is to execute,
>FORTH @ >ASSM
Note that the high level interpreter and the CALL routine correctly compile page changes to called routines when required. Compilation of page changes is discussed in detail in the Advanced Forth Programming Topics chapter of the PDQ Board Users Guide.
Testing, branching, and looping structures
QED-Forth's assembler compiles branches and loops without the need for labels to specify destinations. The branch and loop structures are similar to those used in high level FORTH, making assembly coding very straight-forward.
There is an important difference between high-level and assembly-coded conditionals. The high level words remove a flag from the data stack at run time and base their test on the value of the flag. In assembler code, on the other hand, the test is based on the value of the condition code register (CCR) at run time. A conditional instruction typically follows an instruction that sets the CCR. The programmer specifies which condition should be tested for.
For example, a typical test and branch structure is implemented as,
<instruction that sets condition codes> <condition> IF, <instructions to be executed if condition is met> ELSE, <instructions to be executed if condition is not met> ENDIF, <continue ...>
where the ELSE, clause is optional, and ENDIF, can be replaced by THEN, which is a synonym. These commands are analogous to IF … ELSE … ENDIF in high level FORTH.
Assembly coded loops are assembled using the commands
BEGIN, ... <condition> WHILE, ... REPEAT,
which loops while the condition is true, and,
BEGIN, ... <condition> UNTIL,
which loops until the condition is true, and,
BEGIN, ... AGAIN,
which is an infinite loop. These operate in similar fashion to the analogous high level words. Consult the glossary for detailed descriptions of each of these control words.
The conditions specified in the stack pictures of these constructs are named according to the mnemonics for the 68HC11's branch instructions. For example, the mnemonic BMI (branch if minus) leads to the condition name MI, for "minus". This condition can be used with branch and conditional structures such as IF, WHILE, and UNTIL,.
Note that all comparisons are zero-based, and rely on the contents of condition code register. Thus every conditional instruction should follow an instruction that sets the condition code register. For example, for the code fragment
1234 IMM LDD \ puts 12 in A, 34 in B CBA \ do A - B (without changing A or B), set CCR MI IF, 0000 IMM LDD ENDIF,
the result of the subtraction (0x12 - 0x34) is a negative number (less than zero) and the condition codes are set accordingly. The code between the IF, and ENDIF, is executed if the result is negative, so 0 is placed in D.
For the experts among you, the assembler actually compiles a BPL ("branch if plus") for the example above; the compiled branch is always the logical opposite of the specified condition. After the CBA instruction, the condition code register reflects that the result of the subtraction is less than zero. The BPL instruction is encountered, and the branch is not taken so the
0 IMM LDD instruction is executed. Thus 0 is placed in D. The programmer need not worry about this level of detail; control structures can be programmed using the same logic that is used to code high level conditional and looping statements.
The following table lists the names of the condition codes and an English statement of the condition. Remember that all conditions are referenced to 0.
|Condition Code Mnemonics|
|HI||higher than, unsigned numbers|
|LS||lower or same, unsigned numbers|
|HS||higher or same, unsigned numbers|
|LO||lower than, unsigned numbers|
|VC||two's complement overflow clear|
|VS||two's complement overflow set|
|PL||plus (most significant bit clear)|
|MI||minus (most significant bit set)|
|GE||greater than or equal, signed numbers|
|LT||less than, signed numbers|
|GT||greater than, signed numbers|
|LE||less than or equal, signed numbers|
|ANY.BITS.SET||any of the bits specified by a mask are set|
|ANY.BITS.CLR||any of the bits specified by a mask are clear|
Notice that CC is defined as a condition code in the QED-Forth assembler. Thus if you need to use the hexadecimal number 0xCC during assembly coding, it is recommended that you type it as 0xCC or 0CC to avoid confusion.
Using the ANY.BITS.SET and ANY.BITS.CLR conditions
The last two conditions in the above table, ANY.BITS.SET and ANY.BITS.CLR, implement branches by the instructions BRCLR and BRSET which combine testing of a byte and branching in one instruction. For example, the following code tests the top bit of the top stack item to see if it is negative. If so, it leaves a TRUE flag (-1) on the stack; otherwise, it leaves a FALSE (0).
HEX CODE NEGATIVE? ( n -- flag | flag is true if n is negative, false if positive ) 80 0 IND,Y ANY.BITS.SET IF, \ if top bit of most significant byte is set FFFF IMM LDD \ ... put -1 flag in D ELSE, \ if top bit isn't set... 0 IMM LDD \ ... put 0 flag in D ENDIF, 0 IND,Y STD \ put flag on data stack RTS END.CODE
In the first instruction,
0x80 is a mask that specifies which bits are to be tested, and
0 IND,Y specifies the byte on the data stack to be tested (remember that Y is the data stack pointer). The top bit in
0x80 is a 1 and all the other bits are 0; thus only the top bit is tested.
0 IND,Y is the top byte on the data stack which is the most significant byte of the number to be tested. If the top bit is set, then the code between IF, and ELSE, executes and the flag left on the stack is true. Otherwise the flag is false.
Using BEGIN, ... UNTIL, and BEGIN, ... WHILE, ... REPEAT, loops
The following two words use slightly different methods to clear the 1 Kbyte on-chip RAM that resides at hex addresses 0xB000 to 0xB3FF. The first uses a
BEGIN, ... UNTIL,
loop starting at the bottom of the region, and the second uses a
BEGIN, ... WHILE, ... REPEAT,
loop starting at the top of the region. They illustrate the use of the looping constructs:
HEX CODE CLEAR.CHIP.RAM ( -- ) B000 IMM LDX \ put base address in X BEGIN, \ start the loop 0 IND,X CLR \ clear the current byte INX \ increment the pointer B400 IMM CPX \ are we at the top? EQ UNTIL, \ loop until top is reached RTS \ done END.CODE CODE CLEAR.CHIP.RAM ( -- ) B400 IMM LDX \ put top address in X BEGIN, \ start the loop DEX \ decrement the pointer B000 IMM CPX \ do X - B000; are we at the bottom? HS WHILE, \ if not, continue 0 IND,X CLR \ clear the current byte REPEAT, \ go to beginning of loop RTS \ done END.CODE
Debugging assembly coded routines
QED-Forth's debugger can be used with assembly coded routines just as it is with high level words. Before assembly, execute TRACE ON which forces the trace instruction to be compiled before every assembly command. Then if the DEBUG flag is true, execution of the assembly coded word will print the name of each mnemonic (but not address modes or operands) and the stack picture. Turning the variable DUMP.REGISTERS ON will cause the contents of all of the programming registers to be printed, and turning SINGLE.STEP ON enters the
BREAK> interpreter after each line of the routine. This lets you examine variable locations or perform other diagnostics at any point during the routine's execution. With the exception of the data stack pointer register Y, the values of the programming registers are saved before entry into the
BREAK> mode, and restored on exit from
BREAK>. Thus your diagnostics will not corrupt the register state of the machine. The Y register is purposefully not saved and restored; this lets you correct data stack errors while in the
BREAK> mode. Consult the C V4.4 Interactive Debugger Glossary for a complete description of words used in the debugging system.
Note that compilation of a definition with TRACE ON requires much more memory than an untraced definition, as each trace instruction requires 8 bytes in the definitions area. This extra memory requirement may cause problems when tracing some assembly coded definitions that have many instructions inside a branching structure such as IF, … ENDIF, or BEGIN, … UNTIL,. The extra bytes devoted to the trace may cause the branches to exceed ±128 bytes, which will cause an error message to be issued when the routine is being compiled. To solve this problem, simply define the loop's contents as a separate subroutine that is then CALLed from within the loop.