Using the Forth Assembler

How to write in-line assembly code for fast execution of time-critical code in a Forth program

In-line assembly
Debugging assembly coded routines
Using control structures
Post-fix syntax
Creating an assembly routine definition
QED-Forth register usage
An example
Performing in-line assembly within high level definitions
Calling high level Forth words from within assembly code definitions
- Rely on high level routines to perform page changes
Testing, branching, and looping structures
- Using the ANY.BITS.SET and ANY.BITS.CLR conditions
- Using BEGIN, ... UNTIL, and BEGIN, ... WHILE, ... REPEAT, loops
Debugging assembly coded routines

During software development using the Forth high-level language, it is often useful to optimize time-critical functions using low-level 68HC11 or 9S12 (HCS12) assembly code. QED-Forth's in-line assembler enables you to compile machine code by typing the Motorola assembler mnemonics to specify the desired instructions. High level FORTH-like control structures allow you to easily implement testing, branching, and looping constructs in assembly code.

For most applications you don't need to use the assembler to write high performance code. Forth compiles fast subroutine threaded code, and most kernel words are already optimized in assembler code. But for certain critical applications, especially in interrupt service routines, assembly coding can improve performance.

In-line assembly

QED-Forth's assembler is fully integrated with the rest of the integrated development environment (IDE), and it is easy to switch back and forth between high level and assembly code. Because QED-Forth is a subroutine threaded language, the compiled definition of each high level word consists of executable machine code. QED-Forth's assembler also compiles executable machine code into the dictionary. Since high level and assembly code are indistinguishable in the dictionary, assembly code can be inserted "in-line into a high level definition, giving great programming flexibility.

Debugging assembly coded routines

Assembly coded routines can be debugged using all of the built-in debugging tools. You can install break points, single step through code routines, and view the contents of the programming registers as each instruction executes. The Debugging Programs chapter of the PDQ Board Users Guide describes the debugging C programs in detail, and a brief summary is presented at the end of this chapter.

Using control structures

The assembler operates in a single pass and, unlike traditional assemblers, does not use labels. Rather, loops and branches are specified using control structures patterned after those of FORTH. The mnemonics for these control structures end with a comma to suggest that they are compiling code into the dictionary and to distinguish them from their high level FORTH analogs. The following control structures are implemented:

IF, … ENDIF, ( or THEN, )
IF, … ELSE, … ENDIF, ( or THEN, )
BEGIN, … UNTIL,
BEGIN, … AGAIN,
BEGIN, … WHILE, … REPEAT,

In addition, each assembly subroutine defined with the CODE command has a name specified by the programmer when the subroutine is defined. The name allows a subroutine to be called by subsequently defined routines. This procedure results in highly structured label-less source code for easy readability.

Post-fix syntax

Unlike the C in-line assembler, QED-Forth's assembler uses a FORTH-like post-fix style mnemonic sequence: the operand is stated first, followed by the address mode and then the instruction. The assembler works while the QED-Forth interpreter is in execution mode (not in compilation mode). Operands and address modes leave numeric values on the stack, and the instruction mnemonic uses these values to compile the proper opcode sequence into the dictionary.

It is important to understand that execution of the assembly source code mnemonics does not execute the assembly instruction; rather, it causes compilation of the proper opcode sequences into the dictionary. The examples presented in the following text will make this clear.

We don't attempt here to describe the assembly commands of the 68HC11. Motorola's manuals give a complete description of each assembly command and its mnemonic representation. Each Motorola mnemonic has a corresponding entry (with the same spelling) in the ASSEMBLER vocabulary in QED-Forth's dictionary. Consult the QED-Forth Assembler Glossary for definitions of all of the assembler mnemonics and directives.

Creating an assembly routine definition

Assembly coded definitions are created using the words CODE and END.CODE (or its synonym END-CODE). This creates a routine that can be called from QED-Forth. The definition of CODE is

Create a header in the name area for the word following CODE, and "smudge" the header so it cannot be found in the dictionary until END.CODE executes. Remain in the execution mode. Execute ASSEMBLER to make the assembler vocabulary the context (search) vocabulary so that the assembler mnemonics can be found in the dictionary.

After CODE opens the definition and creates a header, assembly instructions are entered to form the body of the definition. Some instructions consist of a single mnemonic; for example, the instruction ABA tells the processor to add the contents of accumulator B to accumulator A and place the result in A. No operand or address mode is needed to specify this instruction. Other instructions require an operand and an addressing mode as well as the instruction mnemonic. For example, to compile code to load register X with the quantity 0x1234, execute (assuming that QED-Forth is in hexadecimal base),

1234 IMM LDX

which specifies 0x1234 as the operand, IMMediate as the addressing mode, and LDX (load register X) as the instruction. When this set of mnemonics is executed, the 3-byte sequence CE 12 34 is compiled into the dictionary. When later executed, these bytes instruct the processor to perform the desired operation of loading the value 0x1234 into register X.

The addressing modes are assigned the following mnemonics:

Address mode mnemonic	Description
IMM	immediate The actual operand is specified in the instruction.
REL	relative The operand is a signed relative offset byte between -128 and +127 that specifies a branch destination.
DIR	direct The operand resides at an address whose most significant byte is `00` and whose least significant byte is specified in the instruction. This mode yields faster and denser code when the operands are contained at addresses 0x0000–0x00FF.
EXT	extended The operand resides at the address specified in the instruction.
IND,X	indexed, register X The operand resides at the address calculated by adding the single-byte positive offset specified in the instruction to the contents of the X register.
IND,Y	indexed, register Y The operand resides at the address calculated by adding the single-byte positive offset specified in the instruction to the contents of the Y register.
INH	inherent The instruction does not need an operand. Specification of this mode is optional.

The inherent addressing mode is not used; the assembler knows which words are inherent (i.e., do not need a specified address mode). nevertheless, the INH addressing mode keyword is defined as a do-nothing command in case you want to use it to clarify your source code.

Please consult the Motorola 68HC11 Manual for a detailed discussion of the meanings of these address modes.

The last instruction in an assembly coded definition is usually an RTS (return from subroutine) which is the equivalent of ; in high level FORTH. END.CODE completes the definition by executing FORTH to return to the standard vocabulary and "smudging" the header created by CODE so that the name of the definition can be found in the dictionary. END.CODE also prints an error message if excess items were placed on or removed from the data stack during the course of the code definition.

QED-Forth register usage

The 68HC11 has three 16-bit registers available for programming use:

Index register X,
Index register Y, and,
the D register which is made up of 2 8-bit accumulators:
- A (most significant byte) and
- B (least significant byte)

There is also a return stack pointer register (S), a condition code register (CCR), and a program counter (PC); these manage the execution of the instructions. All mathematical operations use one or both of the accumulators that comprise the D register. The X and Y registers may be used as index registers which point to memory locations whose contents then become the operands of machine instructions. The programming registers are fully explained in the 68HC11 manuals.

Your assembly coded routines may freely use the D and X registers. Their contents need not be preserved, as QED-Forth uses them only as scratchpad registers.

The contents of the Y register must be handled with care, however. QED-Forth uses the Y register as the data stack pointer. Assembly coded words must not corrupt Y by leaving it in an arbitrary state. If an assembly routine does not need to access the data stack and needs to use the Y register to hold temporary data, it should save the data stack pointer on the return stack with a PSHY instruction, use the Y register as necessary, and restore the data stack pointer with a PULY instruction before exiting the routine. This may cause hard-to-diagnose problems, however, if the assembly coded routine is interrupted after Y is modified; the interrupt routine may assume that Y is a valid data stack pointer. In addition, note that assembly code routines that do not use Y as a data stack pointer may be difficult to debug using the QED-Forth trace and debugging tools. In summary, it is recommended that assembly routines preserve the Y register's role as the data stack pointer.

To ensure that code is fully pre-emptable and will not cause multitasking failures, follow these rules when putting items on or taking them off the data stack: When putting an item on the data stack, decrement the stack pointer first (e.g., using the DEY instruction) and then put the value on the stack. The following code fragment puts the value 1234 on the data stack:

1234 IMM LDD  \ put value in D
DEY DEY       \ decrement data stack pointer to make room on stack
0 IND,Y STD   \ then put the value on data stack

When taking an item off the stack, move the value into another register or storage location before incrementing the Y register. This code fragment correctly removes an item from the stack and puts it in the X register:

0 IND,Y LDX   \ put top stack item in X register
INY INY       \ then increment stack pointer to drop the item

An example

The following assembly coded routine duplicates an item on the stack and then adds 8 to the top stack item. Executing these instructions causes the definition to be compiled into the dictionary:

HEX
CODE DUP.8+  ( n -- n\n+8 )
  0 IND,Y LDD   \ put top stack item into D
  8 IMM ADDD    \ add 8 to contents of D
  DEY DEY       \ make room on stack for new item
  0 IND,Y STD   \ put incremented item on data stack
  RTS           \ return
END.CODE

CODE and END.CODE delimit a set of assembly mnemonics that cause the compilation of assembly code into the dictionary. CODE creates a header for DUP.8+ in the dictionary. Executing DUP.8+ from the terminal causes the word to be executed, while using it inside a colon definition causes it to be compiled. In short, it behaves just like a FORTH word.

Performing in-line assembly within high level definitions

It is easy to assemble in-line code into a definition. From a high level definition, >ASSM invokes the ASSEMBLER vocabulary and enters execution mode, enabling assembler mnemonics to be entered into the definition. >FORTH returns to the high level definition, executing FORTH to restore the original vocabulary and returning to compilation mode. For example, the following high level definition uses the in-line assembler to code arithmetic operations, while the I/O operations are handled by FORTH commands:

: SUM4  ( n1\n2\n3\n4 -- | prints sum of n1+n2+n3+n4 )
  >ASSM                     \ invoke assembler
    3 IMM LDX               \ X register is loop counter; loop 3 times
    0 IND,Y LDD             \ init D register to n4 (Y is the stack pointer)
    BEGIN,                  \ start the loop
      INY INY               \ DROP and point to next data stack item
      0 IND,Y ADDD          \ add top item to accumulator D
      DEX                   \ decrement counter
    EQ UNTIL,               \ loop until counter = 0
    0 IND,Y STD  ( -- sum ) \ put answer on stack
  >FORTH                    \ go back to high level
  CR ." The sum is " . CR ; \ print sum

The use of the BEGIN, … <condition.code> UNTIL, construction is explained later ([#Testing, branching, and looping structures]).

Calling high level Forth words from within assembly code definitions

A single word or a set of previously defined words can be called from an assembler definition by simply stating

>FORTH <routine's.name> <another.routine's.name> ... >ASSM

The routines could be kernel words, words defined in high level FORTH by the programmer, or even words previously defined in assembly using CODE. For example, the following word calculates a sum in assembler code and then calls high level Forth words to print the result.

CODE SUM4&PRINT ( n1\n2\n3\n4 -- | prints sum of n1+n2+n3+n4 )
  3 IMM LDX          \ X register is loop counter
  0 IND,Y LDD        \ init D register to n4
  BEGIN,             \ start the loop
    INY  INY         \ point to next data stack item
    0 IND,Y ADDD     \ add top item to accumulator D
    DEX              \ decrement counter
  EQ UNTIL,          \ loop until counter = 0
  0 IND,Y STD ( -- sum ) \ put answer on stack
  >FORTH             \ go to high level
    CR ." The sum is " . CR \ print CR and sum
  >ASSM              \ return to assembly
  RTS
END.CODE

Note that SUM4 and SUM4&PRINT compile identical code; one is a high level colon definition that invokes the assembler, and one is a low level CODE definition that calls high level FORTH routines.

Also note that high level FORTH routines may modify the contents of the X and D registers. In the SUM4&PRINT definition, for example, we know that the sum is present in the D register before >FORTH is invoked, but we cannot assume that the sum will still be present in D after we re-enter the assembly definition. In fact, the CR and ." printing statements modify the contents of D. If you need to preserve the contents of X or D while calling high level FORTH routines, push the register contents onto the return stack before invoking the high level commands, and pull the saved contents back into the registers after returning to the assembly coded portion of the routine.

Rely on high level routines to perform page changes

When an address on a different page must be accessed from inside an assembly coded definition, it is best to use high level code to perform the page change. Similarly, when calling previously defined high level or assembly coded words from within an assembly routine, you could execute,

>FORTH <routine's.name> >ASSM

to compile the call. The high level compiler correctly compiles page changes if necessary. A more convenient way of calling a single word from within a CODE definition is to use the directive CALL followed by the name of the word to be called,

>CALL <routine's.name>

For example, the following definition fetches from an extended address and adds 9 to the result:

CODE @9+   ( xaddr -- n  | n = contents of xaddr + 9 )
  CALL  @  ( -- n1 )    \ compile a call to @
  0 IND,Y LDD           \ put n1 in D
  9 IMM ADDD            \ add 9 to contents of D
  0 IND,Y STD  ( -- n ) \ put answer on stack
  RTS
END.CODE

Note that the kernel word @ is used to perform the page change required to fetch from the specified address. An equivalent but less elegant way to compile a call to a previously defined routine from within an assembly code definition is to execute,

>FORTH @ >ASSM

Note that the high level interpreter and the CALL routine correctly compile page changes to called routines when required. Compilation of page changes is discussed in detail in the Advanced Forth Programming Topics chapter of the PDQ Board Users Guide.

Testing, branching, and looping structures

QED-Forth's assembler compiles branches and loops without the need for labels to specify destinations. The branch and loop structures are similar to those used in high level FORTH, making assembly coding very straight-forward.

There is an important difference between high-level and assembly-coded conditionals. The high level words remove a flag from the data stack at run time and base their test on the value of the flag. In assembler code, on the other hand, the test is based on the value of the condition code register (CCR) at run time. A conditional instruction typically follows an instruction that sets the CCR. The programmer specifies which condition should be tested for.

For example, a typical test and branch structure is implemented as,

<instruction that sets condition codes>
<condition>
IF,
   <instructions to be executed if condition is met>
ELSE,
   <instructions to be executed if condition is not met>
ENDIF,
<continue ...>

where the ELSE, clause is optional, and ENDIF, can be replaced by THEN, which is a synonym. These commands are analogous to IF … ELSE … ENDIF in high level FORTH.

Assembly coded loops are assembled using the commands

BEGIN,
   ...
<condition> WHILE,
   ...
REPEAT,

which loops while the condition is true, and,

BEGIN,
   ...
<condition> UNTIL,

which loops until the condition is true, and,

BEGIN,
   ...
AGAIN,

which is an infinite loop. These operate in similar fashion to the analogous high level words. Consult the glossary for detailed descriptions of each of these control words.

The conditions specified in the stack pictures of these constructs are named according to the mnemonics for the 68HC11's branch instructions. For example, the mnemonic BMI (branch if minus) leads to the condition name MI, for "minus". This condition can be used with branch and conditional structures such as IF, WHILE, and UNTIL,.

Note that all comparisons are zero-based, and rely on the contents of condition code register. Thus every conditional instruction should follow an instruction that sets the condition code register. For example, for the code fragment

1234 IMM LDD   \ puts 12 in A, 34 in B
CBA            \ do A - B (without changing A or B), set CCR
MI IF,
  0000 IMM LDD
ENDIF,

the result of the subtraction (0x12 - 0x34) is a negative number (less than zero) and the condition codes are set accordingly. The code between the IF, and ENDIF, is executed if the result is negative, so 0 is placed in D.

For the experts among you, the assembler actually compiles a BPL ("branch if plus") for the example above; the compiled branch is always the logical opposite of the specified condition. After the CBA instruction, the condition code register reflects that the result of the subtraction is less than zero. The BPL instruction is encountered, and the branch is not taken so the 0 IMM LDD instruction is executed. Thus 0 is placed in D. The programmer need not worry about this level of detail; control structures can be programmed using the same logic that is used to code high level conditional and looping statements.

The following table lists the names of the condition codes and an English statement of the condition. Remember that all conditions are referenced to 0.

Condition Code Mnemonics
ALWAYS	always
NEVER	never
HI	higher than, unsigned numbers
LS	lower or same, unsigned numbers
CC	carry clear
CS	carry set
HS	higher or same, unsigned numbers
LO	lower than, unsigned numbers
NE	not equal
EQ	equal
VC	two's complement overflow clear
VS	two's complement overflow set
PL	plus (most significant bit clear)
MI	minus (most significant bit set)
GE	greater than or equal, signed numbers
LT	less than, signed numbers
GT	greater than, signed numbers
LE	less than or equal, signed numbers
ANY.BITS.SET	any of the bits specified by a mask are set
ANY.BITS.CLR	any of the bits specified by a mask are clear

Notice that CC is defined as a condition code in the QED-Forth assembler. Thus if you need to use the hexadecimal number 0xCC during assembly coding, it is recommended that you type it as 0xCC or 0CC to avoid confusion.

Using the ANY.BITS.SET and ANY.BITS.CLR conditions

The last two conditions in the above table, ANY.BITS.SET and ANY.BITS.CLR, implement branches by the instructions BRCLR and BRSET which combine testing of a byte and branching in one instruction. For example, the following code tests the top bit of the top stack item to see if it is negative. If so, it leaves a TRUE flag (-1) on the stack; otherwise, it leaves a FALSE (0).

HEX
CODE NEGATIVE?  ( n -- flag | flag is true if n is negative, false if positive )
  80 0 IND,Y ANY.BITS.SET IF,    \ if top bit of most significant byte is set
    FFFF IMM LDD                \ ... put -1 flag in D
  ELSE,                         \ if top bit isn't set...
    0 IMM LDD                   \ ... put 0 flag in D
  ENDIF,
  0 IND,Y STD                   \ put flag on data stack
  RTS
END.CODE

In the first instruction, 0x80 is a mask that specifies which bits are to be tested, and 0 IND,Y specifies the byte on the data stack to be tested (remember that Y is the data stack pointer). The top bit in 0x80 is a 1 and all the other bits are 0; thus only the top bit is tested. 0 IND,Y is the top byte on the data stack which is the most significant byte of the number to be tested. If the top bit is set, then the code between IF, and ELSE, executes and the flag left on the stack is true. Otherwise the flag is false.

Using BEGIN, ... UNTIL, and BEGIN, ... WHILE, ... REPEAT, loops

The following two words use slightly different methods to clear the 1 Kbyte on-chip RAM that resides at hex addresses 0xB000 to 0xB3FF. The first uses a

BEGIN, ... UNTIL,

loop starting at the bottom of the region, and the second uses a

BEGIN, ... WHILE, ... REPEAT,

loop starting at the top of the region. They illustrate the use of the looping constructs:

HEX
CODE CLEAR.CHIP.RAM  ( -- )
  B000 IMM LDX      \ put base address in X
  BEGIN,            \ start the loop
    0 IND,X CLR     \ clear the current byte
    INX             \ increment the pointer
    B400 IMM CPX    \ are we at the top?
  EQ UNTIL,         \ loop until top is reached
  RTS               \ done
END.CODE
 
CODE CLEAR.CHIP.RAM  ( -- )
  B400 IMM LDX      \ put top address in X
  BEGIN,            \ start the loop
    DEX                \ decrement the pointer
    B000 IMM CPX    \ do X - B000; are we at the bottom?
  HS WHILE,         \ if not, continue
    0 IND,X CLR     \ clear the current byte
  REPEAT,           \ go to beginning of loop
  RTS               \ done
END.CODE

Debugging assembly coded routines

QED-Forth's debugger can be used with assembly coded routines just as it is with high level words. Before assembly, execute TRACE ON which forces the trace instruction to be compiled before every assembly command. Then if the DEBUG flag is true, execution of the assembly coded word will print the name of each mnemonic (but not address modes or operands) and the stack picture. Turning the variable DUMP.REGISTERS ON will cause the contents of all of the programming registers to be printed, and turning SINGLE.STEP ON enters the BREAK> interpreter after each line of the routine. This lets you examine variable locations or perform other diagnostics at any point during the routine's execution. With the exception of the data stack pointer register Y, the values of the programming registers are saved before entry into the BREAK> mode, and restored on exit from BREAK>. Thus your diagnostics will not corrupt the register state of the machine. The Y register is purposefully not saved and restored; this lets you correct data stack errors while in the BREAK> mode. Consult the C V4.4 Interactive Debugger Glossary for a complete description of words used in the debugging system.

Note that compilation of a definition with TRACE ON requires much more memory than an untraced definition, as each trace instruction requires 8 bytes in the definitions area. This extra memory requirement may cause problems when tracing some assembly coded definitions that have many instructions inside a branching structure such as IF, … ENDIF, or BEGIN, … UNTIL,. The extra bytes devoted to the trace may cause the branches to exceed ±128 bytes, which will cause an error message to be issued when the routine is being compiled. To solve this problem, simply define the loop's contents as a separate subroutine that is then CALLed from within the loop.

This page is about: How to Write and Debug In-line Assembly Code, Fast Execution of Time-critical Code in Forth Program, Looping and Branching in Assembly Code, 68HC11 or 9S12 (HCS12) Assembly Code, Calling High Level Function from Assembly Routine – How to write and debug in-line assembly code for fast execution of time-critical code in a Forth program