Failure and Run-Time Error Recovery

How to employ the processor's built-in hardware features for robust error recovery

Getting started and getting stopped
The COP watchdog timer and clock monitor
The clock monitor
Processor operating modes

This chapter describes a variety of useful hardware features on the QCard Controller Single Board Computer (SBC):

The Freescale 68HC11 processor’s external hardware interrupt /IRQ, may be used by external devices to request immediate service.
Three nonmaskable interrupts cause a hardware reset: the external reset, the COP, and the clock monitor. The main reset is activated on power-up or when the /RESET pin is pulled low for more than 4 machine cycle. Enabling the computer operating properly circuit, COP, sets up a watchdog timer that resets the processor unless a special register is periodically updated. This provides a means of recovering from crashes in an embedded application. Use of the COP feature requires installation of an autostart routine which services the COP. The clock monitor backs up the COP by resetting the machine if the system clock fails.
STOP and WAI instructions are available to put the CPU in low power modes with different degrees of power savings
An on-board jumper allows selection of the standard operating mode or the special cleanup mode. Proper access of these hardware features from your C-language application program will improve the reliability and robustness of your instrument control applications.

Getting started and getting stopped
restarts and resets

External hardware resets

The main reset interrupt of the 68HC11 processor is activated upon power-up or when the active-low /RESET signal is pulled low. The processor does not distinguish between a power-on reset and a reset caused by a low level on the /RESET input pin; both result in the same hardware initialization and software restart sequence.

The /RESET line is normally held high by a pull-up resistor. You can pull the /RESET line low by pushing on the reset switch. Moreover, any peripheral device can reset the processor by driving the /RESET signal low for at least 2 microseconds using an open-collector output.

The active-low /RESET signal is controlled by the power monitor circuitry. On power-up, the monitor asserts the reset signal until the positive supply has stabilized above 4.5 Volts.

Internal resets

The 68HC11 resets itself when a failure condition is detected by either the computer-operating-properly (COP) or the clock monitor circuit. When either of these failure conditions occur, the processor drives the /RESET line low for less than 4 machine cycles to reset itself and any peripherals that are connected to the /RESET line. The processor then determines which failure (COP or clock monitor) caused the reset, and branches to the associated service routine. QED-Forth initializes the interrupt vectors for the COP and clock monitor to perform the standard restart sequence, and the programmer may change the vectors if desired. The operation of the COP and clock monitor are described in the following sections.

Crashes

A computer "crashes" when it executes a set of instructions that it is not supposed to. This can cause the processor to write over memory locations that are not write-protected. The processor may get into an infinite loop of legal instructions (in which case it will not respond to your commands), or it may eventually execute an "illegal opcode". Illegal instructions are detected by the processor’s illegal opcode trap and result in a restart , in which case you will see the QED-Forth startup message on your terminal, or execution of the autostart program, if present.

The best response to a crash during program development is to push the reset button. This initializes all of the registers and performs a restart. In most cases a "warm restart" will be performed, which should allow you to continue programming with access to all of the words that you have defined. In other cases, the state of the user area or the dictionary may be corrupted. If QED-Forth detects the corruption, it will automatically execute a "cold restart"; otherwise you may execute COLD which performs the restart. The cold restart re-initializes all of the user variables that control QED-Forth’s operation.

Resets versus restarts

To clarify the discussion of crashes, some terms must be defined. A "reset" is an initialization process invoked by the hardware of the 68HC11, while a "restart" is an initialization process controlled by software.

A reset can be caused by any of four events:

power is applied to the processor
the reset button is pushed
the clock monitor detects a clock failure
the computer operating properly (COP) circuit detects a failure

The hardware of the 68HC11 is configured by a set of registers that reside at locations 8000H through 805FH. (These hardware registers should not be confused with the programming registers D, X, Y, etc.) The reset initializes essentially all of the registers, and then initiates an interrupt response sequence. The interrupt calls a specified response program whose address is stored in an interrupt vector near the top of memory. The power-on and reset-button resets share the same interrupt vector at FFFE. The clock monitor and COP resets are re-vectored to addresses in EEPROM where the programmer can install customized service routines, if desired. All of these service routines are initialized to perform the default restart routine.

A "restart" is an initialization process performed by software. After a (hardware-invoked) reset, the 68HC11 calls a restart routine which re-initializes some of the registers to accommodate QED-Forth, and initializes other memory locations including all or part of the user area. A restart can also be invoked solely via software, by executing the kernel words COLD or WARM. When the illegal opcode trap detects an illegal instruction, it calls a restart routine, but does not perform a hardware reset. Note that a reset always results in a restart, but that a restart can be performed without a reset.

COLD is the most comprehensive software-invoked initialization command. Executing COLD after a crash usually puts the machine into a well-known state by completely initializing the user area which controls QED-Forth’s operation. But COLD does not initialize all of the registers. Therefore, in crashes where the contents of key hardware registers are corrupted, it may be necessary to perform a hardware reset by pushing the reset button or powering the machine off and on again.

Cold versus warm restarts

There are two types of restart: cold and warm. A cold restart initializes all of the parameters used by the QED-Forth system. These parameters are stored in the "user area", which is a 256-byte block of memory in the common RAM. All of the memory management pointers, format variables to control numeric conversion, quantities that enable the compilation of local variables, and many other system values are stored in the user area. COLD initializes these to default values. COLD also initializes several vital interrupt vectors so that they will perform the startup sequence if they are invoked. These vital interrupts --clock monitor, computer operating properly, and illegal opcode trap-- were discussed in the last chapter.

A warm restart, on the other hand, assumes that most of the user variables have already been properly initialized. A warm restart initializes only a few of these parameters, including stack pointers (it clears the stacks) and some multitasking variables (it makes sure that a single task is running and that it has control of the serial port).

A warm restart preserves the prior number base (whatever you had set it to before the restart occurred) while a cold restart always sets the base to decimal. A warm restart preserves the user’s memory map and QED-Forth’s ability to find user defined words, while a cold restart sets a default memory map and forgets all words except those in the original kernel.

The default restart program decides whether to perform a cold or a warm restart by checking a location in the user area to see if a specified pattern (1357H) is stored there. If the correct pattern is present, the restart program assumes that the user area is already properly initialized, so it performs a warm restart. If the location does not contain the proper value, the restart program assumes that some event (perhaps a crash) has corrupted the user area, so a cold restart is executed to force the system to a known state.

Because the QCard’s common RAM is battery backed (except for the 1K of RAM at B000H-B3FFH on the 68HC11 itself), the user area (including the location where the startup pattern is stored) maintains its contents even when it is powered down. Thus a warm restart will be performed most of the time when you turn on the QCard. This is convenient: it means that access to the words you defined, your memory map, and the contents of the user area are not altered by removal of power. It also means that pushing the restart button and powering the machine off and on again have similar effects, except that powering the machine off loses the contents of the 1K of RAM on the 68HC11 at addresses B000H-B3FFH.

If a crash over-writes the user area, the next restart will be a cold restart. QED-Forth signals a cold startup by printing a COLDSTART statement before the QED-Forth V4.4x startup message is printed. If the crash did not corrupt the startup pattern in the user area, a warm restart would be performed, and you could continue debugging. In most cases, all of the words that you defined would still be accessible. If the machine is behaving in an unpredictable manner, however, it may be necessary to reset the machine and perform a cold restart to establish a known initialized state.

Recovery tricks

Some crashes may be difficult (but not impossible!) to recover from. For example, if the name area of the dictionary is corrupted, QED-Forth may not be able to find even the most basic commands in the dictionary. If every command you give is met with the ? error message, try executing COLD. The FIND word in the interpreter is programmed to always recognize the word COLD, even if the dictionary is corrupted.

If all else fails, use the special cleanup mode

These recovery techniques may not work if you have a buggy autostart word or a major crash. If typing COLD or pressing the reset button does not greet you with the standard "QED-Forth V4.4x" prompt, you may need to use the special cleanup mode to restore your system to a proper state. This involves installing Jumper J1 and then pressing the reset button. The special cleanup procedure places the QCard in the same state it was in when it was shipped from the factory.

The COP watchdog timer and clock monitor

In many embedded control applications, it is important that processor crashes be detected quickly so that the system can rapidly be returned to a proper operating condition. The Computer Operating Properly subsystem, also known as a "watchdog timer" or "COP", provides this capability. It gives the programmer a way to force a processor reset if an application program crashes or gets lost. When enabled, the COP resets the processor if the application program fails to periodically update a specified register within a predetermined time-out period. The COP time-out period is programmable to any of four values between 8 msec to 0.5 seconds.

To use the COP, design and debug an application program that, in addition to performing all of its normal tasks, periodically writes a 2-byte pattern to the COP reset (COPRST) register as described below. The specified pattern must be written before the COP "times out". Then install the application as an autostart routine using the QED-Forth word AUTOSTART or PRIORITY.AUTOSTART, and enable the COP.

If the application program ever allows the time-out period to be exceeded without writing the specified pattern, the COP resets the processor. Presumably the pattern will not be properly written if the processor crashes for any reason, so the COP provides a way of automatically resetting the processor to recover from crashes. Then, because the application program has been installed as an autostart routine, the application is automatically restarted when the COP forces a reset.

Be careful with the COP

Before enabling the COP, make sure that a debugged application program that properly updates the COPRST register has been installed as an Autostart() or PriorityAutostart() routine. If the startup program is improperly designed so that it is unable to service the COP on time, the COP will reset the machine, thereby invoking the startup program again, and leading to an infinite series of COP resets.

If you find yourself in this situation you can return the QCard Controller to its "pristine" state by entering the special clean-up mode: install Jumper J1 and then press the reset button to resume normal operation with the COP disabled and any autostart routine removed.

The COP feature should prove error-free as long as the application program is:

fully debugged;
capable of updating the COPRST in a timely fashion; and,
installed as an autostart routine.

Configuring the COP

Three bits are used to configure and enable/disable the COP. They are named CR0, CR1, and NOCOP. CR0 and CR1 are located in the OPTION register. These bits determine the amount of time which can elapse between updates of the COPRST register by the application program. If the time-out period is exceeded, the COP forces a reset. The four available time-out periods are:

Table 7-1 COP Time-out Period

CR1	CR0	Time-out Period
0	0	8.192 ms
0	1	32.768 ms
1	0	131.07 ms
1	1	524.5 ms

The CR1 and CR0 bits in the OPTION register may be modified only during the first 64 cycles after a reset. The function InstallRegisterInits() makes it easy to specify a value that will be automatically stored into the OPTION register after every reset; consult its glossary entry for details.

The third control bit is called NOCOP and is located in the CONFIG register. The QCard is shipped with this bit set so that the COP is disabled. To enable the COP, clear this bit. The CONFIG register’s contents are non-volatile, and so are maintained even after the processor has been powered down.

Servicing the COP

Servicing the COP is accomplished by writing 55H and AAH to the COPRST register. Although the order of the writes is important, the number of intermediate instructions between them is inconsequential. The two writes must be performed before the time-out period has elapsed. Once AAH has been stored, the COP will need to be serviced again before the next time-out period has elapsed.

The clock monitor

The clock monitor provides a second level of security by monitoring the main system clock and resetting the processor if the clock signal disappears or oscillates too slowly. The clock monitor does not initiate a reset as long as the E-clock frequency is greater than 200 kHz (the E-clock frequency is one quarter the frequency of the on-board crystal). A reset is always triggered at E-clock frequencies below 10 kHz, and may be triggered at frequencies as high as 200 kHz.

The clock monitor is primarily used as a backup for the COP. The COP relies on the clock’s presence for reliable operation, and the clock monitor can ensure that the processor is safely reset if the clock fails.

Enabling the clock monitor is accomplished by setting the CME (clock monitor enable) bit in the OPTION register. This bit may be set or reset at any time. A second bit named FCME (force clock monitor enable) is also involved. When the FCME bit is in its default state of 0, the bit has no effect, and when FCME is set, the clock monitor feature cannot be disabled until a reset occurs. We will assume that FCME is 0, and that the CME bit controls the clock monitor. See MC68HC11F1 Technical Data Manual, p.5-3 for further details. Note also that if the clock monitor on the CPU is enabled, a STOP assembly instruction will trigger a reset because it stops the clock, as discussed in the "Low Power Modes" section below.

Processor operating modes

Low power CPU modes

The 68HC11F1 CPU has two low power modes. These modes are enabled by assembly instructions STOP and WAI (wait). The STOP command puts the CPU into its lowest power-consumption mode by stopping all clocks, thereby stopping all processing (MC68HC11F1 Technical Data Manual, p.5-17). If the clock monitor is enabled, a reset will be triggered when the clocks stop due to a STOP instruction. To use a STOP instruction when the clock monitor reset is enabled, disable the monitor before the STOP instruction, and re-enable it after returning from the STOP.

Pulling either /RESET or /IRQ low wakes the processor up after a STOP instruction. Pulling the reset line low awakens the CPU and performs the standard reset startup sequence. For the CPU to be awakened by the /IRQ line going low, the I bit in the CCR register must be clear so that interrupts are globally enabled. When /IRQ goes low and the I bit is clear, execution begins with the /IRQ handler and then executes the code following the STOP instruction.

The STOP instruction is executed as a NOP unless the S bit in the CCR is cleared. After clearing the S bit, any occurrence of a STOP instruction puts the CPU into its lowest power mode. After each reset or restart, QED-Forth leaves the S bit in the CCR in its default set position, meaning that the STOP mode is disabled.

WAI low power mode

The WAI instruction also puts the 68HC11F1 in a low power mode. However, clocks are not disabled in the wait mode, so power consumption is greater than the STOP mode. After a WAI instruction, the machine state is stacked and processing stops. Power savings can be increased by setting the I bit in the CCR and disabling the COP. Further savings can be achieved by disabling the on-chip subsystems, including executing A/D8.OFF to turn off the A/D (MC68HC11F1 Technical Data Manual, pp.5-17...5-18).

The WAI low power state can only be exited by an unmasked interrupt or by pulling the /RESET pin low. When an unmasked interrupt occurs, (for example /IRQ goes low, the COP is not serviced, clock monitor failure or reset occurs), the appropriate interrupt handler is executed and then processing continues with the instructions following the WAI. Implementing the WAI lower power mode is accomplished by simply executing WAI.

Summary of low power modes

In sum, power can be saved by putting the CPU in a low power mode while processing is not required. The 68HC11F1 has two low power modes with different degrees of savings. Both modes are terminated by unmasked interrupts. While the WAI instruction can be called without any preparation, the STOP instruction must be enabled by clearing the S bit of the CCR register.

Operating modes of the 68HC11F1 CPU

The 68HC11F1 microcontroller has four operating modes: expanded nonmultiplexed, special test, single chip, and special bootstrap modes (M68HC11 Reference Manual, Section 3 and MC68HC11F1 Technical Data Manual, pp.4-1...2). The standard operating mode is expanded nonmultiplexed, meaning that the processor has access to expanded memory beyond its on-chip memory, and that the address and data lines are not multiplexed together (as they are on other members of the 68HC11 family). The QCard also makes use of the special test mode, renaming it the "special cleanup" mode. This mode makes it possible to rapidly recover from any programming error that causes repeated machine crashes. The single chip mode takes away the ability of the processor to address external memory, and special bootstrap allows startup code to be inserted into the processor; these two modes are not used on the QCard.

The processor’s operating mode is determined by the states of two pins named MODA and MODB (refer to the schematic in Appendix C). On the QCard, MODA is always high and MODB may be pulled LOW by installing Jumper J1; this invokes the special cleanup mode. When Jumper J1 is not installed, the board is in the standard operating mode.

Special cleanup mode

The Special Cleanup Mode is useful if a buggy startup routine has been installed (using the AUTOSTART or PRIORITY.AUTOSTART words) or if invalid register initializations have been specified (for example, using the InstallRegisterInits() word). To recover from these problems, simply enter the special cleanup mode by installing Jumper J1 and pressing the reset button. This completely re-initializes the system software to its "pristine" state, and displays the QED Forth startup message at your terminal.

This page is about: Robust Error Recovery on 68HC11 Microcontroller – For robust error recovery, use the 68HC11 processor's maskable and nonmaskable interrupts, COP (Computer Operating Properly) watchdog monitor, clock monitor, power-on reset and the special cleanup mode. cold restart, warm restart, recovery tricks