a heuristic architecture


aha source

I have chosen the name aha for the new system I am implementing on the F21 Microprocessor. Perhaps it should be called aha Forth because of the holistic approach that uses hardware and software techniques developed by  Charles Moore the inventor of Forth and he calls this Forth.

The hardware platform, F21, is an unusual microprocessor that uses a hardware architecture based on the Forth virtual machine. It is a Forth machine in silicon that simplifies and matches very well to the Forth software methods developed by its inventor. The hardware design goals were to create a chip with a thousand times fewer transistors, lower power consumption, and production cost than other microprocessors in the 500 Mhz or higher performance range. It is suitable for low cost low power embedded applications from toys to high end aerospace applications. It is optimized for the execution of Forth code, logic, expert systems, real-time, distributed, and AI applications. It is also a highly integrated device combining a CPU, memory interface coprocessor, high speed analog I/O coprocessor, video I/O coprocessor, high speed serial/network coprocessor and router, real time clock, interrupt controller, 10 gigahertz timer, and parallel port on a tiny piece of silicon. Other factors that influence the design of the software are that F21 is word addressing machine with a 20 bit data bus and a 21 bit address bus. The hardware has been optimized for small size, cost, and power consumption, simplicity and speed. It is also known as a Minimal Instruction Set Computer.

The aha software design is influenced by traditional Forth systems where Forth provides an integrated operating system. The aha system will have a Graphic User Interface, a high speed source code compiler, a source code debugger/browser, an optimizing compiler/editor, and it hosts application programs. The software is designed to be close match to the hardware in a holistic way and optimized for small size, simplicity and speed like the hardware.

The aha system is heuristic, "involving or serving as an aid to learning, discovery, or problem-solving by experimental methods and especially trail-and-error methods". The system is holistic, "relating to or concerned with wholes or with complete systems rather than with the analysis of, treatment of, or dissection into parts." (*1)


Isomorphism in Forth hardware and Forth software.
Isomorphism in OS/GUI/compiler/debugger/editor/application integration.
Isomorphism in Distributed Processing using Symmetric Multiprocessing supported by on-chip hardware.
Hardware accelerated GUI implementation.
Tokenized source code database manager.
Source code debugging with instant compilation.
Smalltalk-like GUI design and integration.
LOGO-like simultaneous manipulation of executable code via source code editing.
Highly optimized for heuristic AI, rule based expert systems, neural net emulation, and genetic algorithms in a scalable environment.
Extreme simplicity, compactness and speed.


The aha system is heavily influenced by the software developed and used by Charles Moore over a period of thirty years. He has carried the ideas of empowering a programmer by making all aspects of system and code design simple understandable and accessible, from the inception of Forth in the 1960s through his recent work on Forth chips and new Forth software designs.

Chuck moved from early experimental Forths to the construction of commercial Forths at Forth Inc. and designed systems that became the basis of traditional Forth and later the widely used ANS Forth standard. The PolyForth system that he worked on there before leaving to work on Forth chips at Novix was successful as a fast portable Forth that could provide a stand alone integrated OS and GUI or run in a foreign OS such as DOS. The first time I saw it I was impressed that it loaded so quickly and was surprised to be told that it was actually compiling 700 KB of object code faster than many systems would simply load such code.

At Novix Chuck developed cmForth for use on the Forth in hardware Novix chip. Chuck said that he had the opportunity to go back and streamline and simplify some things in designing a system that fit more closely to the new hardware. The system was remarkably small, fast, and simple. On a floppy disk the system could load a core and compile a new system in about one rotation of the floppy. But he said that he was not very pleased with the very non portable and overly complex optimization techniques in the software that were needed to match the Novix chip's ability to execute certain multiple Forth word instruction sequences in a single opcode and a single cycle. The Novix used multiple busses for the data stack, the return stack, and instruction/data space and could at times manipulate all three in a single cycle. The hardware and software caught my attention because it was faster than Intel's fastest at the time and used one hundred times fewer transistors. The Novix was optimized for subroutine calls and returns and manipulation of a data stack. But Chuck wanted hardware and software that could use a more simple technique for code generation. His goal was software that was simpler, clearer and easier to understand and use.

On the hardware front he moved from a 16 bit design to a 32 bit design with his ShBoom processor. And frustrated by the tools that he had to use that didn't understand his goals or share the focus of his work he began to develop his own CAD software. He also began an experiment in sourceless programming. His idea was that the best way to represent his chip was the design of the chip itself rather than an abstracted text description of the chip. He extended this idea to the software where he felt that he the best representation of the code was the code itself not an abstracted text description of the code. As he often put it in those days, "The map is not the territory."

He built an operating system that he first called Three Forth, as in 3/4, because it did not contain all of Forth, it had no source interpreter or compiler. Later he called it OK, after the traditional command prompt in Forth systems.  OK is a minimal OS and GUI that uses only a few bytes of structured menu code for each graphic screen. The complete system consisted of the boot code, the menus and graphic screens, a HEX memory editor, a graphics screen editor, a character set editor, and an code editor and decompiler. OK was implemented on the Novix, the ShBoom, an Apple IIc, 386, 486, MuP21, and Pentium machines all with the same design and without any source code. He would begin in a debugger and build the OK tools needed to construct the rest of the OK system. Since the total code involved was only a few K on each system it was not a large task and he made small improvements to the design in each iteration.

But OK was just the platform for the application which was his CAD system which he has used exclusively for a dozen years for chip design. He called the system  OKAD, pronounced Oh-CAD. Using  OK, as it does, OKAD is organized into menus with associated graphic screens for the functions of layout browsing, layout editing, design rule checks, simulation, format conversion, and other functions. In this same period Chuck moved from designing for Programmable Gate Arrays to designing for custom Very Large Scale Integration. He moved from using conventional tools that were not designed for laying out highly tuned microprocessors to tools designed to do just what he needed to improve his productivity. He moved from designing for PGA to designing custom VLSI to, as he put it, take advantage of the potential for a ten to one hundred fold improvement in performance and at the same time a one hundred fold reduction in manufacture cost.

The sourceless programming approach was a controversial idea and raised as many questions as did his experiments in using various custom chorded keyboards in his effort to improve the efficiency of his use of his menu based software. After years of use he abandoned the sourceless programming experiment. He declared that it had problems with ambiguous object code sequences that could have been decompiled in different ways and that the decompiler was not altogether satisfactory. He said, "One comment about the ambiguity of decompile: My biggest problem was the lack of expressions that evaluate and document literals. Comments don't help, because you can't rely on their accuracy."

He created assembler source code to the OK and OKAD system to facilitate porting. He declared that most importantly he had come to need to have the ability to compile and interpret code when he needed to create reusable test scripts and automate testing designs in the  OKAD chip simulator. In short, once again he needed Forth.

In this period his chip designs evolved and so did his idea of how Forth should be done on these chips. His chips had gone to using small opcodes where multiple opcodes could be packed into a memory word to allow the CPU to clock instructions at some multiple of the memory clock. The choice of opcodes and other architectural details evolved though years of designs and simulations of various options and simulations of their effect on compiler design and application performance.

He came to call his simple optimizing native code Forth compiler for his chips  Machine Forth. Machine Forth is characterized by a very close match to the hardware and another round of streamlining of the compiler internals. Chuck was pleased by Machine Forth and it was popular with the programmers who have used it for the last eight years. Most versions were simple ports of the page of code he wrote as the first cross compiler for MuP21 but there have also been stand alone and embedded versions.

When he embedded a Forth into his OKAD system he made another set of changes to the internals of the compiler and to the user interface. He called this system   Color Forth. He said that the definition of   Color Forth was putting information into the white space of your source code. He said that when he says "color" he means visually distinctive. He choose to use color tokens that change the visual appearance of following Forth words to make them visually distictive to the programmer and to direct the compiler to do the appropriate thing with them.

Traditionally Forth was a language with almost no syntax, simply words and spaces. Chuck was now using different kinds of spaces to separate words to change the appearance of source code, to reduce the number of Forth words, and to simplify and speed up the internals of the Forth system. He could use color tokens to inform the compiler what to do with the following words and was able to make the compiler faster than it would be if he had used traditional techniques. He had fewer words in the dictionary to search during compilation and did not have to search the dictionary for numbers or convert them from string to binary representation at compile time.

Since a different part of the brain is used in the perception of visual differences like color than syntactic differences in a stream of symbols the approach offered the potential of changing the focus of the user in reading, writing, editing and understanding source code. The idea was seen as very controversial by many people many of whom would complain that they were color blind and would not be able to use his system. He would always reply that they could display the effect of his tokens on source text in a way that would make them visually distinctive to them by using layout on the page, different fonts, different font attributes or whatever they could see. He would always remind these people that in this context "color" means visually distinctive.

It seems that the idea of merging Forth source code with HTML tags, which include color tags, to make sources more clear, more attractive, and in a form that has become a defacto standard for documentation has a lot in common with what Chuck is doing but is not seen as a controversial idea. Perhaps some of this is that Chuck does not feel compelled to follow standards based on what he was doing twenty years ago and has always shown a tendency to try new things and to experiment with new ideas. Perhaps some of this is that when Chuck says that these new techniques allow him to do things that are simpler, faster and more clear that people infer that this means that what they are doing is more complex, slower, and less clear, and find the idea controversial.

The details of the internals of his Color Forth have changed and continue to evolve. He began by simply using text in his OKAD character set packed into memory words for improved access by his word addressing hardware designs. He went into some detail in his 1999 Fireside chat to the Silicon Valley chapter of the Forth Interest Group about the internals of his Color Forth. In July of 2000 he detailed his most recent changes to the design including Huffman encoded character packaging in addition to the color change tokens in his source code in his Color Forth 2000 presentation to SVFIG.

Chuck's software ideas were so closely matched to the hardware that he has designed, including the UltraTechnology F21, that I decided to adapt some of his ideas into a project I had been working on to embed a compiler into the GUI demonstration software that I had done in the  F21 in a mouse demo. I have chosen an odd collection of ideas for  aha which brings me to the design tradeoff details.

In this phase of the design I have not decided exactly how the editor should look to a user. In this system one could have multiple editors that present the source code with different looks. It could look like traditional Forth with certain tokens being displayed as tradition Forth words, it could look like Chuck's  Color Forth or whatever else I decide it should look like.

aha Implementation Details

The design goals of compact source representation and fast compilation made me reject ANS Forth mandated 20 bit characters  (*2) and the byte oriented ASCII string representation of source that I have been using for target compilation. Instead I selected a representation using several types of records to maximize compilation speed and minimize storage costs. There is no requirement here that the source code be in a format that you can process with the editor that you are already using because you are not already using an editor on this machine. The source will not have to be in a format understood by Wordpad or Emacs because those things don't exist in this environment. Utility programs can translate and import or export source code in ASCII byte oriented string format or in HTML or other formats.

Sourceless programming has the minimal source, none. But it has the drawback that it cannot be decompiled unambiguously even in this architecture. An ideal representation must have clear word names, clear high level constructs and comments. Sourceless programming was rejected as the total solution but the technique was borrowed and used as part of the aha design.

In aha the source code will be created and packaged by the editor. It will present and process nicely formatted clear source code. It will make compiler optimization decisions at edit time and enforce certain things that I consider good style in the process. It will process the source code in records and package it into one of several types of record containers.

Much of the source can compile into unambiguous packed Forth opcodes. For this code the binary opcodes are an excellent internal representation of source. The machine has Forth opcodes that directly correspond to the most frequently used Forth words. For sequences where no ambiguity in decompilation exists they are a very compact representation of these Forth words, requiring only five bits per Forth word. The editor will determine which sequences can be safely represented in this way. It is a representation that is about six times more dense than an ASCII representation of the source words. In addition to being a very dense representation of some of the source code it also provides a mechanism that allows these records to simply be moved into the object code being compiled. Much of the code can be stored in this fashion without giving up anything. The editor can present this code to a user in symbolic format and the user need not know how the editor stores packs it into the source. Nothing is lost and this code can be converted from source format to part of executable objects at speeds from ten million to one hundred million Forth words per second by F21. The idea here is to use the five bit opcodes as tokens in packed opcode token records to represent some of the sybolic source code. It provides a high degree of source compression and blinding compilation speed for much of the  aha code.

A second type of record will be tokens to represent another list of other Forth words. This record type will distinguish these tokens from opcode tokenized sequences. This provides a dense representation of source code that would otherwise generate ambiguous opcode sequences at compile time and that could not be decompiled back into original source code. This provides a dense representation that can be processed at high speed at compile time. This provides a mechanism for code that requires localization and binding with addresses at compile time. These tokens will include the common Forth words that are not opcodes on the machine, ones that act as macros or compiler directives. These tokens will include the kind of tokens that Chuck has implemented in his Color Forth that allow the compiler to do things more quickly than traditional Forth compilers. It is the responsibility of an editor program to give them some visual representation. Some will be tranditional Forth immediate words. Some will be newer style compiler directives.

Traditional Forth compilers must parse source code byte by byte while  aha will not have to parse opcode tokens or functional tokens at all. The opcode tokens need only be moved by the compiler and the functional tokens can operate with a simple shift out and indexed jump table mechanism. The dense representation of these tokens, the count fields, and the simple execution mechanism will make  aha compile faster than systems that must parse source code for every word and match every word to a dictionary at compile time.

The use of records allows numbers to be represented in binary form with a token to represent the number base in which they should be displayed in the editor. The traditional Forth compilation and interpretation technique of first parsing a number in string format in the source then searching the dictionary for it and upon failure to find it as a defined word attempt to convert it from string format to binary format at compile time is thus eliminated. The edit of a number field is done at edit time and certain errors can be caught at edit time instead of slipping through to compile or run time.

A third type of record will be packed counted strings. These records will contain the names of defined words, comments, and string data. Because they will use a counted and packed representation the string data they will be faster to process than a simple sequential character string representation. The compiler will be able to jump over comment records and pick up names or hashed representations of names more efficiently with the word addressing CPU instructions. The name dictionary can be built into the source code at edit time. Defined names have Code Field Address fields which are initialized to zero at edit time.

A fourth type of record is the defined name string pointer record. These records represent defined words in one cell or less and point to the CFA field in the dictionary. At compile time the CFA field is set and the compiler need only follow the CFA pointer to CFA, combine it with a subroutine call opcode and compile it.

A fifth type of record will be binary data records. These records can contain arbitrary binary data including bitmaps and run length compressed bitmaps. The record header can distinguish run length compressed representation which will allow faster processing of the data because fewer memory operations will be involved and memory operations are much slower than opcodes on this machine.

The first component to be constructed will be the compiler. The compiler will read source records and distinguish and process each record type and Forth word properly. The opcode tokens need not be compiled in a tradional sense at compile time, the functional tokes need not be parsed as strings or looked up in a dictionary only branching opcodes need be compiled and only defined words need be looked up in the dictionary to be compiled. The dictionary does not have to be built at compile time. All of these things simplify the  aha compiler.

The compiler will not need to write source code records but it, or another utility program, could compress the source data further after, or in the processor of, the complete compilation of executable code. The compiler will first be created by a Machine Forth target compiler on another machine. It can then process source code, including its own source to produce new executable object code. The second component is the source code to be processed by the compiler to build the component programs in the  aha system.

The third component, the editor/optimizing-pre-compiler will need to be able to read and write source code records. It must include the functions of the Machine Forth compiler and be able to recognize and convert opcode and functional tokens. They are presented to the user in symbolic source form. Defined words have names and comments in packed counted string records. Controled changes to the symbolic source generate the changes to the record database, certain compiler optimizations and error processing can be done at edit time.

The fourth component is an source-code-browser/debugger/editor/optimizing-compiler. It will need to be able to read and write source code records, generate executable object code using the compiler, and link the source to executable objects. The source records could be compressed further since they would not have to contain redundant copies of the opcode tokenized records that were moved into the compiled executable object code. Only a few new links between source and object code are needed to support source level debugging.

Compiled programs will be able to use standard and custom source record formats to store information. The system will not be designed to prevent a user from modifying code that might result in a system crash. However any source information and compiled code can be protected by removing records or restricting access by the sourc-code-browswer. For some applications privaleged access could easily be controled by the tools.

Should modifying and recompiling component source code cause a system failure the system can be rebooted manually or in some cases could be rebooted by a watchdog interupt driven routine. Since the original unmodified source code and compiler can be loaded from ROM or FLASH in a fraction of a second it should not represent a serious problem. If modified code contains new bugs after an edit and is saved back to FLASH without testing locating the bug then the system may fail upon an attempted reboot and would require the replacement of a back up copy of the ROM or FLASH with the original system or from a reduntant copy on the device if possible.

Implementation Progress

As of 11/23/00 this design document, a few dcode snippets, and the F21 machine code compiler that will become part of the editor/optimizing-pre-compiler are the only parts of  aha implementation that have been completed.

I got Chuck's first impression of the design on 11/27/00. He said, "Aha sounds great. Your history of OKAD tells it well. I've considered a token providing a binary array, but haven't provided it yet." He also offered the comment quoted above about the ambiguity of decompile.

As of 11/27/00 I have designed the compiler code to process opcode token and binary data token records, compressed opcode token and binary token records, and functional token records. I am working on the design of the code that will manage the dictionary for the packed counted string records. I wrote some test code to compile opcode tokenized records, binary data records, and run length compressed opcode and binary records. This took only a few lines of code.

I have decided that  aha could be inmplemented only with packed counted string and string pointer records. Such an implementation would be close to a traditional simple byte oriented ASCII string representation of source. However a linked dictionary of defined names created at edit time can be used to compress the representation of these names to a pointer to the packed counted string record/CFA record in the name dictionary in the source code. Only one copy of each string will exist in  aha and subsequent occurances of defined words are stored as string pointer tokens.

The branching instructions operate efficiently to a 16K word home page in DRAM or SRAM (on F21d) and within the current 16K word page. Branches outside of this range require a longer macro sequence. The use of a twenty bit word pointer would support images up to a megaword in the compiler. If the number of defined words is given a specific limit an indexed address table could be used to represent defined words in as little as ten bit pointers. ASCII representation of source requires several characters and a space for the typical word so representing a Forth source name as a single cell or less is a more compact representation of the source, allows more error checking at edit time, and does not require a dictionary lookup at compile time.

The CFA in each record of the name dictionary in the source code are initialized to zero. The compiler sets the address in the CFA field when the name is defined and all references to the word appear as pointers to the CFA field. The compilation of a named word only requires following the string pointer token to the CFA of the named word then fetching it and compiling it to the appropriate form of subroutine call or jump instruction. No dictionary search is needed to interpret or compile a defined word in this represetnation. Inlining sequences could be supported as an option to the default function of simple compile CFA to subroutine call instruction. The dictionary exists at compile time and need only be instantiated with the compile address of each defined word at compile time.

Dictionary searches occur only at edit time when words being defined are stored in dictionary records or when byte oriented ASCII source code is tokenized by an application program. At the pre-process time, or edit time, words entered by the user or words processed from a file are parsed and tokenized as either opcode tokens, function tokens, defined word tokens, data tokens, or comments. It could be translated back to ASCII format with no loss. In  aha no dictionary searchs are needed at compile time, opcode tokens are simply moved, function tokens are invoked from a table, words being defined have their CFA set, and defined word pointer tokens are used to simply fetch a CFA and compile a subroutine call.

page under construction
Construction in progress 01/14/01.

On-site References

Introducing Aha
UltraTechnology Homepage
People at UltraTechnology Chuck Moore
Chuck Moore (owner of Computer Cowboys)
Fireside Chat 2000  Chuck Moore to SVFIG 11/11/00
Fireside Chat 1999  Chuck Moore to SVFIG 12/18/99 Color Forth (posted 11/2/00)
Dispelling the User Illusion w/ Color Forth code examples. Chuck Moore to SVFig on 5/22/99
1x Forth  Chuck Moore 4/13/99
Chuck Moore interviewed in his home 6/6/93
Fireside Chat 1998  Charles Moore 11/30/98 (first 50 min)
F21 in a mouse GUI demo of desktop with application in 600 words of code. 3/8/00
UltraTechnology site dated index with info on and pictures of F21d tests.
Streaming Video Theater
Forth -- the LEGO of Programming Languages 1/19/95 11/00
OKAD reference page 9/3/00
More on Forth Engines Volume 16, 1992 -  OKAD articles  Charles Moore, Dr. C. H. Ting, 2/25/99
Color Forth Update 9/16/97
Color Forth  Charles Moore 7/27/97
Fireside Chat 1996  Charles Moore 11/16/96
Life Beyond MuP21  Charles Moore to the SVFIG 5/27/95
MuP21 a High Performance MISC Processor Dr. C.H. Ting and Charles Moore 3/17/95
Chuck Moore to SVFIG on 4/23/93.
Fireside Chat 1993 Charles Moore to SVFIG 11/93
Forth - a Language for Interactive Computing  Charles Moore 1970 HTML (first Forth document), posted 1995
  PDF version (formatted like original typed paper) posted 6/2000
  Microsoft Word version (formatted like original typed paper) posted 1995
Parallel Forth - the new approach  OCCAM and Forth-Linda, Dr. M. Montvelishsky, FD 1993
F21 and F*F :  F21 and Parallelizing Forth, J. Fox, Dr. M. Montvelishsky, FORML 1993
Forth (Thoughtful Programming) Jeff Fox 12/99
Low Fat Computing Jeff Fox 12/6/98
ANSI Forth is ANTI Forth Jeff Fox 2/28/99,  Chuck Moore 7/26/98, 3/5/99
Distributed Shared Memory in Forth  Parallel Forth, J. Fox, 1995
Forth-Linda  Parallelizing Forth, J. Fox,  FORML 1991
Forth Stamp Jeff Fox 9/2/00
Forth Chip Reference Page Jeff Fox 8/2000
F21 Analog I/O Coprocessor
F21 Network I/O Coprocessor
F21 Network and Robotics Examples Topology diagrams and Robotics Examples 7/9/96
Forth Meta Compilation J. Fox 8/21/97
Forth Meets Laws of Form 1994 J. Fox
html guide to  OK version 1.01 Source code to  OK ver 1.01 for the MuP21 4/18/95
P21Forth 1.02 User's Manual in hypertext and Word for Windows formats 4/16/95

Off-site References

*1 Merriam-Webster
*2 ANSI Forth X3J14  E.2.2
Forth Interest Group

UltraTechnology Homepage