A Simple Meta Compiler

Meta eForth DTC

For the last few years the Forth Interest Group has been distributing a highly portable Forth system called eForth. eForth was developed by Bill Muench as a Forth system with maximized portability through a minimized set of CODE words. I don't know if the E in eForth stands for Embedded, Educational, or Easy. eForth 1.0 was released in both Forth source and MASM source code. It was felt by many that MASM was nearly universally available, and that this format would be easily understood by people who might not be so comfortable with Forth source. It was also felt that meta compiling of Forth systems was what was preventing many programmers from ever understanding Forth.

Meta compiling is often regarded as an activity for black belt Forth programmers, but I am of the opinion that a meta compiler can be simpler than most Forth applications, and easily understood by Forth beginners.

The first Forth system that I brought up on one of my own computers some fifteen years ago was ported using a FIG assembly listing, and a ROM monitor, so I can understand the logic that an assembly listing of Forth could make sense. I was several years later that I first worked with Forth meta compilers. When I began working with eForth I developed a couple of new implementations of eForth using the MASM tools. The idea is that since eForth had only thirty CODE words these words could be coded for a new machine and using macros or data statements MASM can generate a new eForth. This idea works, and many versions of eForth have been created over the last few years, most of them using MASM. But in my own experience I found MASM to be a problem. Macros did not work the same in the three versions I had so I spent too much time trying to figure out how to get MASM to do fancy things. MASM needed lots of my disk space, and it was very slow.

When I upgraded to the then experimental eForth 2.0 I got it in Forth source only, but with Bill's meta compiler. The meta compiler did not come with a manual, but I was able to hack it from compiling 8086 object code to compiling MuP20 object code. It was much easier to tame than MASM, and instead of ten minute assembly times I was dealing with one and a half second compile times!

I metacompiled for Indirect Threaded, Direct Threaded, Subroutine Threaded, simulated MuP20, F20, MuP21, and F21 chips. But when I began to experiment with a subroutine threaded model with packed instruction primitives, and many code packing and inlining optimizations I found that I was fighting the tools. So I started writing my own meta compilers for eForth.

Recently I have been meta compiling eForth 2.4 for the MuP21 simulator and development system. I will use a simple version of this meta compiler as the example for this paper on meta compiling eForth. I have wanted to release a meta compiler for eForth that would run under eForth, and this one is now very close. Although this version is for FPC and some of it is specific to that Forth system.

I think that the most important thing in writing a meta compiler is to understand the system that you want to meta compile. The next most important thing is understanding the target machine. If you have a very large and complex Forth system it may require a large and complex meta compiler. s. I intend to concentrate entirely on the metacompiler itself here, and to the eForth model as it relates to metacompiling. I do not intend to cover the coding of the assembler that generates the CODE words, I will leave that up to the user who wants to metacompile a new eForth for their own new machine.

The eForth compiler is extrememly simple. Look up a word, execute it if it is immediate, or compile a token to represent it (usually an address), and if it is not in the Forth dictionary then it is either a number or an error.

An eForth metacompiler is almost as simple, but it is a little more complex because it is generating code that is not immediately executable. Often the code produced is for a different computer, and cannot be executed on the machine doing the metacompiling. The address in memory into which code is metacompiled is often not the same as the address where the code will run when it is executed.

What makes a normal Forth compiler work is that certain words are IMMEDIATE and will execute at compile time while other words are being compiled. Take this example:

Lets assume that WORD1 WORD2 and WORD3 are ordinary non-immediate words in this Forth. First : will execute and read one more word from the input stream to create a new Forth word named in this case MYDEMO. It puts the system into a compile state where non-immediate words get compiled. WORD1 is read from the input stream and its address or token is compiled. WORD2 is read from the input stream and is compiled. IF is read from the input stream, and when the dictionary is searched it will be found to be an IMMEDIATE word, so it will execute. IF in eForth will compile the word If which is a conditional jump if T is not equal to 0, and it will leave space for an address for the jump. It does not yet know this address, so it just leaves space, and leaves the address of the location of this jump address on the stack for later. WORD3 is read and compiled. Then the THEN is found, and it is another IMMEDIATE word. THEN goes back to the address on the stack and stores the current dictionary address there sothat the If code will branch to where the THEN is in the word. Finally the IMMEDIATE ; is found, and it compiles EXIT and returns the eForth system to the interpret state.

A meta compiler does something similar to produce identical object code. It is different because none of the words metacompiled can be executed while metacompiling. Somehow WORD1 needs to compile a reference to WORD1, and IF needs to do the same thing as IF in a normal Forth compiler.

The easiest way to get this to happen is to use the Forth wordlist search order to make it all happen. Remember there are going to be many versions of certain words. IF is a good example. There is the IF in Forth used in the normal Forth compiler, and possibly and IF in the assembler on the Forth system, and an IF in the meta compiler, and possibly and IF in the assembler for the TARGET eForth, and certainly an IF in the eForth source being meta compiled, and they are all different.

In the spirit of eForth I have tried to make this version of the meta compiler as and fairly portable. I have used only three vocabularies, and only two words to change the search order, FORTH and TARGET.

This is the key to how it all works. I use two search orders:

FORTH DEFS FORTH ROOT  the order set by FORTH  and
XASM DEFS FORTH ROOT   the order set by TARGET.
All of the IMMEDIATE words from eForth are given a defintion in the XASM vocabulary. These words perform the same function that they do in a normal forth compiler, except that they do it to a TARGET system ratherthan to their own system. The code to do this is almost exactly the same as the code for these functions in eForth.

The normal Forth compiler normally creates a header for every word compiled. This behaviour is modifed in my meta compiler so that whenever words are defined two headers are created. One header is the header for the word in the TARGET system, and the other is a header for a definition in the DEFS vocabulary. The words in the the DEFS vocabulary all have a similar code structure. They have the same header as a definition in the TARGET system, and they have code that will compile a reference to the code in the TARGET system into the TARGET system. These words in the DEFS vocabulary are also all IMMEDIATE words.

Here is the definition of T_: the word that is used defining several types of words in the meta compiler:

\ T_: WILL COMPILE a header into DEFS and TARGET, and
\     WHEN the DEFS version of a word compiled with T_: is RUN IT

: T_:       ( at runtime -- )
  FIRSTSLOT \ MAKE SURE START OF WORD in MuP21, a target align
  HERE      \ get HERE in the TARGET space
  CREATE    \ modified for TWIN headers, one in DEFS one in TARGET
  ,         \ COMPILE MuP21 address INTO FPC WORD AT PARAM FLD
  IMMEDIATE \ all of the words in DEFS are IMMEDIATE
  DOES> @   \ at run time get MuP21 address ( n ) from param
  T, ;      \ for direct threaded code model and compile it
So when WORD1 is defined in the eForth source code being read by the meta compiler two WORD1 words are created. One word is the actual WORD1 in the TARGET system, and cannot be executed. The other WORD1 is a word in the DEFS vocabulary that will compile a reference to the WORD1 in the TARGET into the TARGET. So when MYWORD is defined in the eForth source code the word WORD1 will appear. The compiler searches XASM first for the normally IMMEDIATE words in the eForth compiler, then it searches DEFS and finds the immediate word WORD1, and executes it. The WORD1 in DEFS will then compile a reference to the WORD1 in the TARGET into the TARGET definition of MYWORD.

In fact all of the words in the eForth source get compiled into IMMEDIATE words in the DEFS vocabulary. But since some of these words need to have immediate function to make the compiler work properly there words already have a defintion in the XASM vocabulary. TARGET sets the search order to XASM DEFS FORTH ROOT so that words like IF that require immediate execution during compilation will be found there before DEFS is searched.

This makes the metacompiler very simple. ALL words that it compiles create IMMEDIATE words in the DEFS vocabulary that do more compiling later if they are ever executed. And words that are normally immediate when compiling are found in XASM before DEFS is searched.

It is also handy to be able to switch back to Forth at times and make it the first vocabulary in the search order. When I finish the eForth resident version of the eForth meta compiler I expect that it will be even simpler than the FPC version. The complexity of FPC somewhat complicates the definition of some of the words in this metacompiler. Since the metacompiler makes TWIN headers for defintions the way FPC does headers is modified. Also some other compiler behaviour is modified that is FPC specific.

Here are most of the words in the XASM vocabulary that are not part of the MuP21 opcode assembler. Most of them should be very familiar.

ABORT" ." C" S" ,
Some words like CELLS appear in the eForth source both inside of defintions when the system is in compile mode, and in interpreted code also outside of definitions where it is used to compute parameters needed for compiling. CELLS might be used with ALLOT or it might be used inside of defintion. If CELLS gets executed outside of a defintion then there must be a CELLS in the XASM vocabulary. In fact since MuP21 is a word addressing machine CELLS is the same as 1 *. And for efficiency ALIGN, ALIGNED, CELLS, and CHARS are immediate NOOPs for MuP21 and F21.

Here are most of the words in the FORTH vocabulary that are not part of the MuP21 opcode assembler.


Jeff Fox