Forth Meta Compilation

Table of Contents
Introduction
What is a Forth compiler?
Forth Compiler Basics.
What makes one Forth compiler more complex than another?
What makes one Forth meta compiler more complex than another?
Chuck Moore's cmForth and its meta compiler
A minimal meta compiler?
An ANS Forth meta compiler?

Introduction

The first step in using the computer language Forth is using the compiler word : to create user defined words. The second step is often learning to define words that define words. The third step is often learning to write a new Forth in Forth.

The simplicity of the Forth language makes it easy to not only extend the language into a tongue that is perfectly suited to a particular problem but rewrite the language from the ground up as you see fit. It is a language the encourages experimentation with problems, thought, and the language itself.

The language was first developed by Charles Moore around 1968 by his recollection. The language has changed over the years as one can see if one compares the first paper published on Forth FORTH - A Language for Interactive Computing from 1970 with modern Forths.

The Forth Interest Group is a non-profit organization to support the Forth programming language and Forth community. They offer a number of public domain Forth systems and some with meta compilers. However I was not aware of any web pages that attempt to explain Forth meta compilation so I decided to create this one.

I had used Forth meta compilers for years without needing to understand what they were really doing. All I needed to know whas that I could simply edit the source code to these Forth systems and generate a whole new Forth system. I only began to understand what a meta compiler really does when I wrote one. I gave a couple of talks to FIG about these compilers and put some information about them at my web site but previously I had only some text documents here about meta compilation. I began by creating html versions of a couple of old documents. A Simple Meta Compiler and A One Word Meta Compiler were my first attempts to explain meta compilation.

This document will provide some prerequisite information and try to give an overview of the subject. Hopefully it will demystify the subject for some people.

Definition of Terms

There are people who prefer to use a wider defintion for meta compiler than I. I prefer to distinguish between target compilers and meta compilers while some other people prefer to use the term meta compiler to refer to both.

The normal operation of Forth is to act as both an interpretter and a compiler. It is a command interpretter because it can accept commands, look them up in a dictionary, and execute them by name. It can also act as a compiler by creating new executable programs or new named commands. To be able to use the language one only needs to understand a few concepts. One must first understand that a Forth virtual machine uses stacks, and that it has two of them. There is one stack that holds temporary data and one stack that holds program return addresses. There language is defined by a set of named operations on those stacks and some named operations to create new named operations.

The user of Forth often does not need to understand the inner details of their compiler. If they wish to modify the inner workings of their Forth or meta compile a new and different Forth, or write their own Forth from scratch they will need to understand some of the inner details of their Forth system.

The normal Forth system contains layers of code starting with code that implements the Forth virtual machine. The next layer of code usually supports the name and code dictionaries and a minimal compiler and interpreter. The following layers usually contain additional functions. This is the way the ANS Forth standard is organized. There is a small set of words called the CORE that are the minimal set needed to define a system that conforms to the standard, and there are additional sets of standard Forth defintions for standard extensions to the basic core of the language.

This type of design can simplify the compilation of a new Forth system because once you have a system that supports the CORE wordset it can compile code. So all you need to do is compile the core of a new system and it can then compile extensions to its core itself. This is the normal operation of Forth.

In the normal Forth system one compiles application code on top of the Forth system itself using its compiler. The application becomes an extension of Forth. The application itself becomes a named command in the new language. The programmers developing code simply saves the system from time to time. It will include the new words they have added. There is no clear distinction between the Forth system, its compiler, and the application.

Figure 1.

An Application Combined with a Normal Forth System

Application code
Forth extensions
Forth compiler
Forth virtual machine

In the traditional Forth system there is usually the option to turnkey the application so that a sealed application will boot up and go directly to the application code. It may or may not include the Forth compiler or command interpreter as part of the application.

Some Forth systems offer the ability to remove the names or headers from the words in the system when sealing it up as a turnkey application. Other Forth systems offer the ability to compile a stand alone turnkey application that does not include any of the Forth compiler code itself. I like to use the term target compiler for this type of compilation.

Figure 2.

A Forth Target Compiler and Application

Program 1: Application code

Program 2: Forth extensions
Forth compiler
Forth virtual machine

A target compiler produces an application program that is separate from the compiler. In Forth we call this a target compiler, but it the way most compilers work in other languages.

If the target application code contains a Forth system then it is a special class of target compiler. When a Forth compiles a new Forth we call it a meta compiler.

Meta compilers represent a more complex problem than target compilers because a target application will just be made up of new words while a target Forth system will be made up of words of a new set of Forth words. A meta compiler will have to deal with two sets of Forth words at the same time.

There are many different models used to implement Forth such as Indirect Threaded, Direct Threaded, Subroutine Threaded, Native Code, Tokenized, and Bit Threaded. There are many different machines and operating systems and target environments. And of course there are many different Forth meta compilers.

The target (for the target or meta compiler) may or may not use the same Forth model, and may or may not be for the same machine or OS. This is the main difference between a normal Forth compiler and a Forth meta compiler. Also a meta compiler must deal with two Forths at once.

The terms that I use in this document are:

Forth Compiler compiles application code into a Forth system.
Target Compiler compiles a separate stand alone application.
Meta Compiler compiles a separate new Forth system.
Clone Compiler compiles a separate new Forth system for the same model and same machine.

From these definitions it is clear that all Clone Compilers are Meta Compilers, and that all Meta Compilers are Target Compilers.

Forth Compiler Basics

Before one can understand a Forth meta compiler one must first understand a Forth compiler. The first thing one must understand about this compiler is that one uses : and ; to use it. The Forth language is based on words. These words have a name and some executable code. New words are created by the compiler by compiling source code.

If we have a word called X that does something and another word called Y that does something else we can compile a new word, call it Z, that does X and then Y by saying


: Z  X Y ;

The : starts compiling and takes the next name as the name of the new word. Then the words X and Y get compiled into the defintion. Finally ; ends the word, stops compiling, and switches back to interpretation.

Most Forth words do different things whey they appear inside of a definition or outside of a defintion. That is when they appear between : and ; they get compiled and if they don't appear between those words they get executed.

We can define a word that prints out 123 in may ways. We could say


: 123.  ." 123" ;

In a Forth system this defines a new word named 123. that will print out the string 123 if executed. If you type 123. in such a system it will respond by executing that defintion and printing out 123. However if this word appears inside of another definition then the word will be compiled into the new word. Instead of printing out the 123 at the time it is encountered it would add that functionality to the word being defined and one of the things it will do when it is executed is to print out 123.

So now if we say

: DEMO  123. 123. 123. ;

We will not see 123 being printed out while DEMO is being defined. We would however see it printed out three times if the word DEMO is executed.

There are some words in Forth however that don't get compiled inside of a Forth definition. We call these words IMMEDIATE. When wcompiling these words don't get compiled, they get executed immediately instead. We have already seen one of these words even though we have only seen two words so far. The ; above is an IMMEDIATE word because unlike our 123. it must do something besides simply compiling one definition into a new word, it will also stop the compilation.

Ok, so we have established that when compiling new words a Forth compiler compiles most words and executes IMMEDIATE words.

Lets look at those two things in more detail. How does a Forth compiler actually compile most of those words?

How a Forth compiler actually compiles a normal Forth word depends on the type of compiler. If it is threaded Forth (Direct, Indirect, or Bit threaded) it compiles an execution token that is normally nothing more than the execution address of the Forth word. In threaded Forth most words are mostly just lists of addresses of other words.

In a tokenized Forth the tokens that get compiled may be smaller than addresses and may be indexes into a table of functions, or predefined functions. The most common tokenized Forth is the boot language on SUN workstations. Open Firmware has also been adopted as the language for implementing portable BIOS code for cards on several new busses. SPARC, PowerPC, and Intel x86 microprocessors will be booting up in Forth on many systems being developed today. For more information look into FirmWorks. Sun offers the Open Firmware Home Page. These tokenized Forths compile 8 bit tokens.

In subroutine threaded or native code Forth compiles the compiler compiles words by compiling a subroutine call or some inlines machine code rather than an address or a token.

Some Forth compilers also implement some optimizations on the compiled code. Some of these optimizations may be done on any Forth system while others take advantage of things are that specific to a particular model or processor.

To understand a Forth compiler one must also understand what sorts of things those IMMEDIATE words do. Lets examine the word IF. Lets begin with a simple example.


: DEMO IF X THEN Y ;

In that example the word DEMO is defined. The defintion begins with the word IF. The meaning of the IF is that the new word should execute the words that follow the IF only if the top of the stack is not zero, otherwise it execution should continue after the THEN. To accomplish this the word IF is IMMEDIATE and will execute when it is encountered inside of the defintion. When it executes it compiles code to do a conditional branch. As discussed before the way it actually compiles this code depends on the Forth model. The code that will be compiled by the IF is a piece of runtime code sometimes called 0BRANCH. It could be given almost any name, but it is a word that tests the top of the stack for zero and branches if the top of stack is zero. It will branch past the compiled code that comes after it if the top of stack is zero or it will continue on to the compiled code that comes after it if the top of stack is not zero. At the time that the IF executes the address it will branch to is not known so it actually compiles an unresolved forward branch. When the THEN is encountered in the source code it is an IMMEDIATE word and resoves the address in the IF branch with the proper destination. I n this case it would resolve the conditional jump before the X to after the X and before the Y.

It is clear that IF and THEN are not simply compiled like the X and Y they are executed immediately to compile some primitive operation and do some other things.

The Forth compiler will construct two dictionaries, one for names and one for code. The name dictionary is often a linked list, but it could also use a hash table or some other mechanism for searching the list of names. The compiler or interpreter search through the list of names to get the execution token for a Forth word. The name and code dictionaries may be mixed together, or they may be in separate locations in memory.

Forth provides mechanisms for managing more than one list or words. These are called word lists in ANS Forth and vocabularies in older Forths. This provides a way for a single name to exist in several different places and for each version to have different code attached to it. The programmer can choose the meaning for the word by specifying which word list to search first to find the version of a word that they want.

Complexity in Forth Systems

Forth system differ greatly in their design and implementation. There is a wide range in both size and complexity. Some are very small. A native code port of one eForth for F21 fit the code dictionary into 1K word of memory. Chuck's latest Forth needs only 1K bytesThere are a number of factors that influence the complexity of a Forth system and the meta compiler that will be needed to compile it.

Although size is generally related to complexity they are not the same thing. A larger Forth system has more potential for complexity because there are more places for complexity to hide but it is possible that a small Forth system could be very complex and that a large Forth system could be very simple in its structure.

Generally threaded Forths have a simpler implementation model than subroutine threaded or native code Forths. In threaded Forths most of the definitions are little more than address lists. In these Forths there is often a clear and simple relationship between the source code description and the compiled lists of addresses. For this reason decompilers for these Forths are quite simple. For native code compilers or compilers with optimizations there may not be such a clear and simple relationship between source and object code.

Feature rich Forths tend to be more complex than Forths that have fewer features. But just as with size the implementation of those features is more important than the number of features. A system can be large and have many features but if those features are created with only a few simple constructs the overall complexity may be less than a smaller Forth with fewer features but where more variety of implementation techniques are used to build the features that are there.

Some criteria for judging the complexity of a Forth:

How many words are implemented in CODE?

Even in threaded Forths not all of the words will be implemented as simple lists of high level addresses. Some of the words must be implemented in assembly CODE. Since one must understand some assembly language to understand these words they add to the complexity of a system. The exception is Forth systems that are written in some other high level language and where one uses this high level language to define the primitive words in Forth rather than using an assembler.

How complex is the assembler in this implementation?

Since we have established that some of a Forth system will be in assembler it is clear that the complexity of that Forth will be related to the complexity of the assembler.

The minimal assembler is , that is the Forth word COMMA stores a value into memory. The minimal assembler is no assembler at all, simply store the numbers for the target opcodes into memory, machine code.

The number of opcodes, the number of addressing modes, the depth of the pipelines, and the use of caches all contribute to the overall complexity of the assembler portion of a Forth system if it uses an assembler. The real issue is how complex is the target processor and the target implementation. If the implementation only has a few words in assembler and these words are short one may understand the code without having to understand everything about the target machine. If an implementation is well tuned to a particular machine then an understanding of the code and the design of the implementation will require a more in depth knowledge of the target machine.

What flow control structures are used in this implementation?
How many IMMEDIATE words are there in this implementation?

There are a number of flow control structures available in Forth such as IF, BEGIN, and DO structures. ANS makes it possible to combine these in ways that are not used or are not legal in some Forths.

When Chuck designed cmForth he knew that the target hardware supported a count down loop in hardware and so he implemented a FOR NEXT construct. This was a simpler approach than the traditional DO LOOP (?DO +LOOP) approach and it mapped better to the hardware on the Novix.

As noted earlier most words in Forth are not IMMEDIATE and are simply compiled when encountered inside of a definition. All words that must have some immediate action when compiling must be given special definitions that specify what they will do when executed at compile time. For instance the word IF will compile a primitive sometimes called 0BRANCH and leave the address of the unresolved forward reference to be resolved later by some other word. It may also do other things like implement compiler security or optimizations.

The kernel of Forth systems that implement things up to the working CORE compiler is the minimal amount of code that a meta compiler must compiler in order to have a working target Forth system. Once you have such a system it may be expanded to any level of complexity with its own compiler. The meta compiler may therefore only have to deal with the source for a simple kernel or a more complex kernel and/or a target application.

Bill Muench designed the original eForth 1.0 for simplicity and portability. It had only 30 words written in assembler and used only BEGIN_UNTIL BEGIN_WHILE_REPEAT IF_ELSE_THEN and FOR_NEXT in its source. The second release reduced the number of code words to 28 and removed the FOR_NEXT constructs from the source code and replaced them with BEGIN constructs. I was pleased to learn this when I was designing a meta compiler to generate a version of eForth for a new target. It meant that there were fewer IMMEDIATE words that were needed in the meta compiler. The meta compiler no longer needed to compile FOR_NEXT constructs.

Does this implementation of Forth support wordlists?

The real question is really how and where does this implementation put wordlists. A Forth that has all the code for its CORE organized in one wordlist requires a less complex meta compiler to compile it because the meta compiler does not need to deal with searching multiple target wordlists. A version that also uses an assembler wordlist is slightly more complex. A version of Forth might require eight or more wordlists or it might need only one to compile its CORE.

Does this implementation of Forth use CREATE DOES> in its source?

CREATE DOES> provide a powerful mechanism to define new defining words. It does a little complexity to the design and will further compilicate meta compiling. The defining words that use CREATE DOES> must be executed. Meta compiling will complicated by the use of CREATE DOES> defining words because these defining words in many cases must run on the host system. Some of what they do must happen on the host compiler and some of must happen on the target system.

Does this implementation support multitasking?

Multitasking is not covered in ANS Forth. If a Forth supports multitasking this will add some complexity. The most simple round robin cooperative multitaskers add only a small amount of complexity. Certain words will have to use user variables instead of normal variables to support multitasking. The systems that use the OS memory manager, or memory management hardware in the system to support multitasking may have very complex multitasking.

Does this implementation support code optimization?

Code optimizations in the compiler can of course complicate the design of a compiler to whatever degree one wishes to pursue. Traditional Forth compilers that generate simple address lists may employ no optimizations. Many users prefer to use a simple and straightforward compiler rather than an optimizing compiler because they value the benefits of the simplicity over the speed offered by optimizers.

The inner loop of a simple Forth compiler simply checks if a word is in the dictionary and if it is it checks if it is IMMEDIATE. If it is IMMEDIATE it executes it and if it is not then it simply compiles a Code Field Address. If the word is not in the dictionary it either converts it into a number or generates an error.

Complexity in Forth Meta Compilers

Forth compilers may be simple or complex, and the same is true for Forth meta compilers. Most of the same things that measure the complexity of a Forth system apply to the meta compiler that will compile the source code to that Forth. A Forth meta compiler can be almost trivial or it can be very complex.

Usually a Forth meta compiler is more complex than a normal Forth compiler. It will have two sets of dictionaries to deal with, the host name and code dictionaries and the target name and code dictionaries. Sometimes there may be no target name dictionary. Sometimes there are other dictionaries and wordlists involved as well.

There are a number of other reasons for complexity that don't apply to normal Forth compiling. When compiling a Forth for a target system there are some problems that just don't appear when you are compiling executable code on top of a running Forth. (see Fig. 1)

Do the host and target machines have the same width bus?

When the host machine has a bus as large as or larger than the target machine it may generate target code using single cells on the host machine. This is simpler than generating code for a target machine that has a wider bus than host machine. In the case where a host cell cannot contain as much as a target cell the host meta compiler will have to use doubles to contain target cells. In fact this would apply to a 16 bit Forth for the 80x86 generating a new 32 bit Forth for the same 80x86.

Do the host and target machines use the same representation on the bus?

I first ran into this when porting code from one microprocessor to another. One machine may be big endian and the other machine may be little endian. I ran into a problem where one machine had a postive bus and the other machine had a negative bus. This means that 5V on the bus on one machine represented a logical 1 while on the other machine a 5V on the bus represented a logical 0. To port the data section of the application from one ROM to another all the data had to XORed with FF.

When target or meta compiling if the host and target machines don't use the same representation on the bus then some logical operation is needed when moving data from the host to the target. When moving from a positive to a negative bus machine or visaversa one must XOR with all 1s. (FF XOR all bytes)

Chuck's MISC computers use a third representation on the bus. In these machines even and odd bits use alternate positive and negative bus logic. This applies to both the data and address busses on these machines. As a result when target or meta compiling for these machines from a PC one must XOR both data and addresses with AAAAA.

Having to change the representation of data or addresses is not a problem that appears in normal Forth compiling nor does it apply to target or meta compiling when the target machine uses the same physical representation for logic on the bus. This is not a problem for clone compilers.

Will the host assembler generate machine code for the target?

If the host and target machines have the same or code compatible CPU then the host assembler can be used to genterate any target code words. If the target machine needs a different assembler then this adds to the complexity of the meta or target compiler. It is one of the reasons why a clone compiler is simpler than a compiler for a different target system.

Will the target code be executable on the host compiler?

If the host and target machines have the same or code compatible CPU then the target compiler may be able to execute the code generated for the target. This may reduce the amount of code that would be required in the meta compiler if it can execute target words almost as if they were just part of the host compiler. This is the other reason that a clone compiler may be only slightly more complex than a normal Forth compiler.

A meta compiler that includes a target simulator will be able to simulate execution of target words including words written in the target assembler. This will add some degree of complexity to the meta compiler.

Some target or meta compilers use an umbilical connection between two machines. These are sometimes called tethered systems. In these configurations the host machine may request execution of target code on the target machine. The system may act as if a normal Forth compiler was running on the target system with full freedom to interpret and incrementally compile target code. In fact their may be no Forth compiler on the target system and the compiler can reside on the host system but appear to user as if it were running on the target system.

Will the target Forth be created at the address where it runs?

Sometimes when one must generate code for a new Forth system the addreses used by the target Forth and the host Forth are the same or for some other reason it is impossible to generate the target code at the same address at which it will run. When this is the case it adds some complexity to the design of the compiler. In the simple case it would require using some offset with the addresses being compiled.

Even if the target is a different machine will the meta compiler also produce a second version of the target system code that can be executed on the host meta compiler?

In some meta compilers code is generated in two places when the target code is compiled. One copy of the code goes into the target and perhaps may never be executable on the host computer. But a second copy of target code is generated somewhere in the host system. This code can be execute on the host compiler making it possible to freely execute target code while meta compiling! This feature may not add much complexity to the meta compiler and it may reduce the complexity of the source code for the target system. Without this feature on systems that target for a different machine one may never execute target code and target source code must be careful to specify when it is compiling for the target and when it is compiling or executing host code. Normal Forth source code that freely mixes compilation and execution of new words will not run on some target compilers without this feature. This will not apply to clone compilers that can execute target words as if they were just part of a normal Forth.

Does this implementation of Forth use CREATE DOES> in its source?

The use of CREATE DOES> defining words will compilicate meta compiling. The defining words that use CREATE DOES> must be executed and in most cases must run on the host system. Some of what they do must happen on the host compiler and some of must happen on the target system. The problems are related to the last few issues regarding host execution of words defined in the target source.

How many targets will the meta compiler support?

If a meta compiler will generate code for a different target machine it may generate code for multiple targets. The ability to support multiple targets will add some complexity compared to single target or even clone compilers.

How many optimizations will the meta compiler support?

Optimization if it is done will add complexity. If the target machine is different than the host machine or if the host compiler does not support optimizations needed in the target this adds complexity to the target compiler. If multiple targets are supported they may each require their own set of optimizations.

Does the system depart from the traditional way of meta compiling?

Rob Chapmans' Timbre system has an interesting approach to generic metacompilation. To quote, "Timbre is a translation engine which allows free-form phrase translation. It uses rules and rule sets to determine how to transform input into output. It has been used for language translation, target compiling, meta-compiling, code verification, binary viewing, code documentation and RTF parsers and reformatters. Timbre was meant to capture the essence of translation allowing for very simple yet powerful ways of translating anything to anything. The example included with the distribution is for source to source translation with peephole optimization."

cmForth from Chuck Moore

Chuck Moore wrote cmForth for the NOVIX computer in 1985. This was the first hardware implementation based on the Forth virtual machine. The NOVIX hardware was a 16 bit machine with three memory busses. By using a separate buses for the data stack, the return stack, and main memory it could perform operations on main memory on the data stack and on the return stack all in one cycle. With a clever compiler it could execute up to five Forth words in a single clock cycle. Chuck wrote a clever compiler called cmForth. It departed in a number of ways from tranditional Forths.

When the subject of cmForth came up in the comp.lang.forth newsgroup on the internet there was an excellent description of cmForth and its meta compiler by Marc De Groot. It is often mentioned in discussions of small Forth systems and small meta compilers.

cmForth generates optimized native code for the Novix. A version was also ported to the Harris RTX 2000. Instead of a compiler that distinguishes between normal and IMMEDIATE words cmForth uses a compiler and an interpretter wordlist. It uses FOR NEXTs instead of DO LOOPs to simplify things. You can meta compile it from source on the PC and it can meta compile itself from source from a few lines of code. The meta compiler is simple because cmForth is simple and because the target machine and the target Forth system use the same assembler and compiler words as the meta compiler. Actually cmForth is the assembler since it is really a native code compiler.

Dr. C. H. Ting's book Footsteps in an Empty Valley began his More on Forth Engines newsletter series with information on cmForth and Novix. It is one of his products at Offete Enterprises Inc. It contains about fifty pages about the Novix NC4000 and another hundred pages about cmForth.

From chapter 3 in Footsteps in an Empty Valley, the cmForth Operating System:

"There are many important innovations in cmForth worth of your special attention. First is the optimizing compiler which takes normal Forth source code and compiles very short and efficient NC4000 machine instructions. Next is the dual threaded vocabulary structure in which all the compiler directives and smart compiling/assembling words are linked together, separate from the regluar FORTH vocabulary. The interpreter searches only the FORTH vocabulary. The compiler first searches the COMPILER vocabulary and then the FORTH vocabulary. This searching method eliminates the necessity of immediate words. Lastly, a very simple linking method is used to compile words into a target dictionary, which can later be isolated from the main dictionary and burned into EPROM. These innovations in time will make significant impact in the Forth community, leading to better and more efficient Forth systems and programs."

A One Word Meta Compiler?

My first meta compilers were simple meta compilers written to generate versions of eForth for the simulated MuP20 and F20 processors. (later to evolve into MuP21 and F21.) They were simple not because I was clever, in fact some of the code is not clever. These were simple compilers because they just didn't need to be very complex because the eForth target didn't need much.

It gave a presentation on A Simple Meta Compiler to a local FIG chapter. This was a traditional but simple meta compiler. It used several wordlists on the host to hold the meta compiler, the target assembler, and the target wordlist (or actually host words with the target names that when executed compile target words into the target.) After I had a working target system P21Forth could now be made to compile a new copy of itself with a new meta compiler. I realized that this clone compiler could be really simple because everything was the simplest case. I only had to redesign a couple of the IMMEDIATE words in eForth so the eigth words that I needed to change all began with a literal containing the address of the word that they compiled.

I did another presentation for FIG about A One Word Meta Compiler and got some nice positive feedback. The heart of this clone compiler is a single word called META. This word transforms the normal host compiler into a meta compiler by modifying just a few of the host compiler IMMEDIATE words into meta compiling words.

We examined what the Forth word IF does earlier. In a normal Forth compiler this word is an IMMEDIATE word that executes during compiling and compiles code to execute the host word 0BRANCH (or its equivalent) into the host Forth system and leaves the address of the unresolved branch to be resolved later by an ELSE or a THEN.

In a meta compiler an IF must execute when meta compiling and compile into the target the code to execute the target 0BRANCH word and leave an address to be resolved later. In the case of this clone compiler the normal Forth IF can be transformed into a meta compiling IF by making one small change. It holds the address of the host 0BRANCH in a LITERAL it uses this to compile this into the host Forth. If it is pointed at a target Forth instead it will happily compile the host 0BRANCH into this target system. That would of course do no good since the target system couldn't execute it. But if the address of the target 0BRANCH is placed into the data field of the LITERAL at the start of the IF then the IF will compile the target 0BRANCH into the target system. It is as simple as that.

In this target system their are only 8 IMMEDIATE words that need to be transformed from normal compiling words into meta compiling words. These words are

:NONAME ; LITERAL IF UNTIL AGAIN ABORT" ."

All of these words begin with a LITERAL that contains the address of a host primitive word used in compiling so they can all be transformed into meta compiling words by the same word.

: META ( -- ) ' ' 2 CELLS + ! ; \ as in META 0BRANCH IF

This word contains two TICKs. In the example above the first TICK returns the address of the first 0BRANCH found, and this will be the one in the target system. The second TICK returns the address of the first IF found, and this will be the one in the host meta compiler. The phrase META 0BRANCH IF must appear in the target source code before the first IF is encountered in the target code and after the target 0BRANCH is defined.

METAis first applied to the word ; right after the target primitive EXIT used by ; is defined. The word ; appears in the source code this Forth 138 times before the target ; is defined. This means that the first 138 times that ; appears in the target source code the target compiling ; will be used. After the target ; is defined it will be found first and it will be execute and it will do exactly what a normal Forth ; does.

The second word to be transformed with META is :NONAME. :NONAME in this version of Forth is used in the definition of : so :NONAME is the word to be transformed rather than :. In this direct threaded Forth colon definitions begin with a call to DO: so this is the address stored in the LITERAL field at the start of :NONAME. The word META is used to store the target DO: into this word to transform it into a meta compiling word. : is used 137 times in P21Forth before it is defined in the target. The first 137 times : is encountered in the target source the meta compiling : is executed and after that the target : will be executed.

The third word to be transformed is LITERAL. This word compiles a primitive called lit. META lit LITERAL is used to transform the normal Forth word LITERAL into a meta compiling one by substituting the address of the target lit. The word LITERAL appears 25 times in the target source before it is defined in the target. Until then the transformed meta compiling version will be executed.

The fourth word to be transformed is IF and we have covered that one above. IF is used 57 times in P21Forth before it is defined in the source.

The only other words needing to be transformed are UNTIL AGAIN ABORT" and ." and these words are used 10, 1, 3, and 3 times repsectively before they are defined in the source.

In this clone compiler the meta compiler executes a combination of normal Forth compiler code, meta compiler code, and target code. The target code dictionary is built and executed from a different address than the target compiler so both can be run. The target name dictionary is built at a different address but is linked into the metacompiler wordlist and after the compilation is complete the link between the two name lists is cut. Since the name dictionarys are treated as a single list the differences between this clone compiler and a normal Forth compiler are minimal.

This meta compiler is little more than a single word defined with only a few simple words. The word META is defined on one line, and invoked in eight other lines of the target source. Except for cutting the target system loose this is the full source:


: META ( -- ) ' ' 2 CELLS + ! ; \ as in META 0branch IF

 META EXIT ;         \ ; used 138 times
                     \ before it is defined in target
 META do: :NONAME    \ : used 137 times
 META lit LITERAL    \ LITERAL used 25 times
 META 0branch IF     \ IF used 57 times
 META 0branch UNTIL  \ UNTIL used 10 times
 META else AGAIN     \ AGAIN used 1 time
 META abort" ABORT"  \ ABORT" used 3 times
 META d" ."          \ ." used 3 times

An ANS Forth Meta Compiler?

The ANS Forth Standard does not specify anything about metacompiling. So the terms I have been using are not official standard terms. Still I consider a Forth meta compiler to be a Forth application program. If a Forth program meets certain requirements it may be considered compliant with the ANS Forth standard. If a meta compiler is written as an ANS compliant program then I say you have an ANS Forth Meta Compiler.

One might also argue that an ANS Forth meta compiler is a meta compiler that compiles an ANS compliant Forth system. These two things are not mutually exclusive. I think there is some justification for that position. However I for one would never suggest that the term an ANS Forth meta compiler should refer to a meta compiler that could compile any ANS Forth for any target on any host ANS Forth machine.

UltraTechnology
Jeff Fox Ultra Technology Inc.
2512 10th Street Berkeley, CA 94710