This html document is a transcript of a presentation that I did for the Silicon Valley Chapter of the Forth Interest Group back in December 2000 on the history of Forth and aha, a system that I had begun working on. I was told by some old time members of the group that although they had been coming to monthly SVFIG meetings for years that my presentation helped them connect the dots and understand more deeply many things that they had not understood over the years. I hope that this transcript will help other people who have not had the benefit of attending monthly SVFIG meetings to understand the evolution of the ideas in Forth.

Jeff Fox
June 2002

12/16/00 SVFIG
a History of Forth and aha

video of Jeff Fox on Forth and aha 12/00

All right, well I'd like to thank all of you for giving me the opportunity to talk about this in front of the group. it is a chance for me to practice explaining it and perhaps get some new ideas from some of you. I need to go into the history of this project a little so that you will understand what brought me to this point and what my goals and objectives were. Most of you saw the `F21 in a Mouse' demo that I did down here some time ago where I taken the 68hc05 out of an old mouse and replaced it with one of my chips (F21) then wrote a mouse driver, an event handler, a desktop, and an application, a simple crude application as a demonstration. I put it into the mouse and do demos with it from time to time. That environment let me to realize some of the limitations that I was faced with and some of the problems that I needed to solve to move that environment forward.

Figure 1. Forth time phases.

Design         Edit            Compile                 Run
                               (ICE)                   (ICE)
Problem        Ascii           Interpret               Execute
Analysis       Files/Blocks    Compile                 Interpret
                               Dictionary Build        Compile
                               Dictionary Search

Chuck Moore often talks about the concept of Forth having four different time spaces; the design time, the edit time, the compile time and the run time, and how each of these is more expensive. How you would like to move things from run time to compile time if you can and move things from compile time to edit time if you can and from edit time to design time if you can, rather than going the other way and waiting to do all these things in your application at run time. that was one of the things that influenced me and I came to the conclusion that I was kind of getting a little tired of the limitations of machineforth, that I had been doing this for ten years now and I was very happy with machineforth but I that there were certain limitations so I decided to spend some time in the design phase and really look at the problems. As Chuck says, solutions do not require large amounts of code they require small amounts of well thought out code. so I began to look at the problems that I was facing.

Now in this diagram (Figure 1.) I have tried to diagram the traditional view of what Forth is: the design phase, you have a problem, you analyze it, you try to figure out how you are going to deal with your problem and solve it. That involves all the problems that you have such as the complexities of hardware, the complexities of operating systems, the complexities of the Forth that you are using, and the complexity of the problem that you are trying to solve. Once you have done that you take your design and you move to the editing phase. incrementally you factor, in this phase you factor your problem into the most logical small groups so that it express the problem in the most eloquent and clear and effective way to solve that problem.

Then you move to the edit phase and you edit the code. Then from the edit phase you move to the compile phase and compile it. Sometimes in compile phase you have to go back to the edit phase and edit the code again and then compile it again and then run it. Then sometimes you have to go back to the edit phase again and then compile your code again and test it incrementally.

Figure 2. Forth evolution timeline.

Aha                00
colorforth         95
machineforth       92
OK                 87
ANSforth           86
cmforth            85
F83                83
FIGforth           70s
Commercial Forth   70s
Early Forth        68

Furthermore what we did in these phases was quite traditional. If you look at the history of Forth (Figure 2.) Chuck Moore did the first early Forths sometime around 1968 and then he wrote the first commercial Forths and Forth Inc. was formed in the early seventies and they wrote more Forths. Later in the seventies the Forth Interest Group popularized Forth and distributed FIG-Forth. Later F83, these were not official standards, in a sense, but in a sense they were standards.

Not like ANS Forth, ANS Forth was the first meta-Forth definition. The first definition of Forth designed to describe a wide range of implementations, and a wide range of machines. To not specify implementation details but to specify things that would be portable. The word CELL, for instance, was key to this because these earlier Forths were tied to a particular word size machine. So if you look back at this legacy code from FIG-Forth and from Forth-83 you see a lot of the word two-times ( 2* ). And two-time sometimes means multiply something by two and sometimes it means convert from cell units to addressing units. So when you convert this code from FIG-Forth to ANS Forth you have to identify which two-times means two-times and where it really means CELLS and you replace it with the word CELLS. And if you are on a machine which has cell addressing then CELLS is one-times which is an immediate word that compiles nothing. It becomes two-times on a sixteen-bit machine with byte addressing, it becomes four-times on a thirty-two bit machine with byte addressing.

And you have portable code and you have words like and again you go through your old Forth and figure out which words should be CELLS and now you can write portable code. CELL+ CELL- and again you go through your old Forths and figure out where you were really referring to cells. Then you can have portable code that will run on 16-bit, or 32-bit, or 8-bit, or 20-bit machines, whatever.

However up to this point this was basically the picture. You analyze your problem in the design phase. Then you go to an editor in the edit phase. The source code was in ascii string format and you store it either files or blocks.

Then you move to the compiler phase where Chuck Moore talks about ICE; Interpret, Compile, Execute. Chuck criticizes many Forth programmers for not having enough interpretation in their application and for too much compilation and execution. And if you analyze his code and you look at the metrics you notice that it's almost half interpreted code and half compiled code. In the middle of a definition he drops out of compilation mode and goes into interpretation mode, classically with a bracket, computes something during that phase of compilation, then goes back into compilation phase and compiles the result of that computation.

So your compiler interprets your ascii text, compiles your ascii text, and can execute the code that you have compiled. And to do this it builds a dictionary. The dictionary is typically a linked list of names of your compiled words, of strings, of data structures. Everything in Forth becomes a word and this dictionary is the container for that structure. Of course it doesn't have to be a linked list, it could use a hash table or have other mechanisms to link your dictionary together but traditionally it was a linked list.

It gets built at compile time and it gets searched at compile time whenever you encounter a word in source it and finds out if it should execute it or compile it or whatever is appropriate. When you run your application it executes your compiled code. It can also interpret more source code at execution time and in some cases it may take advantage of the fact that the Forth compiler is, in most cases, still there because have simply extended Forth towards an application specific language.

But the Forth compiler is still there so your application can still compile. And that's a big advantage for some things because you can take certain problems and from within the Forth compiler generate executable code from some specifications at run time. It's a very powerful feature and some Forth programmers take advantage of that today.

So I was in the design phase looking at my problem in F21 in a Mouse. But before I get to that let me go through history a little bit more. Around the same time as ANS Forth Chuck Moore began working of Forth hardware. And as you all know, the first Forth hardware that he did was the Novix chip and for the Novix chip he wrote a version of Forth that he called cmForth, which is on one of his license plates. And cmForth was a little bit different than traditional Forths. It compiled native code, it compiled optimized native code. It did not use IMMEDIATE words with an IMMEDIATE bit (in a dictionary header) the way Forth had traditionally been done. It used a MACRO wordlist and a FORTH wordlist and IMMEDIATE words went into the MACRO wordlist which gets searched first. He did this to simplify the design and make it run faster.

He wrote into it the optimizations that matched the chip. Well, as John's (John Rible) talk about hardware today showed, hardware is strange, it works in strange ways. The details of how to optimize code on the Novix chip were very specific to that chip. It was an unusual chip, it had three separate data busses, one for the data stack, one for the return stack, and one for main memory. And this allowed it to operate on all three of those environments in a single clock cycle. So it could take items from the data stack combine them send them to the ALU, take something off the return stack, manipulate the return stack, and operate in the main memory space all in the same machine cycle. In a single opcode it could execute up to five traditional Forth high level language words in a single clock cycle. It was remarkably small and simple and only used 4000 gates. And I was very impressed by the way it could out perform Intel's fastest chip at the time on many problems, especially real-time problems.

Because the solutions needed by Intel, such as deeper pipelines and memory caches became very complex. When you added in that there might be interrupt code running in the background it became very non-deterministic. And for real-time problems if you profiled the execution rates it was not the few times that it runs a little bit slower than normal that got you, it was those times that it executed your routine a hundred times slower than normal that made it difficult to solve certain real-time problems. So I was impressed by the way that Novix which was a hundred times smaller and much simpler and was able to outperform these large chips that Intel was making.

And I got a Forthkit III from Computer Cowboys and asked Chuck Moore a lot of questions about it. I was please with how open and helpful he was and that the didn't treat me like some lower person just because I didn't know very much about it. I didn't know anything about his chip or anything about cmForth and he was very helpful. And I very much liked the little manual that came with it. It was full of exercises. It reminded me of the lab notebook that I had in physics class in college in Electronics. Where we went through a lot of experiments. It was full of experiments that you could do and code that he had written to do each little experiment. It was wonderfully educational. (note to self: I should publish a copy on the net sometime.)

I was also very impressed by the fact that cmForth was very small. The total system source including the metacompiler was about thirty screens. The total system was about 6K words and the compiler itself was about 2K words. And it could compile itself incredibly fast. Unless of course you were loading the source code through a 9600 baud serial link from your host computer which is what I was doing.

But Chuck was not altogether happy with cmForth. He felt it was just too complex and too specific for his chip it had all this bizarre optimization that only matched his chip. He wanted to design a chip that would not require this kind of complex optimization to get good clean native code. He wanted a compiler that could produce native code in a simple way that would run on a simple chip. So that set him down the road the MISC chips starting with the original sh-boom and then later the p20 design that evolved into dr. ting's p21 the F21, and the i21, p32, P32, P8, v21, this whole list of chips that he did for different clients. But what was important about these was that he developed a new version of Forth for these chips called machineforth that departed more radically from traditional Forth.

Oh, I'm sorry. I skipped OK. This is an important phase that Chuck went through. He decided that he liked the ideal of minimal source code and as Chuck often says the minimal amount of source code was zero. So he said I'll try that.

He said I'm going to do sourceless programming. He designed a system in which the code was going to be represented by itself. He had already decided that this was the correct way to design a chip. You don't design a chip by providing a text description and handing it to someone else's compiler to compile your chip for you, that you have no control over, that is going to produce something that you don't understand, that is sub-optimal. As he said many times in those days, "The map is not the territory."

You want to deal with a chip? You describe the chip precisely. He designed an environment where he would design the chip at the component, sub-component, level, not in a high level description. He decided to do the same thing with his software and he called it OK. Originally he called it threeForth, 3/4th, he also has a license plate 3Forth, and the idea is that it is part of a Forth system. It isn't a complete Forth because it doesn't have an interpreter it doesn't have a compiler it doesn't have source code but it still basically Forth.

It's sort of like Forth. It's structured like Forth internally but he's the compiler and there is no source code. The code is the description of itself. This became OK, from the Forth command prompt. And he implemented this on a variety of machines implementing it each time without source code.

He began by bootstrapping something into a debugger to bootstrap his activity. Then he would write a hex editor and then an opcode editor and then he would write a screen editor, a graphics editor, then a font editor. Then he would use those as the OK system. OK was the minimal GUI. I don't know, something like six bytes of code. It was basically a menu controlled function for each screen in his application so he had a graphics environment but it was menu based each graphic screen had a menu of associated commands that appeared on the screen and a table of eight functions which were indexed into by his keyboard input.

And he also experimented with what types of keyboards were most effective for him to enter one of eight key commands. You all remember those days of three key keyboards and keyboards glued to wood and keyboards glued to rocks and keyboards attached to his fingers, that he could operate like this (motioning as with castanets). And he tried all kinds of different things. Eventually he became dissatisfied with it. He declared that sourceless programming was a dead end. And the reason that he decided that was that he found that what he originally intended, that the code could represent itself. You could just decompile it if you wanted to see the source. But he found that decompiling was not a trivial problem. It was complicated by the fact that different source code sequences could compile to produce the same object code. Furthermore data, which was compiled by interpretation of source code that then generated binary data at compile time, resulted in just binary data. You just could not decompile it into just how it was defined. So this ambiguity left him dissatisfied with OK. Furthermore he wanted to make it easier to port his CAD system which had been implemented in OK. John Rible was involved in trying to unravel this large mass of object code with no source and produce source code so that it could be compiled, or assembled actually, and produce a new version of OK and OKAD.

So now that he had these tools working and was designing chips he designed a new type of Forth chip that was simpler conceptually than the Novix chip. The optimizations required for this chip were relatively trivial and the compiler to produce that code become known as machineForth.

This is one version of machineForth, the original version of machineForth, and this is it. This (holding up a half page of code) is the optimizing native code compiler for the chip that he designed. In fact this is a cross compiler written for Dr. Ting's P21 and it's written in F-PC which is a 16-bit Forth so it's about twice as complicated as it needs to be since you have to use doubles to compile 21-bit numbers. Furthermore because it is a cross compiler and because he did some odd things inside of the silicon, which he called patterns and numbers, you have to do some conversions when you cross compile from another system. Now of course these conversions are nothing new to anyone who has been involved in this kind of thing before.

The first time that I converted an application from an embedded Motorola chip to an embedded Intel chip it didn't work and I said, why isn't this working? And then I realized that one machine had positive logic and the other machine had negative logic. That is to say if I took the same ROM and put it into the other machine then all of the data statements were now inverted because an 0FF was now a 00 because a logical one on the data bus represented five volts on one machine and zero volts on the other machine. So when moving a ROM from Motorola to Intel I not only had to change all the code, I had to change a macro for all the data statements to exclusive-or them with 0FF.

So when I encountered the concept that if I developed the code on a PC then I would have to exclusive-or all the data statements with 0AAAAA this didn't seem too strange to me. (a little laughter) However the first time I heard Chuck explain this in front of a Forth meeting it confused the Hell out of me. I said, if he can confuse me, and I've been doing this now for a while, I'm sure that everybody else in the room is really confused. And in fact he made it sound like if anybody who wanted to use the chip, even to add two numbers together, had to exclusive-or the arguments or exclusive-or the result and in any event it seemed nightmarishly complicated. And this just wasn't the case, if you wanted to add two numbers, you put two numbers on the stack and you add them.

That's all there was to it. But still, if your a cross compiler writer working on a PC, which has a different bus than this chip, you have to take that into account. You had to exclusive-or addresses. You had to exclusive-or numbers. And because Chuck was working at the hardware level he's thinking of these things as bit patterns they were represented one way and in software they were represented another way and he had to switch back and forth thinking of them as numbers or thinking of them as patterns. And he'd get up in front of the Forth Interest Group and talk about this and since for the most part none of you have to deal with being inside of a chip in a CAD system and outside of the chip it just was very confusing for everyone. Eventually he learned not to talk about that. And said that it was probably a mistake to have ever used that terminology in public because it really confused people.

So for many years I was using various versions of machineForth, cross compilers. I wrote a version of machineForth that ran natively on P21 and it was much simpler and much smaller and more trivial. As you can see from a page of code it is not a complicated or complex program. And for the most part I was very happy with it, but it had its limitations.

One thing about cross compiling makes it difficult to move through the interactive loop of edit, compile, test, edit compile, test because you have to edit, compile, download, test. Sometimes burn a ROM in the process. So it takes time. And we experimented with tethered Forths so that you could speed up the process, but still it was a slow process. Furthermore the chip is relatively small, it doesn't need to be big because everything, everything in the software is so compact. John talked about the advantage of these tiny opcodes but you really don't understand it unless you have been in there and worked with these tiny 5-bit opcodes where you can put four opcodes into a 20-bit word in memory.

When he designed P21 Chuck only used instructions where they had access to a 1K page. He did this because the RAM chips had 1K pages and everything ran fast if you kept it on one page. So he said, "Well most programs fit into 1K so all I need is 1K branches." I thought, "What planet is he from that he thinks that programs fit in 1K." Well I learned that he was from planet Forth and that programs did fit in 1K. After ten years almost all of them, every well written program, fit in 1K for the most part.

It all depends I guess on what you call a program. People sometimes refer to a program as a collection of different programs that get loaded at different times. And he refers to his OKAD system as being about 20 separate programs, that's the way people implement it when they write twenty ten megabyte programs and put them together. And he had twenty 1K programs which he put together so he demonstrated that his programs do fit in about 1K.

When I worked at iTV we had people who were programming in ANS Forth and people who were programming in machineForth and for the most part the machineForth people always kept things under 1K. And when I say to people that programs fit in 1K they think I am from outer space and I have to say, "Well, ours is word addressing machine, 1K words means two and half KBytes. So that's the first translation that you have to make. The second translation that you have to make is that 1K words means up to 4K Forth opcodes. So if you compare that to a 32-bit Forth where words are 32-bits that's 16 KBytes. Now if you say to a Forther that most programs fit in 16 KBytes they don't look at you like you were from outer space. They say, "Well yeah, that's the way we have been doing it for twenty years." But when you say that they fit in 1K they don't quite grasp what you mean. But 1K words on our machine might be the equivalent of 16K bytes for other people.

Meanwhile, while I was happily working away in machineForth all these years writing things for iTV and for myself and my chip and seeing how remarkably small and bragging about the F21 in a mouse that many people think that it is Windows when they see it running has only 600 words of code total. The boot code, the video drivers, the video coprocessor code, the event handler, the mouse driver, the desktop, the application, if you add them all up 600 words. So yes, the compiled code is remarkably small. But the source code was small, yes smaller than other Forths, but still I was dealing with this conceptually, source in ascii files or blocks, interpret, compile, build a dictionary, search the dictionary, create an executable, go execute it. So I had remarkably small object code but the source code was still on a conventional scale.

iTV for instance had a fairly complex application, an operating system, compressed files that decompress when they come out of ROM, flash file system, a GUI with windowing and fonts and colors and font attributes, an Internet browser, a TCP/IP stack, a whole list of Internet protocols, graphics decode, graphics display, email, (a Forth command line and remote line) and the entire image that they had was about 200K words of object code and data.

Michael Montvelishsky did a profile of it down here a few years ago and I found it quite striking. It was totally obvious which modules were written in machine Forth and which modules were written in ANS Forth. He would say this module is 1K, and this module is 1K, and this module is 2K, and this module is 50K, and this module is 70K, and you could tell which was which.

But the point was that the total source was bigger than the ROM on F21. So you couldn't compile the code on F21 and have object code. You just simply could not fit the source code onto the machine. You had to have a separate machine to build it.

But here I have this lovely desktop that looks like Windows with 600 words of code. I don't have the problem of download files from the Internet, store them in the flash and those sorts of things in this environment so I would like to put source code into the flash and be able to do things in a more sophisticated way. So I wrote down all of the features that were my implementation goals.

I wanted to be able to have isomorphism in Forth hardware and software so they matched tightly. Isomorphism in the operating system, GUI, debugger, editor and applications. All integrated in the same way. Isomorphism using distributed processing with symmetric multiprocessing and the network interface that I designed for my chip and the software designed to do this, a hardware accelerated GUI using the video accelerator built into a few transistors in Chuck's design, tokenized source code, a tokenized source code database manager which would allow source code debugging and virtually instantaneous compilation. My GUI, that I had written for myself and for iTV, were based on the Smalltalk model but in Forth and I wanted to do something like that here.

I expanded this, but I also wanted it to have a Logo like property of simultaneously being able to edit the source code and instantly have the compiled code changed so that a child could browse the system, modify a word and instantly they have new executable code.

I wanted a highly optimized heuristic environment ideal for artificial intelligence, rule based expert systems, neural net simulations and genetic algorithms. That's what the hardware was designed to do, that's what the hardware was optimized to do, and I wanted software optimized for that purpose. I wanted it to be a scalable environment where you could plug more than one chip together and have it be as big as you want. And I wanted it to be extremely simple, compact, and fast. Those were my design goals.

Meanwhile I was kind of jealous of Chuck because he had left MachineForth and was now doing weird new stuff in ColorForth. ColorForth is what he describes as putting information into the white space of your source code for the purpose of making the source code more clear, the source code smaller, and making the compiler faster.

It does this in many ways. He has talked about how it works and the evolution of his colorForth over the years. And he changes it from time to time. The last explanation, last July, explained how he was now using Huffman encoding for his character strings and how he picked out the most advantageous way to represent each character in a minimum number of bits and pack them together. And it was very clear that he was thinking about his hardware. His hardware is word oriented, it is not byte oriented. He doesn't process bytes. He doesn't have bytes in his source code, he doesn't have bytes in OK, he doesn't have bytes in OKAD. He's got words, and all of his code reads words and his hardware reads words.

Furthermore he pointed out that Forth itself is word oriented. And he doesn't mean words as in CELLS there, he means words as in Forth is words and spaces and that that is what makes it a distinct language. It doesn't have a great deal of syntax. It doesn't have a lot of rules about how this word has to be in the context of all of these rules or that you have to formulate your expressions if you are going to use this keyword in this way. Forth kind of gets around that, it's got words and spaces.

A word is like an object, it can contain anything, source code, executable code, data, links, pointers, whatever. Listening to Chuck's reasoning, why he was doing things the way he was doing them in colorForth, it was very obvious that he was designing code to run on the chips that he designed. Well, I was writing code to run on the chips that he designed but I was still stuck in this traditional concept (Figure 1.which I describe as the thirty year old concept that everybody in Forth uses and nobody every questions.

This (Figure 1.) is the definition of Forth until Chuck comes down here and gives presentations that George (Perry) calls (jokingly) heresy and reminds us that it is alright to be a heretic and question these thirty year old ideas. So I looked at what I was doing that came from this and break out of these old ideas. So I looked at what I was doing that came from this (ANS), what I was doing that came from this (cmForth), what I was doing that came from this (OK), what I was doing that came from this (machineForth), that got me up to this phase, and what Chuck was doing in colorForth that was different and why it was superior and why I would like to do some of that. And I looked at my design goals which were to write code that smaller, faster, and simpler, and I took all of these parts and started trying to see how I could fit them together and I said, "Aha!"

So I named the project aha, a word designed to express an epiphany, an understanding of the nature of something that suddenly appears before you. And what I did in aha was to refactor this picture, as Chuck had refactored it. In colorForth Chuck had taken some of what a compiler was doing and moved it over here into the editor. As a result of that his compiler was simpler. Well his editor was more complicated but hey, editors are not complicated things. Traditionally people in Forth write their own editors to do exactly what they want. I have written them, many people have written them, some of you have written your own editors. It is generally speaking part of the Forth style since we had these integrated development environments with compiler and editor all rolled together.

By moving functions that used to be in the compiler into the editor Chuck could not find errors at edit time that would normally hang around and wait for compile time. By compile time they are not as fresh in your mind as when you wrote it. When the compiler pops it up you have to look at it and figure it out all over again and think through it. But with Chuck these errors would get caught at edit time to speed up the development cycle.

Furthermore, it sped up the compiler because now the editor could package things in a way more advantageous than just ascii files or ascii blocks. For one thing he was thinking about hardware that was word oriented. It's not byte oriented, he doesn't want to pull them in one byte at a time and chew it up one byte at a time because it not very efficient on his machines. He wants to pull in a word at a time, so he began packing his source code into (Huffman) packed counted strings that are word oriented, cell oriented. So he said, "We need to recognize that Forth is word oriented and take advantage of that." His way of taking advantage of that was to use cells and pack strings and counts into this representation here (source) the compiler looks at it and says, ok this is a counted string and it can skip over it if it is a comment or I can read it in and I know how long it is I don't have to count through it looking for a space to see where it ends. Furthermore he used source tokens that could say, this (next item) is a binary number.

Now you go up to this point (cmForth) and the concept is that for a number first you search the dictionary, and you know that it's not there because numbers are not in the dictionary, but you search it anyway because that how it works in the old compiler and that's it beauty, it's simple. But when you get a number first you search the dictionary, well that's not a big problem for Chuck, he's only got a few hundred words in his dictionary, but when you start dealing with Forths with 20,000 words in their dictionary that's a lot of overhead. So you determine that it's not in the dictionary and they you try to convert it into a number. You take the string, and then you parse it as a string, and then you perform this operation on it to convert the string to a binary number and if it fails you have an error which you couldn't catch at edit time. You only caught that at compile time so you go back to edit time and re-edit your source code because the compiler, parsing your strings and converting your strings into numbers, determined that this was a problem.

So I saw that Chuck was getting this advantage. He moved functions out of this more expensive time phase (compile time) to this less expensive time phase (edit phase) passing some complexity from the compiler to the editor. And I said, I'd like to do that. I want to have packed strings and counted strings and identify numbers but there were other good ideas here. There were other good ideas in cmForth and OK but basically I am sticking with machineForth. I had this big pile of ideas and when I looked at how they could all fit together in a new way and had the `aha' realization I came up with a new concept for this picture.

I said, well let's do this, let's move the dictionary build and dictionary search into edit time as well.

Figure 3. Aha time phases.

Design         Edit            Compile                 Run
                               (ICE)                   (ICE)
Problem        x-ascii         Interpret               Execute
Analysis       x-Files/Blocks  (some) Compile          Interpret
               <-------------- (some) Compile          Compile
               <-------------- Dictionary Build
               <-------------- Dictionary Search

How do you do that? That was two of the mechanisms that I came up with. In fact, I came up with the concept of instead of using an ascii file or block I used something much more like a database management system for the source code. It's in records, and the records are packaged in little packets. And the packets have little tokens that tell the compiler, or the editor, what's in them.

And I determined that what I wanted was five different kinds of packets for my source code. Now, why would I want to do this? I had the realization that I had been doing a lot of sourceless programming, because I had these tools. I had simulators that looked like an old fashion monitor program where you edit the machine code. You can decompile machine code and see what it represents. You see the names of the words, you change a name and it changes the bits for you. You don't have to remember which bits are which opcodes, you just edit it. But it did have limitations.

One version had a dictionary link so that it would show the names, the names for subroutine calls .... noise... OK which was that with binary data you had no idea what it was suppose to mean except that there is some binary data there that, well if you want to change it you have to go way back to here (edit source) and rethink all this do all this to get back and that was the limitation. But for a lot of code that I was doing it wasn't ambiguous, it did decompile completely, and a very good way to represent that code was with tokens, very small tokens, 5-bit tokens that happened to be the opcodes.

So my first record type is Opcode Tokens, 5-bit tokens. In other words I'm representing one class of words in Forth by 5-bit numbers.

Figure 4. Aha DBMS Record Types.

1 Opcode Tokens
2 Function Tokens
3 Packed Counted Strings
4 String Pointer Tokens
5 Binary Records
  Compressed Binary Records

It is a very attractive idea. I don't lose anything and I compress Forth words down from being five character long strings, they average five characters in length if you count the space character that delimits them, to 5-bits which is a 6/1 compression.

Now the next type of token is Function Token. And these work like a traditional tokenized Forth. You take a token, index into a table, and you execute something. So, these words (holding up the half-page machineForth listing) with the exceptions of those that are Forth opcodes, words like IF, THEN, WHILE, BEGIN, UNTIL, and words that are macros in this optimizing compiler are represented also by only a few bits. They are not opcodes but they are source. Opcode tokens are source also. These tokens are very small like the opcode tokens (5 to 6-bits) and they simply go to a lookup table in memory. You pull them out of a word, you pull out a few bits, you index into a table, and you execute the same function that gets done in machineForth.

But in machineForth it works this way. You take the name, you search the dictionary, find it, and then execute the code. That's what the compiler had to do. Whereas now I don't have to do a dictionary search for Opcode Tokens. I don't have to do a dictionary search for Function Tokens. So now that just leaves me with defined words to deal with.

Well I wanted to do what Chuck was doing in colorForth; pack those words, count them, store them packed with a count to speed up processing. And I said, you know, I can pack them into something that looks just like a Forth dictionary inside of the source code. That just has to happen in the editor. The editor processes these things and builds the dictionary. However the dictionary has a CFA field (Code Field Address slot) that is not initialized, it's empty, it's got a zero but it's there, and I get compression that way.

When used in conjunction with the next type of token, the String Pointer Token, it becomes really advantageous because the String Pointer says now when I encounter that word in the source code the next time I look in the dictionary, I find the word, and I represent that as a token, which is a pointer to the CFA field in the dictionary. So I do the dictionary build, and the dictionary search I convert the words into this type of token (String Pointer Token) so that first time you see a word defined, Chuck does that with a red word, or what we do with a name that follows colon ( : ) or a name that follows VARIABLE or a name that follows CREATE, is to create a packed counted string in a dictionary. Any time I encounter that word in the future I package it in a String Pointer Token which represents it as a pointer into the dictionary. So I compress the strings down into a cell or less than a cell if I want to limit myself to less than a million definable words. So I get a lot of compression.

But the real advantage of this that the compiler doesn't have to look them up in the dictionary. The compiler says, ok, this is a pointer into the dictionary that already exists, it's a pointer to a CFA, so the compiler, when it compiles this code, and it encounters the name of a defined word it just fetches the CFA, when it compiles object code it sets the CFA field in the dictionary. Now when it encounters the defined word in the following source code all it has to do is take the representation, which is a pointer into the dictionary, take one fetch, get the CFA, turn it into the appropriate type of subroutine call based on current and destination addresses and compile it into the object code. So there is no dictionary search. So by using these four types of tokens I completely eliminate dictionary searching at compile time.

Now I have a fifth type of record, a binary record. From the compiler's point of view it is analogous to an opcode record but it's got a different flag bit in the data record header that tells the compiler (and editor what it is). I also have packed counted strings for comments. I can put as many comments as I want in there and the compiler says ok, this is a comment and at the same time it knows how long he comment is and it adds that to the pointer and moves past it and it doesn't slow down the compiler.

So I can speed up the compiler by an incredible amount by doing all this. I coded up the function to compile these (Opcode Tokens), and this (Binary Records), and this (Compressed Binary Records) record type number five, compressed binary records. If you are going to represent a thousand zeros in memory you don't want a thousand statements in source code, you want to say repeat this zero one thousand times. Much smaller in source code and much faster, especially in an environment that is limited by memory operations. The CPU is running at 500 Mhz and the memory is much slower. It's much better if you can put it on the stack and then just write it out to memory instead of read it from memory, write it to memory, read it from memory, write it to memory. Again, this speeds things up.

So again, the first code that I wrote was the code to process this type of token and this type of token and this type of token. I've already got the code that processes these types of tokens (Function Tokens), I mean I have the code in machineForth, you saw it right there, that does all the things that these functions do and it's already written. All I need to do is convert it to use a table lookup by pulling these tokens out.

The code to process these types of tokens was about five or six words of object code, several lines of source code. And I benchmarked it at compiling a maximum of 120 million Forth words per second. The way I get that number is that using fast SRAM it takes 30ns to compile one of these opcode tokens which represents up to four source words so I am roughly compiling Forth words of that type at a rate of up to 120 million per second. When I looked at how many operations I need to pull out a function token, AND out one of them, index into the table and execute some code and it is much slower. Way down here around around only two million per second. Defined words are in between. You only get one per word so there are more memory operations involved. So it is somewhere in between say 20 million per second. I am must pulling that out of the air right now so I could even use 2 million per second.

It depends on what your application is as to how much compression, and how much speed you get, but you have to remember that I am talking about things that are six hundred words of object code for this complete (mouse demo) system. Ten or twenty words of source code for this much of the compiler. (Function and string token processing had not been coded at that time.) So if you take these numbers with an average of ten million words per second and figure out how long it takes to compile ten words and you end up with microsecond compile times.

Now I showed all this stuff to Chuck and he thought it was very clever. And he said, "That's just in time compiling but it's so fast that you could compile interrupt service routines from source code." I thought it was a very funny idea and I suppose I could. I don't intend to, but what I can do is now on this little chip I can have a whole bunch of different applications stored in ROM (FLASH) in this packeted format and I can pull them out and compile them as I need to.

The machine doesn't have relocatable code therefore if I am going to put six applications into the ROM and put six icons on the desktop I was previously forced to put all six into fixed locations in memory at all times because the act of compiling is what ties them to a particular location in memory and if you know you might to run all six of these you have to put them at fixed locations. But this scheme allows me to compile them in microseconds into a location in memory. So they are effectively relocatable now.

I can put more applications into memory, I can put a whole bunch of applications in that ROM (FLASH) and pull anything that the user points to and click on the desktop and it will come out of ROM and get compiled in a few microseconds. You have an application now, it pops up, it runs like the way we all expect a GUI to operate.

What I effectively did was by moving these three functions over here (Figure 3.) I refactored things. Now I have a compiler that's pretty and incredibly fast and incredibly small. And I have an editor which can enforce certain rules about how the machine code should work. It can optimize the code at edit time and package it into the database system. It does everything that me editor does now. It does some of the things that the machineForth compiler does now. I say it does, but it doesn't exist yet so it doesn't do anything yet. But the editor will take some of the functions form a traditional compiler and pass them upstream in time to this less expensive time phase.

It also allows me to build a source code browser - debugger - editor - optimizing compiler that will work like Logo and Smalltalk and Forth all packed together.

The records allow changes to the source code to be localized. You don't have to compile the whole application, you just pre-compile the packet that your changing. The editor can do these things between keystrokes. That's no problem. And it allow me to build what I wanted, something that looks like Logo and Smalltalk and Forth but is much smaller and cleaner and simpler.

And I was really very happy. I wanted to make my code a little cleaner, a little faster and a little more compact and make the compiler a little smaller and a little faster. But in fact I had made it a lot more compact, a lot smaller, and a lot faster and I was very happy with that.

Now a final note. This is related. I got an email from a young man in Australia named Sean Pringle who had written a stand-alone PC Forth system that he called Enth as in this is the n'th Forth system that someone has written. And it was an ANS Forth based environment with boot code and drivers for DMA, for the interrupt controller, keyboard and VGA drivers. Similar to Chuck's stand-alone colorForth that he has been doing but it is written in assembler and uses VGA instead of the latest ATI 3D video accelerator card that Chuck has now ported to, and when Chuck chose to change his video drivers to this card he locked a lot of people out. Of course the fact that he is using his own character set, that he is using his own keyboard arrangement, that he's writing it in colorForth, and that it's designed to run OKAD already locks almost everyone out from even being able to get in there and play with it.

So I was pleased to hear about Sean's doing this but he had not only written Enth, which is a stand-alone ANS Forth for the Pentium, but he also wrote Flux which he had much more fun doing, which is a flavor of colorForth. It uses tokens like Chuck, color tokens, but it has the same boot code and driver code for the Pentium as Enth and all that. So we started having some discussions about it and I explained to him what I was doing in aha with packed counted strings, building a dictionary in the editor and how I used dictionary pointers in the source code for defined words and that he could do that in his colorForth and steal this idea from aha.

I was very pleased that he came back in two days and said I implemented it and it works nicely. (laughter) He is currently on vacation and when he gets back he, he has a web page about Flux but hasn't done any documentation yet on Enth. I hope to get a copy of Enth and copy of Flux available at the UltraTechnology website so that people can get them and play with them. There are a lot of people who have expressed interest in playing with either a stand-alone boot Forth for the Pentium, or a colorForth and I doing it with Chuck's code is problematic (at this time).

I got a copy of one of Chuck's colorForths and I thought that it might boot up on that machine (pointing) because it does have an ATI video card before he switched to the latest 3D ATI video card and it wouldn't boot up. You can't read the disk with anything unless you write a utility to pull in the first track and store it in a file as a binary image and then you can try to figure out what all those bits mean and I wasn't about to do that and I don't think anyone else is either.

So the idea that you could get it assembler source code and do something with it stand-alone Forth on a Pentium and have ANS or colorForth I found very attractive. I encouraged Sean. He was worried about talking about this in public. (he had lurked in comp.lang.forth) He was worried that people would jump all over him and call him an idiot and make fun of him. And you know, he was just a college kid so he was really worried about that and I said, "I won't do that. I will support you. I'll encourage you, I'll try to help you. I'll be happy to post information at my site and we can talk about it in the MISC mail list. I know that there's people there who won't jump all over you for having these ideas."

Since then I have learned of a few other colorForths. Terry Loveall has a colorForth that he's written that boots up from floppy under DOS so it is a little bit like Flux. It doesn't have this function from aha to eliminate dictionary searches. And by the way, this is the only mechanism that Sean is using. So his Flux is much simpler than my aha. I am willing to go the extra distance of adding in the complexity of Opcode Tokens and Function Tokens because they compress the source code down several times more. Because I've got this environment where there is such a tight connection between the Forth that Chuck wrote and the chip that Chuck designed that they just fit together like that.

So I'm willing to do it, especially since I can compile Forth source code at a hundred million Forth words per second by doing it. So I'm willing to live with a little extra complexity in the aha compiler and the aha editor to do that. And he (Sean) doesn't need to do that. And he has written his Flux to look very much Chuck's colorForth in the sense that it's using a machineForth virtual machine model. It is kind of designed to run on Chuck's chips even though it's implemented on a Pentium. And that's what Chuck is doing. Chuck is writing for the Pentium but he is using the machineForth virtual machine model from his chips. I've also learned of another colorForth from James Hague.

When I get to the stage of writing an editor I plan to write several editors. The first editor will be a machineForth editor. It will look like what I have been doing except that it will be able to have comments and the editor will compress the code into these DBMS records to be processed by the compiler. And I can write an editor that displays the same thing as colorForth because that's an editor function not a compiler function. I can even display it as ANS Forth if I want; for people who need to see it that way and have a third editor. After all these editors are jus little tiny programs that I can pull in and execute with the same tiny little compiler.

So the colorForth picture now is that I am aware of five people working on colorForth, Chuck Moore, Terry Loveall, Sean Pringle, James Hague and myself. Four of them have working colorForths at the moment and I don't. I'm just leaving the design phase for the edit phase (bootstrapping) writing the first code and I am not going to get to writing this code (editor) until after I write this code (compiler) so I don't have a colorForth yet but I've made provisions to do what Chuck is doing and some other things in aha. So it combines stuff from traditional Forths with ideas from cmForth, from OK, from machineForth, and from colorForth and adds some ideas of my own. And that's the story on aha.

Any questions? (laughter) homepage