introducing aha

Introducing Aha

I have chosen the name aha for the new system I am implementing on the F21 Microprocessor. I chose the name aha because it is an expression used to convey an epiphany felt upon the sudden manifestation of the essence of meaning of something. In this case I was examining the models and methodologies used in software development. I had spent nearly ten years examining how well different software development strategies applied to a new tiny computer.

Comuters are the only general purpose machines ever made in that any computer can perform any possible computation given sufficient time and memory. But all computers have limited processing speed and limited memory and these limit the range of computations that are physically practical or economically justified. Wider, deeper, richer, and faster processors with more available memory can perform larger more complex computing tasks with proportional increases in power consumption and cost. Smaller machines can perform a wider range of economically justified but less demanding applications. Specialized machines can perform some tasks more efficienty but lose some generality and performance on other problems.

One ever present factor in computer hardware and software is efficiency. For a given computation the efficiency of one set of logic circuits can be measured in comparison to another set. For a given computation the efficiency of one sequence of instructions can be measured in comparison to another set. Once a given hardware architecture is choosen it imposes a set of constraints on which approaches to software will be more efficient.

Given the many differences between processor designs and the nearly infinite number of variations on the software to do a specific computation the problem of quantifying efficiency becomes very difficult. Chip makers provide clock rates as a metric to gauge chip performance but software developers know well that clocks per instruction and what the instructions actually do are just as important as clock rate and are difficult to quantify or measure. Benchmarks are often used to make relative comparisons between different hardware and software designs on some problems but the line between marketing MIPS and real performace can be difficult to find.

Even for a given hardware platform the efficiency of the software may also be difficult to quantify or measure. Efficiency in this context is only a relative measure of the speed or use of memory between two programs that perform the same fuction. The increase in the features implemented in a program make it difficult to quantify its performance by relative efficiency of some subset of the total functions that two different programs have in common. Rarely do vendors provide any kind of relative efficiency information about the features in their products because they are often not flattering. The assumption is made that adding new features may make old features relatively less efficient but that it will not be important because it will simply require, demand, bigger, faster, more modern computer hardware to run in the first place. The vendors don't advertize the regular drop in the efficiency of the functions in their software or emphasize that the drop in efficiency may closely match the required increase in the performance of the hardware.

Software vendors are market driven to emphasize the increase in features and de-emphasize the decrease in relative software efficiency of their products. The market assumes that machines will get bigger and faster at a rate that will make up for any software efficiecy decrease and that the next software release should require more hardware. Hardware vendors are market driven to emphasize the increase in features and clock rates and de-emphasize the decrease in efficiency in performance relative to required hardware costs and resources. Consumers get faster hardware and more features in software and the system moves forward with diminishing returns.

Ten years ago I started down a path of examining the hardware efficiency of unusual tiny and somewhat specialized computer designs. These were machines with an architecture closely resembling the virtual machine in the Forth programming languages. As one might expect I found that they were very efficient for executing Forth programs and less efficient for executing code written in some other languages. What seemed amazing was that for a certain class of problems we could get similar performance to that of conventional designs that required one thousand times more transistors to perform the computation.

Forth, as a language, has always encouraged more experimenation with algorithms to find more efficient solutions than most other programming languages. One often sees in Forth, 'C', Java, and other languages that the degree of optimization of code generated by different compilers can result in programs that have efficiency ratios of ten to one or thirty to one. These differences in efficiency can often be traced to the compilation of optimized native code or the compilation of code for a Forth or Java virtual machine.

A more striking difference in efficiency is sometimes seen when a better algorithm is used in the computation. Compiling a given algorithm to native code instead of tokens to be interpreted by a virtual machine in software can often provide a speedup of ten times but a better algorithm might calculate the same result a hundred times, a thousand times, or a million times faster.

Having made the decision to explore a class of relatively small processors and machines I found myself confronting, more often than most, the issue of software efficiency. Other people would routinely plan to use more memory and more processing power tomorrow in the next release. Because they hope to see a bigger increase in hardware performance than relative decrease in software efficiency they do not see software efficiency as a problem that they need to confront or solve.

Ten years ago I have fully appreciated the simple and obvious solution to implementing efficient software on this tiny harware. Because we have what looks very much like a virtual machine in silicon we can use high level language techniques of writing for this virtual machine but because it is a real machine in silicon we can generate native code at the same time. We get the best of assembler, simplicity and speed, and the best of the high level language constructs, abstraction, clarity, and portability.

I had also not appreciated how trying new things and experimenting with different algorithms and implementation strategies would uncover so many ways to improve hardware and software efficiency that I had never considered. One of the things that deeply influenced my thinking was seeing how Chuck Moore implemented and used his VLSI CAD software and his Forth compilers. Everyone else who was designing custom microprocessors were using very large. very expensive, very slow, and very complex software. Chuck wrote his own custom code to do everything the way he wanted to do it. He wrote very small, very fast, very simple code that was easy for him to update and change.

Simulating hardware and software for new computers is an open ended problem. No amount of computing power would be too much. It is in a class of very demanding problems where performance requires very efficient software on a smaller machine or very expensive hardware.

One of the things I had not appreciated was that while most people using Forth had adapted a standard that was based on the kind of Forth Chuck was doing twenty years ago that Chuck had continued to experiment and change his ideas about how Forth should work. He was years ahead of me in his understanding of the style of programming and kind of algorithms that would map well to the chips that he was designing. Like many other people I had the habit of thinking that the definition of Forth was what we had been doing for twenty years.

We had an unusual enviroment where a native code optimizing compiler was a page of code. I simulators, compilers, target compilers, meta comilers, operating system, GUI, and application programs in these environments for UltraTechnology and other companies. Mems analysis of the Machine Forth code and traditional virtual machine threaded Forth code on these machines made it obvious why there sometimes was such a huge differences in code size and speed.

After lots of close examination of the techniques that Chuck has explored in his programming for the last 30 years I had an aha experience. I saw that I was writing for a simple silicon machine and in a new style but was still thinking that Forth had to do many things the way I had done them for twenty years. I had lots of better algorithms to choose from and an opportunity to do things better than I had before.

I observed that Chuck talks about Design time, Edit time, Compile time, and Run time as stages in the development process. Having studied and taught formal design methodologies I could see that Chuck had stepped outside of the thinking he had been using thirty years earlier. In the early days of Forth he had developed integrated development enviroments with fast, one pass, compilation. He could move from design, to edit, to compile, to test and back to design or edit. The speed, simplicity, and integrated nature of the tools made this possible. A decade later this became popular on a wider scale.

In the fifteen years since he moved primarliy from software to hardware design he has explored many new ideas in how to improve the efficiency of his hardware and software development. His ideas were not getting any exposure outside of a small group so as a service I made an effort to document the direction of his work and some of the more noteworthy details.

In porting a compiler and editor to a new F21 environement I took a long look at Chucks work. I wanted to make the source code smaller to be able to fit more into memory or mass storage. I wanted to make the tools as fast a possible to speed the development cycles. I wanted to make the tools small and simple to keep development easy and make compilation faster. I wanted to make the source code clearer and simpler. I wanted to make the tools more logical and to factor them in a better way to improve development efficiency. Just because things seemed small and fast now I still wanted to make them smaller and faster so I spend some time in the design phase.

I said, aha. I was hoping to make the source code a little smaller and the compiler a little smaller and a little faster and the tools a little more efficient. I realized that by refactoring things and changing some algorithms, that I had been using for twenty years without questioning, that I could make the source code and compiler much smaller and the compiler much faster and the tools more powerful than I had first imagined. I found ways to get greatly compressed source code, incredible compilation speed, and make the code look exactly the way I want it to look.

aha
a heuristic architecture