Factoring in Forth

Chuck Moore, the inventor of Forth tells people to "factor, factor, factor" their Forth code. But what is meant by the term factor in the context of the Forth programming language?

This means more than to just write definitions rather than repeat the same sequences of code in multiple places. For one thing it means the many things it means in mathematics. If one is asked to write a program to compute the result of "a^2 +2ab +b^2" given arguments a and b one could do that in Forth.

( stack pictures are shown in paranthesis to show what is being done. )

( a b ) OVER OVER ( a b a b ) * ( a b a*b ) 2* ( a b 2ab )
>R ( a b ) DUP ( a b b ) * ( a b^2 ) SWAP ( b^2 a ) DUP ( b^2 a a )
* ( b^2 a^2 ) + ( a^2+b^2 ) R> ( a^2+b^2 2ab ) + ( a^2+2ab+b^2 )

Which gives the result we want. Without the comments that is;

OVER OVER * 2* >R DUP * SWAP DUP * + R> +

Is that what a Forth programmer should do? Not if they factor.

The problem with that unfactored code is that it has four multiplies, two additions, and seven stack manipulation words. Less code would be better.

In high school or even grade school one might have learned that

(a + b)^2 = a^2 + 2ab + b^2

So we can substitue that expression and get the same result. To code (a + b)^2 in Forth given a and b we can code;

( a b ) + ( a+b ) DUP ( a+b a+b ) *

Without the comments that is;

+ DUP *

That is one addition, one muliplication and one stack manipulation word but gives the same result as the first much longer and complicated program.

Here are some guidelines to what "factoring" means in the Forth I do and that I see the inventor of Forth doing. But these are just a few of the meanings of factor in the context of coding in Forth. Other people may have dozens of other guidelines that they follow.

Factor for the problem at hand. The solution should follow the natural factors of the problem. Choose the best algorithm not simply the most popular one or the first one suggested.

Factor for just the specific problem to be solved. In most cases you don't need to solve the general problem. Don't factor in hooks that you think might be useful in the future. The code can always be expanded or redesigned when changes are made to the requirements.

Factor for short definitions. Everyone says they factor their code even the people who say they used 1000 line functions in C and only a few hundred lines for definitions in Forth. My teachers make nine out of ten definitions well less than one line and averaging only ten characters. My mentors put as many as fifty or more definitions in sixty-four words of compiled code and the source code for those words are very short.

Factor definitions to use about the same amount of executed code as compiled code to make the code clearer, faster, and smaller and do more at compile time. Forth is done in multiple time phases. Don't do at runtime what you can do at compile time. Use [ ] inside of colon definitions or yellow words in colorforth definitions.

Factor to reuse code by using a named routine in a single location for code that is used often. Do this to make code smaller and clearer by removing secondary complexity and attempting to approach the essential complexity of the problem at hand and its solution.

Factor magic numbers out of code by showing what they are and how they are computed by using interpreted sequences. Use [ ] if inside of definitions. It makes code easier to read, easier to maintain, and avoids some bugs.

Factor and refactor to find the smallest or fastest sequence of inlined code. To do this you have to know the cost of each word you use. Without that you don't know which sequence will cost the least.

Factor to conserve stack space. Entire applications may only need a dozen or so stack cells. You should keep track of the data stack use inside of each word. If you have trouble counting items then count them on your fingers. Long before you run out of fingers you need to refactor your code.

Factor for a real return stack. Without it there is no hope for compact or efficient code. Don't be afraid of phrases like "carnal knowledge" as you will not burn in hell forever for knowing what you are doing.

Factor for optimized source and don't try to rely on premature global optimization in a compiler to transform bad code into good code. Don't use that as an excuse to write bad code. Don't let a compiler second guess your code. Excessive global optimization on well written source isn't needed, leads to bugs that may be hard to find, and because we compile often can waste a lot of time. Show some pride and self-esteme and don't assume that a compiler is smarter than you are.

Factor to use short definitions that make the most common errors in longer defintions impossible. In particular in most cases try to use only one control flow construct per definition. Arbitrarily deep nested control flow is the source of most bugs and the reason most time gets wasted in debugging of these bugs.

Factor the math in the specification as part of factoring in Forth. To do n^4 don't code DUP 2DUP * * * since n^4=(n^2)^2 code DUP * DUP *

Factor to refactor. Working code is an existence proof of one way to code something and make it work but it can almost always be improved after it has been made to work.

Factor to make code clear by not using names when the sequence of code in question is smaller, faster, and clearer to a reader than a named routine. A good example is R> DROP EXIT or in a different Forth dialect POP DROP ;

Factor to use the stack in most code. If you have stack juggling, meaning a lot of stack manipulation words, then refactor to remove it and use the stack in a more natural way.

Factor to use global variables where needed. Don't be afraid of them as they may be needed for things that need to persist do not really belong on the stack.

Factor not to use local variables instead of the stack or global variables just because you think your only alternative to your own badly written stack code is to do something worse yet. Don't use local variables in Forth just because that is your habit a different programming language.

Factor to do the simple thing. Don't try to write the most complex or largest code you can. Anyone can do that. Simple code is better. It is easier to read, easier to understand, and will tend to be both smaller and faster.

Factor to favor lower level words. Don't write new unecessary high-level words when more efficient lower level words already exist. This is part of knowing the cost of each word.

Factor Forth with Forth style, don't copy algorithms and methods factored for very different languages. It is likely that you will often be shown C code instead of a specification without the C-ness required in the C code. Don't copy the C code in Forth.

Factor to use meaningful and communicative short names for definitions to be reused. Don't use long-hyphenated-phrases as names and don't use long_phrases_with_underlines like in C. Try to use natural language as best you can for factoring short communicative word names.

Factor to use Forth as your OS and compiler to get the most from Forth. From the start Forth has always had its greatest advantage in size and speed by having the OS and compiler tuned for what is needed in a given environment. If you choose a one-size fits all environment when a solution that fits the problem is better. If you don't use tested Forth code for which you have source you may be building on tens of thousands of bugs for which you have no source code.

Factor to use Forth for your Forth source and don't abandon Forth to write your Forth code in a language other than Forth. Forth has always worked best to define Forth. It is very easy to integrate an assembler into Forth when needed so try not to rely on external assemblers or compilers that require you to understand and write in something other than Forth in the name of doing Forth.

Factor to conserve return stack space. If you use recursion you will need a larger return stack but don't be wasteful when you avoid it. Remember that a real return stack is a precious and limited resource and using it takes time and money.

Factor to mix control flow when you can keep it simple and need to make smaller and faster code than code factored into multiple definitions. A good example of this is seen in the phrase BEGIN FOR ... IF ... XNEXT ; THEN ... NEXT ;

Factor to avoid the word ELSE. Use an EXIT somewhere after the word IF and a return somewhere after the word THEN. This avoids a unneeded jump. It also makes coding and debugging easier. Use ELSE when you need to reduce the nesting overhead.

Factor aliased words to not need an extra jump.

Factor deferred words to do no more than a jump or nop.

Factor your Forth code by doing a profile of your code that uses a base addess and an offset to use an offset of zero for the most frequently used access. This way one can use the mathematical technique of cancelation to eliminate loading the zero offset and doing the unnecessary addition with the zero. An optimizing compiler is not likely to reorder by knowing which thing should enjoy the benefit of being at offset zero.

Factor your Forth code by doing a profile of your use of structures to eliminate ninety percent of the base plus offset calculations. Reorder the fields in the structure so all the most frequently used code to do sequential accesss most of the time when you can and then instead of just caching your pointer use an auto-increment access instruction where possible to reduce the amount of code needed. Factor to fold the pointer arithmetic into the access when you can.

Factor with modern coding techniques rather than following thirty or fourty year old practices that were made obsoleted long ago. Use -IF and -UNTIL and -WHILE that test sign as you would use the words IF and UNTIL and WHILE to test for zero on stack. Learn why they obsolete so many old Forth words. Learn why they are actually more powerful that the older traditional words that test for zero.

Factor to embrace what hardware wants to do. Hardware and software do not exist in isolation of one another. But do not abandon Forth style to embrace other languages by adding extra complexity to Forth just because the hardware may have extra complexity needed by other languages and do the simpler thing.

Factor to use definition fall-through at the end of one Forth word into a Forth definition that would otherwise be called at the end of the first definition. This makes code clearer, simpler, shorter, and faster where possible.

Factor to use tail-recursion to make code simpler, shorter and faster and to conserve valuable return stack space. Don't make a call to a word that returns to a return when a tail-recursive jump can be done instead.

Factor to use features of other languages in Forth code when it is a natural fit to the problem but don't try to embed other languages inside of Forth.

Factor to conserve power. If two sequences of code are the same size and same speed factor for the one that uses less power. This a variation of the idea that you should know the cost of each word you use.

Factor based an really understanding the real requirements.

There is a lesson to be learned from the story of the CEO and the shredder.

A manager at a company sees the CEO standing in front of a shredder looking perplexed. He asks, "Can I help you?"

The CEO says, "I don't know how this thing works."

The manager says, "You just feed your papers in here like this." (whir whir)

The CEO says, "Thanks. I only need two copies."

4/28/2011