Chuck Moore to SVFIG on 4/23/93. A video of this presentation in the store.
Chuck Moore's presentation to Silicon Valley Forth Interest Group on 4/24/1993.
The Champaign is welcome. It's been a long long time. One of the things I am proudest of in having made this chip work is sticking it out for so long. The project was scheduled for six months to a years and it has lasted more like two. And that has strained the resources almost beyond credibility.
But we do have chips that work. They don't work perfectly. But they work enough to justify celebration. You've seen this board before. This is not one of the current chips, this is an old ones that doesn't work so if anyone wants to come up here and touch it they can. I was showing this board off at a silicon valley company and people expressed shock and amazement that I would actually let them touch a CMOS chip. I hadn't really paid that much attention to that. I handle these things all the time myself usually with a strap on my wrist. But the world is aware that you don't casually handle valuable silicon. This is now functional and you are welcome to come up and look at it. I also have in this little box one of the naked die. And amazingly enough this has proved one of the most convincing selling points. You show someone that this is really a small chip. If you have not seen it do come up and look at it because it's really small. People think of chips as being the size of your thumbnail or maybe you little fingernail, this is small. (someone holds up a pocket microscope) Yes, a microscope would help to see the detail.
What does it mean that I've got a chip that works. You have seen the instruction set. The instructions all work. They all work at 100 mips. I think I have executed everything, I think I have looked at everything but I can't read and write DRAM. It slows it down essentially. I can't get out of the boot code. the chip is suppose to boot up using 8 bit SRAM, a PCMCIA card typically. And it will do that quite happily. When it comes to DRAM I have badly generated the timing signals for the DRAM, they are much too fast. The DRAM doesn't have a chance to react properly before the signals go away. This is compounded by the fact that rise time on my output signals are much slower than I had expected so the fact that the timing is too fast and the signals come out of the chip too slowly just means that in some cases the signals are seen rising and then they go away. They never have a chance to reach the state where they operate on the DRAM properly.
I want to talk a little bit about why that is. I can imagine that what you are interested in is two things; first what the chip does and second is why is didn't do it long ago.
Actually the history of the mistakes I have made is a story in itself. I don't know if it will ever be told because it is almost that I deliberately tried to forget the mistakes and I have only a vague record of what they were. But someday perhaps the story will be out there.
The first thing I want to do is show you some code. I actually started coding OK for the P21 about a week ago and it is very interesting. Those of you who have bought OK or have looked at OK know that the 386 OK is really a prototype for the P21. This is the one that is going to count. This is the one with all the interesting characteristics. It is written in Forth, in FPC actually.
I must say that I really regret the absence of blocks. I've coded this as a great big long file. It has no structure except the structure that I forced myself to impose. The only reasonable structure in that regard is that these things should be sort of pages. Each of these ought to be sort of self contained entities. It is not the same as being three blocks.
This is not the first page, this is the second page This is part of the video output. The word FRAME fills memory with one frame buffer. It calls all the words above it. once you have constructed the video buffer you can throw away the code and overwrite it.
But the way video is setup you have to generate horizontal retrace. You have to generate vertical retrace, you have to put in the colorburst for NTSC video.
You see all those constants with commas, this is obsolete code it is clearly a week old, it can't be current. The commas will be coded as store pluses. I am just laying down a pattern of data in memory. It's Forth. This is the most Forth code I have written in years and I must recommend the language to you. (laughter)
It's very highly nested. But I think you can see that FRAME calls VR2. VR2 calls HR. HR calls 0S. The return stack on P21 is only 4 deep so that code won't work, but this is my first cut. You can nest more than four deep. That is sort a form of discipline on a programmer that is either good or bad depending on how you look at it. It's not particularly bad. You probably shouldn't write infinitely deeply nested larger stack. You should structure things more shallowly and P21 enforces that. The next version of P21 will probably have larger stacks. there always seems to a way of doing it. and one way of doing it of course is to pop things off the stack and store it off to memory and then bring them back when you are through. That is all I could fit into the space there. Actually Since I did that, that dates back about a year, and I have more compact registers so I could get six now but I haven't gone back and done that change yet.
This word 6! is worth talking about. It has six words of storepluses. It illustrates the fact that we are talking about DRAM and as long as you stay in the same 1K page you get 40ns access. If you go to a different page it costs you three times the access time so you want to stay on the page. For code like this there is zero probability that the code will be on the same page as the data. So every time you go from one instruction to the next you go off page. You fetch an instruction slowly, then the first data access is slow, the next three are very fast. What I'm storing in this page, I'm filling an array with zeros. This is the word CLEAR which is suppose to clear screen memory so you get a blank screen. I want to do this 96 times. I have 96 words in a row constituting 384 pixels, 4 per word. I do 24 stores 4 times.
I don't have to have a loop counter. A loop counter has two drawbacks. First it is slow and second it takes stack space. This doesn't use any return stack locations. Just for the record if you want to store N words, or if you want to do a thing N times, the most efficient way to do it the square root of N times in a routine and then call that routine the square root of N times. In this case N is 24 and the square root of that is 5 and 5*5 equals 6*4. You can either do this 4 times and repeat it 6 or do it 6 times and repeat it 4. If you want to do it 25 times you can do it 24 times and then one more time. So there is a lot of freedom in doing this and it is sort of subroutine loop or subroutine counting. I plan to use it a lot because it is more efficient than any other kind of looping because there is no overhead for the loop counter. So frame is done once and CLEAR is done occasionally and when it is done you want it to happen quickly. at the top is a cursor word. I can set the cursor to the upper left or the lower left or the middle. Having set the cursor somewhere I can tab N positions on the screen.
Carriage return (CR) falls into TAB. This is headerless Forth code. I think that would be essential to any future Forth system that I do and it is done that way in OK.
Sometimes two or three words fall into each other. It saves you a jump instruction or something. It really makes you feel good, it gives you a warm feeling. Carriage return (CR) falls into TAB and that tells it the cursor position.
The constants are most amusing. Essentially I'll come to some better examples later but essentially you just have to construct constants in programs often with very little rime or reason as to why the number is what it is.
Here is the menu code. Basically this is the key to OK. OK is driven by menus which have seven entries in them. This is how much code it takes to implement them.
VARIABLE menu : MENU POP DUP PUSH ; + menu A! ! ; : BUTTON ( - n 2) 700DF COM A! 7 AND ; : blank CLEAR : KEY ( - x n 4) 1000 BEGIN VRI 2* C=1 UNTIL DROP 10 B8B01 BEGIN BUTTON T=1 IF ; THEN DROP VRI +* C=1 UNTIL 2/ ; : -- menu KEY + PUSH ; : label LL menu A@ @ 8 + A! : LINE 1000 E8000 : TYPE ( a n) BEGIN @+ SYM *+ C=1 UNTIL DROP DROP ; : LABEL label -- ; : LIST ( a - x) UL A! 100 BEGIN LINE OR 2* C=1 UNTIL ;the word dash-dash (--) is what you execute when you don't care what happens. You are finished and are going to go back and wait for another keystroke.
I tend to use lower case to represent variables nowadays and upper case to represent words. It doesn't work in FPC. Is there a way to turn off case in FPC? Because I think it is foolish to throw away half the characters. On the other hand I was trying out a version of PolyForth recently and it wouldn't work. It wouldn't work because I wasn't typing CAPS. That suprised me. We learn these things, but we only have to learn them once very few years.
BUTTON reads something, in this case something from 1 to 255. (questions and comments on FPC and PolyForth) The word KEY, KEY calls the word BUTTON every vertical retrace interval. So basically it will actually hit every 8 (noise). at the beginning of KEY you see a BEGIN loop that waits for 8 VRI. It is the most efficient way, it is a reasonable way to do a loop that repeats 8 times. I put that 1000 on the stack and I am going to shift it left until it reaches the carry position. The P21 now has two conditional jump instructions. There are no skip instructions any longer. On P21 you can jump if T equals zero, which is what you need for IF. Or you can jump if carry equals zero, carry being bit 20, the left most bit, the carry bit. But it really isn't carry it is bit the 21st bit.
The syntax I am using here is C=1 UNTIL. I have 2* C=1 UNTIL that is the jump on C equals zero. So I wait for 8 VRI and look to see if there is a button and I do that B9B01 times. The other kind of loop. Where I am doing is an add instead of a shift. A shift only works up to 20 times, it lets you count 20 times. (for a loop counter on a 20 bit machine) But to do a shift only requires one data stack position. To do a longer loop than that takes two data stack positions. An increment of 10 and a count of B9B01. Instead of doing 2* C=1 I do +* C=1. +* is a conditional add that does not disturb either of its arguments. It is used as a multiply step. It requires that the lower bit of the top of the stack equals one, hence the B9B01. I successively add 10 to that until I get overflow. When I get overflow I know I have done this thing 5000 times. I am timing out 5 minutes.
(question about the numbers and Chuck mentions that he is dealing with five digit hex numbers on a twenty bit machine.)
If you look at the BEGIN BUTTON T=1 loop there that is the screen saver. It is amusing that for six instructions to do that thing that there is a whole industry that has been created. (laughter) And it will put all kinds of pretty patterns on your screen. I was at court recently watching the clerk of the court and her computer and I think I never actually saw a computer that didn't have a screen saver display on it. And they are extremely distracting. This is not what you want to do in an office environment, having fish on the screen or fireworks exploding. You really do want the screen to go blank. That is what this does. After a 5 minute time-out, after 5000 vertical retraces it times out it comes back to you with a keystroke zero. So the 0 entry in the menu is what you do with a time-out and 1 through 7 are what you do if you if you get a keystroke.
Then we come to the code for LIST. LIST calls LINE. LINE falls into TYPE. TYPE calls SYM. SYM should probably be called EMIT. Here is all the code for putting text on the screen. LABEL puts one line of text at the bottom line of the screen.
: 2/7 2/ 2/ 2/ 2/ : 2/3 2/ 2/ 2/ ; : .H ( n) F AND 10 -OR SYM FFFF8 TAB ; : .ADR .2 : .3 ( n) .1 : .2 ( n) .1 : .1 ( n) DUP .H 2/ 2/3 ; : .BYTE ( n) 0 SYM .2 ; VARIABLE dump : DUMP dump A! LL 8 CLEAR BEGIN RED A .ADDR @+ .BYTE .BYTE .BYTE .BYTE FF3F0 + 2* C=1 UNTIL ; : RING MID digit A@ @ 7 + 2* 2* TAB 3F +SYM ; : +DUMP ( n) PUSH dump A! @ + ! POP ; FORWARD MEMORY FORWARD CHANGE VARIABLE digit CREATE mask F , F0 , F00 , F000 , F0000 , : +H 11111 : +- digit A! @ mask + A! @ AND OVER COM OVER AND PUSH OVER PUSH AND + POP AND POP -OR ! CHANGE ; : -H FFFFF +- ; : +1 digit A! @ 4 -OR T=1 IF @ 1 + ELSE 1 +DUMP THEN ! CHANGE ; : -1 -1 digit A! @ T=1 IF + ELSE DROP +DUMP 4 THEN (cut off on projector)There are going to be two very important words in the system, 2/7 and 2*7, which just give you somewhere to jump. You do that many shift lefts or shift rights. The way the system works... where is it called? There you are. 2/3 preceded by 2/ gives you a shift of four. The way I have formatted this each line gives you four instructions, it represents one word. A call can be preceded by a single word. If I say 2/ 2/3 that is one word. You do one 2/ then you jump to the code to do 3 more. That is probably not efficient I can string out the four in place, but this way you get the semicolon for free. You tack semicolon onto a call and you get a jump, tail recursion is a very handy thing to do. So once again a free semicolon in many many cases you fall though.
.1 puts out a hex digit, .2 puts out two of them, .3 puts out three of them. So all these numbers are being put out from right to left it is not the most efficient way of doing it but it is the simplest way of doing it.
Then there is the code for DUMP. What DUMP does, the screen format is 24 characters by 18 lines. With this kind of NTSC video. That gives you characters that are 16 bits by 26 bits. Vertical resolution is ridiculously over implemented. But if you take advantage of that it eliminates scan lines.
What DUMP does is puts up 18 words from bottom to top of address and data. RING puts a ring around one of those digits and then these words at the bottom lets you move the cursor and alter a digit under the ring.
This is the trickiest code I have written here, +- , I didn't know what to call it. It is used to either increment or decrement one of the five hex digits in a word. +1 and -1 increments a cursor position. And the way I am doing it here, it turned out to be the easiest way. The center line in the screen is the one you are editing. Every time you slide the display up and down to get the line in the center and then you alter the digits in that word. It is simpler to do it that way than to calculate where on the screen the cursor should be. And I am always going to do it the simplest way.
: CHANGE DUMP RING MENU LABEL ; MEMORY ; MEMORY ; -- ; -- ; -1 ; +H ; -H ; +1 ; 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , FORWARD MAIN FORWARD CODE : +10 10 : +N +DUMP MEMORY ; : -10 FFFF0 +N ; : +1000 1000 +N ; : -1000 FF000 +N ; : BLANK blank : MAIN 0 LIST MENU LABEL ; BLANK ; MAIN ; RING ; CODE ; -10 ; +1000 ; -1000 ; +10 ; 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , : CODE : BLANK blank : MAIN 0 LIST MENU LABEL ; BLANK ; -- ; -- ; -- ; -- ; -- ; -- ; -- ; 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,Here are some menus. Here are the three menus I have implemented so far. The MAIN menu here goes to one of seven which I haven't listed. The MEMORY menu calls DUMP and displays memory. The CHANGE menu calls DUMP and shows you the cursor and lets you change things.
The interesting thing about this is that I use forward words I use. I have to make forward references. I don't know if FPC does that for you. (comment from someone about DEFER)
Here is where I come to the map is not the territory again. I have mentioned this before but here is good concrete example of it. I have to say FORWARD MAIN because I am going to be referring to MAIN in MEMORY and MAIN has not been defined yet because I MAIN uses MEMORY. I can't go this in any sequence that makes everything work with backward references.
I remember looking at assembler programs for almost any computer. The way people write assembler programs is that they write the code and then at the end of the code the have the data space. And they guarantee than any reference to data space is a forward reference so they guarantee that they make excellent use of the 1 and 1/2 or 2 pass compiler. If they had put their data references first the whole thing would be a lot simpler but it just doesn't happen.
There is a sequentialness to a first code description. You start at the top and you read down. I would argue that you should start at the bottom and read up. But in either case you are starting and moving. And that is totally unrealistic. A program does not have a beginning and end, a program is in random access memory. It makes very little difference which direction you go from any position within that program.
And yet we have this very strong bias for moving sequentially through it. The map is not the territory. The description of a program is very very different from the program, different in a philosophical sense.
(comment about hypertext editors and browsers)
The thing is of course a two pass compiler can resolve all of these so there is really no problem. But what we have trained ourselves to do is start at the beginning of a page and bring in another page. There are insidious effects of this predisposition which I am sure we don't appreciate. I would argue that we want to avoid that wherever possible. Let the program just be whatever shape it is and deal with it that way rather than transforming it into a different shape and learning to deal with it in that fashion.
The closer we get to our computers the happier we are going to be in the future. If there is any way that we can ever be happy with computers.
This is the amount I have coded so far and I think you can see that it is a little sketchy. I just wanted to show you because this is the greatest bulk of P21 code (coded by Chuck) the world has ever seen.
These lines are suppose to be the menu lines that go on the bottom of the screen. And each of these can have whatever you want to put on the screen to clue you as to what keys to what.
(question about how Chuck has evaluated these ideas)
This has all been implemented on a 386 so I have been living with it for a couple of years. And before that it was implemented on a ShBoom and before that on a Novix. I have been working on this for a long time. These ideas are nicely evolved.
This code is really what it is all aimed for. The others were crude symalacryms. Useful, because they show you how to implement such a thing, but crude they are not polished because I could never seem to invest the effort in 386 probably because it is the the worst computer ever invented.
This is a good text description of it. The reason I am doing this is probably is a short circuit development cycle. Probably to get booted up. Once I get this code running on P21 I will throw it away and live with the code on P21.
I have to basically add three more menus to this code to get what I want. One is the big bit editor that lets me construct characters. The other is the screen editor that lets me construct the things that list lists so that I put pictures on the screen. The third is the code editor that lets me display the memory dump as decomposed code and edit the code. When those three things are added, which will about double the size of what I have shown you, OK will be complete.
I can get closer to the hardware on a 386 this way. And I tell you frankly that I detest every software package for the 386 with on exception, TETRAS. TETRAS looks like it was done simply. But everything else is just ...
Question: Chuck, I have been away from Forth for a year or two. What is the purpose of the P21? What are you trying to do with it? Is it purely a video chip or is it a Forth engine?
Chuck: It has two purposes. First is to implement my CAD package, which is OK plus silicon layout. The second is to provide a development system for people to judge the value of this chip for their purposes and to encourage them to commission me to design a chip that suits their purposes. It is both a demonstrator of the technology and a tool that I can use for development.
Hopefully someone will pick up this chip and decide they like it and have five million of something to install them in. But that person doesn't exist yet. So this is a speculative investment on the part of myself and Ting. I want to commend and thank Ting for his support for the last couple of years.
Question: Is the P21 suitable for a controller, or a laser printer, or a generalized computer or just what exactly ..., all the above?
Chuck: It is a general purpose problem solver. It has the ability to access a very large memory, a million words of memory. It is very fast, 100 mip. It need not cost very much, in quantities of about 50,000 it should cost about $1 per part. I don't know of anything you can't do with it that you could do with some current machine. But it does have NTSC video output (composite programmed for NTSC or PAL) which is a peculiarity which kind of focuses it on the video market. I would be glad to take out the video if you don't want that.
Part 2: (~1hr)
Questions and answers:
Timing and engineering details about pin drivers, inductance, coupling, packaging, BGA, transmission lines, timing signals, interconnect capacitance, parallel plate capacitance, threshold voltage, diffusion capacitors, conventional transistor models, SPICE equations, OKAD transistor models, async architectural characteristics, stories about engineers at Samsung, silicon black magic, NOVIX and ShBoom patents, future technologies, P21 details, debugging with a scope, ideas, financing, chips, pizza, etc.
2512 10th Street
Berkeley, CA 94710