Life beyond MuP21

Jeff Fox's report on Chuck Moore's presentation to the Silicon Valley Chapter of the Forth Interest Group Meeting May 27 1995.

Chuck said that F21 was suffering from too many improvements. He said you just have to draw the line at some point and finish everything, but that F21 had enough things added to it that it was difficult to finish. It is complex enough that it has this sort of tail. You would like to cut it off somewhere. The design required some chip wide changes based on the results of the last P8 prototype run. The simulator currently says 400 mips internally on F21, at one point it was tuned to 500, but Chuck worried that he had pushed it too far. He said that generally there were no advantages to making the internal clock slower even if memory is already the bottleneck. If he slows things down the signals may not be as good. He said there was a sort of natural resonance for the design. This design has a ~300 picosecond delay for each of the various sort of primitive operations that occur on the the chip. It takes ten or twelve of these steps to execute a CPU instruction.

One of the things that slow Chuck down is the speed of simulation. It takes about 5 minutes to simulate 12 instructions, so testing is very slow. Chuck is still using the 486 version of OKAD, and has only done a partial port of OKAD to MuP21. Since the hardware simulation in OKAD is billions of times slower than the actual chip much of the testing will have to be done with real hardware. Chuck hopes he will have "participant clients" who will be involved in first level testing. He originally thought that testing chips would be fairly simple. He would just write a little program to test all the instructions etc. But on a chip like F21 with three coprocessors all capable of interrupting the CPU there are many conditions that need to be simulated to verify the design.

Chuck said it was about a one day job to stretch the design of F21 CPU to make the 32 bit P32 design. Chuck said that he no longer asks clients how deep they would like the stacks. He gets the wrong answer. There appears to be a problem with replicating the stack structure and making it all work. F21 has 17/18 deep stacks, but Chuck said that 8/12 may be a more reasonable stack depth to implement on the P32.

Chuck said he is working on the analog signals. He says that at the speeds he is dealing with all the square waves have rounded edges and resemble sine waves anyway. He is interested in doing a modem, thinks there are opportunities to simplify things. He said that the idea that you want 200 transistors in an op amp seemed strange to him.

The traditional models of things are not always the most useful. The most important property of the capacitor in the modell in OKAD is that it has the same amount of opposite charge as the other side of he capacitor. Transistors are already simplified in OKAD the equations that Chuck uses are much simpler and more accurate and the "body effect problem" in SPICE is not a problem in OKAD.

The P8 design had a problem with the reset circuit. Chuck reasoned that this PNP transistor with the gate tied high would be off at power up. Instead it came up powerd on. This was because Chuck had not correctly guessed its power up logic state. Chuck would very much like to be able to do this, because it would be very valuable. But it is a problem because he has found no documentation on the power up state of these VLSI designs.

Chuck used to play chess, and solve puzzles for fun. Today his work on chips stimulates the same parts of his brain that these activites once did. He enjoys what he does. Chuck says he almost feels that he has been breaking the laws of physics by making microprocessors that are five times faster than the industry accepted speed of a flip flop in a given process technology.

Chuck has been thinking about sub .1 micron technology. It will be a big problem for the industry, because at these scales the interconnect delay becomes more signifigant than gate delay. But the old cell libraries and CAD programs are not designed to handle this. Chuck's answer to this problem is what he does now in OKAD, manual place and route. Chuck says the industry will need some very advanced AI software or they will need to get engineers involved in the designs to solve these problems.

Chuck also has been thinking about nano technology. Chuck feels that mechanical computers at that scale are not the only way to go. Electrons or fields will still be faster.

In OKAD one of the units represents 6000 electrons per femto coulomb. This is getting close enough to think about single electrons. In the smaller transistors that Chuck makes there are about 3000 atoms along the edge of the transistor, and the oxide level is about 60 atoms deep. So there are only a billion atoms in a transistor, and with the statistical distribution of dopants in transistors there are potential problems.

Chuck feels he is on the right path with integrating systems onto a chip. Where other people are thinking megatransistors Chuck is thinking kilotransistors. He said that memory is also expensive and is getting relatively more expensive all the time. The use of compact instruction sets and on chip i/o coprocessors makes for very efficient use of memory.

Chuck has been dealing with NASA and the Air Force and projects like satelites and the Mars rover. Chuck said the he perceives that the antagonism to Forth has faded, and they aren't locked into ADA. "They have this system that is even worse than C." There are still waivers, although there were suppose to have faided ten years ago.

The home-page branch instructions on the F21 are a good example of how an idea evolves. It ended up being much more involved than Chuck had planed. "It was worth it, but it was a complication."

The MuP21 could put a branch instruction in either of the first two instruction slots. If you put it in the first slot it still only used the last ten bits for an on page branch so the second five bits in the word were just wasted. If you put it in the second five bit instruction slot then that frees up the first slot for some other instruction. Chuck says that his code uses a literal in the first slot and a call afterward.

Chuck has been convinced that programs may get larger than 1K.

The F21 will have three types of branch instruction for branches that occur in the first slot. When these instructions appear in the first slot you have a potential for a 15 bit address. If the most signifigant of those address bits (a14) is a 0 then the instruction will be a branch in a 14 bit range. 16K words is up to 64K Forth instruction opcodes. If a14 is a 1 then it becomes a home-page branch. This home page can be set to low page of either DRAM or high speed SRAM in the configuratin register on F21. So a branch can take a 10 bit argument and have an opcode before the branch, or use a 14 bit page argument, or branch to the home page in a different address space. Since the return stack is a full 21 bits wide the return instruction will always get you back.

The @R+ and !R+ will also be very useful. Having a second memory addressing registers will simplify many data handling routines.

P32 will indeed use six five bit instructions. This means it can take advantage of some of those extra internal CPU mips. The CPU clock can run at up to six times the memory clock instead of four with the x21 designs. 32 bits give you two extra bits for instructions. One of the two left over bits would be a return instruction. The other bit has not yet been assigned, but many options have been considered.

Chuck talked a little about P8. He was tired of working a "brain damaged" chip, and has kind of given up on such specialized function chips. He said when he first implented the ROM he drew in the individual bits to program the device. Then when he had to change the program he decided it was time to write a program that would read a ROM image and generate all the transistors.

Chuck said he was using new names for the signals previously called I/O and SRAM on MuP21, and that he was now calling these RAM and ROM. This makes the memory spaces more obvious, fast and slow, 8 bit and 20 bit.

Chuck talked about the analog I/O coprocessor, the network interface processor, and the configuration registers on F21. F21 has many bits in the configuration register to adjust timing on the memory processor and each of the i/o coprocessors have bits to adjust their timing. The F21 memory coprocessor can use 12 or 25 ns SRAM, 150 or 250 ns ROM, and there are three other bits to adjust all timing up and down. This should allow the chip to run the memory interface faster to compensate for running at 3V. It will also make the chip more flexible for a wider range of memories.

Home Page