Chuck Moore's Fire Side Chat 1996
Chuck Moore gave his annual fire side chat to the combined Forth day meeting of the Silicon Valley, Sacramento, and North Bay chapters of the Forth Interest Group on November 16, 1996.
I was one of the speakers in the morning session and gave a progress report on the developments at the iTV Corporation. For those who didn't know the story I gave some background explaining that the iTV Corporation was developing a low cost set-top box for browsing the internet. This box will contain iTV's i21 chip designed by Chuck Moore, and software written by a number of programmers at iTV.
I showed the iTV glossies of its Pegasus product and its i21 chip.
I noted that I was pleased to see that the competition has now started running adds on TV. Not only do you see articles in the paper and magazines all the time about browsing the web on your tv but now you see commercials for browsing the internet on your tv on your tv. I noted also that I was pleased to see the inside of the Sony and Philips boxes and see why their boxes will be so much more expensive. I recalled that in one of our meetings at iTV Joe Zott was told that one connector would cost a few cents more than another and he said something like "but a penny is a lot of money on our board. When you are going to make ... boards it is a lot of money." I said that I was pleased to see MISC technology moving into a product where it can take advantage of low cost. So often over the last few years many people have said to me "worry about efficiency or memory use, or cost? My workstation (or PC) has lots of power and lots of memory and who cares if the hardware and software is fat and bloated." They really just don't get the idea behind MISC chips.
Our product is not yet ready for market and we expect to ramp up sales in first quarter of 97. I said that it was fun to browse and do email with Forth programs and it was fun working with the people at iTV and with MISC chips.
In the afternoon Chuck gave his annual fireside chat. I took several pages of notes which I will present here. There will be a few direct quotes, but I am summing up. This is from my notes and is not an actual transcript. :-)
Howdy. I am now working for iTV and we are making a set-top box using what some people have called "Moore technology" in this case the i21 chip.
iTV is currently in the process of raising money. It has been in this process since it was founded and will probably always be in this process. :-) As such we have a lot of people who drop by to take a tour. Mostly they are from Taiwan, Korea, and Japan. I put on a demo and show the co-evolution of the chips and design tools. This situation is very satisfying for me as each make each other possible.
There are two classes of visitors, investors or management types and engineers. You never really know what the investors or management types think, but the engineers listen, ask a few questions, blink, and go away impressed.
I have given various numbers for the speed of our chip, my latest measurements show we are now running at 500mips for a sequence of instructions in a word. So it executes each instruction in 2 nanoseconds then waits a long time for memory. We have 16k transistors and five processors on the chip.
We have produced i21a,b,c,d,e,f,i and are about to submit j,k so it is not a one try process. Sooner or later I will make one that works perfectly. J and k go in next week for multi wafer fab. So instead of making 25 parts we are making thousands this time. It is very exciting to get thousands of chips that don't work. :-)
J as previous versions has 56 pads, but k will use 68. In quantities of 25 ceramic packaging is cheaper, but in larger quantities you want to do it in plastic. We are pad limited so going to 68 pads means expanding the die. This means we will have much more empty space. The extra pads will be used for power and ground. On the previous chips we had 3 power and 3 ground but on the k we will have 16 power and ground pins. This is because inductance on the power and ground leads produces a drop in voltage or a ground bounce on chip. This can be very bad.
We have spent a lot of time and money down at Acurel trying to look at things on our chips. We use an electron beam probe to look at what was happening on the chip and see pulse widths and such but pretty much failed to get any useful information this way.
Its not just our problem, everyone puts lots of power and ground pins on their chips because of this problem. On the h chip I mis-estimated the power disipation and could see about a 1 volt voltage drop from this problem.
I was using .07 ohms per square for metal and 70 ohms per square for diffusion and figured that the .07 ohms per square was going to be negligible. I figured it is four orders of magnitude lower than the impedance of the diffusion so I didn't think it would matter. But there are so many tiles that it added up to an unacceptable level. The tiles are 2.6 micro meters on a side and the metal wires are 1 micron wide. A 2mm chip is 2000 tiles long.
Last month I beefed up the power busses. I was not simulating the power resistance in OKAD so it was the prime candidate for the problem we were seeing. The worst case was where I write all 1s on the data bus and all 1s on the address bus when these were all zero. This draws about 400ma for 2ns. This results in a voltage drop of 5V. :-) The drop is only to the i/o pads and at that time nothing is happening on the chip because we are waiting for memory access but it is still too much.
OKAD now draws the traces of four current signals across the screen and at the top I have added a trace for the current use. It is very interesting to see where the power use happens. It is sort of black magic. We have P transistors to power and N transistors to ground and when you construct a complementary transistor pair like this you get some parasitic capacitance. This can lead to circulating currents. So the amount of current going in and out of the transistor isn't the total current. The amount of current on chip vs off chip is hard to predict.
What happens is that there is virtually zero power until an instruction then there are 4 peaks as the four opcodes in a word execute at 2ns intervals and each peak is higher until it reaches a maximum of about 150ma on the last peak. As a result it is not always safe to execute four instructions in one word. It is always safe if you have 3 nops and one other opcode per word. Sometimes you can use 4 instructions, sometimes 2, sometimes one. It is both dependent on the instruction and with some instructions the data. So it is only safe in the general case now to run with 3 nops in a word. The new chip will hopefully fix this.
We now have great wide power busses. I used to think that narrow power lines were pretty but now they are ugly. Wide ones are pretty. I would bulldoze a path for a wider power bus across the chip like building a freeway across a city, plowing down transistors. The stacks were a problem because there was very little space there. I wrote a program that would widen the power bus. I had to notch it it place to fit around some stack circuit edges, but now the program produces some 0 width rectangles so I have to fix it.
The power busses are now twice as wide and we bring power in from the top and bottom so we have four times as much in the stacks and six times as much as we did elsewhere. So I expect this will solve the problem.
I do believe in "satisfysing" that is doing a design that is just good enough.
We have a box, inside is an i21, some dram, some flash. It has video output so you can connect it to a TV or monitor and it has a serial internet interface over a modem. The hardware is not complex, the software is not complex, it is an appliance.
You may not know but .gif file format is patented. It's one of those terrible patents. If you decode an image in .gif format you have to pay. The charge is $.25 for each box. $.25 is a lot when Joe is concerned about $.01 extra on a connector. The patent runs out in a couple of years.
We generate video in 384x480 format. We have a different aspect ratio than the PC so we must resample images when we display them.
You can code in a high level Forth with stacks in memory or use assembler. Assembler restricts you to assembler opcodes, there is no OR, no ROT, no SWAP, and memory addressing is a little different than Forth's @ and !. You can code in high level and convert critical routines to assembler. You can gain an order of magnitude in speed by using the on chip stacks and assembler but you must live with some restrictions. You can carefully craft the assembler code to get it to run fast by keeping data access onpage. You pay a 3 times penalty when data access is not onpage.
There is documentation for OKAD which few people have seen. In it there was a section that read "it would be easy to record the time when signals transit 2.5 volts." I would read it and think yes that might be useful. Then I decided to take that out of the documentation and just about that time I added it OKAD. Now I can measure pulse width in OKAD. When I did I was horrified. Pulses that were suppose to be 1 to 2ns were 700 picoseconds. I was off by about 2 to 1. Now I can point to any circuit and see pulse width or capacitance.
I know that the ideal pulse for my shift register is about 650ps, 750ps for a counter, and 900ps would be too much.
John Rible has been re-engineering the OKAD code. I have been saying for years that "the map is not the territory." It is time to revisit this issue. There was no source for OKAD to start with. We went from object code to a MASM source. 12,000 bytes of object, 12,000 lines of MASM. I'm dissapointed on this one. I wanted object code to stand on its own, but I failed. It is the biggest most complex program I have ever done and I have spent far more time using it than any other thing I have ever written.
We purchased a subset of the Mentor VLSI tools to see what it could do. We paid $160k. It can read my chip layout, it has spice, schematic capture, and can simulate the entire chip. But they can't really. Their simulator can't simulate this chip and mine can because mine was designed to do it.
One of my favorite circuits is the phase lock loop. We extracted just a pll set of transistors and imported it into Mentor and did a schematic capture and simulate. Mentor returns garbage. They can't even simulate a simple circuit from i21 let alone the entire chip. It is a question of the number of man years it would take to get Mentor to simulate the chip. The original estimate was man weeks, it has been man months, and who knows how long it would take.
This is a chip without a description. But Mentor needs schematic, they NEED schematic capture, they can't do anything without it.
It is backwards to get a schematic from a chip. I am dubious that iTV will invest the effort needed, and I question the value. My simulator is at least as accurate as theirs. Things I don't have they don't have either.
We factor things so that manufacturing process details has a file, and chip parameters has a file, and there is a file for simulation parameters. But really each chip has its own version of OKAD. I could have done it with one version but it would have required lots of state variables at run time, better at compile time thus many versions.
I am still the only user of OKAD. We have the OKAD hardware simulation of the chip, Jeff's software simulator, the Mentor simulator, and the actual chip. We run code and test routines through all of them.
There is a delay in getting parts fabricated and it results in a sort of pipeline. We have had the pipeline filled and I have been putting in chips with changes before seeing the actual chips come back from 3 previously submitted designs. The pipeline is now empty. I think our wafer run will be faster than the 25 part runs, 4 weeks plus 2 weeks to package.
(someone asked how big was the box and Chuck showed with his hands) Marketing said the box had to be a minimal size for customer perception. From the engineering standpoint it could be a bulge in the cable. But the box has to be a certain size and heavier is better.
Other design tools start with a design then generate schematics then transistors and finally vlsi components. People spend a lot of time making pretty schematics but then the software that does everything else with the schematic does a poor job.
There is a tendency today for everything to be text. I looked at what was being done with the Mentor tools. It was all ascii text manipulations. Everything seems to be going this way and it is wrong. We need to work more closely with the "thing" not with an ascii description of it.
We have plans for 32 bit chips. We plan to put ram on the chip. That is what you do with all the white space on the die that we are not currently using. With on chip ram you could actually sustain that 500mip operation. With external DRAM it will be more like 100. With the fab pipleline empty I may work on getting ram on the chip.
The lesson we have learned here is that things must be kept simple. We have a simple chip, a simple board, and simple software.
(Chuck was asked if the competition was aware of iTV)
The word we got was that Sony did know about us and they don't believe it is possible. I have the .8 micro process running at 500mips with 650ps pulse widths etc. Engineers will tell you that this is not possible. Conventional engineering says the limit is ten times lower than this and that these numbers are just not possible, but there is the chip.
page created by Jeff Fox
UltraTechnology's F21 Microprocessor
was designed by Chuck Moore using his OKAD system