Nosc Mail List Archive
Nosc is for discussions of the design of Forth CPU, No Operand Set Computers, ie. Zero Operand or Stack Machines.
4/20/01 Myron Plichota wrote: > > This is a repeat posting of something that went out successfully to the > original MISC list: > > Dear MISCers, > > It gives me great pleasure to announce the successful > construction/testing/shakedown of the first Steamer16-based prototype > system. The shakedown was not a cakewalk due to transmission line phenomena > experienced on the hand-built circuit board, but it now seems solid enough > to do some serious development work for a startup venture I am involved > with. I still suspect a marginal data hold time problem during write cycles > to asynchronous SRAM, but I am not going to rush into a redesign or further > modifications of the prototype until more real-world experience has been > gained. It is running at the theoretical limit of 20 MHz which yields a > performance of 20 peak MIPs and 16.7 to 10 aggregate MIPs depending on the > particulars of the opcode mix in a given instruction packet. > > The design was written in VHDL and fitted to the Cypress CY37128P84-125JC > CPLD. > > I'd like to express thanks to those of you who encouraged me to see this > through, and in particular, for recent close support by my brother Vic. > > I am extremely busy working on the R&D for the startup venture, but I will > reply ASAP to any inquiries from the readership. > > Following is an excerpt from the Steamer16 documentation. > > Myron Plichota > ------------------------------------------------------------------------- > Programming model: > > Steamer16 consists of a program counter (PC) and a 3-deep evaluation > stack. All instructions except "lit," operate on data already on the > stack. PC is cleared on reset. Stack entries are undefined until loaded > using "lit," instructions. There is no program status word. > > Instruction encoding: > > The PC addresses inline data or instruction packets, not individual > instructions. Five 3-bit instruction "slots" are packed left-justified > into each 16-bit instruction packet with the lsb a don't care. The most > significant slot in a packet is executed first. All instructions execute > identically in any slot. Inline data to satisfy any "lit," instructions > follows the packet itself. > > Stack diagrams: > > Stack diagrams are used to describe instruction behavior by showing the > inputs and results on the stack in a concise notation. The inputs are on > the left-hand side of the "--" before/after separator, the results are on > the right. The input list shows only the relevant stack entries. The > output list shows all three entries. The symbols x, y, and z, are used to > denote the original values of any surviving independent stack entries. > > Instruction descriptions and opcode assignments: > > NOP, {0} ( -- x y z) no operation > > lit, {1} ( -- y z data) PC++ push data at PC, increment PC > > @, {2} ( addr -- x y data) fetch data from addr > > !, {3} ( data addr -- x x x) store data to addr > > +, {4} ( n1 n2 -- x x n1+n2) add 2ND to TOP > > AND, {5} ( n1 n2 -- x x n1&n2) and 2ND to TOP > > XOR, {6} ( n1 n2 -- x x n1^n2) exclusive-or 2ND to TOP > > zgo, {7} ( flg addr -- x x x) if flg equals 0 then jump to addr > else continue > > Instruction timing: > > First, an instruction fetch cycle is required to load the instruction > register with the packet of 5 instructions currently addressed by the > PC. Next, the instructions contained in the packet execute in 1 cycle > each. An exception is made when the remainder of a packet consists > entirely of NOPs or zgo, takes a jump. In either of these cases the > current packet is aborted and another instruction fetch cycle follows > immediately. Packets are fetched and executed in 6 cycles during > straight-ahead execution for an average of 1.2 cycles per instruction. > Any packet containing 4 trailing NOPs will execute in 2 cycles, i.e. > an instruction fetch cycle followed by execution of the first > instruction in the packet, whether it is a NOP or not. Myron Plichota wrote: > > This originally was bounced when the MISC list went "poof": > > Dear MISCers, > > Steamer16 has just successfully completed a field test of a machine vision > application in an industrial environment. The handwired prototype is > performing at the design limit of 20 MIPS with all of the anomolies > encountered during system shakedown fixed. Even though 20 MIPS is not a big > number by today's standards, it's amazing what can be done in real time by > coupling software with a minimal peripheral set of parallel inputs and > outputs and a 20 MHz free-running timer. This vindicates the original > concept of a microprocessor being used to replace lots of dedicated > hardware. It is extremely gratifying to be running a simple, low-cost > homebuilt system with such performance and 100% reliability. Goodbye Z80, > 8031, PIC, and other similar technology that was the previous economic and > performance limit for maverick computer experimenters. I need to be slapped > because I'm having so much fun and can't stop grinning. > > The sport of CPLD and FPGA design using low-cost tools seems to me to be the > 21st century equivalent of the early days of microprocessor hacking before > the monopolies forsook the needs of experimenters and stampeded towards the > big bu$ine$$ orientation that I have bemoaned ad nauseum in some of my > previous postings. I encourage those of you who are pursuing your own CPU > designs to see them through and publish news of the results via the MISC > list or email direct to me. > > Myron Plichota Eric Laforest wrote: > > On Fri, Apr 20, 2001 at 01:59:18PM +0000, Myron Plichota thus spake: > > This originally was bounced when the MISC list went "poof": > > > > Dear MISCers, > > > > Steamer16 has just successfully completed a field test of a machine vision > > application in an industrial environment. The handwired prototype is > > Excellent! > Good to know someone is succeeding! > I'm curious as to how 'heavy' a machine vision task can be run in real-time > by a 20 MIPS MISC chip... > > Eric LaForest Rick Hohensee wrote: > > > > > This originally was bounced when the MISC list went "poof": > > > > Dear MISCers, > > > > The sport of CPLD and FPGA design using low-cost tools seems to me to be the > > 21st century equivalent of the early days of microprocessor hacking before > > the monopolies forsook the needs of experimenters and stampeded towards the > > big bu$ine$$ orientation that I have bemoaned ad nauseum in some of my > > previous postings. I encourage those of you who are pursuing your own CPU > > designs to see them through and publish news of the results via the MISC > > list or email direct to me. > > > > Myron Plichota > > ------------------------ > > Congratulations. > For those that don't read comp.lang.forth, H3sm, Hohensee's 3-stack > machine, now exists entirely as x86 assembly, and seems pretty snappy. The > 80 or so primitives require about 3k of assembly. Your machine looks like > it would require about, oh, 200 bytes, for a 32 bit subroutine-threaded > VM, if I am in the ballpark as to what it involves. > > Questions about Steamer16 > > It's a one-stack MISC with 3 parameter stack cells? > how many gates/transistors on the device in question? > how much room is left? > how long did it take to write the VHDL? > you're using real good old SRAM with it? > what are it's bus widths? > etc etc > > MORE INFO! :o) > > Rick Hohensee > www.clienux.com 4/22/01 Jeff Fox wrote: > > Dear NOSC list readers: > > I have completed the upload of all my videos of Dr. Ting's > presentations and John Rible's VLSI design classes for SVFIG > to the streaming video theater including the latest > presentations by Dr. Ting from 4/14/01. This is where Dr. > Ting talks about the release of P8, P16, P24, and P32 > sources and some other stuff on CD-ROM. > > Most of these videos were never on the web and some are > new and were not available previously on CD-ROM. > > Jeff Fox 4/24/01 Rick Hohensee wrote: > > > > > Rick Hohensee's questions about Steamer16: > > > > 1) It's a one-stack MISC with 3 parameter stack cells? > > > > Yes, that's all that would fit on the CY37128P84. I figured > > this was enough to do something useful after a paper > > evaluation using the quadratic solution as a benchmark > > as it could be performed on a Hewlett-Packard RPN > > calculator. > > > > 2) how many gates/transistors on the device in question? > > > > Cypress's datasheet doesn't use those metrics to quantify things. > > The next heading sets the record straight. > > > > 3) how much room is left? > > > > The following is an excerpt from the report file generated by > > Cypress's Warp2 VHDL compiler. There is not enough left over > > to add more instructions or stack registers. I know because > > I tried ;) > > > > Information: Macrocell Utilization. > > > > Description Used Max > > ______________________________________ > > | Dedicated Inputs | 1 | 1 | > > | Clock/Inputs | 2 | 4 | > > | I/O Macrocells | 59 | 64 | > > | Buried Macrocells | 52 | 64 | > > | PIM Input Connects | 242 | 312 | > > ______________________________________ > > 356 / 445 = 80 % > > > > Required Max (Available) > > CLOCK/LATCH ENABLE signals 2 12 > > Input REG/LATCH signals 0 69 > > Input PIN signals 3 5 > > Input PINs using I/O cells 0 0 > > Output PIN signals 59 64 > > > > Total PIN signals 62 69 > > Macrocells Used 111 128 > > Unique Product Terms 476 640 > > > > 4) how long did it take to write the VHDL? > > > > The initial implementation (~18 months ago) was written in 3 days. > > After reviewing the boolean equations in the report file and > > convincing myself that I expressed myself correctly, I spent 4 > > days running simulations. This was essentially a warmup exercise > > with the then-unfamiliar tools prior to doing "serious" work for > > a startup which went nowhere. The design sat in the can for a > > year and I kept wondering whether it was worth realizing, due > > to its obvious limitations and much sexier examples of silicon > > that are out there. The CY37128P84 was chosen because it was the > > biggest gun available that would fit in a socket I could deal > > with with a view to hand-wiring a prototype for the (unfunded) > > aforementioned startup. I had inventory of a 60 pc. min. qty. > > purchase all dressed up with nowhere to go, so I dusted off > > the design files and convinced myself that it WAS worth taking > > to completion. > > > > I wrote myself an assembler (in Forth, of course ;) and became > > unhappy with the fact that the NOP, padding that was frequently > > required between the last explicit instruction in a packet and a > > following (packet-aligned) jump target label cost useless clock > > cycles to execute, so I redesigned the instruction sequencer to > > force a fetch cycle under those circumstances. While I was at it, > > I made the master reset synchronous. This took 1 day, followed by > > 4 days of fresh simulator sessions. > > > > All hell broke loose when I fired up the prototype! Series > > termination on the clocks and high-speed strobes solved some of > > the problems, but some of the test software crashed consistently. > > Persistence and trial and error showed that the zgo, instruction > > failed when a jump was taken in 2 out of 5 slots: inline literals > > were being executed. It took me 2 days of staring at the VHDL code > > for the instruction packet sequencer to figure out what the problem > > was and fix it. I also changed the write strobe timing prior to > > identifying the real problem and have kept it that way since. I > > took a look at my test vectors and confirmed that they never > > generated the scenario which caused the zgo, instruction to fail. > > Oops! > > > > So in summary it took ~6 days of VHDL design or debugging and ~8 > > days of simulation (should have been more in retrospect) to bring > > things to the current happy state of affairs. > > > > 5) you're using real good old SRAM with it? > > > > I'm using a pair of 25 nS 32Kx8 SRAMs liberated from the L2 cache > > sockets on obsolete PC motherboards on their way to the dumpster. A15 > > goes directly to the /CE pins of the SRAMs, mapping them to the low 32K > > cells. > > > > 6) what are it's bus widths? > > > > Both the address and data busses are 16 bits. There is no byte > > addressing capability. 2 early (R/W and W/R) and 2 late (/RD and /WR) > > control signals are generated to obviate the need for external > > decoding (and consequent delays) in most forseeable cases. W/R goes > > to the SRAM /OE pins and /WR goes to the /WE pins. The /WR strobe is > > synchronous to the 40 MHz 2x master clock and pipelined such that it > > pulses low when the 1x clock (derived from the 2x clock) goes low to > > put data and address hold time well clear of the SRAM's 0 nS minimum. > > The /RD strobe is combinatorially generated when the 20 MHz 1x clock > > is low to assure data hold time during read cycles, but this signal > > is not used on the prototype system. All of the internal registers are > > synchronously clocked by the rising edge of the 1x clock. > > > > 7) etc etc > > > > The 20 MHz clock limit is due to the ripple-carry adder implementation. > > BTW, the first VHDL module I wrote used the syntax "+" to specify the > > behavior of the adder, and that _in_itself_ exceded the capacity of the > > chip. Apparently, a full lookahead-carry implementation was attempted. > > Later on I noticed that there were radio buttons in the project options > > dialog box: Goal <- area|speed which defaulted to speed. I had already > > explicitly written the ripple-carry solution and haven't tried the "+" > > syntax with area as the goal. > > > > Interrupts, wait states, and bus sharing features are not implemented > > due to fitting limitations. A 2nd CY37128 is used on my prototype > > system as an I/O companion chip with 16 bits of parallel input, 16 > > bits of parallel output, a free-running 20 MHz 16-bit timer, and a > > register which provides the 2/ function. It is self-decoded near the > > top of memory. A cable between the host PC provides the JTAG interface > > for burning the JEDEC fuse maps into the devices and downloading code > > using the boundary scan registers to drive the target busses. The > > JTAG code downloader is integrated with the assembler. > > > > The 5V operating current was measured at 440 mA on the prototype system > > with 2x CY37128 (Steamer16 + I/O), 2x 25 nS SRAMs, and a 40 MHz 2x > > clock oscillator module with the system out of reset and running a real > > application. The situation could be improved by invoking the low-power > > options of the CY37128, downgrading the 1x clock to ~10 MHz, and using > > low-power 70 nS SRAMs, which could be battery backed up to retain code > > and data during the powerdown condition. > > > > The code is not particularly space efficient despite the compact > > instruction encoding due to the heavy incidence of inline literals, > > the primitive instruction set, and NOP, padding statistics. > > > > The software tools are written in a bastard dialect of Forth-79 that I > > have been using since 1990. The assembler proper is lean and mean. > > Colon definitions in an include file are used to implement macros. > > > > Because Steamer16 is not a true Forth chip, and in fact lacks call and > > return instructions, it is necessary to use a re-entrant stack frame > > strategy I call ArF (don't ask unless you have a sense of humor) which > > is a hybrid between Forth and C under the hood. A call is implemented > > as a sequence of 7 instructions, a return is 5 instructions, and a read > > or write indexed into the stack frame is 5 instructions. The coding > > style that inevitably results can be considered bad Forth that > > over-uses static variables and the PICK operator, but on the other > > hand relieves the programmer of optimizing the order of operands on > > the stack or resorting to the use of stack reordering operators. ArF > > mandates preservation of the input arguments: the results are > > appended to the list and the calling parent is solely responsible for > > building up or tearing down the stack frame as in C, therefore Forth > > operators like DUP, OVER, and R@ are not required. The on-chip > > evaluation stack is nevertheless used for bursts of Forth-like > > activity until it is appropriate to write an intermediate or final > > result to the stack frame or a static variable. The fine-grained > > subroutine factoring that is one of Forth's major strengths is not > > as attractive as it is with a true Forth chip, but there are some > > compensations and opportunities for optimization that are unique > > to the Steamer16/ArF genre. I have experimented enough with the > > software end of things to come to discover that agonizing over > > shaving off a cycle or two from a subroutine is very seldom > > worthwhile and sometimes, very surprisingly, retrograde. I believe > > this is due to the instruction alignment statistics, which are > > deterministic but difficult to exactly predict as one is writing > > code. This is not to say that the programmer should not be > > performance-conscious, but rather that the straightforward ArF macros > > offer _pretty_good_ runtime performance. The more I use it the more I > > like it, and it beats the snot out of any off-the-shelf CPU that I > > have used so far, except for DSPs. > > Exiting. > > I'd heard a prominent Forther was thinking of doing a one-stack Forth, and > I thought he was BSing me, but if the one stack was the parameter stack > that would explain it. Your instruction counts for call/return are about > proportional to how long one near call instruction actually takes on a 386 > anyway, since it's at some level doing the same things. > > CLD DD Clear Direction Flag > > Opcode Instruction Clocks Description > > FC CLD 2 Clear direction flag; SI and DI > will increment during string > instructions > > E8 cw CALL rel16 7+m Call near, displacement relative > to next instruction > ^^^best case timing > > Linux has 60% more calls than returns. That's probably a LOT of functions > that are in fact inline code that's being called. The subroutine threaded > H3sm has 3 times the calls as returns though, since all thread words are > mostly calls. If I did hardware I'd be looking into what I was talking to > Jan Coombs about almost always doing a call on each instruction > fetch. Then you can maybe get the return stack activity in parallel. > > Rick Hohensee > www.clienux.com 4/25/01 Myron Plichota wrote: > > I have my website online now at: > > http://www3.sympatico.ca/myron.plichota/ > > Zipped archives are there for: > > Steamer16 VHDL > DTC Forth with CORDIC sin/cos function for TI's C6x DSP family > optimized floating-point FIR and IIR filters in GNU C > > I want to add a few enhancements to the software development tools > before I release them and also provide reference schematics. I will post > notifications as more info becomes available. > > Myron Plichota 4/26/01 Myron Plichota wrote: > > On Tue, 24 Apr 2001 21:17:11 -0400 (EDT), Rick Hohensee wrote: > >> > First, a gentle reminder: please try to keep the size of postings to a > minimum. My first reply on this thread was deleted by the server filter, > and Martin was good enough to intervene (thanks Martin), but that was > pushing my luck. In particular, I think that cutting out all but the > relevant portions of the message being replied to will go a long way to > conserving the server's resources. > > > I'd heard a prominent Forther was thinking of doing a one-stack Forth, and > > I thought he was BSing me, but if the one stack was the parameter stack > > that would explain it. Your instruction counts for call/return are about > > proportional to how long one near call instruction actually takes on a 386 > > anyway, since it's at some level doing the same things. > > I once read somewhere that RISC design philosophy was to expose an > orthogonal microcode-like instuction set to compiler optimization, and > the MISC/NOSC machines we have seen so far do so as well, but in the > Forth tradition of small well-factored subroutines, or in the case of > Steamer, as comparitively memory-hungry assembler macros and > coarser-grained subroutine threading. In any event, synthesis of a > variety of addressing modes requires multiple instructions and clock > cycles. It's interesting how the comparisons pan out with respect to > CISCs like the x86. It costs you at one level or the other it seems. > > > Linux has 60% more calls than returns. That's probably a LOT of functions > > that are in fact inline code that's being called. The subroutine threaded > > H3sm has 3 times the calls as returns though, since all thread words are > > mostly calls. If I did hardware I'd be looking into what I was talking to > > Jan Coombs about almost always doing a call on each instruction > > fetch. Then you can maybe get the return stack activity in parallel. > > I haven't subscribed to comp.lang.forth, so I'm in the dark about H3sm. > Do you have a zipped archive you could send directly to me? > > BTW, if there are any C compiler design gurus on the list, I am very > interested in an appraisal of Steamer16 as a target for a tiny, > integer-only flavor. I started on a Forth compiler, but code bloat and > low performance set in very quickly and it turns out to be a poor > marriage :( > > Myron Plichota Rick Hohensee wrote: > > > > > On Tue, 24 Apr 2001 21:17:11 -0400 (EDT), Rick Hohensee wrote: > > > > > > > > I haven't subscribed to comp.lang.forth, so I'm in the dark about H3sm. > > Do you have a zipped archive you could send directly to me? > > > > ftp://linux01.gwdg.de/pub/cLIeNUX/interim and the last hardcopy "Forth > Dimensions" > > > BTW, if there are any C compiler design gurus on the list, I am very > > interested in an appraisal of Steamer16 as a target for a tiny, > > integer-only flavor. I started on a Forth compiler, but code bloat and > > low performance set in very quickly and it turns out to be a poor > > marriage :( > > > Martin Richards' page, author of BCPL, predecessor of C, and Tripos, > predecessor of AmigaDos. > http://www.cl.cam.ac.uk/~mr/ > > I uncovered a thing called smallc from gatekeeper.dec.com. Honk if you > can't find it. > Small C version C3.0R1.1 > (SCC3) > > Chris Lewis > > Rick Hohensee > www.clienux.com 5/27/01 Myron Plichota wrote: > > http://www3.sympatico.ca/myron.plichota 6/5/01 Lonnie Reed wrote: > > If you haven't seen this yet: > http://www.eet.com/story/OEG20010604S0087 > > This looks like a great idea. Instead of a grid layout, use diagonal > connections. It reduces power consumption, connection length/signal > delay and increases performance. > > If Chuck is working on OKAD III, he might want to look into this. > > Lonnie 7/01/01 Jeff Fox wrote: > > http://www.mindspring.com/~chipchuck > > "How many chips could a chipchuck chuck if a > chipchuck could chuck chips?" Mark Sandford wrote: > > How much would it cost to built 25x chip? > $14K for protos but how much for NRE? 7/03/01 Eric Laforest wrote: > > Code is loaded into the on-chip 512w memory for execution... > > This seems to imply that the internal and external RAM are in a flat, > contiguous memory space. > > I presume one simply copies the words one wants to use right now into > internal RAM and then do a subroutine call to it? > > ...or is the dictionnary simply set up to place the shorter, more oft-called > words on the chip? > > Eric LaForest > Jeff Fox wrote: > > Eric Laforest wrote: > > > Code is loaded into the on-chip 512w memory for execution... > > > > This seems to imply that the internal and external RAM are in a flat, > > contiguous memory space. > > The X18 chip and 25x chip are a little different in that > X18 has a pinout that is a superset of a mirror image > of a 512Kx18 cache SRAM. 25x is a superset of that. > > So X18 has access to >1MB of external 4ns memory and > access to internal 1ns DRAM and ROM. > > 25x adds 24 X18 core but without their external SRAM > connection pins. So they have limited memory. They > have register to register or memory to memory > (I don't know which) communcation links in rows and > columns for a total of 180Gbps internal communication > bandwidth or something like that. > > Processors on the outside of the block have connections > to I/O pins that can be programmed to do digital or > analog I/O and whatever prototcols will fit. Chuck's > ideas about software drivers for I/O are part of his > idea of Forth. You give up some speed in exchange > for a wider range of I/O capability than with > dedicated I/O hardware. But with a 2400Mip core > most things in the outside world look pretty slow. > > I does not have one pool of memory with a 25 way > bus arbitration unit or anything like that. > > I don't know what mechanism Chuck uses to distinguish > internal addresses from external since the chip is > 18 bits wide and the external address bus is 18 bits. > I would guess that it uses some paging mechanism > but I couldn't find the information on the site yet. > > The site is in progress. There is nothing there on > the CAD internals yet. I know Chuck plans to add > a lot of things. > > He does not have time to do unpaid support for > hobbyists wanting to play with ColorForth or > his chip designs, but we can collect a list of > things, like missing bits of documentation > in our mail lists and pass our requests as > a group for what we need on to Chuck. > > Since the instruction set on X18 is basically the > same as F21 with a little more information one > could modify the simulators and emulators for > F21 to do X18 and then later 25X. Armed with > those tools people could develop real code > suitable for framing in ROM. > > The best ideas for donated code routines could > go into the ROM. Chuck will have some interesting > ideas about what to put into the ROM but he > can be influenced by logic if someone else > does identify code that deserves to be ROMmed. > > > I presume one simply copies the words one wants to > > use right now into internal RAM and then do a > > subroutine call to it? > > The processors are asynchonous and software must > coordinate processes. On X18 there is 4ns and > 1ns memory just as on the F21 prototypes in .8u > there are SRAM and DRAM and ROM busses. The > cache SRAM chips are not used as cache, there is > no cache controller. The software running on > the CPU manages things by putting the things > that need to run fast into the faster memory. > Instead of the old external memories that > provided 18ns SRAM and 30ns DRAM access these > chips have 1ns internal DRAM and 4ns external SRAM. > > > ...or is the dictionnary simply set up to place > > the shorter, more oft-called words on the chip? > > Dictionary is software. It does whatever. > One would assume that time critical code and/or > more often used code would be run on chip. > > Remember that Chuck says that most programs fit > into 1K. He has room for 1.5K inlined Forth > opcodes or .5K calls on chip or some mix in > between. > > Chuck has also said that we could scale the tiles > on F21 down to .18u and modernize the memory busses > if there was interest. The increased speed and > reduced power make it more scifi. The 10G > timer could become a 50G timer. 200M analog and > faster bit banged analog and digital I/O etc. > > One advantage of .8u on the old prototypes is that > the chips could be made in the third world on > "obsolete" fabs for almost nothing. They have > quoted prices for wafers that look like what we > paid for die even in small quantities. And 500Mips > per node is sufficient for many problems. The > nodes are also bigger than 25x nodes but require > external memory. I guess Chuck could fit 20K words > of memory onto an F21 in .18u without the die > becoming too large. > > 25x was Chuck's first multiprocessor to have have > multiple CPU instead of CPU and multiple I/O > coprocessors. He picked 5x5 to keep the chip > tiny and cheap. If someone wanted to pay for > a big one with bigger node clusters that could > also be done. A large wafer could hold thousands > of 2400 MIP X18 cores. > > These designs are well suited to a class of > computationally challenging problems. The problems > are real and attacking them today is very expensive. > People are using things like machines with thousands > of Pentium chips. If you do a MIP/$ or MIP/W > comparison you can see the idea. They are not > designed to run sluggish bloated popular software. Eric Laforest wrote: > > On Tue, Jul 03, 2001 at 12:37:48PM -0700, Jeff Fox thus spake: > > > > He does not have time to do unpaid support for > > hobbyists wanting to play with ColorForth or > > his chip designs, but we can collect a list of > > things, like missing bits of documentation > > in our mail lists and pass our requests as > > a group for what we need on to Chuck. > > Absolutely. > > My first request is more details on the memory system. :) > > Actually, is he planning to release the sources to his ColorForth/OKAD II > systems on this site? > > Eric LaForest Jeff Fox wrote: > > Eric Laforest wrote: > > Actually, is he planning to release the sources > > to his ColorForth/OKAD II > > systems on this site? > > ColorForth will get source and object files soon. > > CAD will get tutorials, examples and component > libraries since the links are there but no files yet. > > As to the full five hundred lines of source to OKAD II > and the full source to the chips themselves that is > something else. He has wanted to get paid something > for the twenty years of work he has invested into > chip designs mostly without pay. Chuck is past > a reasonable retirement age given his family history. > > He has discussed his option to put the CAD and > chip sources into the public domain. That might > be a last ditch attempt to get someone to pay for > consulting work. But he has no hard plans to do that > at the moment. Personally I think it is a lot > to ask. I think it would have more negatives than > positives. > > One problem I see with that is the one of > where that would lead. The main interest so far > has not been from "talking chinese doll makers" as > someone wrote in one of the other mail lists. > The chips may just end up as weapons anyway but > it has been an issue. > > There was an interesting story on the news > yestersay about Fort Lauderdale Florida where > the police have many survelance cameras around > town connected to a computer with face recognition > software that do criminal records searches on > people in a crowd. If you make the cost, size, and > power consumption of that sort of technology zero > you enable not just big brother but things like smart > bullets and human hunting robots which are seriously > funded projects that have wanted these chips. And like > I have said many times, I have many stories from the > last ten years. It gives another reading to the > popular advice that we need to find a "killer app." > > The idea of truely inexpensive computers with > reasonable software that could enable and educate > more than 2% of the world's population seems to of > no interest to anyone. Most people in the lucrative > computer industry prefer the shell game with the > highest profit margins and are opposed in every way > to that idea. Some people who should know better do > all they can to trivialize and distort what Chuck has > done in order to protect their investments in time, > money, and skills in conventional computer technology. > > Very few people have invested time, money, or > skills in support of what Chuck has tried to do. > Most fans have only provided lip service and only > to other fans. I must admit that I expected > some support from the Forth community rather than > have them be the most opposed group and the most > threatened by Chuck's ideas (ideas that are not > already at least twenty years old that is. ;-) > > One nice thing about Chuck getting his own website > is that I could more easily get out of the loop > altogether and avoid the kill (meaning insult, distort, > trivialize, demonize, name call etc.) the messenger > syndrome that has characterized the last decade > where I was the only person making information public > on this stuff. Eric Laforest wrote: > > On Tue, Jul 03, 2001 at 02:31:46PM -0700, Jeff Fox thus spake: > > Eric Laforest wrote: > > > Actually, is he planning to release the sources > > > to his ColorForth/OKAD II > > > systems on this site? > > > > ColorForth will get source and object files soon. > > Very cool. > > > As to the full five hundred lines of source to OKAD II > > and the full source to the chips themselves that is > > something else. He has wanted to get paid something > > for the twenty years of work he has invested into > > chip designs mostly without pay. Chuck is past > > a reasonable retirement age given his family history. > > > > Hmm.....~14000$US for a run of 25 packaged chips from MOSIS. > This means ~560$/chip. (somewhat more really to cover Chuck's consulting fees) > This is not an impossible sum for many people. > Is there interest enough in 25 or more people to fund a run of x25/F21 chips? > At that cost, a group of determined enthusiasts could fund yearly > or half-yearly runs with rewards of helping Chuck, getting cool > technology and furthering MISC as a whole. > > > > > One problem I see with that is the one of > > where that would lead. The main interest so far > > has not been from "talking chinese doll makers" as > > someone wrote in one of the other mail lists. > > The chips may just end up as weapons anyway but > > it has been an issue. > > Indeed a difficult situation to avoid. > > Eric LaForest 7/04/01 Mark Sandford wrote: > > Are there people to group fund such an effort? > Exactly how much more than the 14K would it take > for Chuck's time, using his $100/hr = $4000/week > lets say he needs 3months that comes out to $62K. > Are there say 12 people willing to put up 5k each > to help this thing along? Chuck might be willing to > work for less if it is a group rather than comercial > effort, in which he would still retain all rights. > > I think that there are a lot of possibilities here, > with silicon processing being a art rather than a > science, the chip costs come more from the fact that > not every transistor or wire is perfect so the fabs > need to test every chip and toss those that don't pass > for speed or functionality. Some poeple particalarly > HP Labs have done work with systems with known flaws > and the system routes around the problems. At 0.18um > you can't expect more than something like 50% fully > functional chips but if you are willing to call 20 > cpus good enough, then you withstand a failure of 20% > which increase yields greatly. Of course if your one > and only failure is in a main bus elemnet or I/O block > your toast, but still, you beat the odds by a large > margin. This should be very attriactive to consumer > electronics companies, which end up avoiding test and > building products and just doing a final test and > tossing failures in the trash as they need to keep > costs down and repair on small low cost devices isn't > cost effective. > > The space people (NASA and sattalite builders should > also be interested as then can have redundant > processors and just switch to another when a gama ray > takes out one processor. > > I would suggest that this be retargetted somewhat as > 25 > processor seems a little overkill, 16 or 9 (assuming > you like squared numbers) seems more reasonable and > the SRAM at 4ns (250 MHz before timing margins > on-chip), would need to get shared between 25 > processors. Assuming they are doing similar things > this leaves only an effective 10MHz per processor > while they are running at 2400MHz, so unless the > application is heavily, heavily inner loops thay will > spend a great amount of time twidling thier thumbs > awaiting thier turn on the bus. Even running solid > multiplies at 125M this still leaves a large margin > for data transfers. So firstly I'd trim down the > number of processors and might suggest looking at > pairing the processor with a x36 chip instead of the > x18 to get two "18 bit words" per cycle and > effectively running the memory at 500MHz x18. > > Another area that I might suggest a change is the > memory per processor 384 words might be 1K words if > the number of processors is trimmed down to 16 or 9 so > you would be more likely to run without needing to > load or store data as frequently. > > I might be interested in contributing to such an > effort, I would need to know more about Chuck's > experience and how likely the first try is likely to > work (Murphy's Law and all). I bought one of the > original P21 chips and I beleive that those didn't > function untill the 8th run so this is never a slam > dunk especially if 0.18 and TSMC are new to his > techniques. Jeff Fox wrote: > > Mark Sandford wrote: > > The space people (NASA and sattalite builders should > > also be interested as then can have redundant > > processors and just switch to another when a gama ray > > takes out one processor. > > iTV came from NASA. iTV did radiation testing on > i21 and found it to be extremely resistant to > ionizing radiation despite not being designed with > rad-hard rules. They also worked with the AirForce > on processors for spacecraft. > > > I would suggest that this be retargetted somewhat as 25 > > processor seems a little overkill, 16 or 9 (assuming > > you like squared numbers) seems more reasonable and > > the SRAM at 4ns (250 MHz before timing margins > > on-chip), would need to get shared between 25 > > processors. Assuming they are doing similar things > > this leaves only an effective 10MHz per processor > > while they are running at 2400MHz, so unless the > > application is heavily, heavily inner loops thay will > > spend a great amount of time twidling thier thumbs > > awaiting thier turn on the bus. > > Of course. The same thing applies to workstation farms. > All problems have a balance between node processing > and node communication. The design was not created > for problems that are essential serial and are > limited by communication bandwidth or serial processing. > > Instead this design is for computationally intense > problems that can use 60,000 MIPS per $1 cluster chip > and not for software or problems that would limit > it to 250MIPS. A single X18 is capable of 2400MIPS > so why limit 25 of them to a total of 250MIPS? > > The proper model for F21 or 25X is a workstation > farm, but without the hardware and software overhead > needed to put C or Unix on each node. A very small, > very cheap, Forth workstation farm. > > > Even running solid > > multiplies at 125M this still leaves a large margin > > for data transfers. So firstly I'd trim down the > > number of processors and might suggest looking at > > pairing the processor > > Like P21, F21, i21, and others the X18 design was > picked to reduce the prototying cost and get a > chip with pins that fit the prototyping constraints. > So if someone has their own fab line and is not > restricted by such constraints and is also not > concerned with budget constraints the number of > processors per die is completely variable, from > 1 to thousands. There is interest in thousands > of processor per chip. > > The width is also variable from 5 bits to whatever. > Chuck's designs are in columns so scaling the > width is mostly trivial. Chuck said that making > a P32 from a P21 was about a day's work in OKAD. > > But the + and +* instructions timing is proportional > to bus width, so those opcodes would be slower with a > wider bus. Also the pin count and costs go up. Pins > are more expensive than silicon in high volume. That > is why a 60,000 MIP 25x can cost about the same thing > as a 2400 MIP X18. > > > with a x36 chip instead of the > > x18 to get two "18 bit words" per cycle and > > effectively running the memory at 500MHz x18. > > It could be done, and still get 2400MIPS from > the internal memories. Larger amounts of > internal memory could be put on larger more > expensive chips if prototyping costs are not > an issue. But these have not been billion > dollar type funding projects so far so > things have been kept small to make it possible. > > > Another area that I might suggest a change is the > > memory per processor 384 words might be 1K words if > > the number of processors is trimmed down to 16 or 9 so > > you would be more likely to run without needing to > > load or store data as frequently. > > True. Have you a particular application in mind where > you have determined that twice as much on chip > memory is needed? I spent years doing that sort of > thing to tweak F21 before it was fabbed. > > I suggest than anyone with a particular idea simulate > it extensively to be able to tweak the design to > do what you really find best suited to your needs. > Chuck is in the custom silicon business. He can > make it work in many ways depending on what the client > wants. It is a little like picking items from a > menu. Chuck would love to make many custom versions. > But he would also really like to make a production > run and get some chips into some product somewhere. > It is sort of a key element that hasn't happened. > > > I might be interested in contributing to such an > > effort, I would need to know more about Chuck's > > experience and how likely the first try is likely to > > work (Murphy's Law and all). I bought one of the > > original P21 chips and I beleive that those didn't > > function untill the 8th run so this is never a slam > > dunk especially if 0.18 and TSMC are new to his > > techniques. > > It did take 8 tries to get P21 completely working. > It had the thermal bug like all conventional chips > but at only 100Mhz in 1.2u Chuck didn't bump into > it and didn't find it. When he scaled down to .8u > and went to 500Mhz he discovered a bug in the > transitor model. There were almost thirty > prototypes made at iTV and four by UltraTechnology. > The modeling in OKAD got closer and closer to > what the fabs actually produced. > > The problem was that no one could say what the fabs > would produce. No one knew. People would just > repeat the mantra that it is just too complex to understand > so you just have to trust in your half million dollar > CAD software and accept that if it only tries to get > within 1/10th of the potential speed in a given > process that things will most likely work. The problem > there is that that software can add so much complexity > to the design, and do such poor routing that even a > 90% margin of error may not be enough. So ultimately > it is a trial and error process whether you use > the half million dollar tools and aim for 10% or if > you try to actually understand what the fab process > will really produce and fine tune your own cad > software to match it. > > Even the people who wrote the half million dollar > CAD software would usually just say that they > only wrote 1% of it and only really understood 1% > of it the way Chuck needed to understand 100%. > Also by only aiming at 10% the potential they > could live with a very fuzzy idea of what the > fabs would actually produce. Remember that > it took hundreds of millions of man years of > testing to find some Pentium bugs. > > I was convinced that everything was working in the > last .8u prototypes, the way OKAD predicted, but not > all the bug fixes got put into the last designs prototyped. > The last prototypes were made in 1998 then all funding was > gone. The move to 1.8u will require prototyping > to get things fine tuned. I doubt if things will > work 100% the first time. Not very likely. But most > of the problems with CAD have been worked out over > the last decade. The only proof will be chips that > do exaclty what OKAD predicts they will do. > > The only one of the chips that worked 100% on the > first try was ShBoom. It was a funny story. Chuck > laid out the design, the software routed it and Chuck > said, "This could never run. The software is brain > damaged and doesn't understand which circuits are > critical for timing. It must just create a list and > go through it. Look at this trace, it is the most > imporant trace on the design but must have been one > of the last ones routed because it goes all over the > chip to get from here to here. It needs to be shorter > and straight. The only solution is to lay out all > the components by hand and hand wire all the sections > together." > > The engineers at OKI thought Chuck was nuts. They > said that his solution was impossible and simply could > not ever work. But Chuck did it, it worked 100% on > the first try. Chuck decided from that experience that > he needed his own tools that did what he needed. I > think his explanations at his site of why his CAD tools > work the way they do is very well written. > > The design was stolen from Chuck and eventually found > its way to Patriot Scientific where they spent a > decade making changes and trying to get them to work. > > Even with OKAD as evolved as it is, I would expect > that more than one prototype fab would be needed to > get things working 100%. Also testing every transitor > and combination of instruction etc. is a very > involved process. That is the sort of thing that > Chuck expected from the client and owner of a chip. > > I always thought that it made for an unusual programming > challange when you have no idea what will work or is > working. You can't count on anything and have to > start almost from scratch each time unless it just > happens to work with a subset of the problems you saw > with the last suite of diagnostic software. > > I thought it would be a fun programming challange for > Forth day. Here is a simulated processor. Here is > the instruction set that it is designed to support. > Write a program to do such and such to get X points. > You get X extra credit points for each bug where > the simulated processor that we give you does not > do what this documentation says. You also get > points for a work around for each bug that you > identify. You also get points for correctly > documenting the details of the hardware bugs that > you find. The person with the most points at > the end of the lunch hour wins. > > This sort of programming is very different than > what most people do. You can't trust the chip hardware, > you can't trust the board hardware, you can't trust > the compiler, and the first few dozen things that > you try simply may not work at all. So you can't > easily find software bugs by observing that your > program didn't work. The problem becomes nearly > impossible if you introduce your own software bugs > on top of external hardware bugs that are seen > at a board level due to signal glitches or from > internal hardware bugs in the instruction set > or registers. And the bugs that only appear > once in ever few billion executions of a given > sequence of otherwise proper code are very very > hard to find. Most programmers have a hard time > finding their own bugs when given solid chips, > solid boards, solid operating systems, and solid > compilers. The bare metal programming of > prototype chips is tricky business. Rick Hohensee wrote: > > 60000 MIPS might be too hard to believe. It might be an easier sell to > talk about a one-cycle process-switch. 25 task processors. I don't know if > routing that is easier or harder though. Make it 32 bit, with two 18 bit > cache SRAMs, put it on a PCI card, call it a multimedia board, and don't > tell them it doesn't need the x86. > > Rick Hohensee 7/05/01 Jeff Fox wrote: > > Rick Hohensee wrote: > > > 60000 MIPS might be too hard to believe. > > It is impressive for a $1 chip. The MIPS numbers for > one Pentium sized or wafer sized will be harder for > people to believe. The MIPS numbers for a Pentium > sized version in .1u or smaller would be harder > for people to believe. > > Many people didn't believe that 100MIPS in 1.2u > was possible or that 500MIPS in .8u was possible > after it was done it was pretty obvious that > they didn't care either. I try not to be > concerned about the people who can't belive > it or don't care. > > > It might be an easier sell to talk about a > > one-cycle process-switch. > > Have you tried to sell the idea? Has there been > serious interest in a super fast task switching > processor with a 0.4ns task switch? > > > 25 task processors. I don't know if > > routing that is easier or harder though. > > A 25 way bus arbitration unit would be more > complex that what Chuck is doing and would > keep 24 of 25 processors shut down at any > given time. Given that Chuck has gone to > dynamic logic processors would lose all their > contents if shut down for very long. So > instead of 4% throughput it would be a tiny bit > lower due to the extra overhead to keep all > shut down processors alive waiting for a task > switch. > > > Make it 32 bit, with two 18 bit cache SRAMs, > > Twice the number of pins and some multiple of the > development cost of course. > > > put it on a PCI card, call it a multimedia board, > > and don't tell them it doesn't need the x86. > > Making a product that used a new chip is > completely different thing than desiging or > making the chip. That can require orders of > magnitude higher budgets. Who or what company > is that you are suggesting should develop this > PCI card product with the 32 bit chip? What > do you think would be a good PC application > for the product to target? I certainly could > be done and it might be a good idea. t wrote: > > As Jeff Fox has pointed out in previous posts, extolling the virtues of MISC > Forth on these email lists is preaching to the choir. > > What we have is a failure to communicate between tech (bleeding edge MISC > Forth) and finance/marketing/mgmt (FMM). > > We see the nominal 6 orders of magnitude imrovement in code/silicone but > have failed to show FMM how it is to their benefit to adopt it. Since we > have the ideas clearly in our minds, it is our responsibility to convey it > to them in _their_ terms. > > Treat it as an engineering communications problem. Define the parameter > space and their communications protocols and find the optimum fit. Apply the > same brutal efficiency to convey this advantage as is demonstrated in MISC. > E.g see Chucks Floppy I/O: > http://www.mindspring.com/~chipchuck/ide.html > for a working example of brutal efficiency, 5 defs for disk I/O! Can we do > the same in FMM speak? > > For a simple example, a meme (a self replicating concept expressed in a > simple phrase) would set the context. "A million times more efficient" seems > like an obvious one to me, but this may be too much for FMM to accept. > Repetition can get it accepted, as demonstrated by the mindless ads we all > know and love, which means the MISC community has to go out and proselytize. > Devise other more credible memes to spread before it. > > Get involved in the Open Source Community. One post to Slashdot is > 'priceless'. Organize a submission campaign to get Chuck's web site exposed. > Get the MISC advantage out into the world. In a word, ADVERTISE. Just be > covert about it, unless you have money to throw around. > > To restate the situation: this is a communications/protocol mismatch. Figure > out the motivation for Finance, Markting and Management decisions and > provide the proper stimulus. > > But that's just my opinion. > > Terry Loveall Rick Hohensee wrote: > > > > > Rick Hohensee wrote: > > > > > 60000 MIPS might be too hard to believe. > > > > It is impressive for a $1 chip. The MIPS numbers for > > one Pentium sized or wafer sized will be harder for > > people to believe. The MIPS numbers for a Pentium > > sized version in .1u or smaller would be harder > > for people to believe. > > > > Many people didn't believe that 100MIPS in 1.2u > > was possible or that 500MIPS in .8u was possible > > after it was done it was pretty obvious that > > they didn't care either. I try not to be > > concerned about the people who can't belive > > it or don't care. > > > > > It might be an easier sell to talk about a > > > one-cycle process-switch. > > > > Have you tried to sell the idea? Has there been > > No, but it's a transparent drop-in to existing stuff on the scale of > things under discussion here. > > > serious interest in a super fast task switching > > processor with a 0.4ns task switch? > > > > > > 25 task processors. I don't know if > > > routing that is easier or harder though. > > > > A 25 way bus arbitration unit would be more > > complex that what Chuck is doing and would > > keep 24 of 25 processors shut down at any > > given time. Given that Chuck has gone to > > I assume the only-one-active aspect. > > > dynamic logic processors would lose all their > > contents if shut down for very long. So > > instead of 4% throughput it would be a tiny bit > > lower due to the extra overhead to keep all > > shut down processors alive waiting for a task > > switch. > > > > Throughput is 100% of "keep the pins busy" and 4% of "keep the silicon > busy", has lower current draw? , lower exotherm? , and I don't think > you'll get near 100% utilization of 25 engines. Clustering might scale > close to that. SMP doesn't. > > > > Make it 32 bit, with two 18 bit cache SRAMs, > > > > Twice the number of pins and some multiple of the > > development cost of course. > > > > > put it on a PCI card, call it a multimedia board, > > > and don't tell them it doesn't need the x86. > > > > Making a product that used a new chip is > > completely different thing than desiging or > > making the chip. That can require orders of > > magnitude higher budgets. Who or what company > > is that you are suggesting should develop this > > PCI card product with the 32 bit chip? What > > do you think would be a good PC application > > for the product to target? I certainly could > > be done and it might be a good idea. > > I dono. :o) > > Rick Hohensee 7/06/01 Mark Sandford wrote: > > Jeff Fox wrote: > >Mark Sandford wrote: > -- Space/NASA stuff deleted for length considerations > > >> I would suggest that this be retargeted somewhat as > 25 > >> processor seems a little overkill, 16 or 9 > (assuming > >> you like squared numbers) seems more reasonable and > >> the SRAM at 4ns (250 MHz before timing margins > >> on-chip), would need to get shared between 25 > >> processors. Assuming they are doing similar things > >> this leaves only an effective 10MHz per processor > >> while they are running at 2400MHz, so unless the > >> application is heavily, heavily inner loops they > will > >> spend a great amount of time twiddling their thumbs > >> awaiting their turn on the bus. > > > >Of course. The same thing applies to workstation > farms. > >All problems have a balance between node processing > >and node communication. The design was not created > >for problems that are essential serial and are > >limited by communication bandwidth or serial > processing. > > > >Instead this design is for computationally intense > >problems that can use 60,000 MIPS per $1 cluster chip > >and not for software or problems that would limit > >it to 250MIPS. A single X18 is capable of 2400MIPS > >so why limit 25 of them to a total of 250MIPS? > > > >The proper model for F21 or 25X is a workstation > >farm, but without the hardware and software overhead > >needed to put C or Unix on each node. A very small, > >very cheap, Forth workstation farm. > > Agreed, but a chip (processor farm), that can't do a > significant/interesting demo, isn't much of a > technology > demonstration. There have been many instances of this > in the > past if you have to wait for the demo and then wait > for > an implementation that does something real people lose > > interest so you can say that this chip will only > work in one class of problems but if those problems > aren't of > interest then the whole technology gets dismissed. > What is described above is the classic problem, and > one that has plagued the CPU industry for years. This > has become a main mantra of mine, a system isn't > limited > nearly as much by MIPS as by memory bandwidth, and > as CPU speeds increase at a rate faster than memory > speeds increase this problem grows. The classic > case is the Sieve which used to be a speed test > but as processor speeds increased beyond what > memory could provide the test became useless. As > such processor designs now while they have faster > processor clocks every year performance is dominated > by cache size and design. I understand that part of > the MISC concept is that Machine Forth is that much > smaller and thus faster than traditional Bloatware, > but if the chip can only run very small routines, > code or data must be load and stored and the speed > of the processor is limited by the available > bandwidth. > As you mentioned workstation farms are bandwidth > limited (with fast, wide memory and large caches, with > one, two or four processors), how is a much faster > set of 25 processors supposed to survive? The > technology > could be proven more effectively with a better memory > bandwidth, bandwidth requirement match. This can be > addressed with faster, wider external memories, and > more on-chip memory such that the more routines > can be stored on-chip reducing the program load > portion of the memory bandwidth equation. 60,000 MIPS > that can't be used is worthless, 20000 MIPS (9 > processors) > that can be used is worth while. If there isn't > enough bandwidth or the requirements can't be reduced > the 60,000 MIPS don't have value. > > A 36bit chip helps bandwidth, while keeping the size > small > and one chip, and the more on-chip helps reduce > requirements > buy having more on-chip code. My suggestions are > aimed at making > the demonstration chip more viable. Are there really > any 60,000 > MIPs applications that run in 384 words and require > less than > 250Mwords of data bandwidth? I can't think of any, and > without > a compelling application, no matter how powerful, this > technology will > go nowhere. We are engineers can often think of many > things > that could be done but as much as we hate to admit it, > if nobody > wants or can use what you develop, its nothing more > than a > paper-weight. > > I have a strong belief that the future of processors > will be dominated > by the intelligent RAM concept, where you put the > realitively small > CPU and put it inside the RAM which can then be very > wide 128 or 256 bits > and center the chip on the memory availability which > will be the limiting > factor anyway. The old if Muhammad won't go to the > mountain bring the mountain > to him concept, it sounds backwards but you need to > overcome your problems > via the simplest route. > > > > >> Even running solid > >> multiplies at 125M this still leaves a large margin > >> for data transfers. So firstly I'd trim down the > >> number of processors and might suggest looking at > >> pairing the processor > > > >Like P21, F21, i21, and others the X18 design was > >picked to reduce the prototying cost and get a > >chip with pins that fit the prototyping constraints. > >So if someone has their own fab line and is not > >restricted by such constraints and is also not > >concerned with budget constraints the number of > >processors per die is completely variable, from > >1 to thousands. There is interest in thousands > >of processor per chip. > > > >The width is also variable from 5 bits to whatever. > >Chuck's designs are in columns so scaling the > >width is mostly trivial. Chuck said that making > >a P32 from a P21 was about a day's work in OKAD. > > > >But the + and +* instructions timing is proportional > >to bus width, so those opcodes would be slower with a > > >wider bus. Also the pin count and costs go up. Pins > > >are more expensive than silicon in high volume. That > > >is why a 60,000 MIP 25x can cost about the same thing > >as a 2400 MIP X18. > > > >> with a x36 chip instead of the > >> x18 to get two "18 bit words" per cycle and > >> effectively running the memory at 500MHz x18. > > > >It could be done, and still get 2400MIPS from > >the internal memories. Larger amounts of > >internal memory could be put on larger more > >expensive chips if prototyping costs are not > >an issue. But these have not been billion > >dollar type funding projects so far so > >things have been kept small to make it possible. > > > >> Another area that I might suggest a change is the > >> memory per processor 384 words might be 1K words if > >> the number of processors is trimmed down to 16 or 9 > so > >> you would be more likely to run without needing to > >> load or store data as frequently. > > > >True. Have you a particular application in mind > where > >you have determined that twice as much on chip > >memory is needed? I spent years doing that sort of > >thing to tweak F21 before it was fabbed. > > > >I suggest than anyone with a particular idea simulate > >it extensively to be able to tweak the design to > >do what you really find best suited to your needs. > >Chuck is in the custom silicon business. He can > >make it work in many ways depending on what the > client > >wants. It is a little like picking items from a > >menu. Chuck would love to make many custom versions. > >But he would also really like to make a production > >run and get some chips into some product somewhere. > >It is sort of a key element that hasn't happened. > > > >> I might be interested in contributing to such an > >> effort, I would need to know more about Chuck's > >> experience and how likely the first try is likely > to > >> work (Murphy's Law and all). I bought one of the > >> original P21 chips and I beleive that those didn't > >> function untill the 8th run so this is never a slam > >> dunk especially if 0.18 and TSMC are new to his > >> techniques. > > > >It did take 8 tries to get P21 completely working. > >It had the thermal bug like all conventional chips > >but at only 100Mhz in 1.2u Chuck didn't bump into > >it and didn't find it. When he scaled down to .8u > >and went to 500Mhz he discovered a bug in the > >transitor model. There were almost thirty > >prototypes made at iTV and four by UltraTechnology. > >The modeling in OKAD got closer and closer to > >what the fabs actually produced. > > -- other Chip history comments deleted for space > > It seems a little misleading to say that the > prototyping > cost with Mosis is $14K when it may take 2, 4 or even > 8 tries to get things working. If it really takes 8 > tries > the prototyping cost is $112K and 32 Months, this > doesn't > sound that attractive. Chuck's models and thus > experience > have been (as far as I know) at 0.8um and while his > software > may be getting better he will have a whole new set of > issues > to deal with as the geometry gets smaller. This > transition > has been pretty difficult for the tradition CAD > software > vendors. The term deep sub-micron refers to the > probelms > that are seen as geormetries drop below 0.3um and the > gate delays that defined performance historically, > stop > being dominant. At 0.35um gate delays rule, and wire > delays can be ignored. At 0.25um gate delays and wire > delays are near equal and both must be considered. At > 0.18um wires dominate and gated delays can't be > ignored > but placement and thus wire lengths now become the > detirming > factor. As Chuck's transistors are faster and he > isn't playing > the safe must work technology game that the > traditional > EDA firms are he will see these issues in a different > fashion > but still these problems will exist and the nature > will > change with geormeties. So his software may have > inproved > with Chuck's understanding of the issues but he will > need multiple tries to calibrate his technology when > operating with his new geormtries. > > Given the above his best attack maybe to put the > processor design to the side for a moment and build > a test chip with variuos transistor and gate designs > and use this to calibrate his designs before trying > a new processor on a new techology. He could try > various parameters and find either which line up with > his models or tune his models to work with the given > transistors once his models are correct getting a > processor to work should be much easier (Murphy's > Law still appilies unfortunately). > > This said, while I would like to see Chuck succeed, > it doesn't seem like it would be easy find investors > to contribute to a techology that requires significant > tuning through multiple iterations to work. The MISC > ideas are very powerful and it seems that > 7/07/01 Jeff Fox wrote: > > Mark Sandford wrote: > > Agreed, but a chip (processor farm), that can't do a > > significant/interesting demo, isn't much of a > > technology demonstration. > > Can't? I am currios why you say that. > > But from what I have seen the demos that people want > to see are ususally moronic and have nothing to do > with what chips are good for. > > Compression and decompression of data streams in > realtime is pretty much an open ended problem, > things like protein folding, gene sorting, simulations > and problem modeling, AI, and a lot of other things > that need computing power are not the sort of things > the investors want to see. They want to see a > dancing baby doing the latest popular dance. Then > they don't pay for the demo and don't invest anyway. > > > There have been many instances of this in the > > past if you have to wait for the demo and then wait > > for an implementation that does something real people > > lose interest so you can say that this chip will only > > work in one class of problems but if those problems > > aren't of interest then the whole technology gets > > dismissed. > > True. I think the real problem there is that the only > problem that is of interest to most people is how to > do anything while carrying a 99.9% overhead built > into their PC. They are only concerned with how to > get a PC to do much of anything while it is hamstrung > with terrible hardware and software overhead for > backwards compatibilty reasons. Most people think > that is the only real problem worth addressing, how > to get a few percent increase while carrying the > excess overhead of PC hardware or popular software and > few are even willing to consider starting by simply > removing the overhead and starting with a clean > slate to get a 1000x improvement. > > > What is described above is the classic problem, and > > one that has plagued the CPU industry for years. This > > has become a main mantra of mine, a system isn't > > limited > > nearly as much by MIPS as by memory bandwidth, and > > Very true. And by the programs being 100 times larger > than they need to be. The overhead is built into the > systems to create the artificial problem that can > be improved in little steps for marketing purposes. > The easist problems to solve are these sorts of > artificial problems, but they are what drives the > industry. > > > as CPU speeds increase at a rate faster than memory > > speeds increase this problem grows. The classic > > case is the Sieve which used to be a speed test > > but as processor speeds increased beyond what > > memory could provide the test became useless. As > > such processor designs now while they have faster > > processor clocks every year performance is dominated > > by cache size and design. I understand that part of > > the MISC concept is that Machine Forth is that much > > smaller and thus faster than traditional Bloatware, > > but if the chip can only run very small routines, > > code or data must be load and stored and the speed > > of the processor is limited by the available > > bandwidth. > > Most programs only need a little memory for code. > If you have lots of memory you can run larger programs. > > If small programs need megabytes of code then large > programs are not possible. You kind of have it backwards. > The problem with 99.9% overhead is that it limits the > machines to only trivial problems. The idea of low > overhead is to be able to sovle serious problems. > Anyone can solve trivial problems, but for marketing > reasons the solutions are bloated up to fill the > machine and require hardware and software upgrades > to even do trivial things. > > Look at the requirement that 80386 and 68020 have > been classified as not powerful enought to keep up > with a fast typist. ;-) I read in c.l.f last year > that it was only recently with >500Mhz 32 bit deeply > pipelined CPU and sophisiticated optiming native > code compilers that they were able to solve the > same problems that they could solve twenty years > ago with 5Mhz 8 machines running threaded Forth. > To me this says that in twenty years they have > more or less canceled out with hardware and software > the 99.9% overhead that was introduced along the way. > > The faster peripherals and larger storage and bigger > displays are the big difference. The 1000x increase > in processing power is more or less canceled out by > a similar increase in processing overhead. SUVs > get better milage than they used to also. The > improvements in the technology are used to cancel > out the introduced overhead to keep profit margins > high and give the consumers the impression that > things are getting better. > > > As you mentioned workstation farms are bandwidth > > limited (with fast, wide memory and large caches, with > > one, two or four processors), how is a much faster > > set of 25 processors supposed to survive? > > Sometimes the overhead is such a joke that I can't > believe it doesn't wave a red flag to more people. I > listened to a lot of presentations at the Parallel > Processing Connection over the years. When people > would say that they needed X megabytes on each node > for overhead or X gigabytes total overhead to run > a hello world program I always found it simply amazing. > > > The technology > > could be proven more effectively with a better memory > > bandwidth, bandwidth requirement match. This can be > > addressed with faster, wider external memories, and > > more on-chip memory such that the more routines > > can be stored on-chip reducing the program load > > portion of the memory bandwidth equation. > > This is the classic image of parallel processing where > they see node communications as the limiting factor and > thus want the biggest nodes with biggest processor and > biggest caches possible to reduce the level of > parallelism. But a lot of research over the last > few decades has been into how biological systems can > do so many things so well that these machines can't. > The answer is lots more smaller nodes. > > Instead of a single 1000Mhz processor with a huge > cache (that is dwarfed by the size of the software > overhead required) and a huge amount of memory, a > design optimized to carry the markeing introduced > overhead, the same number of transistors can > be 1000x more efficient on problems that are > parallel. > > Almost all problems, certainly almost all interesting > problems, are embarrasingling parallel. The only > problems that are not are the one we artificially > created for ourselves in our antiquated serial > computers with absurd computational overhead. > > Humans don't look like Pentiums, they have 2*10^11 > processing nodes. They don't run Unix or Windows. > > > 60,000 MIPS that can't be used is worthless, > > If it is considered useless it may never be made. > If people keep repeating that it is useless other > people will keep thinking it is useless. If none > are ever made the only value will be the educational > value to the few people who study the good ideas > that are there. > > Some of the most brilliant people I have met love > the idea of cheap chips with millions of mips. But > convincing people with money is a more difficult > problem. Convincing most people seems to simply > be a matter of showing them that it has become > mainstream. They equate good idea with mainstream > pure and simple. Followers not leaders. > > > that can be used is worth while. If there isn't > > enough bandwidth or the requirements can't be reduced > > the 60,000 MIPS don't have value. > > 180 billion bits per second bandwidth between nodes, and > 1,200 billion bits per second memory bandwith in a $1 > chip makes a $1000 PC look pretty sick. But you have > to compare 100 25x to a PC to get the picture. Does > your PC have 18,000 billion bps network and 120,000 > billion bps memory bandwidth? That hasn't stopped it > from being marketed. > > > A 36bit chip helps bandwidth, while keeping > > the size small and one chip, and the more on-chip > > helps reduce requirements buy having more on-chip code. > > My suggestions are aimed at making > > the demonstration chip more viable. > > Not really. It cuts it by at least a factor of 2. It > would be useful if the idea is that you have to carry > more overhead on each node. > > When I brought the idea of parallel processing to Chuck > more than ten years ago he was slow to embrace it. It > took him time to understand an appreciate the issues. > > When he brought his ideas of Forth and MISC designs to > me it took me time to understand and appreciate the > issues. For instance I just didn't understand it > when he said, "Most programs fit in one K." > > I didn't understand because I was picturing programs > with overhead built in for marketing purposes. After > watching Chuck for years I began to see that with > his approach most programs fit in one K or less. > > Programs that other people felt required 10megabytes > became 1K for Chuck. His VLSI CAD software is only > 500 lines of code. He doesn't need megabytes to do > a hello world program. > > > Are there really any ... > > Yes. Most problems, and most programs. But most > problems are beyond the machines with artificial > self-imposed problems to solve so most people have > never looked at how they could be solved. > > > We are engineers can often think of many things > > that could be done but as much as we hate to admit it, > > if nobody wants or can use what you develop, its nothing > > more than a paper-weight. > > That is what the people who hate it, or are threatened > by it, or want to see it fail have keep repeating > for the last decade. But there have been a few hundred > people who have been influenced by the good ideas and > say it has been a benefit to them. So even if no chips > get made, the ideas have been recognized as good > ideas my more people than you might realize. > > I am always amazed by the profiles of the people > downloading stuff from my site. It is popular with > Intel, it is popular with the US Gov, it is popular > with NASA. And I see the ideas spreading even if > our chips are not being produced by anyone. > > But there still are people chanting that it is > worthless or bad. It seems that the biggest resistance > are the people who feel threatened by change. The > mainframe types said all the same things about micros > in the old days. Worthless toys, not real computers, > nothing more than paperweights that will never amount > to anything but a curriousity. I have been hearing > that for over thirty years now. > > > I have a strong belief that the future of processors > > will be dominated by the intelligent RAM concept, > > I like that idea too. I have wanted to use Chuck's > CAD technology to make cheap content addressable RAM. > But we would like to sell something to get funding first. > > > where you put the realitively small > > CPU and put it inside the RAM which can then be very > > wide 128 or 256 bits > > and center the chip on the memory availability which > > will be the limiting > > factor anyway. The old if Muhammad won't go to the > > mountain bring the mountain > > to him concept, it sounds backwards but you need to > > overcome your problems > > via the simplest route. > > The idea of dropping MISC processors into a corner > of conventional memory chips, and being able to access > 1000 words in parallel at once has appealed to a > lot of people. When iTV had large well funded > corporate partners in Asia who were manufacturing > the memory chips that we all use those companies > wanted to do some of that. Then the Asian economies > collapsed and the projects died. > > > It seems a little misleading to say that the > > prototyping > > cost with Mosis is $14K when it may take 2, 4 or even > > 8 tries to get things working. If it really takes 8 > > tries the prototyping cost is $112K and 32 Months, this > > doesn't sound that attractive. Chuck's models and thus > > experience have been (as far as I know) at 0.8um and while > > his software may be getting better he will have a whole > > new set of issues to deal with as the geometry gets smaller. > > This is all very true. But any further work rides on > the work already done and the fab runs that other people, > such as I, have already paid for. As all the CAD problems > seem to have been solved a few years ago Chuck's optimism > may not be too overly optimizistic and my pessism may be > overly pessimistic. > > But what you say isn't quite right regarding the > constraints. If you can only afford the lowest budgets > then you have a 4 month turn around. Pay more and get > a 4 day turn around. If you want the projec to be > completed in 2 months instead of 32 months that is > is really just a budget issue. Professional paths > are more expensive paths funded on hobby budgets. > Still with mostly hobby budgets we have kept up > with or passed the companies spending billions of > dollars on each round of chip development. > > The problem is always that if you say you can do > 100 times better on 100 times lower budget you > will be asked to do 1000 times better on a 1,000,000 > times lower budget. Then when you do that they > just say they don't care anyway. > > One thing that appealed to ten years ago was that Chuck's > approach solved the big problem that other people are > not struggling with. Scale. Chuck's tiled approach > and hand layout, with simulation that takes transistor > size, load, path lenght, and temperature effects being > used to get the tiled design right they scale almost > without effort. Problem solved. > > With a schematic or high level functionality description > and reliance on automated to tools to place and route > they never have any idea what to expect until the last > minute and if they change the scale they have to start > over from scratch. This the major difference between > Chuck's approach and other people's approach to CAD, > they must have schematic capture and trust in tools > while Chuck doesn't need or want it. > > > This transition has been pretty difficult for the > > tradition CAD software vendors. The term deep sub-micron > > refers to the probelms that are seen as geormetries drop > > below 0.3um and the gate delays that defined performance > > historically, stop being dominant. At 0.35um gate delays > > rule, and wire delays can be ignored. At 0.25um gate delays > > and wire delays are near equal and both must be considered. > > At 0.18um wires dominate and gated delays can't be ignored > > but placement and thus wire lengths now become the > > detirming factor. > > Exactly! That is why it was the first problem that Chuck > solved ten years ago. > > > As Chuck's transistors are faster and he isn't playing > > the safe must work technology game that the traditional > > EDA firms are he will see these issues in a different > > fashion but still these problems will exist and the nature > > will change with geormeties. So his software may have > > improved with Chuck's understanding of the issues but he will > > need multiple tries to calibrate his technology when > > operating with his new geormtries. > > Yes, he still has to make chips and see what happens the > same as everyone else. But instead of billions per new > chip the costs are much lower. If you reduce the costs > by a factor of 1000 he can do it 10 times faster. If > you reduce the funding by 1000000 he can do it about > as fast but it is more work. And we do get tired > of doing it that way. > > > Given the above his best attack maybe to put the > > processor design to the side for a moment and build > > a test chip with variuos transistor and gate designs > > and use this to calibrate his designs before trying > > a new processor on a new techology. He could try > > various parameters and find either which line up with > > his models or tune his models to work with the given > > transistors once his models are correct getting a > > processor to work should be much easier (Murphy's > > Law still appilies unfortunately). > > Yes. Chuck's doing that was essential to solve the > industry wide thermal bug in the transitor models. > The details were fascinating but proprietary. > > > This said, while I would like to see Chuck succeed, > > it doesn't seem like it would be easy find investors > > to contribute to a techology that requires significant > > tuning through multiple iterations to work. The MISC > > ideas are very powerful and it seems that > > Very true. But don't kid yourself that Pentium > or Alpha designs don't require significant tuning or that > some billion dollar efforts don't just get written > off as development costs for designs that didn't work > at all. They just pick up the pieces and try again. Jeff Fox wrote: > > I wanted to add that Chuck isn't locked into > parallel designs. He would be happy to make > a 32 bit chip with whatever custom features you > want, or a 64 bit chip, or a 128 bit chip or ... > if someone wants to pay for the work. He > would be even more interested if they were > serious about manufacturing them. > > He could make more on royaltees than on design > payments if someone has a good idea and it was > more than just advice for how other people > should spend their money. > > He is in the custom silicon business, and has > some unusual tools and skills. Bring him the > ideas and the funding and something can happen. > > His interest in SMP has been because there has > been outside interest in SMP and efforts to > generate funding. If anyone has ideas that they > think are better all they have to do is make him > an offer that he can't refuse. > > His CAD work is much faster now with the new tools. > So he needs more clients with more ideas and more > funding. > > If anyone has ideas, it really doesn't matter too > much if I like their ideas or not. Everyone has their > own opinions. Mine are mine. Yours are yours. Maybe > someone will have some winning ideas sometime. 7/08/01 Myron Plichota wrote: > > As a longtime Forth programmer and hardware designer, and shorter time > MISC -> NOSC subscriber, I've seen many instances where beautifully > simple ideas languish simply because they fly in the face of current > fashion trends in the computer industry. The creative mavericks have > been paying the price for over a decade; the politically savvy > opportunists have yet to discover how they have assiduously played into > the hands of the monopolies and will soon be hard up for decent paying > jobs, thanks to a general stagnation in the industry, nightmarishly > complicated industry standards that require mega budgets to even > consider attempting, offshore software sweatshops, etc. The glory days > are over unless you work for the monopolies, and not all of us can stand > that kind of corporate culture in the first place. > > It has taken me a long time to appreciate some of the finer points and > the overall consistency in the evolution of the chips, silicon design > tools, and Forth compilers developed by Chuck Moore, Jeff Fox, and Dr. > Ting. These guys have my total admiration for their persistence and > purity of vision in rejecting the emperor's new clothes in favor of > putting the programmer back in the driver's seat. It's great news to me > that Chuck now has his own website, which must be a considerable relief > to Jeff Fox. Thanks Jeff, for all the effort you have put into making > the saga available to all who are interested enough to actually read > through it. > > The recent flurry of NOSC/MForth/CForth postings illustrates what I see > to be a culture clash. Good intentions are apparent, but differences of > opinion as far as how to gain the notoriety required for funding the > ongoing development work abound. I assume these are due to the widely > varying backgrounds and age groups that we encompass. > > IMHO, those who advocate porting to a MISC > implementation are missing a key point behind the MISC initiative, i.e. > an OS typically attempts to be all things to all people, and becomes > very ugly very quickly. My take on this (and I have seen this put into > practice many times) is that what is needed are simple interface > routines to perform the I/O functions (keyboard, mouse, video, mass > storage, datacomm, etc.) rather than all-singin'/all-dancin' hardware > abstraction monstrosities that pretend that everything is a file. So > much for OSs. > > I'll go farther here in stating that with enough CPU performance, which > appears to be abundant enough even with my silly little 20 MIPS > Steamer16, you can use software to replace dedicated hardware for an > amazing variety of realtime applications. Sure, you need at least a > free-running timer and parallel ports, and judicious analog and digital > hardware interface assists, but the minimalist approach keeps you in > control of the the procurement/availability/obsolescence issues and > allows you to make tradeoffs that are simply denied by the menagerie of > the silicon that is quasi-available out there. I definitely think that > Chuck's current concept of an array of identical processors is superior > to the dedicated coprocessor approach taken in the past. The faster the > CPU is, the less reason to complicate the design with dedicated > hardware. If the CPU design is debugged, then all of them in the array > are as a natural consequence. In a multiprocessor design, you can even > do away with interrupts by roadmapping which processor is responsible > for what task and the interprocessor communication architecture. Why > bother involving a '765 floppy disk controller, 16550 UART, USB > controller, etc. when you can talk directly to the interface cable > buffers? Software is easy to fix, hardware takes another design > iteration and fab run. > > Another good example of solving a serious system-level issue is Chuck's > idea of establishing pin locations such that a 4 nS memory chip can be > mounted on the opposite side of the PCB with no more that 1 cm of trace > length pin to pin. The best solution to many problems is to sidestep them > in the first place by changing the rules of the game. In this case, > signal integrity is reconciled with simple, low-cost PCB fab technology. > > Maybe the best bet for funding is to approach universities and research > foundations, rather than bored and greedy venture capitalists. Or maybe > we should all buy lottery tickets and pledge the winnings. I agree with > Jeff that wooing the mainstream with me-tooisms is a waste of time. A > unique niche based on nya-nya minimalism is the natural arena for Forth > chips to shine in. The computer industry as a whole is now decadent and > needs a slap in the face. The biggest pair of balls is way too high up > in the clouds to swing a decent kick at unfortunately, just ask the US > Dept. of Justice. > > Don't think of the MISC chips as being a PC-incompatible replacement > that needs a plethora of SVGA PCI card drivers, etc. Think of them as an > opportunity to expose the fraud behind the shortsighted standard > products that get flogged to us this year and discontinued the next in > the guise of progress. And of course if you are a "nobody", you'll never > get the information necessary to write your own driver. Even supposing > you do, it will be necessary to do it all over again when the card is > discontinued. Far better to put a stake in the ground and design your > own high-resolution color display subsystem with whatever features and > hardware/software tradeoffs you see fit, for example. > > I hate to see quibbling over which geometry results in what mind-numbing > MIPS figure. I take it for granted that sophisticated designs are > subject to certain statistics as the fat lady sings. I just want to hear > her perform, even on an off day. > > It's reverse heresy to be conservative on these mailing lists for > heretics ;) > > Myron Plichota Jeff Fox wrote: > > Myron, > > I like the way you phrase things. I will have to > save your first paragraph as a sort of manifesto > to show to other people to frame things. > > > It's great news to me that Chuck now has his own website, > > which must be a considerable relief to Jeff Fox. > > You can say that again. I so pleased to read Chuck's > wonderful explanations of so many of his ideas in a > writing style that reflects his focused programming > and system design. > > After a decade of what I describe as the kill the > messenger syndrome where about a dozen specific > people could always be counted on post counter > information to anything I said about what we > were working and a few who chose to repeatedly > characterize me as a liar, cultist, snake oil > salesman, nieve ignorant fool or a list of > other names I realize that I have at times lashed > out even at those who wanted to get involved or > provide support but didn't seem to know how. > > I do plan to take a step or two back and let > the dust settle a little now that Chuck has > done such a good job of explaining his current > work. I still think that understanding the > history of his work since he left Forth Inc. > and logic in making major changes to his > Forth about every five years makes it all > make much more sense. But in providing that > information my web site simply has too much > stuff and isn't as simple and clear as Chuck's > new site. I have a meg of html files alone! > I was able to read all the files at Chuck's > site more than one fairly quickly. > > > The recent flurry of NOSC/MForth/CForth postings illustrates > > what I see to be a culture clash. Good intentions are apparent, > > but differences of opinion as far as how to gain the notoriety > > required for funding the ongoing development work abound. I > > assume these are due to the widely varying backgrounds and > > age groups that we encompass. > > Yes, very true. I used to be famous for my patience. Over > the last few years it has worn rather thin or at times > been worn away altogether. > > > IMHO, those who advocate porting > > to a MISC implementation are missing a key point behind > > the MISC initiative, i.e. an OS typically attempts to be > > all things to all people, and becomes very ugly very quickly. > > Yes. I tried to address this idea with my "Low Fat > Computing" theme. Chuck's ideas are not minimalism > for the sake of minimalism, as many people have claimed, > or even Ting's idea of minimalism as the key to what > he call's Forth enlightenment, but simply good > computer health instead of the unhealthy digital > bloated fast food that is marketed everywhere that one > turns. > > > My take on this (and I have seen this put into > > practice many times) is that what is needed are simple > > interface routines to perform the I/O functions (keyboard, > > mouse, video, mass storage, datacomm, etc.) rather than > > all-singin'/all-dancin' hardware abstraction monstrosities > > that pretend that everything is a file. So much for OSs. > > Chuck has said that OS is a dirty word. On his new > web site he just says it is an obsolete concept. It is > part of the fat, that provides the artificial problem > that can appear to be improved for marketing purposes > while all the time more problems are slipped in behind > the scenes. (63,000 estimated bugs below your code?) > > It is also the legacy of Forth. In the old days Forth > showed it superiority by making it easy to have custom > hardware drivers that provided improved programmer > productivity and improved system performance. This > more or less ended with Chuck's PolyForth when he > left Forth Inc. and they abandoned the legacy of > Forth and began marketing it as a layer on top of > popular operating systems. As they admit the > constraints to do that dwarfed the constraints to > do Forth itself. > > But with a generation of programmers who never saw > anything but stuff layered on top of popular OSes > they could not picture Forth in any other way > except in its most ancient forms. "Modern Forth" > is, according to them, a layer on top of everything > else embracing all the things that Chuck invented > Forth to avoid. > > The real problem with this, for me, was that > there have been many people who have posting > information about how Chuck's chips are really > 1000x slower than what we said because they > simply cannot imagine a computer without all > the conventional stuff built in. > > The best examples I could find were the popular > Unix benchmarks designed to show system performance. > The rational behind these programs, was to take > simple problems and add the overhead that is typical > in Unix programs. Simple computations were intentionly > made more complex by adding Unix system calls and > forks and task switches and file access and all the > overhead that is typical in other programs layered > on top of all those standard services. So they are a > fair way to estimate how a given Unix system will run > typical Unix programs with typical overhead. They > show how well systems can carry all the overhead > in hardware and software needed to do all that. > > They are the opposite of Chuck's idea of making > a system with mininal fat, minimal waste, minimal > unneeded overhead. They show how well giant chips > with giant memory spaces can deal with giant overhead > on otherwise simple problems. So I wrote web pages > saying that anyone who suggests that these benchmarks > are a good way to evaluate Chuck's Forth chips > designed to run Chuck's Forth software is either > completely ignorant of the whole idea in the first > place or is deliberatly picking the most deceptive > idea for a benchmark that they could find. > > I was then amazed at how many well known people in > the Forth community did exactly that and repeated > posted information about how these chips are really > 1000 times slower than what we say based on their > estimates of how well they would run their favorite > benchmarks for their favorite OS compiled with their > favorite C compiler tuned for their favorite huge > chips. In other words, they don't carry fat well. > > Chuck said that a generation of people could not > separate the idea of the abstractions from the > actual computer. They didn't understand that > computer hardware and computer software could be > simple and logical. They thought that hardware, > hardware design, OS internals, and compilers > were simply beyond the realm of mortals and > could not distinguish between the layers of > abstrations introduced to tame these introduced > problems and the real computer below. He > said that he wanted to "Dispell the User Illusion" > that a computer is Windows or Unix. > > > I'll go farther here in stating that with enough > > CPU performance, which appears to be abundant enough > > even with my silly little 20 MIPS Steamer16, you can > > use software to replace dedicated hardware for an > > amazing variety of realtime applications. > > This is something that I didn't understand even after > a decade of working with the first few generations of > PCs and I got my first PC in 74. I began to understand > the concept that you just described when I got a Novix > Forth kit from this guy named Charles Moore in 86. It > came with a lab workbook that was very much like my > lab workbook in Electronics Class in the physics > department in college. It was full of tiny simple > programs that could allow the programmer (Chuck) to > trade software for hardware. > > It is essential to understanding Chuck's whole > approach. On F21 the parallel port can be treated > as 12 extra analog I/O lines with software. Those > are digital lines, but can be programmed to do > analog. They are limited to a few megahertz of > analog however while a few transistors specialized > to do analog on two pins provided 40MSPS. > > The dedicated analog hardware is remarkable simple > and clean. On a garden variety microcontroller > a timer counts down and fires off an interrupt. > The CPU then stops what it is doing and executes > an ISR. The ISR saves registers to memory then > monkeys with A/D or D/A hardware. Eventually > it inputs or outputs a sample. Then context is > restored and the CPU goes back to doing what it > was doing. The result of all of that is that many > many memory cycles are needed for each sample. > On Chuck's design you can bit bang analog samples > on digital pins with a little overhead. With his > analog coprocessor it has it own timer, its own > DMA to memory and needs one memory cycle per sample. > It is 100 times more efficient than conventional > chips at doing what it does. > > Still, I can't say how frustrated I was over all > the years when people would write to me and say, > "We are considering using F21, we currently use > an 8051. But our 8051 can do 20K analog samples > per second and we are not convinced that the > F21 could keep up with it." (Couldn't they read the > 20M or 40M numbers? How could they miss that this > chip does everything 1000x faster than an 8051?) > > Understanding what the chip could do with a little > programming required a little background that > apparently a new generation of hardware and software > engineers just didn't have. They could read the > numbers on the box of a soundcard that they plugged > into a PC and that was about it. > > The same thing applied to what you could do with > a clever video coprocessor with instructions for > windows acceleration, or a clever analog coprocessor > that could be clocked for substantial oversampling, > or a clever programmable active messaging passing > piece of network hardware. The ideas were apparently > just too far ahead of their time to be understood > without a very broad and in depth background. Some > of the ideas were new or fresh out of research. > > I can see how the people working with FPGA today > could get these ideas while other people don't. > They can see how specialized and optimized hardware > will always be fastest but has no flexibility. That > programmable I/O coprocessors provide a compromise > between hardware and programmability and that > even simple hardware lets a clever programmer > trade software for hardware. > > Without the layers of generic abstraction a > simple processor like the steamer can be quite > powerful. With a clever programmer who understands > how to trade software for hardware it can be > remarkably flexible. I learned a lot from Bill > Muench and John Rible about that stuff too. > > Chuck's current designs have no coprocessors but > are fast enought that a few parallel port pins > can be programmed to perform almost any hardware > functions imaginable. With a bunch of Forth processors > on chip some can be specialized I/O coprocessors > in a given design. Chuck's IDE driver in few lines > of code is a good example. > > His current chips are fast enough to even bit bang > analog at very high speeds on these pins. I still > doubt if people who don't have a certain background > or experience really understand these critical concepts. > It isn't like PC where everyone assumes certain specialized > complex hardware that is then crippled by layers > of abstractions in generic software layers designed > to make them all look the same. > > On Chuck's new website he explains this in a couple > of sentances. He says that these layers of abstraction > that try to make all these different computers look > just the same must lower them all to the lowest possible > common denominator. He says, accept that they are > different. > > I tried to explain this as the spirit of MachineForth, > face the actual machine. Don't think the User Illusion > is the reality. A computer is not Windows or Unix. > > But the most visible people in the Forth community are > marketing the abstraction idea, marketing the portable > abstraction approach. The rest of the community is > buying it. Neither those selling or buying want anyone > to point out that the emporer is not wearing any clothes. > > So if anyone tried to explain how the legacy of Forth > was something else, or how Chuck's idea of Forth was > something else a huge number of people would distort > and trivialize any argument that you made or call > you a liar, cultist, snake oil salesman etc. I found > this situation maddening at times. I am so pleased > that Chuck created his own website. I can't tell you. > > > Sure, you need at least a > > free-running timer and parallel ports, and judicious analog > > and digital hardware interface assists, but the minimalist > > approach keeps you in control of the the > > procurement/availability/obsolescence issues and > > allows you to make tradeoffs that are simply denied > > by the menagerie of the silicon that is quasi-available > > out there. I definitely think that Chuck's current concept > > of an array of identical processors is superior > > to the dedicated coprocessor approach taken in the past. > > The faster the CPU is, the less reason to complicate the > > design with dedicated hardware. If the CPU design is debugged, > > then all of them in the array are as a natural consequence. In > > a multiprocessor design, you can even do away with interrupts > > by roadmapping which processor is responsible for what task > > and the interprocessor communication architecture. > > You have followed the history and understand the concepts. > You put it well. > > > Why bother involving a '765 floppy disk controller, 16550 UART, > > USB controller, etc. when you can talk directly to the > > interface cable buffers? > > Right. The only time you want dedicated hardware is when > you want to push the performance to the bleeding edge. > If you want multiple gigabit self-routing network > coprocessors it only takes a little extra hardware > like what I designed for F21 and that Chuck has included > on the x25 chip. That could have been done with the same > parallel port pins that are programmable and run off > chip, but it gives you the extra inter-node bandwidth > that you want in a multiprocessor. > > > Software is easy to fix, hardware takes another design > > iteration and fab run. > > Very true. The problem is that software for most people > is only the very top layer of abstraction. They want to > be as isolated from the hardware as possible. The > result is predictable. > > We are told that only recently have 500Mhz 32 bit > deeply pipelined processors running the most sophisticed > native code optimizing Forth compilers (on top of > you know what) been able to meet the performance levels > set by 5Mhz 8 bit simple processors running simple > threaded Forths (written by Chuck) twenty years ago. > Apparently some truth occasionally slips through the > marketing hype. > > Meanwhile Chuck came up with an incredibly simple > native code optimizing compiler for Forth by designing > hardware to make it incredibly simple. The approach > is even simple when applied to a Pentium. And this > was ten years ago, now he has removed the last > remaining sytax in the language and many antiquated > words with his ColorForth approach. > > > Another good example of solving a serious system-level > > issue is Chuck's idea of establishing pin locations such > > that a 4 nS memory chip can be mounted on the opposite > > side of the PCB with no more that 1 cm of trace > > length pin to pin. The best solution to many problems is > > to sidestep them in the first place by changing the rules > > of the game. In this case, signal integrity is reconciled > > with simple, low-cost PCB fab technology. > > Yes, yes, yes. After he figured out how to make the CPU > and I/O coprocessors cost a few cents the costs of an > extra layer in a PCB or an extra square inch of board > space became significant. So he concentrated on improving > pinouts to make PCB cheaper and faster. I am so pleased > that someone has noticed and understands the ideas. Too > bad you are not one of the people with money. ;-) > > > Maybe the best bet for funding is to approach universities > > and research foundations, rather than bored and greedy venture > > capitalists. Or maybe we should all buy lottery tickets and > > pledge the winnings. > > Perhaps. I got boored with the Universities because it > looked to me like they didn't want to teach anything that > was not already so mainstream as to be obsolete. They > do have some good research in places however. > > But stuff similar to the active message passing network > hardware that came out of research at UC Berkeley or > that went into F21 is yet to appear in mainstream technology. > There is still a big gap. From what I see they are teaching > that computers are Windows or Unix to most people. But maybe > you are right that there is something there to use. > > > I agree with Jeff that wooing the mainstream with me-tooisms > > is a waste of time. A unique niche based on nya-nya minimalism > > is the natural arena for Forth chips to shine in. The computer > > industry as a whole is now decadent and needs a slap in the face. > > The biggest pair of balls is way too high up in the clouds to > > swing a decent kick at unfortunately, just ask the US > > Dept. of Justice. > > Don't get me started. We have the best government that money > can buy. > > > Don't think of the MISC chips as being a PC-incompatible > > replacement that needs a plethora of SVGA PCI card drivers, > > etc. > > Here here! > > > Think of them as an opportunity to expose the fraud > > behind the shortsighted standard products that get > > flogged to us this year and discontinued the next in > > the guise of progress. And of course if you are a "nobody", > > you'll never get the information necessary to write your > > own driver. Even supposing you do, it will be necessary > > to do it all over again when the card is discontinued. Far > > better to put a stake in the ground and design your > > own high-resolution color display subsystem with whatever > > features and hardware/software tradeoffs you see fit, > > for example. > > You can get a lot of hate mail for writing things like > that in the wrong place. I hope the NOSC mail list > is not the wrong place to say it. I have learned > over the years that it is very dangerous to point > out to people the things that they want to hide > from themselves or are just not yet ready to understand. > > At first I had no concerns about threating anyone > with new ideas. I mean the world is big, there are > countless billions being made, we are just a few > guys with a garage type business. Not a global > threat! > > Many people saw us as some big threat to the industry. > And few people knew about real threats except for > some science fiction writers. > > > I hate to see quibbling over which geometry results > > in what mind-numbing MIPS figure. I take it for granted > > that sophisticated designs are subject to certain > > statistics as the fat lady sings. I just want to hear > > her perform, even on an off day. > > I appologize for any quibbling and for exploding sometimes > when my short fuses get lit. I do overreact sometimes > after being accused so many times of being a lying mindless > sicophat who worships Chuck Moore, thinks he is God, and > is attempting to form a cult. I have also lost almost > all patience at being told what "I" should do even when > it is well intentioned and not meant to be critical or > insulting. > > In retrospect I think that I am not very well suited > to the computer industry where almost everyone lies > all the time and has to to earn a living. > > What is the difference between a car salesman and a > computer salesman? The car salesman knows when they > are lying to you. > > > It's reverse heresy to be conservative on these mailing lists for > > heretics ;) > > I am sure that there is some appeal to be being a reverse > heretic and defending the status quo. > > I have just lost most of my patience when in a mail list > named MachineForth, which I assumed was people to ask > and answer questions about programming Chuck's chips > the subjects are lectures about how we will fail or > how Chuck's chips are only good for doorstops or > whatever. I thought the lists were for people who > were interested in this stuff, not another place > for people to insult us. > > Well I plan to take a vacation for a while and will > not be dealing with email, mail lists, usenet, etc. > later in July and August. So take any rumors that > you read about my absence with a grain of salt. > Maybe my level of acceptance or interest will go up > if I cool down a bit. 7/09/01 Jeff Fox wrote: > > Mark Sandford wrote: > > This is true, I have also had a fair portion of career > > spent in parallel processing and the systems are > > generally designed to be 1000 workstations rather than > > 10,000 efficent processors. > > Oh, were it only a 10/1 ratio. But the ratio is from > 1000/1 to 10000/1. So rather than compare 1 1000Mhz > workstation to 10 2400Mhz processors based on > transistor count, cost, or power consumption you > have to use somewhere between 1000/1 or 10000/1. So > the equivalent of the 1000 workstation multiprocessor > is a 1,000,000 or 10,000,000 node MISC design. I > think it is really hard for people to really picture a > 24 billion MIP computer very clearly. > > But the whole idea is only possible with Chuck's > ideas on programming which came first. Without > those ideas you need 1000x more of everything to do > trivial things like a printf. Printf exposes > the antiquated and poor idea that everything > is a file. > > > They have a full OS that > > takes multiple megabytes to handle communications, a > > big portion of which is getting "printf" statements > > from this processor out to a control console. > > Exactly. > > Remember that Chuck's interest in hardware began > because he felt that that was the remaining problem. > He had invented Forth and mastered it which allowed > him to write smaller, more efficient and faster > solutions and be more productive. > > Given that as he put it, "the software problem was > solved" the remaining problems mostly had to do > with the hardware causing most of the problems. > The hardware had started to fight us at every > turn. What other people wanted to put into the > hardware to support the overhead they put into > their software made doing Forth many times more > complicated that it needed to be. Thus was born > the idea of hardware tuned to do Forth. > > > I spent > > a large portion of my career using the Inmos > > Transputer, now defunct as it didn't fit peoples ideas > > of what a processor should be but it followed many of > > the MISC concepts. > > And the Transputer was a strong influence on my > ideas of parallel hardware, parallel programming, > and what sort of custom CPU I would want. I brought > the idea to Chuck eleven years ago. But the people > in parallel processing considered Unix a joke and > wanted big industrial strength operating systems. > So they really didn't get the idea of Forth or > small processors at all. The Forth people didn't > get the idea of parallelism at all. A dozen years > later a few people are beginning to get the ideas. > > > Intrested parties should scoure > > the web for information as it is a very powerful and > > good model of what a minimalist processor should look > > like. It used byte codes and had a 3 element stack > > Yes, but not dual stack, not Forth, not tuned for a > trivially simple second generation optimizing native > Code Forth compiler or tuned for instructions smaller > than bytes. It was tuned for Occam. > > P21 is about as minimal as you can get for Forth. > Forth has two stacks, it is hard to put that into > three cells. Once you use those three cells as > stack pointers to memory it looks like a clumsy > version of a conventional processor. > > > architecture, it was designed to be programmed in a > > high level languauge called OCCAM for which most > > operations turned straight into byte codes so even > > though it looked like a high level language (like C or > > Pascal) it ran like assembly. It also had 4 > > communications links that could tie to 4 other > > processors and create a "computing surface". It is > > very low in transistor count and even contained a > > process scheduler in hardware so that no OS was ever > > required. > > Yes, of course. I knew it well. But because of where > it came from it also needed special support chips that > didn't quite fit in with commodity parts. So engineers > had problems with it. And people who were not already > using Occam didn't want a new language. I had thought > a dozen years ago that there was already an established > community of people happy with programming in Forth > and didn't anticipate how they would change to something > entirely different, ANS Forth, over that fifteen year > period. > > Occam and the transputer were tightly coupled. I don't > know which came first. A transputer could run other > things and Occam could run on other things but then > you lost the synergy. We wanted that kind of synergy > but for Forth. Particularly the exemplary style > of Forth practiced by Mr. Moore. > > > It fit with many if not all the MISC > > concepts and the code was suffienctly small that in > > many cases the processor could do many significant > > functions completely from its 4Kbytes of on-chip > > memory. > > Yes, many but not all. Occam isn't Forth. But > Forth can include the features of Occam in a few > lines of Forth code. > > > Many people missused it and added OS's and > > large amounts of external ram but you could build very > > powerfull systems efficently if you took simplicity as > > a design goal. There is much to be learned from this > > processor and hopefully the 25x will use some of these > > concepts. > > The research that led to 25x was built on top of the > Occam and Linda concepts and transputer legacy that I > brought to the table mixed with Chuck's MISC and Forth > ideas. That is ancient history now. > > > that bandwidth issues often dominate and limit > > processing power so make sure that your processing > > power has suffiecent bandwidth support, so it doesn't > > spend all its time waiting for data. > > Of course. We looked at workstation farms in the > old days and the communication between nodes was > most often the bottleneck because it was built on > top of hardware designed to compete with slow disk > drives, and on top of layers of OS software designed > to support the file paradigm. That is why Chuck > designed gigabit self-routing network interfaces > that required zero memory bandwidth most of the > time as on F21 or the multiple gigabit serial > links on one P32 to support fiberchannel and > network routing and protocol translation on $1 > chips instead of $10,000 boxes. We did extensive > simulations for years on the variations of the > instruction set, memory space, and interconnection > options. > > Not all things are best for all problems. We > didn't want a universal general purpose solution > that only operated at very low efficiency most of > the time, wasting 99.99% of its power most of the > time. So for real code, and real problems that > interested us we picked some designs and tuned things. > > Many people have complained that it isn't the > way they would tune their own design. Fine. > Please, do your own design. Do the research, > tune it for what you want. Not everyone wants > the same thing. > > There is no such thing as a general purpose solution > that gets maximum efficiency on everything. There > are many general purpose designs, or at least that > was the goal of many designs. Our goal was to get > 1000x better efficiency by narrowing the focus. > > If it isn't your focus then tune your design > differently. Chuck is in the custom silicon business. > Bring your ideas and funding and do your thing. It > is easier to be critical of other people's focus > than to have one of your own or to expose it. > > > Maybe my comment > > are related to my current work which is developing a > > chip for voice processing that has 9 DSPs running at a > > realitively high speed (10% of Chuck's 25x speed) and > > we are bandwidth not MIPS limited so I put this > > forward and caution that many people underestimate > > thier bandwidth needs and get burned in the longrun. > > Yes. And few people invented their own language and > worked exclusively with it for twenty or thirty years > before building hardware based on their understanding > of how the language worked. You didn't get to tune > the instruction set on your chips, so you could not > shrink the code by a factor of 100 and reduce your > memory bandwidth needs. Maybe you didn't spend > a few years simulating the effiency of different > design options to create a target architecture > tuned to your problem. If you used off the shelf > DSP then you had the problem that led Chuck to > want to design hardware in the first place, that > if someone else makes the hardware design decisions > they are not likely a close match to your software > plans. This leads to extra layers of software, > more work programming, reduced hardware and > software and programmer efficiency. And it also > leads to increased memory bandwidth requirements. > > I like to say that at the software level any program > can be represented by one bit on the right hardware > or zero bits if the hardware only runs one program. > Custom hardware people, or programmable hardware > people often take solutions described in software > and compile them directly to logic gates. There > is really no hard line between hardware and software > in that sense. > > But most programmers use off the shelf hardware > where someone else made the hardware decisions and > they have to add software on top to fill in the gap. > Most programmers use off the shelf OS and compilers > where someone else made the software decisions and > they have to add software on top to fill in the gap. > So there are extra layers of hardware that are not > only not needed but get in the way and extra layers > of OS software that are not only not needed but get > in the way and more extra layers of software that > _are_ needed to get around those other extra layers of > hardware and software. Bloat becomes inevitiable and > things like memory bandwidth requirements go up and > performance and programmer efficiency go down. > > The effect of the syngergy of highly tuned hardware > and software, and of Forth hardware and software are > very difficult for most programmers to grasp because > to them software is all about those extra layers that > they deal with for a living. They just have no > experience with highly tuned efficient hardware and > software and how the line between hardware and > software isn't hard at all in such systems. They > have a hard time stepping out of the world where > they live where hardware is hard, and megabytes > of software is hard, and only some software is > slightly softer. > > In our world hardware is soft and the synergy between > highly tuned custom hardware and highly tuned custom > software changes the whole picture completely. > > > You can never have too many friends, money or > > bandwidth. > > I don't know about that. I know a lot of people who > have way too much money for their own good or anyone > else's. I don't think it really helps them. It just > makes them lazy, greedy, arrogant, and mean spirited. > They worry about getting more money not being a better > person or helping anyone else or doing good with their > life. > > Even with friends, qualtiy is everything and quantity > accounts for little. Evil people can find plenty of > other evil people to be their friends. Much better > to find one good person to be your friend. > > I think there is a sense of satisifaction that comes > from doing more with less. Being able to meet your > goals without squandering resources and making other > people slave or starve or breath excessive levels of > your smoke. But we live in a culture that promotes the > idea of conspicuous consumption and retched excess as the > ideal and fosters the illusion that more is always better. > > Many people, even in the Forth community, say they > want the "ideal" computer and describe it as an > infinite register machine. I laugh and say that is > exactly the problem. They want infinite waste. > > I think infinite registers means infinite addressing > width, and infinite registers times infinite width > is infinite hardware which will require infinite > cost, infinitely large programs and infinite > time to even decode one instruction. The idea is > simply impossible, but if everything in the universe > was converted into the best approximation of their > infinite waste machine they could meet their goal > of selfishly and stupidly destroying the universe > to make one infinitely useless and infinitely slow > computer for them. > > I always found Math and Physics to have a form > of beauty that rivaled any painting or symphony. > The beauty of simple yet powerful ideas. I liked > the elegance and beauty of the shortest and simplest > solution, not the ugliest of the biggest or > most complex solution to a problem. This is why I > was attracted to the best example of elegant beauty > that I have found in the world of computing, Chuck > Moore's ideas about Forth. > > I know they are not the ultimate ideas, but computers > are still a very young idea. They are just the best > ideas I have found in that field. Mark Sandford wrote: > > --- Jeff Fox wrote: > > Mark Sandford wrote: > > > Agreed, but a chip (processor farm), that can't do > > a > > > significant/interesting demo, isn't much of a > > > technology demonstration. > > > > Can't? I am currios why you say that. > > > > But from what I have seen the demos that people want > > to see are ususally moronic and have nothing to do > > with what chips are good for. > > > > Compression and decompression of data streams in > > realtime is pretty much an open ended problem, > > things like protein folding, gene sorting, > > simulations > > and problem modeling, AI, and a lot of other things > > that need computing power are not the sort of things > > the investors want to see. They want to see a > > dancing baby doing the latest popular dance. Then > > they don't pay for the demo and don't invest anyway. > > Ok, let me rephrase. Assuming the 25x is built what > problems do you see it solving and is this specific > configuration the best answer? It appears to me that > Chuck put forth the 25x because thats what fits it 7 > sq mm. which is the minimum size for MOSIS at 0.18, > thats fine, the next question is, is that the best > configuration or does code to be run need more than > 384 "words", if it fits then this is the right answer, > if it doesn't you have the option of paging in code or > increasing memory size if the code is only a little > bigger, then increasing the on-chip is the right > answer. If it is a lot bigger then more on-chip is in > feasible and paging will be required, it's fine either > way as long as you understand the trade-offs. Some of > my concerns came with 25 processors feeding off a > single memory chip, if the processors are constantly > paging they don't get nearily the possible amount of > work done. > > As you correctly point out, most code is overly > bloated and inefficent and the HW industry has > accepted this bloat and then to L1, L2 and L3 caches > to overcome the poor programming. You are also > correct in pointing out that small effiecnt code > requires significantly smaller bandwidth. The > bandwidth requirements come in two parts, the program > (instruction) and data areas, effiecient coding > reduces instruction requirements but data is data and > can be reduced by effiecient design to some extent, > but generally this will not change much. With the 25 > processors and a single memory then, assuming each is > doing similar work they will have similar requirements > and thus get equal portions of the available > bandwidth. If the extrenal memory is capable of 250 > Mwords/Sec, then each processor could use no more than > 10 Mwords/sec. For some applications this is fine for > others this is not and the processors could be idle > much of the time. For appliactions like AI, this > probably works out fine, for others the processors > starve. > > -- stuff deleted > > > > What is described above is the classic problem, > > and > > > one that has plagued the CPU industry for years. > > This > > > has become a main mantra of mine, a system isn't > > > limited > > > nearly as much by MIPS as by memory bandwidth, and > > > > Very true. And by the programs being 100 times > > larger > > than they need to be. The overhead is built into > > the > > systems to create the artificial problem that can > > be improved in little steps for marketing purposes. > > The easist problems to solve are these sorts of > > artificial problems, but they are what drives the > > industry. > > Agreed > > -- stuff deleted > > Instead of a single 1000Mhz processor with a huge > > cache (that is dwarfed by the size of the software > > overhead required) and a huge amount of memory, a > > design optimized to carry the markeing introduced > > overhead, the same number of transistors can > > be 1000x more efficient on problems that are > > parallel. > > > > Almost all problems, certainly almost all > > interesting > > problems, are embarrasingling parallel. The only > > problems that are not are the one we artificially > > created for ourselves in our antiquated serial > > computers with absurd computational overhead. > > > > Humans don't look like Pentiums, they have 2*10^11 > > processing nodes. They don't run Unix or Windows. > > > > This is true, I have also had a fair portion of career > spent in parallel processing and the systems are > generally designed to be 1000 workstations rather than > 10,000 efficent processors. They have a full OS that > takes multiple megabytes to handle communications, a > big portion of which is getting "printf" statements > from this processor out to a control console. I spent > a large portion of my career using the Inmos > Transputer, now defunct as it didn't fit peoples ideas > of what a processor should be but it followed many of > the MISC concepts. Intrested parties should scoure > the web for information as it is a very powerful and > good model of what a minimalist processor should look > like. It used byte codes and had a 3 element stack > architecture, it was designed to be programmed in a > high level languauge called OCCAM for which most > operations turned straight into byte codes so even > though it looked like a high level language (like C or > Pascal) it ran like assembly. It also had 4 > communications links that could tie to 4 other > processors and create a "computing surface". It is > very low in transistor count and even contained a > process scheduler in hardware so that no OS was ever > required. It fit with many if not all the MISC > concepts and the code was suffienctly small that n > many cases the processor could do many significant > functions completely from its 4Kbytes of on-chip > memory. Many people missused it and added OS's and > large amounts of external ram but you could build very > powerfull systems efficently if you took simplicity as > a design goal. There is much to be learned from this > processor and hopefully the 25x will use some of these > concepts. > > > > 60,000 MIPS that can't be used is worthless, > > > > If it is considered useless it may never be made. > > If people keep repeating that it is useless other > > people will keep thinking it is useless. If none > > are ever made the only value will be the educational > > value to the few people who study the good ideas > > that are there. > > > > Some of the most brilliant people I have met love > > the idea of cheap chips with millions of mips. But > > convincing people with money is a more difficult > > problem. Convincing most people seems to simply > > be a matter of showing them that it has become > > mainstream. They equate good idea with mainstream > > I'm not trying to say that this is a bad idea, just > that bandwidth issues often dominate and limit > processing power so make sure that your processing > power has suffiecent bandwidth support, so it doesn't > spend all its time waiting for data. Maybe my comment > are related to my current work which is developing a > chip for voice processing that has 9 DSPs running at a > realitively high speed (10% of Chuck's 25x speed) and > we are bandwidth not MIPS limited so I put this > forward and caution that many people underestimate > thier bandwidth needs and get burned in the longrun. > You can never have too many friends, money or > bandwidth. > > Thanks - mark 7/20/01 List-Admin@chaossolutions.org wrote: > > Hello. > > They may be some disruption over the next 2-3 days, so if you get bounces > etc. wait until after the weekend. > > We have set up our own DNS servers but are relying on our provider to > change their records and provide delegation. > > It could go cleanly, but, it`s out of our hands and DNS can be problematic > due to propogation etc. > > Regards...Martin > 7/25/01 List-Admin@chaossolutions.org wrote: > > Hello. > > When this list was started in March, there was not much of a description of > the scope of the list. So here is an update. > > The NOSC (No Operand Set Computers) mailing list is for discussions of > the design of Forth CPU, No Operand Set Computers, ie. Zero Operand or > Stack Machines. > > Please do not stray off the narrow focus of this mailing list. > > There are plenty of other sources for Forth > information http://www.forth.org for example. > > Have fun, ask relavent questions. There is some further information on > NOSC at :- > > http://www.ultratechnology.com > > May the Forth be with you...always > 7/28/01 Eric Laforest wrote: > > Forgive the cross-post, but this is more germane to the NOSC list. > Further replies should go there. > > On Fri, Jul 27, 2001 at 05:49:17PM -0400, Jecel Assumpcao Jr thus spake: > > > > > But I noticed that the people for whom I am doing this project have > > more ambitious plans for the future, so I suggested that we might do > > better with a MISC in FPGA instead. A 15K gate Spartan 2 costs $7, so > > even with an external Flash memory it is half the price of the > > ATmega103 we now have. A quick test of the free Xilinx tools with Dr. > > Ting's nice P16 VHDL design resulted in a 25K layout running at a > > surprising 50 MHz for the slowest speed grade chip. I am sure that a > > design that used RAM for the stacks instead of individual flip flops > > would be much smaller. > > With the help of a couple of people I have a preliminary version of > a stack computer that is not unlike the F21 CPU core. > > Summary: > Machine Forth variant, Data/Address/Return stacks (16 deep each), > 3 instructions/word, 32Kword codespace, 96Kw data space, > ( both can be fetched/stored from/to) > ...and it's 17 bits wide. > (3 5-bit instructions or a 15-bit address for 2-bit JMP/JMP0/CALL) > Memory and CPU run at same speed. > (couldn't figure out how to do the F21-style pre-fetch) > If last slot is not an I/O instruction, the next word > is prefetched then, else in next 'dummy' cycle. > All instructions and JMPs/CALLs/RETs are single-cycle. > > It's coded in Verilog. > It synthesizes into ~15K gates (1100 LUTs or 1158 slices) on a > Virtex 50-E (XCV50E-6, the slowest speed grade). > Predicted speed is currently ~50MHz. > > Things to be done: > (feedback is welcome) > > Add interrupts (how many? 4?) > Add I/O lines (~37 enough? think of LCDs and IDE drives) > Clean up code for speed and size...and embarrasing mistakes. :) > Add serial EEPROM boot-loader (In-System-Programmable?) > Add some simple form of UARTs for interfacing to user and to > other identical systems. > (The Virtex II FPGAs have built-in differential signalling, which > would be a very nice feature for inter-system communications.) > > The current idea is to mount the FPGA+SRAM+FPGA EEPROM+FORTH EEPROM > onto a single little board with the IO/IRQ/serial lines brought out > to a connector. All peripheral interfacing is done through the > IO/IRQ lines to eliminate many of the headaches of interfacing > slow devices to a fast memory bus. > (At 50+MHz, most peripherals can't be used easily) > This goes with the idea that one board does only a few functions > at most if not only one (like behaving like an LCD or IDE > interface and controller) with only the bare minimum interfacing > hardware needed. > > A larger system would be composed of several of these boards, each > running the core Forth and whatever code is needed to do its tasks. > (One for the LCD, one for the IDE drive, one for ethernet, > one for a few serial ports, one for sound, etc...) > The user gets his/her own board to act as interface. > All bords talk through some point-to-point or crossbar system. > > Instead of designing one chip with intelligent co-processors like > the F21, I want to have as many peripheral functions implemented > in Forth on identical hardware. > > This is just a quick overview of something I've been working on for > a while. I would like feedback from the list members as there > is likely some really useful feature I've ommited or implemented > in a non-optimal way. I try to keep it KISS unless it saves > on hardware and wiring or software speed/efficiency. > > It's late and I must go sleep, hence please forgive the lack of > coherency if any. :) > > Eric LaForest (archive in progress)