Nosc Mail List Archive

Nosc Mail List Archive
Nosc is for discussions of the design of Forth CPU, No Operand Set Computers, ie. Zero Operand or Stack Machines.

4/20/01
Myron Plichota wrote:
> 
> This is a repeat posting of something that went out successfully to the
> original MISC list:
> 
> Dear MISCers,
> 
> It gives me great pleasure to announce the successful
> construction/testing/shakedown of the first Steamer16-based prototype
> system. The shakedown was not a cakewalk due to transmission line phenomena
> experienced on the hand-built circuit board, but it now seems solid enough
> to do some serious development work for a startup venture I am involved
> with. I still suspect a marginal data hold time problem during write cycles
> to asynchronous SRAM, but I am not going to rush into a redesign or further
> modifications of the prototype until more real-world experience has been
> gained. It is running at the theoretical limit of 20 MHz which yields a
> performance of 20 peak MIPs and 16.7 to 10 aggregate MIPs depending on the
> particulars of the opcode mix in a given instruction packet.
> 
> The design was written in VHDL and fitted to the Cypress CY37128P84-125JC
> CPLD.
> 
> I'd like to express thanks to those of you who encouraged me to see this
> through, and in particular, for recent close support by my brother Vic.
> 
> I am extremely busy working on the R&D for the startup venture, but I will
> reply ASAP to any inquiries from the readership.
> 
> Following is an excerpt from the Steamer16 documentation.
> 
> Myron Plichota
> -------------------------------------------------------------------------
> Programming model:
> 
>   Steamer16 consists of a program counter (PC) and a 3-deep evaluation
>   stack. All instructions except "lit," operate on data already on the
>   stack. PC is cleared on reset. Stack entries are undefined until loaded
>   using "lit," instructions. There is no program status word.
> 
> Instruction encoding:
> 
>   The PC addresses inline data or instruction packets, not individual
>   instructions. Five 3-bit instruction "slots" are packed left-justified
>   into each 16-bit instruction packet with the lsb a don't care. The most
>   significant slot in a packet is executed first. All instructions execute
>   identically in any slot. Inline data to satisfy any "lit," instructions
>   follows the packet itself.
> 
> Stack diagrams:
> 
>   Stack diagrams are used to describe instruction behavior by showing the
>   inputs and results on the stack in a concise notation. The inputs are on
>   the left-hand side of the "--" before/after separator, the results are on
>   the right. The input list shows only the relevant stack entries. The
>   output list shows all three entries. The symbols x, y, and z, are used to
>   denote the original values of any surviving independent stack entries.
> 
> Instruction descriptions and opcode assignments:
> 
>   NOP,  {0}     ( -- x y z)             no operation
> 
>   lit,  {1}     ( -- y z data) PC++     push data at PC, increment PC
> 
>   @,    {2}     ( addr -- x y data)     fetch data from addr
> 
>   !,    {3}     ( data addr -- x x x)   store data to addr
> 
>   +,    {4}     ( n1 n2 -- x x n1+n2)   add 2ND to TOP
> 
>   AND,  {5}     ( n1 n2 -- x x n1&n2)   and 2ND to TOP
> 
>   XOR,  {6}     ( n1 n2 -- x x n1^n2)   exclusive-or 2ND to TOP
> 
>   zgo,  {7}     ( flg addr -- x x x)    if flg equals 0 then jump to addr
>                                         else continue
> 
> Instruction timing:
> 
>   First, an instruction fetch cycle is required to load the instruction
>   register with the packet of 5 instructions currently addressed by the
>   PC. Next, the instructions contained in the packet execute in 1 cycle
>   each. An exception is made when the remainder of a packet consists
>   entirely of NOPs or zgo, takes a jump. In either of these cases the
>   current packet is aborted and another instruction fetch cycle follows
>   immediately. Packets are fetched and executed in 6 cycles during
>   straight-ahead execution for an average of 1.2 cycles per instruction.
>   Any packet containing 4 trailing NOPs will execute in 2 cycles, i.e.
>   an instruction fetch cycle followed by execution of the first
>   instruction in the packet, whether it is a NOP or not.

Myron Plichota wrote:
> 
> This originally was bounced when the MISC list went "poof":
> 
> Dear MISCers,
> 
> Steamer16 has just successfully completed a field test of a machine vision
> application in an industrial environment. The handwired prototype is
> performing at the design limit of 20 MIPS with all of the anomolies
> encountered during system shakedown fixed. Even though 20 MIPS is not a big
> number by today's standards, it's amazing what can be done in real time by
> coupling software with a minimal peripheral set of parallel inputs and
> outputs and a 20 MHz free-running timer. This vindicates the original
> concept of a microprocessor being used to replace lots of dedicated
> hardware. It is extremely gratifying to be running a simple, low-cost
> homebuilt system with such performance and 100% reliability. Goodbye Z80,
> 8031, PIC, and other similar technology that was the previous economic and
> performance limit for maverick computer experimenters. I need to be slapped
> because I'm having so much fun and can't stop grinning.
> 
> The sport of CPLD and FPGA design using low-cost tools seems to me to be the
> 21st century equivalent of the early days of microprocessor hacking before
> the monopolies forsook the needs of experimenters and stampeded towards the
> big bu$ine$$ orientation that I have bemoaned ad nauseum in some of my
> previous postings. I encourage those of you who are pursuing your own CPU
> designs to see them through and publish news of the results via the MISC
> list or email direct to me.
> 
> Myron Plichota

Eric Laforest wrote:
> 
> On Fri, Apr 20, 2001 at 01:59:18PM +0000, Myron Plichota thus spake:
> > This originally was bounced when the MISC list went "poof":
> >
> > Dear MISCers,
> >
> > Steamer16 has just successfully completed a field test of a machine vision
> > application in an industrial environment. The handwired prototype is
> 
> Excellent!
> Good to know someone is succeeding!
> I'm curious as to how 'heavy' a machine vision task can be run in real-time
> by a 20 MIPS MISC chip...
> 
> Eric LaForest

Rick Hohensee wrote:
> 
> >
> > This originally was bounced when the MISC list went "poof":
> >
> > Dear MISCers,
> >
> > The sport of CPLD and FPGA design using low-cost tools seems to me to be the
> > 21st century equivalent of the early days of microprocessor hacking before
> > the monopolies forsook the needs of experimenters and stampeded towards the
> > big bu$ine$$ orientation that I have bemoaned ad nauseum in some of my
> > previous postings. I encourage those of you who are pursuing your own CPU
> > designs to see them through and publish news of the results via the MISC
> > list or email direct to me.
> >
> > Myron Plichota
> > ------------------------
> 
> Congratulations.
> For those that don't read comp.lang.forth, H3sm, Hohensee's 3-stack
> machine, now exists entirely as x86 assembly, and seems pretty snappy. The
> 80 or so primitives require about 3k of assembly. Your machine looks like
> it would require about, oh, 200 bytes, for a 32 bit subroutine-threaded
> VM, if I am in the ballpark as to what it involves.
> 
> Questions about Steamer16
> 
>         It's a one-stack MISC with 3 parameter stack cells?
>         how many gates/transistors on the device in question?
>         how much room is left?
>         how long did it take to write the VHDL?
>         you're using real good old SRAM with it?
>         what are it's bus widths?
>         etc etc
> 
> MORE INFO!   :o)
> 
> Rick Hohensee
> www.clienux.com

4/22/01
Jeff Fox wrote:
> 
> Dear NOSC list readers:
> 
> I have completed the upload of all my videos of Dr. Ting's
> presentations and John Rible's VLSI design classes for SVFIG
> to the streaming video theater including the latest
> presentations by Dr. Ting from 4/14/01.  This is where Dr.
> Ting talks about the release of P8, P16, P24, and P32
> sources and some other stuff on CD-ROM.
> 
> Most of these videos were never on the web and some are
> new and were not available previously on CD-ROM.
> 
> Jeff Fox

4/24/01
Rick Hohensee wrote:
> 
> >
> > Rick Hohensee's questions about Steamer16:
> >
> > 1) It's a one-stack MISC with 3 parameter stack cells?
> >
> > Yes, that's all that would fit on the CY37128P84. I figured
> > this was enough to do something useful after a paper
> > evaluation using the quadratic solution as a benchmark
> > as it could be performed on a Hewlett-Packard RPN
> > calculator.
> >
> > 2) how many gates/transistors on the device in question?
> >
> > Cypress's datasheet doesn't use those metrics to quantify things.
> > The next heading sets the record straight.
> >
> > 3) how much room is left?
> >
> > The following is an excerpt from the report file generated by
> > Cypress's Warp2 VHDL compiler. There is not enough left over
> > to add more instructions or stack registers. I know because
> > I tried ;)
> >
> >   Information: Macrocell Utilization.
> >
> >                      Description        Used     Max
> >                  ______________________________________
> >                  | Dedicated Inputs   |    1  |    1  |
> >                  | Clock/Inputs       |    2  |    4  |
> >                  | I/O Macrocells     |   59  |   64  |
> >                  | Buried Macrocells  |   52  |   64  |
> >                  | PIM Input Connects |  242  |  312  |
> >                  ______________________________________
> >                                          356  /  445   = 80  %
> >
> >                                       Required     Max (Available)
> >           CLOCK/LATCH ENABLE signals     2           12
> >           Input REG/LATCH signals        0           69
> >           Input PIN signals              3            5
> >           Input PINs using I/O cells     0            0
> >           Output PIN signals            59           64
> >
> >           Total PIN signals             62           69
> >           Macrocells Used              111          128
> >           Unique Product Terms         476          640
> >
> > 4) how long did it take to write the VHDL?
> >
> > The initial implementation (~18 months ago) was written in 3 days.
> > After reviewing the boolean equations in the report file and
> > convincing myself that I expressed myself correctly, I spent 4
> > days running simulations. This was essentially a warmup exercise
> > with the then-unfamiliar tools prior to doing "serious" work for
> > a startup which went nowhere. The design sat in the can for a
> > year and I kept wondering whether it was worth realizing, due
> > to its obvious limitations and much sexier examples of silicon
> > that are out there. The CY37128P84 was chosen because it was the
> > biggest gun available that would fit in a socket I could deal
> > with with a view to hand-wiring a prototype for the (unfunded)
> > aforementioned startup. I had inventory of a 60 pc. min. qty.
> > purchase all dressed up with nowhere to go, so I dusted off
> > the design files and convinced myself that it WAS worth taking
> > to completion.
> >
> > I wrote myself an assembler (in Forth, of course ;) and became
> > unhappy with the fact that the NOP, padding that was frequently
> > required between the last explicit instruction in a packet and a
> > following (packet-aligned) jump target label cost useless clock
> > cycles to execute, so I redesigned the instruction sequencer to
> > force a fetch cycle under those circumstances. While I was at it,
> > I made the master reset synchronous. This took 1 day, followed by
> > 4 days of fresh simulator sessions.
> >
> > All hell broke loose when I fired up the prototype! Series
> > termination on the clocks and high-speed strobes solved some of
> > the problems, but some of the test software crashed consistently.
> > Persistence and trial and error showed that the zgo, instruction
> > failed when a jump was taken in 2 out of 5 slots: inline literals
> > were being executed. It took me 2 days of staring at the VHDL code
> > for the instruction packet sequencer to figure out what the problem
> > was and fix it. I also changed the write strobe timing prior to
> > identifying the real problem and have kept it that way since. I
> > took a look at my test vectors and confirmed that they never
> > generated the scenario which caused the zgo, instruction to fail.
> > Oops!
> >
> > So in summary it took ~6 days of VHDL design or debugging and ~8
> > days of simulation (should have been more in retrospect) to bring
> > things to the current happy state of affairs.
> >
> > 5) you're using real good old SRAM with it?
> >
> > I'm using a pair of 25 nS 32Kx8 SRAMs liberated from the L2 cache
> > sockets on obsolete PC motherboards on their way to the dumpster. A15
> > goes directly to the /CE pins of the SRAMs, mapping them to the low 32K
> > cells.
> >
> > 6) what are it's bus widths?
> >
> > Both the address and data busses are 16 bits. There is no byte
> > addressing capability. 2 early (R/W and W/R) and 2 late (/RD and /WR)
> > control signals are generated to obviate the need for external
> > decoding (and consequent delays) in most forseeable cases. W/R goes
> > to the SRAM /OE pins and /WR goes to the /WE pins. The /WR strobe is
> > synchronous to the 40 MHz 2x master clock and pipelined such that it
> > pulses low when the 1x clock (derived from the 2x clock) goes low to
> > put data and address hold time well clear of the SRAM's 0 nS minimum.
> > The /RD strobe is combinatorially generated when the 20 MHz 1x clock
> > is low to assure data hold time during read cycles, but this signal
> > is not used on the prototype system. All of the internal registers are
> > synchronously clocked by the rising edge of the 1x clock.
> >
> > 7) etc etc
> >
> > The 20 MHz clock limit is due to the ripple-carry adder implementation.
> > BTW, the first VHDL module I wrote used the syntax "+" to specify the
> > behavior of the adder, and that _in_itself_ exceded the capacity of the
> > chip. Apparently, a full lookahead-carry implementation was attempted.
> > Later on I noticed that there were radio buttons in the project options
> > dialog box: Goal <- area|speed which defaulted to speed. I had already
> > explicitly written the ripple-carry solution and haven't tried the "+"
> > syntax with area as the goal.
> >
> > Interrupts, wait states, and bus sharing features are not implemented
> > due to fitting limitations. A 2nd CY37128 is used on my prototype
> > system as an I/O companion chip with 16 bits of parallel input, 16
> > bits of parallel output, a free-running 20 MHz 16-bit timer, and a
> > register which provides the 2/ function. It is self-decoded near the
> > top of memory. A cable between the host PC provides the JTAG interface
> > for burning the JEDEC fuse maps into the devices and downloading code
> > using the boundary scan registers to drive the target busses. The
> > JTAG code downloader is integrated with the assembler.
> >
> > The 5V operating current was measured at 440 mA on the prototype system
> > with 2x CY37128 (Steamer16 + I/O), 2x 25 nS SRAMs, and a 40 MHz 2x
> > clock oscillator module with the system out of reset and running a real
> > application. The situation could be improved by invoking the low-power
> > options of the CY37128, downgrading the 1x clock to ~10 MHz, and using
> > low-power 70 nS SRAMs, which could be battery backed up to retain code
> > and data during the powerdown condition.
> >
> > The code is not particularly space efficient despite the compact
> > instruction encoding due to the heavy incidence of inline literals,
> > the primitive instruction set, and NOP, padding statistics.
> >
> > The software tools are written in a bastard dialect of Forth-79 that I
> > have been using since 1990. The assembler proper is lean and mean.
> > Colon definitions in an include file are used to implement macros.
> >
> > Because Steamer16 is not a true Forth chip, and in fact lacks call and
> > return instructions, it is necessary to use a re-entrant stack frame
> > strategy I call ArF (don't ask unless you have a sense of humor) which
> > is a hybrid between Forth and C under the hood. A call is implemented
> > as a sequence of 7 instructions, a return is 5 instructions, and a read
> > or write indexed into the stack frame is 5 instructions. The coding
> > style that inevitably results can be considered bad Forth that
> > over-uses static variables and the PICK operator, but on the other
> > hand relieves the programmer of optimizing the order of operands on
> > the stack or resorting to the use of stack reordering operators. ArF
> > mandates preservation of the input arguments: the results are
> > appended to the list and the calling parent is solely responsible for
> > building up or tearing down the stack frame as in C, therefore Forth
> > operators like DUP, OVER, and R@ are not required. The on-chip
> > evaluation stack is nevertheless used for bursts of Forth-like
> > activity until it is appropriate to write an intermediate or final
> > result to the stack frame or a static variable. The fine-grained
> > subroutine factoring that is one of Forth's major strengths is not
> > as attractive as it is with a true Forth chip, but there are some
> > compensations and opportunities for optimization that are unique
> > to the Steamer16/ArF genre. I have experimented enough with the
> > software end of things to come to discover that agonizing over
> > shaving off a cycle or two from a subroutine is very seldom
> > worthwhile and sometimes, very surprisingly, retrograde. I believe
> > this is due to the instruction alignment statistics, which are
> > deterministic but difficult to exactly predict as one is writing
> > code. This is not to say that the programmer should not be
> > performance-conscious, but rather that the straightforward ArF macros
> > offer _pretty_good_ runtime performance. The more I use it the more I
> > like it, and it beats the snot out of any off-the-shelf CPU that I
> > have used so far, except for DSPs.
> 
> Exiting.
> 
> I'd heard a prominent Forther was thinking of doing a one-stack Forth, and
> I thought he was BSing me, but if the one stack was the parameter stack
> that would explain it. Your instruction counts for call/return are about
> proportional to how long one near call instruction actually takes on a 386
> anyway, since it's at some level doing the same things.
> 
> CLD DD Clear Direction Flag
> 
> Opcode    Instruction     Clocks   Description
> 
> FC        CLD             2        Clear direction flag; SI and DI
>                                    will increment during string
>                                    instructions
> 
> E8  cw    CALL rel16       7+m            Call near, displacement relative
>                                           to next instruction
>                            ^^^best case timing
> 
> Linux has 60% more calls than returns. That's probably a LOT of functions
> that are in fact inline code that's being called. The subroutine threaded
> H3sm has 3 times the calls as returns though, since all thread words are
> mostly calls. If I did hardware I'd be looking into what I was talking to
> Jan Coombs about almost always doing a call on each instruction
> fetch. Then you can maybe get the return stack activity in parallel.
> 
> Rick Hohensee
> www.clienux.com

4/25/01
Myron Plichota wrote:
> 
> I have my website online now at:
> 
>   http://www3.sympatico.ca/myron.plichota/
> 
> Zipped archives are there for:
> 
>   Steamer16 VHDL
>   DTC Forth with CORDIC sin/cos function for TI's C6x DSP family
>   optimized floating-point FIR and IIR filters in GNU C
> 
> I want to add a few enhancements to the software development tools
> before I release them and also provide reference schematics. I will post
> notifications as more info becomes available.
> 
> Myron Plichota

4/26/01
Myron Plichota wrote:
> 
> On Tue, 24 Apr 2001 21:17:11 -0400 (EDT), Rick Hohensee wrote:
> 
> 
> 
> First, a gentle reminder: please try to keep the size of postings to a
> minimum. My first reply on this thread was deleted by the server filter,
> and Martin was good enough to intervene (thanks Martin), but that was
> pushing my luck. In particular, I think that cutting out all but the
> relevant portions of the message being replied to will go a long way to
> conserving the server's resources.
> 
> > I'd heard a prominent Forther was thinking of doing a one-stack Forth, and
> > I thought he was BSing me, but if the one stack was the parameter stack
> > that would explain it. Your instruction counts for call/return are about
> > proportional to how long one near call instruction actually takes on a 386
> > anyway, since it's at some level doing the same things.
> 
> I once read somewhere that RISC design philosophy was to expose an
> orthogonal microcode-like instuction set to compiler optimization, and
> the MISC/NOSC machines we have seen so far do so as well, but in the
> Forth tradition of small well-factored subroutines, or in the case of
> Steamer, as comparitively memory-hungry assembler macros and
> coarser-grained subroutine threading. In any event, synthesis of a
> variety of addressing modes requires multiple instructions and clock
> cycles. It's interesting how the comparisons pan out with respect to
> CISCs like the x86. It costs you at one level or the other it seems.
> 
> > Linux has 60% more calls than returns. That's probably a LOT of functions
> > that are in fact inline code that's being called. The subroutine threaded
> > H3sm has 3 times the calls as returns though, since all thread words are
> > mostly calls. If I did hardware I'd be looking into what I was talking to
> > Jan Coombs about almost always doing a call on each instruction
> > fetch. Then you can maybe get the return stack activity in parallel.
> 
> I haven't subscribed to comp.lang.forth, so I'm in the dark about H3sm.
> Do you have a zipped archive you could send directly to me?
> 
> BTW, if there are any C compiler design gurus on the list, I am very
> interested in an appraisal of Steamer16 as a target for a tiny,
> integer-only flavor. I started on a Forth compiler, but code bloat and
> low performance set in very quickly and it turns out to be a poor
> marriage :(
> 
> Myron Plichota

Rick Hohensee wrote:
> 
> >
> > On Tue, 24 Apr 2001 21:17:11 -0400 (EDT), Rick Hohensee wrote:
> >
> > 
> >
> > I haven't subscribed to comp.lang.forth, so I'm in the dark about H3sm.
> > Do you have a zipped archive you could send directly to me?
> >
> 
> ftp://linux01.gwdg.de/pub/cLIeNUX/interim and the last hardcopy "Forth
> Dimensions"
> 
> > BTW, if there are any C compiler design gurus on the list, I am very
> > interested in an appraisal of Steamer16 as a target for a tiny,
> > integer-only flavor. I started on a Forth compiler, but code bloat and
> > low performance set in very quickly and it turns out to be a poor
> > marriage :(
> >
> Martin Richards' page, author of BCPL, predecessor of C, and Tripos,
> predecessor of AmigaDos.
> http://www.cl.cam.ac.uk/~mr/
> 
> I uncovered a thing called smallc from gatekeeper.dec.com. Honk if you
> can't find it.
>                         Small C version C3.0R1.1
>                               (SCC3)
> 
>                             Chris Lewis
> 
> Rick Hohensee
> www.clienux.com

5/27/01
Myron Plichota wrote:
> 
> http://www3.sympatico.ca/myron.plichota

6/5/01
Lonnie Reed wrote:
> 
> If you haven't seen this yet:
> http://www.eet.com/story/OEG20010604S0087
> 
> This looks like a great idea. Instead of a grid layout, use diagonal
> connections. It reduces power consumption, connection length/signal
> delay and increases performance.
> 
> If Chuck is working on OKAD III, he might want to look into this.
> 
> Lonnie

7/01/01
Jeff Fox wrote:
> 
> http://www.mindspring.com/~chipchuck
> 
> "How many chips could a chipchuck chuck if a
> chipchuck could chuck chips?"

Mark Sandford wrote:
> 
> How much would it cost to built 25x chip?
> $14K for protos but how much for NRE?

7/03/01
Eric Laforest wrote:
> 
> Code is loaded into the on-chip 512w memory for execution...
> 
> This seems to imply that the internal and external RAM are in a flat,
> contiguous memory space.
> 
> I presume one simply copies the words one wants to use right now into
> internal RAM and then do a subroutine call to it?
> 
> ...or is the dictionnary simply set up to place the shorter, more oft-called
> words on the chip?
> 
> Eric LaForest
> 

Jeff Fox wrote:
> 
> Eric Laforest wrote:
> 
> > Code is loaded into the on-chip 512w memory for execution...
> >
> > This seems to imply that the internal and external RAM are in a flat,
> > contiguous memory space.
> 
> The X18 chip and 25x chip are a little different in that
> X18 has a pinout that is a superset of a mirror image
> of a 512Kx18 cache SRAM.  25x is a superset of that.
> 
> So X18 has access to >1MB of external 4ns memory and
> access to internal 1ns DRAM and ROM.
> 
> 25x adds 24 X18 core but without their external SRAM
> connection pins.  So they have limited memory.  They
> have register to register or memory to memory
> (I don't know which) communcation links in rows and
> columns for a total of 180Gbps internal communication
> bandwidth or something like that.
> 
> Processors on the outside of the block have connections
> to I/O pins that can be programmed to do digital or
> analog I/O and whatever prototcols will fit.  Chuck's
> ideas about software drivers for I/O are part of his
> idea of Forth.  You give up some speed in exchange
> for a wider range of I/O capability than with
> dedicated I/O hardware.  But with a 2400Mip core
> most things in the outside world look pretty slow.
> 
> I does not have one pool of memory with a 25 way
> bus arbitration unit or anything like that.
> 
> I don't know what mechanism Chuck uses to distinguish
> internal addresses from external since the chip is
> 18 bits wide and the external address bus is 18 bits.
> I would guess that it uses some paging mechanism
> but I couldn't find the information on the site yet.
> 
> The site is in progress.  There is nothing there on
> the CAD internals yet.  I know Chuck plans to add
> a lot of things.
> 
> He does not have time to do unpaid support for
> hobbyists wanting to play with ColorForth or
> his chip designs, but we can collect a list of
> things, like missing bits of documentation
> in our mail lists and pass our requests as
> a group for what we need on to Chuck.
> 
> Since the instruction set on X18 is basically the
> same as F21 with a little more information one
> could modify the simulators and emulators for
> F21 to do X18 and then later 25X.  Armed with
> those tools people could develop real code
> suitable for framing in ROM.
> 
> The best ideas for donated code routines could
> go into the ROM.  Chuck will have some interesting
> ideas about what to put into the ROM but he
> can be influenced by logic if someone else
> does identify code that deserves to be ROMmed.
> 
> > I presume one simply copies the words one wants to
> > use right now into internal RAM and then do a
> > subroutine call to it?
> 
> The processors are asynchonous and software must
> coordinate processes.  On X18 there is 4ns and
> 1ns memory just as on the F21 prototypes in .8u
> there are SRAM and DRAM and ROM busses.  The
> cache SRAM chips are not used as cache, there is
> no cache controller.  The software running on
> the CPU manages things by putting the things
> that need to run fast into the faster memory.
> Instead of the old external memories that
> provided 18ns SRAM and 30ns DRAM access these
> chips have 1ns internal DRAM and 4ns external SRAM.
> 
> > ...or is the dictionnary simply set up to place
> > the shorter, more oft-called words on the chip?
> 
> Dictionary is software.  It does whatever.
> One would assume that time critical code and/or
> more often used code would be run on chip.
> 
> Remember that Chuck says that most programs fit
> into 1K.  He has room for 1.5K inlined Forth
> opcodes or .5K calls on chip or some mix in
> between.
> 
> Chuck has also said that we could scale the tiles
> on F21 down to .18u and modernize the memory busses
> if there was interest.  The increased speed and
> reduced power make it more scifi.  The 10G
> timer could become a 50G timer.  200M analog and
> faster bit banged analog and digital I/O etc.
> 
> One advantage of .8u on the old prototypes is that
> the chips could be made in the third world on
> "obsolete" fabs for almost nothing.  They have
> quoted prices for wafers that look like what we
> paid for die even in small quantities.  And 500Mips
> per node is sufficient for many problems.  The
> nodes are also bigger than 25x nodes but require
> external memory.  I guess Chuck could fit 20K words
> of memory onto an F21 in .18u without the die
> becoming too large.
> 
> 25x was Chuck's first multiprocessor to have have
> multiple CPU instead of CPU and multiple I/O
> coprocessors.  He picked 5x5 to keep the chip
> tiny and cheap.  If someone wanted to pay for
> a big one with bigger node clusters that could
> also be done.  A large wafer could hold thousands
> of 2400 MIP X18 cores.
> 
> These designs are well suited to a class of
> computationally challenging problems.  The problems
> are real and attacking them today is very expensive.
> People are using things like machines with thousands
> of Pentium chips.  If you do a MIP/$ or MIP/W
> comparison you can see the idea.  They are not
> designed to run sluggish bloated popular software.

Eric Laforest wrote:
> 
> On Tue, Jul 03, 2001 at 12:37:48PM -0700, Jeff Fox thus spake:
> >
> > He does not have time to do unpaid support for
> > hobbyists wanting to play with ColorForth or
> > his chip designs, but we can collect a list of
> > things, like missing bits of documentation
> > in our mail lists and pass our requests as
> > a group for what we need on to Chuck.
> 
> Absolutely.
> 
> My first request is more details on the memory system. :)
> 
> Actually, is he planning to release the sources to his ColorForth/OKAD II
> systems on this site?
> 
> Eric LaForest

Jeff Fox wrote:
> 
> Eric Laforest wrote:
> > Actually, is he planning to release the sources
> > to his ColorForth/OKAD II
> > systems on this site?
> 
> ColorForth will get source and object files soon.
> 
> CAD will get tutorials, examples and component
> libraries since the links are there but no files yet.
> 
> As to the full five hundred lines of source to OKAD II
> and the full source to the chips themselves that is
> something else. He has wanted to get paid something
> for the twenty years of work he has invested into
> chip designs mostly without pay.  Chuck is past
> a reasonable retirement age given his family history.
> 
> He has discussed his option to put the CAD and
> chip sources into the public domain.  That might
> be a last ditch attempt to get someone to pay for
> consulting work.  But he has no hard plans to do that
> at the moment.  Personally I think it is a lot
> to ask. I think it would have more negatives than
> positives.
> 
> One problem I see with that is the one of
> where that would lead.  The main interest so far
> has not been from "talking chinese doll makers" as
> someone wrote in one of the other mail lists.
> The chips may just end up as weapons anyway but
> it has been an issue.
> 
> There was an interesting story on the news
> yestersay about Fort Lauderdale Florida where
> the police have many survelance cameras around
> town connected to a computer with face recognition
> software that do criminal records searches on
> people in a crowd. If you make the cost, size, and
> power consumption of that sort of technology zero
> you enable not just big brother but things like smart
> bullets and human hunting robots which are seriously
> funded projects that have wanted these chips.  And like
> I have said many times, I have many stories from the
> last ten years.  It gives another reading to the
> popular advice that we need to find a "killer app."
> 
> The idea of truely inexpensive computers with
> reasonable software that could enable and educate
> more than 2% of the world's population seems to of
> no interest to anyone.  Most people in the lucrative
> computer industry prefer the shell game with the
> highest profit margins and are opposed in every way
> to that idea.  Some people who should know better do
> all they can to trivialize and distort what Chuck has
> done in order to protect their investments in time,
> money, and skills in conventional computer technology.
> 
> Very few people have invested time, money, or
> skills in support of what Chuck has tried to do.
> Most fans have only provided lip service and only
> to other fans.  I must admit that I expected
> some support from the Forth community rather than
> have them be the most opposed group and the most
> threatened by Chuck's ideas (ideas that are not
> already at least twenty years old that is. ;-)
> 
> One nice thing about Chuck getting his own website
> is that I could more easily get out of the loop
> altogether and avoid the kill (meaning insult, distort,
> trivialize, demonize, name call etc.) the messenger
> syndrome that has characterized the last decade
> where I was the only person making information public
> on this stuff.

Eric Laforest wrote:
> 
> On Tue, Jul 03, 2001 at 02:31:46PM -0700, Jeff Fox thus spake:
> > Eric Laforest wrote:
> > > Actually, is he planning to release the sources
> > > to his ColorForth/OKAD II
> > > systems on this site?
> >
> > ColorForth will get source and object files soon.
> 
> Very cool.
> 
> > As to the full five hundred lines of source to OKAD II
> > and the full source to the chips themselves that is
> > something else. He has wanted to get paid something
> > for the twenty years of work he has invested into
> > chip designs mostly without pay.  Chuck is past
> > a reasonable retirement age given his family history.
> >
> 
> Hmm.....~14000$US for a run of 25 packaged chips from MOSIS.
> This means ~560$/chip. (somewhat more really to cover Chuck's consulting fees)
> This is not an impossible sum for many people.
> Is there interest enough in 25 or more people to fund a run of x25/F21 chips?
> At that cost, a group of determined enthusiasts could fund yearly
> or half-yearly runs with rewards of helping Chuck, getting cool
> technology and furthering MISC as a whole.
> 
> >
> > One problem I see with that is the one of
> > where that would lead.  The main interest so far
> > has not been from "talking chinese doll makers" as
> > someone wrote in one of the other mail lists.
> > The chips may just end up as weapons anyway but
> > it has been an issue.
> 
> Indeed a difficult situation to avoid.
> 
> Eric LaForest

7/04/01
Mark Sandford wrote:
> 
> Are there people to group fund such an effort?
> Exactly how much more than the 14K would it take
> for Chuck's time, using his $100/hr = $4000/week
> lets say he needs 3months that comes out to $62K.
> Are there say 12 people willing to put up 5k each
> to help this thing along?  Chuck might be willing to
> work for less if it is a group rather than comercial
> effort, in which he would still retain all rights.
> 
> I think that there are a lot of possibilities here,
> with silicon processing being a art rather than a
> science, the chip costs come more from the fact that
> not every transistor or wire is perfect so the fabs
> need to test every chip and toss those that don't pass
> for speed or functionality.  Some poeple particalarly
> HP Labs have done work with systems with known flaws
> and the system routes around the problems.  At 0.18um
> you can't expect more than something like 50% fully
> functional chips but if you are willing to call 20
> cpus good enough, then you withstand a failure of 20%
> which increase yields greatly.  Of course if your one
> and only failure is in a main bus elemnet or I/O block
> your toast, but still, you beat the odds by a large
> margin.  This should be very attriactive to consumer
> electronics companies, which end up avoiding test and
> building products and just doing a final test and
> tossing failures in the trash as they need to keep
> costs down and repair on small low cost devices isn't
> cost effective.
> 
> The space people (NASA and sattalite builders should
> also be interested as then can have redundant
> processors and just switch to another when a gama ray
> takes out one processor.
> 
> I would suggest that this be retargetted somewhat as
> 25
> processor seems a little overkill, 16 or 9 (assuming
> you like squared numbers) seems more reasonable and
> the SRAM at 4ns (250 MHz before timing margins
> on-chip), would need to get shared between 25
> processors.  Assuming they are doing similar things
> this leaves only an effective 10MHz per processor
> while they are running at 2400MHz, so unless the
> application is heavily, heavily inner loops thay will
> spend a great amount of time twidling thier thumbs
> awaiting thier turn on the bus.  Even running solid
> multiplies at 125M this still leaves a large margin
> for data transfers.  So firstly I'd trim down the
> number of processors and might suggest looking at
> pairing the processor with a x36 chip instead of the
> x18 to get two "18 bit words" per cycle and
> effectively running the memory at 500MHz x18.
> 
> Another area that I might suggest a change is the
> memory per processor 384 words might be 1K words if
> the number of processors is trimmed down to 16 or 9 so
> you would be more likely to run without needing to
> load or store data as frequently.
> 
> I might be interested in contributing to such an
> effort, I would need to know more about Chuck's
> experience and how likely the first try is likely to
> work (Murphy's Law and all).  I bought one of the
> original P21 chips and I beleive that those didn't
> function untill the 8th run so this is never a slam
> dunk especially if 0.18 and TSMC are new to his
> techniques.

Jeff Fox wrote:
> 
> Mark Sandford wrote:
> > The space people (NASA and sattalite builders should
> > also be interested as then can have redundant
> > processors and just switch to another when a gama ray
> > takes out one processor.
> 
> iTV came from NASA.  iTV did radiation testing on
> i21 and found it to be extremely resistant to
> ionizing radiation despite not being designed with
> rad-hard rules.  They also worked with the AirForce
> on processors for spacecraft.
> 
> > I would suggest that this be retargetted somewhat as 25
> > processor seems a little overkill, 16 or 9 (assuming
> > you like squared numbers) seems more reasonable and
> > the SRAM at 4ns (250 MHz before timing margins
> > on-chip), would need to get shared between 25
> > processors.  Assuming they are doing similar things
> > this leaves only an effective 10MHz per processor
> > while they are running at 2400MHz, so unless the
> > application is heavily, heavily inner loops thay will
> > spend a great amount of time twidling thier thumbs
> > awaiting thier turn on the bus.
> 
> Of course.  The same thing applies to workstation farms.
> All problems have a balance between node processing
> and node communication.  The design was not created
> for problems that are essential serial and are
> limited by communication bandwidth or serial processing.
> 
> Instead this design is for computationally intense
> problems that can use 60,000 MIPS per $1 cluster chip
> and not for software or problems that would limit
> it to 250MIPS.  A single X18 is capable of 2400MIPS
> so why limit 25 of them to a total of 250MIPS?
> 
> The proper model for F21 or 25X is a workstation
> farm, but without the hardware and software overhead
> needed to put C or Unix on each node.  A very small,
> very cheap, Forth workstation farm.
> 
> > Even running solid
> > multiplies at 125M this still leaves a large margin
> > for data transfers.  So firstly I'd trim down the
> > number of processors and might suggest looking at
> > pairing the processor
> 
> Like P21, F21, i21, and others the X18 design was
> picked to reduce the prototying cost and get a
> chip with pins that fit the prototyping constraints.
> So if someone has their own fab line and is not
> restricted by such constraints and is also not
> concerned with budget constraints the number of
> processors per die is completely variable, from
> 1 to thousands.  There is interest in thousands
> of processor per chip.
> 
> The width is also variable from 5 bits to whatever.
> Chuck's designs are in columns so scaling the
> width is mostly trivial.  Chuck said that making
> a P32 from a P21 was about a day's work in OKAD.
> 
> But the + and +* instructions timing is proportional
> to bus width, so those opcodes would be slower with a
> wider bus.  Also the pin count and costs go up.  Pins
> are more expensive than silicon in high volume.  That
> is why a 60,000 MIP 25x can cost about the same thing
> as a 2400 MIP X18.
> 
> > with a x36 chip instead of the
> > x18 to get two "18 bit words" per cycle and
> > effectively running the memory at 500MHz x18.
> 
> It could be done, and still get 2400MIPS from
> the internal memories.  Larger amounts of
> internal memory could be put on larger more
> expensive chips if prototyping costs are not
> an issue.  But these have not been billion
> dollar type funding projects so far so
> things have been kept small to make it possible.
> 
> > Another area that I might suggest a change is the
> > memory per processor 384 words might be 1K words if
> > the number of processors is trimmed down to 16 or 9 so
> > you would be more likely to run without needing to
> > load or store data as frequently.
> 
> True.  Have you a particular application in mind where
> you have determined that twice as much on chip
> memory is needed?  I spent years doing that sort of
> thing to tweak F21 before it was fabbed.
> 
> I suggest than anyone with a particular idea simulate
> it extensively to be able to tweak the design to
> do what you really find best suited to your needs.
> Chuck is in the custom silicon business.  He can
> make it work in many ways depending on what the client
> wants.  It is a little like picking items from a
> menu.  Chuck would love to make many custom versions.
> But he would also really like to make a production
> run and get some chips into some product somewhere.
> It is sort of a key element that hasn't happened.
> 
> > I might be interested in contributing to such an
> > effort, I would need to know more about Chuck's
> > experience and how likely the first try is likely to
> > work (Murphy's Law and all).  I bought one of the
> > original P21 chips and I beleive that those didn't
> > function untill the 8th run so this is never a slam
> > dunk especially if 0.18 and TSMC are new to his
> > techniques.
> 
> It did take 8 tries to get P21 completely working.
> It had the thermal bug like all conventional chips
> but at only 100Mhz in 1.2u Chuck didn't bump into
> it and didn't find it.  When he scaled down to .8u
> and went to 500Mhz he discovered a bug in the
> transitor model.  There were almost thirty
> prototypes made at iTV and four by UltraTechnology.
> The modeling in OKAD got closer and closer to
> what the fabs actually produced.
> 
> The problem was that no one could say what the fabs
> would produce.  No one knew.  People would just
> repeat the mantra that it is just too complex to understand
> so you just have to trust in your half million dollar
> CAD software and accept that if it only tries to get
> within 1/10th of the potential speed in a given
> process that things will most likely work.  The problem
> there is that that software can add so much complexity
> to the design, and do such poor routing that even a
> 90% margin of error may not be enough.  So ultimately
> it is a trial and error process whether you use
> the half million dollar tools and aim for 10% or if
> you try to actually understand what the fab process
> will really produce and fine tune your own cad
> software to match it.
> 
> Even the people who wrote the half million dollar
> CAD software would usually just say that they
> only wrote 1% of it and only really understood 1%
> of it the way Chuck needed to understand 100%.
> Also by only aiming at 10% the potential they
> could live with a very fuzzy idea of what the
> fabs would actually produce.  Remember that
> it took hundreds of millions of man years of
> testing to find some Pentium bugs.
> 
> I was convinced that everything was working in the
> last .8u prototypes, the way OKAD predicted, but not
> all the bug fixes got put into the last designs prototyped.
> The last prototypes were made in 1998 then all funding was
> gone.  The move to 1.8u will require prototyping
> to get things fine tuned.  I doubt if things will
> work 100% the first time.  Not very likely.  But most
> of the problems with CAD have been worked out over
> the last decade.  The only proof will be chips that
> do exaclty what OKAD predicts they will do.
> 
> The only one of the chips that worked 100% on the
> first try was ShBoom.  It was a funny story.  Chuck
> laid out the design, the software routed it and Chuck
> said, "This could never run.  The software is brain
> damaged and doesn't understand which circuits are
> critical for timing.  It must just create a list and
> go through it.  Look at this trace, it is the most
> imporant trace on the design but must have been one
> of the last ones routed because it goes all over the
> chip to get from here to here.  It needs to be shorter
> and straight.  The only solution is to lay out all
> the components by hand and hand wire all the sections
> together."
> 
> The engineers at OKI thought Chuck was nuts.  They
> said that his solution was impossible and simply could
> not ever work.  But Chuck did it, it worked 100% on
> the first try.  Chuck decided from that experience that
> he needed his own tools that did what he needed.  I
> think his explanations at his site of why his CAD tools
> work the way they do is very well written.
> 
> The design was stolen from Chuck and eventually found
> its way to Patriot Scientific where they spent a
> decade making changes and trying to get them to work.
> 
> Even with OKAD as evolved as it is, I would expect
> that more than one prototype fab would be needed to
> get things working 100%.  Also testing every transitor
> and combination of instruction etc. is a very
> involved process.  That is the sort of thing that
> Chuck expected from the client and owner of a chip.
> 
> I always thought that it made for an unusual programming
> challange when you have no idea what will work or is
> working.  You can't count on anything and have to
> start almost from scratch each time unless it just
> happens to work with a subset of the problems you saw
> with the last suite of diagnostic software.
> 
> I thought it would be a fun programming challange for
> Forth day.  Here is a simulated processor.  Here is
> the instruction set that it is designed to support.
> Write a program to do such and such to get X points.
> You get X extra credit points for each bug where
> the simulated processor that we give you does not
> do what this documentation says.  You also get
> points for a work around for each bug that you
> identify. You also get points for correctly
> documenting the details of the hardware bugs that
> you find.  The person with the most points at
> the end of the lunch hour wins.
> 
> This sort of programming is very different than
> what most people do.  You can't trust the chip hardware,
> you can't trust the board hardware, you can't trust
> the compiler, and the first few dozen things that
> you try simply may not work at all.  So you can't
> easily find software bugs by observing that your
> program didn't work.  The problem becomes nearly
> impossible if you introduce your own software bugs
> on top of external hardware bugs that are seen
> at a board level due to signal glitches or from
> internal hardware bugs in the instruction set
> or registers.  And the bugs that only appear
> once in ever few billion executions of a given
> sequence of otherwise proper code are very very
> hard to find.  Most programmers have a hard time
> finding their own bugs when given solid chips,
> solid boards, solid operating systems, and solid
> compilers.  The bare metal programming of
> prototype chips is tricky business.

Rick Hohensee wrote:
> 
> 60000 MIPS might be too hard to believe. It might be an easier sell to
> talk about a one-cycle process-switch. 25 task processors. I don't know if
> routing that is easier or harder though. Make it 32 bit, with two 18 bit
> cache SRAMs, put it on a PCI card, call it a multimedia board, and don't
> tell them it doesn't need the x86.
> 
> Rick Hohensee

7/05/01
Jeff Fox wrote:
> 
> Rick Hohensee wrote:
> 
> > 60000 MIPS might be too hard to believe.
> 
> It is impressive for a $1 chip.  The MIPS numbers for
> one Pentium sized or wafer sized will be harder for
> people to believe.  The MIPS numbers for a Pentium
> sized version in .1u or smaller would be harder
> for people to believe.
> 
> Many people didn't believe that 100MIPS in 1.2u
> was possible or that 500MIPS in .8u was possible
> after it was done it was pretty obvious that
> they didn't care either.  I try not to be
> concerned about the people who can't belive
> it or don't care.
> 
> > It might be an easier sell to talk about a
> > one-cycle process-switch.
> 
> Have you tried to sell the idea?  Has there been
> serious interest in a super fast task switching
> processor with a 0.4ns task switch?
> 
> > 25 task processors. I don't know if
> > routing that is easier or harder though.
> 
> A 25 way bus arbitration unit would be more
> complex that what Chuck is doing and would
> keep 24 of 25 processors shut down at any
> given time.  Given that Chuck has gone to
> dynamic logic processors would lose all their
> contents if shut down for very long.  So
> instead of 4% throughput it would be a tiny bit
> lower due to the extra overhead to keep all
> shut down processors alive waiting for a task
> switch.
> 
> > Make it 32 bit, with two 18 bit cache SRAMs,
> 
> Twice the number of pins and some multiple of the
> development cost of course.
> 
> > put it on a PCI card, call it a multimedia board,
> > and don't tell them it doesn't need the x86.
> 
> Making a product that used a new chip is
> completely different thing than desiging or
> making the chip.  That can require orders of
> magnitude higher budgets.  Who or what company
> is that you are suggesting should develop this
> PCI card product with the 32 bit chip? What
> do you think would be a good PC application
> for the product to target?  I certainly could
> be done and it might be a good idea.

t wrote:
> 
> As Jeff Fox has pointed out in previous posts, extolling the virtues of MISC
> Forth on these email lists is preaching to the choir.
> 
> What we have is a failure to communicate between tech (bleeding edge MISC
> Forth) and finance/marketing/mgmt (FMM).
> 
> We see the nominal 6 orders of magnitude imrovement in code/silicone but
> have failed to show FMM how it is to their benefit to adopt it. Since we
> have the ideas clearly in our minds, it is our responsibility to convey it
> to them in _their_ terms.
> 
> Treat it as an engineering communications problem. Define the parameter
> space and their communications protocols and find the optimum fit. Apply the
> same brutal efficiency to convey this advantage as is demonstrated in MISC.
> E.g see Chucks Floppy I/O:
> http://www.mindspring.com/~chipchuck/ide.html
> for a working example of brutal efficiency, 5 defs for disk I/O! Can we do
> the same in FMM speak?
> 
> For a simple example, a meme (a self replicating concept expressed in a
> simple phrase) would set the context. "A million times more efficient" seems
> like an obvious one to me, but this may be too much for FMM to accept.
> Repetition can get it accepted, as demonstrated by the mindless ads we all
> know and love, which means the MISC community has to go out and proselytize.
> Devise other more credible memes to spread before it.
> 
> Get involved in the Open Source Community. One post to Slashdot is
> 'priceless'. Organize a submission campaign to get Chuck's web site exposed.
> Get the MISC advantage out into the world. In a word, ADVERTISE. Just be
> covert about it, unless you have money to throw around.
> 
> To restate the situation: this is a communications/protocol mismatch. Figure
> out the motivation for Finance, Markting and Management decisions and
> provide the proper stimulus.
> 
> But that's just my opinion.
> 
> Terry Loveall

Rick Hohensee wrote:
> 
> >
> > Rick Hohensee wrote:
> >
> > > 60000 MIPS might be too hard to believe.
> >
> > It is impressive for a $1 chip.  The MIPS numbers for
> > one Pentium sized or wafer sized will be harder for
> > people to believe.  The MIPS numbers for a Pentium
> > sized version in .1u or smaller would be harder
> > for people to believe.
> >
> > Many people didn't believe that 100MIPS in 1.2u
> > was possible or that 500MIPS in .8u was possible
> > after it was done it was pretty obvious that
> > they didn't care either.  I try not to be
> > concerned about the people who can't belive
> > it or don't care.
> >
> > > It might be an easier sell to talk about a
> > > one-cycle process-switch.
> >
> > Have you tried to sell the idea?  Has there been
> 
> No, but it's a transparent drop-in to existing stuff on the scale of
> things under discussion here.
> 
> > serious interest in a super fast task switching
> > processor with a 0.4ns task switch?
> 
> >
> > > 25 task processors. I don't know if
> > > routing that is easier or harder though.
> >
> > A 25 way bus arbitration unit would be more
> > complex that what Chuck is doing and would
> > keep 24 of 25 processors shut down at any
> > given time.  Given that Chuck has gone to
> 
> I assume the only-one-active aspect.
> 
> > dynamic logic processors would lose all their
> > contents if shut down for very long.  So
> > instead of 4% throughput it would be a tiny bit
> > lower due to the extra overhead to keep all
> > shut down processors alive waiting for a task
> > switch.
> >
> 
> Throughput is 100% of "keep the pins busy" and 4% of "keep the silicon
> busy", has lower current draw? , lower exotherm? , and I don't think
> you'll get near 100% utilization of 25 engines. Clustering might scale
> close to that. SMP doesn't.
> 
> > > Make it 32 bit, with two 18 bit cache SRAMs,
> >
> > Twice the number of pins and some multiple of the
> > development cost of course.
> >
> > > put it on a PCI card, call it a multimedia board,
> > > and don't tell them it doesn't need the x86.
> >
> > Making a product that used a new chip is
> > completely different thing than desiging or
> > making the chip.  That can require orders of
> > magnitude higher budgets.  Who or what company
> > is that you are suggesting should develop this
> > PCI card product with the 32 bit chip? What
> > do you think would be a good PC application
> > for the product to target?  I certainly could
> > be done and it might be a good idea.
> 
> I dono.  :o)
> 
> Rick Hohensee

7/06/01
Mark Sandford wrote:
> 
> Jeff Fox wrote:
> >Mark Sandford wrote:
> -- Space/NASA stuff deleted for length considerations
> 
> >> I would suggest that this be retargeted somewhat as
> 25
> >> processor seems a little overkill, 16 or 9
> (assuming
> >> you like squared numbers) seems more reasonable and
> >> the SRAM at 4ns (250 MHz before timing margins
> >> on-chip), would need to get shared between 25
> >> processors.  Assuming they are doing similar things
> >> this leaves only an effective 10MHz per processor
> >> while they are running at 2400MHz, so unless the
> >> application is heavily, heavily inner loops they
> will
> >> spend a great amount of time twiddling their thumbs
> >> awaiting their turn on the bus.
> >
> >Of course.  The same thing applies to workstation
> farms.
> >All problems have a balance between node processing
> >and node communication.  The design was not created
> >for problems that are essential serial and are
> >limited by communication bandwidth or serial
> processing.
> >
> >Instead this design is for computationally intense
> >problems that can use 60,000 MIPS per $1 cluster chip
> >and not for software or problems that would limit
> >it to 250MIPS.  A single X18 is capable of 2400MIPS
> >so why limit 25 of them to a total of 250MIPS?
> >
> >The proper model for F21 or 25X is a workstation
> >farm, but without the hardware and software overhead
> >needed to put C or Unix on each node.  A very small,
> >very cheap, Forth workstation farm.
> 
> Agreed, but a chip (processor farm), that can't do a
> significant/interesting demo, isn't much of a
> technology
> demonstration.  There have been many instances of this
> in the
> past if you have to wait for the demo and then wait
> for
> an implementation that does something real people lose
> 
> interest so you can say that this chip will only
> work in one class of problems but if those problems
> aren't of
> interest then the whole technology gets dismissed.
> What is described above is the classic problem, and
> one that has plagued the CPU industry for years.  This
> has become a main mantra of mine, a system isn't
> limited
> nearly as much by MIPS as by memory bandwidth, and
> as CPU speeds increase at a rate faster than memory
> speeds increase this problem grows.  The classic
> case is the Sieve which used to be a speed test
> but as processor speeds increased beyond what
> memory could provide the test became useless.  As
> such processor designs now while they have faster
> processor clocks every year performance is dominated
> by cache size and design.  I understand that part of
> the MISC concept is that Machine Forth is that much
> smaller and thus faster than traditional Bloatware,
> but if the chip can only run very small routines,
> code or data must be load and stored and the speed
> of the processor is limited by the available
> bandwidth.
> As you mentioned workstation farms are bandwidth
> limited (with fast, wide memory and large caches, with
> one, two or four processors), how is a much faster
> set of 25 processors supposed to survive?  The
> technology
> could be proven more effectively with a better memory
> bandwidth, bandwidth requirement match.  This can be
> addressed with faster, wider external memories, and
> more on-chip memory such that the more routines
> can be stored on-chip reducing the program load
> portion of the memory bandwidth equation.  60,000 MIPS
> that can't be used is worthless, 20000 MIPS  (9
> processors)
> that can be used is worth while.  If there isn't
> enough bandwidth or the requirements can't be reduced
> the 60,000 MIPS don't have value.
> 
> A 36bit chip helps bandwidth, while keeping the size
> small
> and one chip, and the more on-chip helps reduce
> requirements
> buy having more on-chip code.  My suggestions are
> aimed at making
> the demonstration chip more viable.  Are there really
> any 60,000
> MIPs applications that run in 384 words and require
> less than
> 250Mwords of data bandwidth? I can't think of any, and
> without
> a compelling application, no matter how powerful, this
> technology will
> go nowhere.  We are engineers can often think of many
> things
> that could be done but as much as we hate to admit it,
> if nobody
> wants or can use what you develop, its nothing more
> than a
> paper-weight.
> 
> I have a strong belief that the future of processors
> will be dominated
> by the intelligent RAM concept, where you put the
> realitively small
> CPU and put it inside the RAM which can then be very
> wide 128 or 256 bits
> and center the chip on the memory availability which
> will be the limiting
> factor anyway.  The old if Muhammad won't go to the
> mountain bring the mountain
> to him concept, it sounds backwards but you need to
> overcome your problems
> via the simplest route.
> 
> >
> >> Even running solid
> >> multiplies at 125M this still leaves a large margin
> >> for data transfers.  So firstly I'd trim down the
> >> number of processors and might suggest looking at
> >> pairing the processor
> >
> >Like P21, F21, i21, and others the X18 design was
> >picked to reduce the prototying cost and get a
> >chip with pins that fit the prototyping constraints.
> >So if someone has their own fab line and is not
> >restricted by such constraints and is also not
> >concerned with budget constraints the number of
> >processors per die is completely variable, from
> >1 to thousands.  There is interest in thousands
> >of processor per chip.
> >
> >The width is also variable from 5 bits to whatever.
> >Chuck's designs are in columns so scaling the
> >width is mostly trivial.  Chuck said that making
> >a P32 from a P21 was about a day's work in OKAD.
> >
> >But the + and +* instructions timing is proportional
> >to bus width, so those opcodes would be slower with a
> 
> >wider bus.  Also the pin count and costs go up.  Pins
> 
> >are more expensive than silicon in high volume.  That
> 
> >is why a 60,000 MIP 25x can cost about the same thing
> >as a 2400 MIP X18.
> >
> >> with a x36 chip instead of the
> >> x18 to get two "18 bit words" per cycle and
> >> effectively running the memory at 500MHz x18.
> >
> >It could be done, and still get 2400MIPS from
> >the internal memories.  Larger amounts of
> >internal memory could be put on larger more
> >expensive chips if prototyping costs are not
> >an issue.  But these have not been billion
> >dollar type funding projects so far so
> >things have been kept small to make it possible.
> >
> >> Another area that I might suggest a change is the
> >> memory per processor 384 words might be 1K words if
> >> the number of processors is trimmed down to 16 or 9
> so
> >> you would be more likely to run without needing to
> >> load or store data as frequently.
> >
> >True.  Have you a particular application in mind
> where
> >you have determined that twice as much on chip
> >memory is needed?  I spent years doing that sort of
> >thing to tweak F21 before it was fabbed.
> >
> >I suggest than anyone with a particular idea simulate
> >it extensively to be able to tweak the design to
> >do what you really find best suited to your needs.
> >Chuck is in the custom silicon business.  He can
> >make it work in many ways depending on what the
> client
> >wants.  It is a little like picking items from a
> >menu.  Chuck would love to make many custom versions.
> >But he would also really like to make a production
> >run and get some chips into some product somewhere.
> >It is sort of a key element that hasn't happened.
> >
> >> I might be interested in contributing to such an
> >> effort, I would need to know more about Chuck's
> >> experience and how likely the first try is likely
> to
> >> work (Murphy's Law and all).  I bought one of the
> >> original P21 chips and I beleive that those didn't
> >> function untill the 8th run so this is never a slam
> >> dunk especially if 0.18 and TSMC are new to his
> >> techniques.
> >
> >It did take 8 tries to get P21 completely working.
> >It had the thermal bug like all conventional chips
> >but at only 100Mhz in 1.2u Chuck didn't bump into
> >it and didn't find it.  When he scaled down to .8u
> >and went to 500Mhz he discovered a bug in the
> >transitor model.  There were almost thirty
> >prototypes made at iTV and four by UltraTechnology.
> >The modeling in OKAD got closer and closer to
> >what the fabs actually produced.
> 
> -- other Chip history comments deleted for space
> 
> It seems a little misleading to say that the
> prototyping
> cost with Mosis is $14K when it may take 2, 4 or even
> 8 tries to get things working.  If it really takes 8
> tries
> the prototyping cost is $112K and 32 Months, this
> doesn't
> sound that attractive.  Chuck's models and thus
> experience
> have been (as far as I know) at 0.8um and while his
> software
> may be getting better he will have a whole new set of
> issues
> to deal with as the geometry gets smaller.  This
> transition
> has been pretty difficult for the tradition CAD
> software
> vendors.  The term deep sub-micron refers to the
> probelms
> that are seen as geormetries drop below 0.3um and the
> gate delays that defined performance historically,
> stop
> being dominant.  At 0.35um gate delays rule, and wire
> delays can be ignored. At 0.25um gate delays and wire
> delays are near equal and both must be considered.  At
> 0.18um wires dominate and gated delays can't be
> ignored
> but placement and thus wire lengths now become the
> detirming
> factor.  As Chuck's transistors are faster and he
> isn't playing
> the safe must work technology game that the
> traditional
> EDA firms are he will see these issues in a different
> fashion
> but still these problems will exist and the nature
> will
> change with geormeties.  So his software may have
> inproved
> with Chuck's understanding of the issues but he will
> need multiple tries to calibrate his technology when
> operating with his new geormtries.
> 
> Given the above his best attack maybe to put the
> processor design to the side for a moment and build
> a test chip with variuos transistor and gate designs
> and use this to calibrate his designs before trying
> a new processor on a new techology.  He could try
> various parameters and find either which line up with
> his models or tune his models to work with the given
> transistors once his models are correct getting a
> processor to work should be much easier (Murphy's
> Law still appilies unfortunately).
> 
> This said, while I would like to see Chuck succeed,
> it doesn't seem like it would be easy find investors
> to contribute to a techology that requires significant
> tuning through multiple iterations to work.  The MISC
> ideas are very powerful and it seems that
> 

7/07/01
Jeff Fox wrote:
> 
> Mark Sandford wrote:
> > Agreed, but a chip (processor farm), that can't do a
> > significant/interesting demo, isn't much of a
> > technology demonstration.
> 
> Can't?  I am currios why you say that.
> 
> But from what I have seen the demos that people want
> to see are ususally moronic and have nothing to do
> with what chips are good for.
> 
> Compression and decompression of data streams in
> realtime is pretty much an open ended problem,
> things like protein folding, gene sorting, simulations
> and problem modeling, AI, and a lot of other things
> that need computing power are not the sort of things
> the investors want to see.  They want to see a
> dancing baby doing the latest popular dance.  Then
> they don't pay for the demo and don't invest anyway.
> 
> > There have been many instances of this in the
> > past if you have to wait for the demo and then wait
> > for an implementation that does something real people
> > lose interest so you can say that this chip will only
> > work in one class of problems but if those problems
> > aren't of interest then the whole technology gets
> > dismissed.
> 
> True.  I think the real problem there is that the only
> problem that is of interest to most people is how to
> do anything while carrying a 99.9% overhead built
> into their PC.  They are only concerned with how to
> get a PC to do much of anything while it is hamstrung
> with terrible hardware and software overhead for
> backwards compatibilty reasons.  Most people think
> that is the only real problem worth addressing, how
> to get a few percent increase while carrying the
> excess overhead of PC hardware or popular software and
> few are even willing to consider starting by simply
> removing the overhead and starting with a clean
> slate to get a 1000x improvement.
> 
> > What is described above is the classic problem, and
> > one that has plagued the CPU industry for years.  This
> > has become a main mantra of mine, a system isn't
> > limited
> > nearly as much by MIPS as by memory bandwidth, and
> 
> Very true.  And by the programs being 100 times larger
> than they need to be.  The overhead is built into the
> systems to create the artificial problem that can
> be improved in little steps for marketing purposes.
> The easist problems to solve are these sorts of
> artificial problems, but they are what drives the
> industry.
> 
> > as CPU speeds increase at a rate faster than memory
> > speeds increase this problem grows.  The classic
> > case is the Sieve which used to be a speed test
> > but as processor speeds increased beyond what
> > memory could provide the test became useless.  As
> > such processor designs now while they have faster
> > processor clocks every year performance is dominated
> > by cache size and design.  I understand that part of
> > the MISC concept is that Machine Forth is that much
> > smaller and thus faster than traditional Bloatware,
> > but if the chip can only run very small routines,
> > code or data must be load and stored and the speed
> > of the processor is limited by the available
> > bandwidth.
> 
> Most programs only need a little memory for code.
> If you have lots of memory you can run larger programs.
> 
> If small programs need megabytes of code then large
> programs are not possible.  You kind of have it backwards.
> The problem with 99.9% overhead is that it limits the
> machines to only trivial problems.  The idea of low
> overhead is to be able to sovle serious problems.
> Anyone can solve trivial problems, but for marketing
> reasons the solutions are bloated up to fill the
> machine and require hardware and software upgrades
> to even do trivial things.
> 
> Look at the requirement that 80386 and 68020 have
> been classified as not powerful enought to keep up
> with a fast typist. ;-)  I read in c.l.f last year
> that it was only recently with >500Mhz 32 bit deeply
> pipelined CPU and sophisiticated optiming native
> code compilers that they were able to solve the
> same problems that they could solve twenty years
> ago with 5Mhz 8 machines running threaded Forth.
> To me this says that in twenty years they have
> more or less canceled out with hardware and software
> the 99.9% overhead that was introduced along the way.
> 
> The faster peripherals and larger storage and bigger
> displays are the big difference.  The 1000x increase
> in processing power is more or less canceled out by
> a similar increase in processing overhead.  SUVs
> get better milage than they used to also.  The
> improvements in the technology are used to cancel
> out the introduced overhead to keep profit margins
> high and give the consumers the impression that
> things are getting better.
> 
> > As you mentioned workstation farms are bandwidth
> > limited (with fast, wide memory and large caches, with
> > one, two or four processors), how is a much faster
> > set of 25 processors supposed to survive?
> 
> Sometimes the overhead is such a joke that I can't
> believe it doesn't wave a red flag to more people.  I
> listened to a lot of presentations at the Parallel
> Processing Connection over the years.  When people
> would say that they needed X megabytes on each node
> for overhead or X gigabytes total overhead to run
> a hello world program I always found it simply amazing.
> 
> > The technology
> > could be proven more effectively with a better memory
> > bandwidth, bandwidth requirement match.  This can be
> > addressed with faster, wider external memories, and
> > more on-chip memory such that the more routines
> > can be stored on-chip reducing the program load
> > portion of the memory bandwidth equation.
> 
> This is the classic image of parallel processing where
> they see node communications as the limiting factor and
> thus want the biggest nodes with biggest processor and
> biggest caches possible to reduce the level of
> parallelism.  But a lot of research over the last
> few decades has been into how biological systems can
> do so many things so well that these machines can't.
> The answer is lots more smaller nodes.
> 
> Instead of a single 1000Mhz processor with a huge
> cache (that is dwarfed by the size of the software
> overhead required) and a huge amount of memory, a
> design optimized to carry the markeing introduced
> overhead, the same number of transistors can
> be 1000x more efficient on problems that are
> parallel.
> 
> Almost all problems, certainly almost all interesting
> problems, are embarrasingling parallel.  The only
> problems that are not are the one we artificially
> created for ourselves in our antiquated serial
> computers with absurd computational overhead.
> 
> Humans don't look like Pentiums, they have 2*10^11
> processing nodes.  They don't run Unix or Windows.
> 
> > 60,000 MIPS that can't be used is worthless,
> 
> If it is considered useless it may never be made.
> If people keep repeating that it is useless other
> people will keep thinking it is useless.  If none
> are ever made the only value will be the educational
> value to the few people who study the good ideas
> that are there.
> 
> Some of the most brilliant people I have met love
> the idea of cheap chips with millions of mips.  But
> convincing people with money is a more difficult
> problem.  Convincing most people seems to simply
> be a matter of showing them that it has become
> mainstream.  They equate good idea with mainstream
> pure and simple.  Followers not leaders.
> 
> > that can be used is worth while.  If there isn't
> > enough bandwidth or the requirements can't be reduced
> > the 60,000 MIPS don't have value.
> 
> 180 billion bits per second bandwidth between nodes, and
> 1,200 billion bits per second memory bandwith in a $1
> chip makes a $1000 PC look pretty sick.  But you have
> to compare 100 25x to a PC to get the picture.  Does
> your PC have 18,000 billion bps network and 120,000
> billion bps memory bandwidth?  That hasn't stopped it
> from being marketed.
> 
> > A 36bit chip helps bandwidth, while keeping
> > the size small and one chip, and the more on-chip
> > helps reduce requirements buy having more on-chip code.
> > My suggestions are aimed at making
> > the demonstration chip more viable.
> 
> Not really. It cuts it by at least a factor of 2.  It
> would be useful if the idea is that you have to carry
> more overhead on each node.
> 
> When I brought the idea of parallel processing to Chuck
> more than ten years ago he was slow to embrace it. It
> took him time to understand an appreciate the issues.
> 
> When he brought his ideas of Forth and MISC designs to
> me it took me time to understand and appreciate the
> issues.  For instance I just didn't understand it
> when he said, "Most programs fit in one K."
> 
> I didn't understand because I was picturing programs
> with overhead built in for marketing purposes.  After
> watching Chuck for years I began to see that with
> his approach most programs fit in one K or less.
> 
> Programs that other people felt required 10megabytes
> became 1K for Chuck.  His VLSI CAD software is only
> 500 lines of code.  He doesn't need megabytes to do
> a hello world program.
> 
> > Are there really any ...
> 
> Yes.  Most problems, and most programs.  But most
> problems are beyond the machines with artificial
> self-imposed problems to solve so most people have
> never looked at how they could be solved.
> 
> > We are engineers can often think of many things
> > that could be done but as much as we hate to admit it,
> > if nobody wants or can use what you develop, its nothing
> > more than a paper-weight.
> 
> That is what the people who hate it, or are threatened
> by it, or want to see it fail have keep repeating
> for the last decade.  But there have been a few hundred
> people who have been influenced by the good ideas and
> say it has been a benefit to them.  So even if no chips
> get made, the ideas have been recognized as good
> ideas my more people than you might realize.
> 
> I am always amazed by the profiles of the people
> downloading stuff from my site.  It is popular with
> Intel, it is popular with the US Gov, it is popular
> with NASA.  And I see the ideas spreading even if
> our chips are not being produced by anyone.
> 
> But there still are people chanting that it is
> worthless or bad.  It seems that the biggest resistance
> are the people who feel threatened by change.  The
> mainframe types said all the same things about micros
> in the old days.  Worthless toys, not real computers,
> nothing more than paperweights that will never amount
> to anything but a curriousity.  I have been hearing
> that for over thirty years now.
> 
> > I have a strong belief that the future of processors
> > will be dominated by the intelligent RAM concept,
> 
> I like that idea too.  I have wanted to use Chuck's
> CAD technology to make cheap content addressable RAM.
> But we would like to sell something to get funding first.
> 
> > where you put the realitively small
> > CPU and put it inside the RAM which can then be very
> > wide 128 or 256 bits
> > and center the chip on the memory availability which
> > will be the limiting
> > factor anyway.  The old if Muhammad won't go to the
> > mountain bring the mountain
> > to him concept, it sounds backwards but you need to
> > overcome your problems
> > via the simplest route.
> 
> The idea of dropping MISC processors into a corner
> of conventional memory chips, and being able to access
> 1000 words in parallel at once has appealed to a
> lot of people.   When iTV had large well funded
> corporate partners in Asia who were manufacturing
> the memory chips that we all use those companies
> wanted to do some of that.  Then the Asian economies
> collapsed and the projects died.
> 
> > It seems a little misleading to say that the
> > prototyping
> > cost with Mosis is $14K when it may take 2, 4 or even
> > 8 tries to get things working.  If it really takes 8
> > tries the prototyping cost is $112K and 32 Months, this
> > doesn't sound that attractive.  Chuck's models and thus
> > experience have been (as far as I know) at 0.8um and while
> > his software may be getting better he will have a whole
> > new set of issues to deal with as the geometry gets smaller.
> 
> This is all very true.  But any further work rides on
> the work already done and the fab runs that other people,
> such as I, have already paid for.  As all the CAD problems
> seem to have been solved a few years ago Chuck's optimism
> may not be too overly optimizistic and my pessism may be
> overly pessimistic.
> 
> But what you say isn't quite right regarding the
> constraints.  If you can only afford the lowest budgets
> then you have a 4 month turn around.  Pay more and get
> a 4 day turn around.  If you want the projec to be
> completed in 2 months instead of 32 months that is
> is really just a budget issue.  Professional paths
> are more expensive paths funded on hobby budgets.
> Still with mostly hobby budgets we have kept up
> with or passed the companies spending billions of
> dollars on each round of chip development.
> 
> The problem is always that if you say you can do
> 100 times better on 100 times lower budget you
> will be asked to do 1000 times better on a 1,000,000
> times lower budget.  Then when you do that they
> just say they don't care anyway.
> 
> One thing that appealed to ten years ago was that Chuck's
> approach solved the big problem that other people are
> not struggling with.  Scale.  Chuck's tiled approach
> and hand layout, with simulation that takes transistor
> size, load, path lenght, and temperature effects being
> used to get the tiled design right they scale almost
> without effort.  Problem solved.
> 
> With a schematic or high level functionality description
> and reliance on automated to tools to place and route
> they never have any idea what to expect until the last
> minute and if they change the scale they have to start
> over from scratch.  This the major difference between
> Chuck's approach and other people's approach to CAD,
> they must have schematic capture and trust in tools
> while Chuck doesn't need or want it.
> 
> > This transition has been pretty difficult for the
> > tradition CAD software vendors.  The term deep sub-micron
> > refers to the probelms that are seen as geormetries drop
> > below 0.3um and the gate delays that defined performance
> > historically, stop being dominant.  At 0.35um gate delays
> > rule, and wire delays can be ignored. At 0.25um gate delays
> > and wire delays are near equal and both must be considered.
> > At 0.18um wires dominate and gated delays can't be ignored
> > but placement and thus wire lengths now become the
> > detirming factor.
> 
> Exactly!  That is why it was the first problem that Chuck
> solved ten years ago.
> 
> > As Chuck's transistors are faster and he isn't playing
> > the safe must work technology game that the traditional
> > EDA firms are he will see these issues in a different
> > fashion but still these problems will exist and the nature
> > will change with geormeties.  So his software may have
> > improved with Chuck's understanding of the issues but he will
> > need multiple tries to calibrate his technology when
> > operating with his new geormtries.
> 
> Yes, he still has to make chips and see what happens the
> same as everyone else.  But instead of billions per new
> chip the costs are much lower.  If you reduce the costs
> by a factor of 1000 he can do it 10 times faster.  If
> you reduce the funding by 1000000 he can do it about
> as fast but it is more work.  And we do get tired
> of doing it that way.
> 
> > Given the above his best attack maybe to put the
> > processor design to the side for a moment and build
> > a test chip with variuos transistor and gate designs
> > and use this to calibrate his designs before trying
> > a new processor on a new techology.  He could try
> > various parameters and find either which line up with
> > his models or tune his models to work with the given
> > transistors once his models are correct getting a
> > processor to work should be much easier (Murphy's
> > Law still appilies unfortunately).
> 
> Yes.  Chuck's doing that was essential to solve the
> industry wide thermal bug in the transitor models.
> The details were fascinating but proprietary.
> 
> > This said, while I would like to see Chuck succeed,
> > it doesn't seem like it would be easy find investors
> > to contribute to a techology that requires significant
> > tuning through multiple iterations to work.  The MISC
> > ideas are very powerful and it seems that
> 
> Very true.  But don't kid yourself that Pentium
> or Alpha designs don't require significant tuning or that
> some billion dollar efforts don't just get written
> off as development costs for designs that didn't work
> at all.  They just pick up the pieces and try again.

Jeff Fox wrote:
> 
> I wanted to add that Chuck isn't locked into
> parallel designs.  He would be happy to make
> a 32 bit chip with whatever custom features you
> want, or a 64 bit chip, or a 128 bit chip or ...
> if someone wants to pay for the work.  He
> would be even more interested if they were
> serious about manufacturing them.
> 
> He could make more on royaltees than on design
> payments if someone has a good idea and it was
> more than just advice for how other people
> should spend their money.
> 
> He is in the custom silicon business, and has
> some unusual tools and skills.  Bring him the
> ideas and the funding and something can happen.
> 
> His interest in SMP has been because there has
> been outside interest in SMP and efforts to
> generate funding.  If anyone has ideas that they
> think are better all they have to do is make him
> an offer that he can't refuse.
> 
> His CAD work is much faster now with the new tools.
> So he needs more clients with more ideas and more
> funding.
> 
> If anyone has ideas, it really doesn't matter too
> much if I like their ideas or not.  Everyone has their
> own opinions.  Mine are mine. Yours are yours.  Maybe
> someone will have some winning ideas sometime.

7/08/01
Myron Plichota wrote:
> 
> As a longtime Forth programmer and hardware designer, and shorter time
> MISC -> NOSC subscriber, I've seen many instances where beautifully
> simple ideas languish simply because they fly in the face of current
> fashion trends in the computer industry. The creative mavericks have
> been paying the price for over a decade; the politically savvy
> opportunists have yet to discover how they have assiduously played into
> the hands of the monopolies and will soon be hard up for decent paying
> jobs, thanks to a general stagnation in the industry, nightmarishly
> complicated industry standards that require mega budgets to even
> consider attempting, offshore software sweatshops, etc. The glory days
> are over unless you work for the monopolies, and not all of us can stand
> that kind of corporate culture in the first place.
> 
> It has taken me a long time to appreciate some of the finer points and
> the overall consistency in the evolution of the chips, silicon design
> tools, and Forth compilers developed by Chuck Moore, Jeff Fox, and Dr.
> Ting. These guys have my total admiration for their persistence and
> purity of vision in rejecting the emperor's new clothes in favor of
> putting the programmer back in the driver's seat. It's great news to me
> that Chuck now has his own website, which must be a considerable relief
> to Jeff Fox. Thanks Jeff, for all the effort you have put into making
> the saga available to all who are interested enough to actually read
> through it.
> 
> The recent flurry of NOSC/MForth/CForth postings illustrates what I see
> to be a culture clash. Good intentions are apparent, but differences of
> opinion as far as how to gain the notoriety required for funding the
> ongoing development work abound. I assume these are due to the widely
> varying backgrounds and age groups that we encompass.
> 
> IMHO, those who advocate porting  to a MISC
> implementation are missing a key point behind the MISC initiative, i.e.
> an OS typically attempts to be all things to all people, and becomes
> very ugly very quickly. My take on this (and I have seen this put into
> practice many times) is that what is needed are simple interface
> routines to perform the I/O functions (keyboard, mouse, video, mass
> storage, datacomm, etc.) rather than all-singin'/all-dancin' hardware
> abstraction monstrosities that pretend that everything is a file. So
> much for OSs.
> 
> I'll go farther here in stating that with enough CPU performance, which
> appears to be abundant enough even with my silly little 20 MIPS
> Steamer16, you can use software to replace dedicated hardware for an
> amazing variety of realtime applications. Sure, you need at least a
> free-running timer and parallel ports, and judicious analog and digital
> hardware interface assists, but the minimalist approach keeps you in
> control of the the procurement/availability/obsolescence issues and
> allows you to make tradeoffs that are simply denied by the menagerie of
> the silicon that is quasi-available out there. I definitely think that
> Chuck's current concept of an array of identical processors is superior
> to the dedicated coprocessor approach taken in the past. The faster the
> CPU is, the less reason to complicate the design with dedicated
> hardware. If the CPU design is debugged, then all of them in the array
> are as a natural consequence. In a multiprocessor design, you can even
> do away with interrupts by roadmapping which processor is responsible
> for what task and the interprocessor communication architecture. Why
> bother involving a '765 floppy disk controller, 16550 UART, USB
> controller, etc. when you can talk directly to the interface cable
> buffers? Software is easy to fix, hardware takes another design
> iteration and fab run.
> 
> Another good example of solving a serious system-level issue is Chuck's
> idea of establishing pin locations such that a 4 nS memory chip can be
> mounted on the opposite side of the PCB with no more that 1 cm of trace
> length pin to pin. The best solution to many problems is to sidestep them
> in the first place by changing the rules of the game. In this case,
> signal integrity is reconciled with simple, low-cost PCB fab technology.
> 
> Maybe the best bet for funding is to approach universities and research
> foundations, rather than bored and greedy venture capitalists. Or maybe
> we should all buy lottery tickets and pledge the winnings. I agree with
> Jeff that wooing the mainstream with me-tooisms is a waste of time. A
> unique niche based on nya-nya minimalism is the natural arena for Forth
> chips to shine in. The computer industry as a whole is now decadent and
> needs a slap in the face. The biggest pair of balls is way too high up
> in the clouds to swing a decent kick at unfortunately, just ask the US
> Dept. of Justice.
> 
> Don't think of the MISC chips as being a PC-incompatible replacement
> that needs a plethora of SVGA PCI card drivers, etc. Think of them as an
> opportunity to expose the fraud behind the shortsighted standard
> products that get flogged to us this year and discontinued the next in
> the guise of progress. And of course if you are a "nobody", you'll never
> get the information necessary to write your own driver. Even supposing
> you do, it will be necessary to do it all over again when the card is
> discontinued. Far better to put a stake in the ground and design your
> own high-resolution color display subsystem with whatever features and
> hardware/software tradeoffs you see fit, for example.
> 
> I hate to see quibbling over which geometry results in what mind-numbing
> MIPS figure. I take it for granted that sophisticated designs are
> subject to certain statistics as the fat lady sings. I just want to hear
> her perform, even on an off day.
> 
> It's reverse heresy to be conservative on these mailing lists for
> heretics ;)
> 
> Myron Plichota

Jeff Fox wrote:
> 
> Myron,
> 
> I like the way you phrase things.  I will have to
> save your first paragraph as a sort of manifesto
> to show to other people to frame things.
> 
> > It's great news to me that Chuck now has his own website,
> > which must be a considerable relief to Jeff Fox.
> 
> You can say that again.  I so pleased to read Chuck's
> wonderful explanations of so many of his ideas in a
> writing style that reflects his focused programming
> and system design.
> 
> After a decade of what I describe as the kill the
> messenger syndrome where about a dozen specific
> people could always be counted on post counter
> information to anything I said about what we
> were working and a few who chose to repeatedly
> characterize me as a liar, cultist, snake oil
> salesman, nieve ignorant fool or a list of
> other names I realize that I have at times lashed
> out even at those who wanted to get involved or
> provide support but didn't seem to know how.
> 
> I do plan to take a step or two back and let
> the dust settle a little now that Chuck has
> done such a good job of explaining his current
> work.  I still think that understanding the
> history of his work since he left Forth Inc.
> and logic in making major changes to his
> Forth about every five years makes it all
> make much more sense.  But in providing that
> information my web site simply has too much
> stuff and isn't as simple and clear as Chuck's
> new site.  I have a meg of html files alone!
> I was able to read all the files at Chuck's
> site more than one fairly quickly.
> 
> > The recent flurry of NOSC/MForth/CForth postings illustrates
> > what I see to be a culture clash. Good intentions are apparent,
> > but differences of opinion as far as how to gain the notoriety
> > required for funding the ongoing development work abound. I
> > assume these are due to the widely varying backgrounds and
> > age groups that we encompass.
> 
> Yes, very true.  I used to be famous for my patience.  Over
> the last few years it has worn rather thin or at times
> been worn away altogether.
> 
> > IMHO, those who advocate porting 
> > to a MISC implementation are missing a key point behind
> > the MISC initiative, i.e. an OS typically attempts to be
> > all things to all people, and becomes very ugly very quickly.
> 
> Yes.  I tried to address this idea with my "Low Fat
> Computing" theme.  Chuck's ideas are not minimalism
> for the sake of minimalism, as many people have claimed,
> or even Ting's idea of minimalism as the key to what
> he call's Forth enlightenment, but simply good
> computer health instead of the unhealthy digital
> bloated fast food that is marketed everywhere that one
> turns.
> 
> > My take on this (and I have seen this put into
> > practice many times) is that what is needed are simple
> > interface routines to perform the I/O functions (keyboard,
> > mouse, video, mass storage, datacomm, etc.) rather than
> > all-singin'/all-dancin' hardware abstraction monstrosities
> > that pretend that everything is a file. So much for OSs.
> 
> Chuck has said that OS is a dirty word.  On his new
> web site he just says it is an obsolete concept.  It is
> part of the fat, that provides the artificial problem
> that can appear to be improved for marketing purposes
> while all the time more problems are slipped in behind
> the scenes.  (63,000 estimated bugs below your code?)
> 
> It is also the legacy of Forth.  In the old days Forth
> showed it superiority by making it easy to have custom
> hardware drivers that provided improved programmer
> productivity and improved system performance.  This
> more or less ended with Chuck's PolyForth when he
> left Forth Inc. and they abandoned the legacy of
> Forth and began marketing it as a layer on top of
> popular operating systems.  As they admit the
> constraints to do that dwarfed the constraints to
> do Forth itself.
> 
> But with a generation of programmers who never saw
> anything but stuff layered on top of popular OSes
> they could not picture Forth in any other way
> except in its most ancient forms.  "Modern Forth"
> is, according to them, a layer on top of everything
> else embracing all the things that Chuck invented
> Forth to avoid.
> 
> The real problem with this, for me, was that
> there have been many people who have posting
> information about how Chuck's chips are really
> 1000x slower than what we said because they
> simply cannot imagine a computer without all
> the conventional stuff built in.
> 
> The best examples I could find were the popular
> Unix benchmarks designed to show system performance.
> The rational behind these programs, was to take
> simple problems and add the overhead that is typical
> in Unix programs.  Simple computations were intentionly
> made more complex by adding Unix system calls and
> forks and task switches and file access and all the
> overhead that is typical in other programs layered
> on top of all those standard services.  So they are a
> fair way to estimate how a given Unix system will run
> typical Unix programs with typical overhead.  They
> show how well systems can carry all the overhead
> in hardware and software needed to do all that.
> 
> They are the opposite of Chuck's idea of making
> a system with mininal fat, minimal waste, minimal
> unneeded overhead.  They show how well giant chips
> with giant memory spaces can deal with giant overhead
> on otherwise simple problems.  So I wrote web pages
> saying that anyone who suggests that these benchmarks
> are a good way to evaluate Chuck's Forth chips
> designed to run Chuck's Forth software is either
> completely ignorant of the whole idea in the first
> place or is deliberatly picking the most deceptive
> idea for a benchmark that they could find.
> 
> I was then amazed at how many well known people in
> the Forth community did exactly that and repeated
> posted information about how these chips are really
> 1000 times slower than what we say based on their
> estimates of how well they would run their favorite
> benchmarks for their favorite OS compiled with their
> favorite C compiler tuned for their favorite huge
> chips.  In other words, they don't carry fat well.
> 
> Chuck said that a generation of people could not
> separate the idea of the abstractions from the
> actual computer.  They didn't understand that
> computer hardware and computer software could be
> simple and logical.  They thought that hardware,
> hardware design, OS internals, and compilers
> were simply beyond the realm of mortals and
> could not distinguish between the layers of
> abstrations introduced to tame these introduced
> problems and the real computer below.  He
> said that he wanted to "Dispell the User Illusion"
> that a computer is Windows or Unix.
> 
> > I'll go farther here in stating that with enough
> > CPU performance, which appears to be abundant enough
> > even with my silly little 20 MIPS Steamer16, you can
> > use software to replace dedicated hardware for an
> > amazing variety of realtime applications.
> 
> This is something that I didn't understand even after
> a decade of working with the first few generations of
> PCs and I got my first PC in 74.  I began to understand
> the concept that you just described when I got a Novix
> Forth kit from this guy named Charles Moore in 86.  It
> came with a lab workbook that was very much like my
> lab workbook in Electronics Class in the physics
> department in college.  It was full of tiny simple
> programs that could allow the programmer (Chuck) to
> trade software for hardware.
> 
> It is essential to understanding Chuck's whole
> approach.   On F21 the parallel port can be treated
> as 12 extra analog I/O lines with software.  Those
> are digital lines, but can be programmed to do
> analog.  They are limited to a few megahertz of
> analog however while a few transistors specialized
> to do analog on two pins provided 40MSPS.
> 
> The dedicated analog hardware is remarkable simple
> and clean.  On a garden variety microcontroller
> a timer counts down and fires off an interrupt.
> The CPU then stops what it is doing and executes
> an ISR.  The ISR saves registers to memory then
> monkeys with A/D or D/A hardware.  Eventually
> it inputs or outputs a sample.  Then context is
> restored and the CPU goes back to doing what it
> was doing.  The result of all of that is that many
> many memory cycles are needed for each sample.
> On Chuck's design you can bit bang analog samples
> on digital pins with a little overhead.  With his
> analog coprocessor it has it own timer, its own
> DMA to memory and needs one memory cycle per sample.
> It is 100 times more efficient than conventional
> chips at doing what it does.
> 
> Still, I can't say how frustrated I was over all
> the years when people would write to me and say,
> "We are considering using F21, we currently use
> an 8051.  But our 8051 can do 20K analog samples
> per second and we are not convinced that the
> F21 could keep up with it."  (Couldn't they read the
> 20M or 40M numbers?  How could they miss that this
> chip does everything 1000x faster than an 8051?)
> 
> Understanding what the chip could do with a little
> programming required a little background that
> apparently a new generation of hardware and software
> engineers just didn't have.  They could read the
> numbers on the box of a soundcard that they plugged
> into a PC and that was about it.
> 
> The same thing applied to what you could do with
> a clever video coprocessor with instructions for
> windows acceleration, or a clever analog coprocessor
> that could be clocked for substantial oversampling,
> or a clever programmable active messaging passing
> piece of network hardware.  The ideas were apparently
> just too far ahead of their time to be understood
> without a very broad and in depth background. Some
> of the ideas were new or fresh out of research.
> 
> I can see how the people working with FPGA today
> could get these ideas while other people don't.
> They can see how specialized and optimized hardware
> will always be fastest but has no flexibility.  That
> programmable I/O coprocessors provide a compromise
> between hardware and programmability and that
> even simple hardware lets a clever programmer
> trade software for hardware.
> 
> Without the layers of generic abstraction a
> simple processor like the steamer can be quite
> powerful.  With a clever programmer who understands
> how to trade software for hardware it can be
> remarkably flexible.  I learned a lot from Bill
> Muench and John Rible about that stuff too.
> 
> Chuck's current designs have no coprocessors but
> are fast enought that a few parallel port pins
> can be programmed to perform almost any hardware
> functions imaginable.  With a bunch of Forth processors
> on chip some can be specialized I/O coprocessors
> in a given design.  Chuck's IDE driver in few lines
> of code is a good example.
> 
> His current chips are fast enough to even bit bang
> analog at very high speeds on these pins.  I still
> doubt if people who don't have a certain background
> or experience really understand these critical concepts.
> It isn't like PC where everyone assumes certain specialized
> complex hardware that is then crippled by layers
> of abstractions in generic software layers designed
> to make them all look the same.
> 
> On Chuck's new website he explains this in a couple
> of sentances.  He says that these layers of abstraction
> that try to make all these different computers look
> just the same must lower them all to the lowest possible
> common denominator.  He says, accept that they are
> different.
> 
> I tried to explain this as the spirit of MachineForth,
> face the actual machine.  Don't think the User Illusion
> is the reality.  A computer is not Windows or Unix.
> 
> But the most visible people in the Forth community are
> marketing the abstraction idea, marketing the portable
> abstraction approach.  The rest of the community is
> buying it.  Neither those selling or buying want anyone
> to point out that the emporer is not wearing any clothes.
> 
> So if anyone tried to explain how the legacy of Forth
> was something else, or how Chuck's idea of Forth was
> something else a huge number of people would distort
> and trivialize any argument that you made or call
> you a liar, cultist, snake oil salesman etc. I found
> this situation maddening at times.  I am so pleased
> that Chuck created his own website. I can't tell you.
> 
> > Sure, you need at least a
> > free-running timer and parallel ports, and judicious analog
> > and digital hardware interface assists, but the minimalist
> > approach keeps you in control of the the
> > procurement/availability/obsolescence issues and
> > allows you to make tradeoffs that are simply denied
> > by the menagerie of the silicon that is quasi-available
> > out there. I definitely think that Chuck's current concept
> > of an array of identical processors is superior
> > to the dedicated coprocessor approach taken in the past.
> > The faster the CPU is, the less reason to complicate the
> > design with dedicated hardware. If the CPU design is debugged,
> > then all of them in the array are as a natural consequence. In
> > a multiprocessor design, you can even do away with interrupts
> > by roadmapping which processor is responsible for what task
> > and the interprocessor communication architecture.
> 
> You have followed the history and understand the concepts.
> You put it well.
> 
> > Why bother involving a '765 floppy disk controller, 16550 UART,
> > USB controller, etc. when you can talk directly to the
> > interface cable buffers?
> 
> Right.  The only time you want dedicated hardware is when
> you want to push the performance to the bleeding edge.
> If you want multiple gigabit self-routing network
> coprocessors it only takes a little extra hardware
> like what I designed for F21 and that Chuck has included
> on the x25 chip. That could have been done with the same
> parallel port pins that are programmable and run off
> chip, but it gives you the extra inter-node bandwidth
> that you want in a multiprocessor.
> 
> > Software is easy to fix, hardware takes another design
> > iteration and fab run.
> 
> Very true.  The problem is that software for most people
> is only the very top layer of abstraction.  They want to
> be as isolated from the hardware as possible.  The
> result is predictable.
> 
> We are told that only recently have 500Mhz 32 bit
> deeply pipelined processors running the most sophisticed
> native code optimizing Forth compilers (on top of
> you know what) been able to meet the performance levels
> set by 5Mhz 8 bit simple processors running simple
> threaded Forths (written by Chuck) twenty years ago.
> Apparently some truth occasionally slips through the
> marketing hype.
> 
> Meanwhile Chuck came up with an incredibly simple
> native code optimizing compiler for Forth by designing
> hardware to make it incredibly simple.  The approach
> is even simple when applied to a Pentium.  And this
> was ten years ago, now he has removed the last
> remaining sytax in the language and many antiquated
> words with his ColorForth approach.
> 
> > Another good example of solving a serious system-level
> > issue is Chuck's idea of establishing pin locations such
> > that a 4 nS memory chip can be mounted on the opposite
> > side of the PCB with no more that 1 cm of trace
> > length pin to pin. The best solution to many problems is
> > to sidestep them in the first place by changing the rules
> > of the game. In this case, signal integrity is reconciled
> > with simple, low-cost PCB fab technology.
> 
> Yes, yes, yes.  After he figured out how to make the CPU
> and I/O coprocessors cost a few cents the costs of an
> extra layer in a PCB or an extra square inch of board
> space became significant.  So he concentrated on improving
> pinouts to make PCB cheaper and faster.  I am so pleased
> that someone has noticed and understands the ideas.  Too
> bad you are not one of the people with money. ;-)
> 
> > Maybe the best bet for funding is to approach universities
> > and research foundations, rather than bored and greedy venture
> > capitalists. Or maybe we should all buy lottery tickets and
> > pledge the winnings.
> 
> Perhaps.  I got boored with the Universities because it
> looked to me like they didn't want to teach anything that
> was not already so mainstream as to be obsolete.  They
> do have some good research in places however.
> 
> But stuff similar to the active message passing network
> hardware that came out of research at UC Berkeley or
> that went into F21 is yet to appear in mainstream technology.
> There is still a big gap.  From what I see they are teaching
> that computers are Windows or Unix to most people.  But maybe
> you are right that there is something there to use.
> 
> > I agree with Jeff that wooing the mainstream with me-tooisms
> > is a waste of time. A unique niche based on nya-nya minimalism
> > is the natural arena for Forth chips to shine in. The computer
> > industry as a whole is now decadent and needs a slap in the face.
> > The biggest pair of balls is way too high up in the clouds to
> > swing a decent kick at unfortunately, just ask the US
> > Dept. of Justice.
> 
> Don't get me started.  We have the best government that money
> can buy.
> 
> > Don't think of the MISC chips as being a PC-incompatible
> > replacement that needs a plethora of SVGA PCI card drivers,
> > etc.
> 
> Here here!
> 
> > Think of them as an opportunity to expose the fraud
> > behind the shortsighted standard products that get
> > flogged to us this year and discontinued the next in
> > the guise of progress. And of course if you are a "nobody",
> > you'll never get the information necessary to write your
> > own driver. Even supposing you do, it will be necessary
> > to do it all over again when the card is discontinued. Far
> > better to put a stake in the ground and design your
> > own high-resolution color display subsystem with whatever
> > features and hardware/software tradeoffs you see fit,
> > for example.
> 
> You can get a lot of hate mail for writing things like
> that in the wrong place.  I hope the NOSC mail list
> is not the wrong place to say it.  I have learned
> over the years that it is very dangerous to point
> out to people the things that they want to hide
> from themselves or are just not yet ready to understand.
> 
> At first I had no concerns about threating anyone
> with new ideas. I mean the world is big, there are
> countless billions being made, we are just a few
> guys with a garage type business.  Not a global
> threat!
> 
> Many people saw us as some big threat to the industry.
> And few people knew about real threats except for
> some science fiction writers.
> 
> > I hate to see quibbling over which geometry results
> > in what mind-numbing MIPS figure. I take it for granted
> > that sophisticated designs are subject to certain
> > statistics as the fat lady sings. I just want to hear
> > her perform, even on an off day.
> 
> I appologize for any quibbling and for exploding sometimes
> when my short fuses get lit.  I do overreact sometimes
> after being accused so many times of being a lying mindless
> sicophat who worships Chuck Moore, thinks he is God, and
> is attempting to form a cult.  I have also lost almost
> all patience at being told what "I" should do even when
> it is well intentioned and not meant to be critical or
> insulting.
> 
> In retrospect I think that I am not very well suited
> to the computer industry where almost everyone lies
> all the time and has to to earn a living.
> 
> What is the difference between a car salesman and a
> computer salesman?  The car salesman knows when they
> are lying to you.
> 
> > It's reverse heresy to be conservative on these mailing lists for
> > heretics ;)
> 
> I am sure that there is some appeal to be being a reverse
> heretic and defending the status quo.
> 
> I have just lost most of my patience when in a mail list
> named MachineForth, which I assumed was people to ask
> and answer questions about programming Chuck's chips
> the subjects are lectures about how we will fail or
> how Chuck's chips are only good for doorstops or
> whatever. I thought the lists were for people who
> were interested in this stuff, not another place
> for people to insult us.
> 
> Well I plan to take a vacation for a while and will
> not be dealing with email, mail lists, usenet, etc.
> later in July and August.  So take any rumors that
> you read about my absence with a grain of salt.
> Maybe my level of acceptance or interest will go up
> if I cool down a bit.

7/09/01
Jeff Fox wrote:
> 
> Mark Sandford wrote:
> > This is true, I have also had a fair portion of career
> > spent in parallel processing and the systems are
> > generally designed to be 1000 workstations rather than
> > 10,000 efficent processors.
> 
> Oh, were it only a 10/1 ratio.  But the ratio is from
> 1000/1 to 10000/1.  So rather than compare 1 1000Mhz
> workstation to 10 2400Mhz processors based on
> transistor count, cost, or power consumption you
> have to use somewhere between 1000/1 or 10000/1.  So
> the equivalent of the 1000 workstation multiprocessor
> is a 1,000,000 or 10,000,000 node MISC design.  I
> think it is really hard for people to really picture a
> 24 billion MIP computer very clearly.
> 
> But the whole idea is only possible with Chuck's
> ideas on programming which came first.  Without
> those ideas you need 1000x more of everything to do
> trivial things like a printf.  Printf exposes
> the antiquated and poor idea that everything
> is a file.
> 
> > They have a full OS that
> > takes multiple megabytes to handle communications, a
> > big portion of which is getting "printf" statements
> > from this processor out to a control console.
> 
> Exactly.
> 
> Remember that Chuck's interest in hardware began
> because he felt that that was the remaining problem.
> He had invented Forth and mastered it which allowed
> him to write smaller, more efficient and faster
> solutions and be more productive.
> 
> Given that as he put it, "the software problem was
> solved" the remaining problems mostly had to do
> with the hardware causing most of the problems.
> The hardware had started to fight us at every
> turn.  What other people wanted to put into the
> hardware to support the overhead they put into
> their software made doing Forth many times more
> complicated that it needed to be.  Thus was born
> the idea of hardware tuned to do Forth.
> 
> > I spent
> > a large portion of my career using the Inmos
> > Transputer, now defunct as it didn't fit peoples ideas
> > of what a processor should be but it followed many of
> > the MISC concepts.
> 
> And the Transputer was a strong influence on my
> ideas of parallel hardware, parallel programming,
> and what sort of custom CPU I would want.  I brought
> the idea to Chuck eleven years ago.  But the people
> in parallel processing considered Unix a joke and
> wanted big industrial strength operating systems.
> So they really didn't get the idea of Forth or
> small processors at all.  The Forth people didn't
> get the idea of parallelism at all.  A dozen years
> later a few people are beginning to get the ideas.
> 
> > Intrested parties should scoure
> > the web for information as it is a very powerful and
> > good model of what a minimalist processor should look
> > like.  It used byte codes and had a 3 element stack
> 
> Yes, but not dual stack, not Forth, not tuned for a
> trivially simple second generation optimizing native
> Code Forth compiler or tuned for instructions smaller
> than bytes.  It was tuned for Occam.
> 
> P21 is about as minimal as you can get for Forth.
> Forth has two stacks, it is hard to put that into
> three cells.  Once you use those three cells as
> stack pointers to memory it looks like a clumsy
> version of a conventional processor.
> 
> > architecture, it was designed to be programmed in a
> > high level languauge called OCCAM for which most
> > operations turned straight into byte codes so even
> > though it looked like a high level language (like C or
> > Pascal) it ran like assembly.  It also had 4
> > communications links that could tie to 4 other
> > processors and create a "computing surface".  It is
> > very low in transistor count and even contained a
> > process scheduler in hardware so that no OS was ever
> > required.
> 
> Yes, of course.  I knew it well.  But because of where
> it came from it also needed special support chips that
> didn't quite fit in with commodity parts.  So engineers
> had problems with it.  And people who were not already
> using Occam didn't want a new language.  I had thought
> a dozen years ago that there was already an established
> community of people happy with programming in Forth
> and didn't anticipate how they would change to something
> entirely different, ANS Forth, over that fifteen year
> period.
> 
> Occam and the transputer were tightly coupled.  I don't
> know which came first.  A transputer could run other
> things and Occam could run on other things but then
> you lost the synergy.  We wanted that kind of synergy
> but for Forth.  Particularly the exemplary style
> of Forth practiced by Mr. Moore.
> 
> > It fit with many if not all the MISC
> > concepts and the code was suffienctly small that in
> > many cases the processor could do many significant
> > functions completely from its 4Kbytes of on-chip
> > memory.
> 
> Yes, many but not all.  Occam isn't Forth.  But
> Forth can include the features of Occam in a few
> lines of Forth code.
> 
> > Many people missused it and added OS's and
> > large amounts of external ram but you could build very
> > powerfull systems efficently if you took simplicity as
> > a design goal.  There is much to be learned from this
> > processor and hopefully the 25x will use some of these
> > concepts.
> 
> The research that led to 25x was built on top of the
> Occam and Linda concepts and transputer legacy that I
> brought to the table mixed with Chuck's MISC and Forth
> ideas.  That is ancient history now.
> 
> > that bandwidth issues often dominate and limit
> > processing power so make sure that your processing
> > power has suffiecent bandwidth support, so it doesn't
> > spend all its time waiting for data.
> 
> Of course.  We looked at workstation farms in the
> old days and the communication between nodes was
> most often the bottleneck because it was built on
> top of hardware designed to compete with slow disk
> drives, and on top of layers of OS software designed
> to support the file paradigm.  That is why Chuck
> designed gigabit self-routing network interfaces
> that required zero memory bandwidth most of the
> time as on F21 or the multiple gigabit serial
> links on one P32 to support fiberchannel and
> network routing and protocol translation on $1
> chips instead of $10,000 boxes.  We did extensive
> simulations for years on the variations of the
> instruction set, memory space, and interconnection
> options.
> 
> Not all things are best for all problems.  We
> didn't want a universal general purpose solution
> that only operated at very low efficiency most of
> the time, wasting 99.99% of its power most of the
> time. So for real code, and real problems that
> interested us we picked some designs and tuned things.
> 
> Many people have complained that it isn't the
> way they would tune their own design.  Fine.
> Please, do your own design.  Do the research,
> tune it for what you want.  Not everyone wants
> the same thing.
> 
> There is no such thing as a general purpose solution
> that gets maximum efficiency on everything.  There
> are many general purpose designs, or at least that
> was the goal of many designs.  Our goal was to get
> 1000x better efficiency by narrowing the focus.
> 
> If it isn't your focus then tune your design
> differently.  Chuck is in the custom silicon business.
> Bring your ideas and funding and do your thing.  It
> is easier to be critical of other people's focus
> than to have one of your own or to expose it.
> 
> > Maybe my comment
> > are related to my current work which is developing a
> > chip for voice processing that has 9 DSPs running at a
> > realitively high speed (10% of Chuck's 25x speed) and
> > we are bandwidth not MIPS limited so I put this
> > forward and caution that many people underestimate
> > thier bandwidth needs and get burned in the longrun.
> 
> Yes.  And few people invented their own language and
> worked exclusively with it for twenty or thirty years
> before building hardware based on their understanding
> of how the language worked.  You didn't get to tune
> the instruction set on your chips, so you could not
> shrink the code by a factor of 100 and reduce your
> memory bandwidth needs.  Maybe you didn't spend
> a few years simulating the effiency of different
> design options to create a target architecture
> tuned to your problem.  If you used off the shelf
> DSP then you had the problem that led Chuck to
> want to design hardware in the first place, that
> if someone else makes the hardware design decisions
> they are not likely a close match to your software
> plans.  This leads to extra layers of software,
> more work programming, reduced hardware and
> software and programmer efficiency.  And it also
> leads to increased memory bandwidth requirements.
> 
> I like to say that at the software level any program
> can be represented by one bit on the right hardware
> or zero bits if the hardware only runs one program.
> Custom hardware people, or programmable hardware
> people often take solutions described in software
> and compile them directly to logic gates.  There
> is really no hard line between hardware and software
> in that sense.
> 
> But most programmers use off the shelf hardware
> where someone else made the hardware decisions and
> they have to add software on top to fill in the gap.
> Most programmers use off the shelf OS and compilers
> where someone else made the software decisions and
> they have to add software on top to fill in the gap.
> So there are extra layers of hardware that are not
> only not needed but get in the way and extra layers
> of OS software that are not only not needed but get
> in the way and more extra layers of software that
> _are_ needed to get around those other extra layers of
> hardware and software.  Bloat becomes inevitiable and
> things like memory bandwidth requirements go up and
> performance and programmer efficiency go down.
> 
> The effect of the syngergy of highly tuned hardware
> and software, and of Forth hardware and software are
> very difficult for most programmers to grasp because
> to them software is all about those extra layers that
> they deal with for a living.  They just have no
> experience with highly tuned efficient hardware and
> software and how the line between hardware and
> software isn't hard at all in such systems.  They
> have a hard time stepping out of the world where
> they live where hardware is hard, and megabytes
> of software is hard, and only some software is
> slightly softer.
> 
> In our world hardware is soft and the synergy between
> highly tuned custom hardware and highly tuned custom
> software changes the whole picture completely.
> 
> > You can never have too many friends, money or
> > bandwidth.
> 
> I don't know about that.  I know a lot of people who
> have way too much money for their own good or anyone
> else's.  I don't think it really helps them. It just
> makes them lazy, greedy, arrogant, and mean spirited.
> They worry about getting more money not being a better
> person or helping anyone else or doing good with their
> life.
> 
> Even with friends, qualtiy is everything and quantity
> accounts for little.  Evil people can find plenty of
> other evil people to be their friends.  Much better
> to find one good person to be your friend.
> 
> I think there is a sense of satisifaction that comes
> from doing more with less.  Being able to meet your
> goals without squandering resources and making other
> people slave or starve or breath excessive levels of
> your smoke. But we live in a culture that promotes the
> idea of conspicuous consumption and retched excess as the
> ideal and fosters the illusion that more is always better.
> 
> Many people, even in the Forth community, say they
> want the "ideal" computer and describe it as an
> infinite register machine. I laugh and say that is
> exactly the problem.  They want infinite waste.
> 
> I think infinite registers means infinite addressing
> width, and infinite registers times infinite width
> is infinite hardware which will require infinite
> cost, infinitely large programs and infinite
> time to even decode one instruction.  The idea is
> simply impossible, but if everything in the universe
> was converted into the best approximation of their
> infinite waste machine they could meet their goal
> of selfishly and stupidly destroying the universe
> to make one infinitely useless and infinitely slow
> computer for them.
> 
> I always found Math and Physics to have a form
> of beauty that rivaled any painting or  symphony.
> The beauty of simple yet powerful ideas.  I liked
> the elegance and beauty of the shortest and simplest
> solution, not the ugliest of the biggest or
> most complex solution to a problem. This is why I
> was attracted to the best example of elegant beauty
> that I have found in the world of computing, Chuck
> Moore's ideas about Forth.
> 
> I know they are not the ultimate ideas, but computers
> are still a very young idea.  They are just the best
> ideas I have found in that field.

Mark Sandford wrote:
> 
> --- Jeff Fox  wrote:
> > Mark Sandford wrote:
> > > Agreed, but a chip (processor farm), that can't do
> > a
> > > significant/interesting demo, isn't much of a
> > > technology demonstration.
> >
> > Can't?  I am currios why you say that.
> >
> > But from what I have seen the demos that people want
> > to see are ususally moronic and have nothing to do
> > with what chips are good for.
> >
> > Compression and decompression of data streams in
> > realtime is pretty much an open ended problem,
> > things like protein folding, gene sorting,
> > simulations
> > and problem modeling, AI, and a lot of other things
> > that need computing power are not the sort of things
> > the investors want to see.  They want to see a
> > dancing baby doing the latest popular dance.  Then
> > they don't pay for the demo and don't invest anyway.
> 
> Ok, let me rephrase.  Assuming the 25x is built what
> problems do you see it solving and is this specific
> configuration the best answer?  It appears to me that
> Chuck put forth the 25x because thats what fits it 7
> sq mm. which is the minimum size for MOSIS at 0.18,
> thats fine, the next question is, is that the best
> configuration or does code to be run need more than
> 384 "words", if it fits then this is the right answer,
> if it doesn't you have the option of paging in code or
> increasing memory size if the code is only a little
> bigger, then increasing the on-chip is the right
> answer.  If it is a lot bigger then more on-chip is in
> feasible and paging will be required, it's fine either
> way as long as you understand the trade-offs.  Some of
> my concerns came with 25 processors feeding off a
> single memory chip, if the processors are constantly
> paging they don't get nearily the possible amount of
> work done.
> 
> As you correctly point out, most code is overly
> bloated and inefficent and the HW industry has
> accepted this bloat and then to L1, L2 and L3 caches
> to overcome the poor programming.  You are also
> correct in pointing out that small effiecnt code
> requires significantly smaller bandwidth.  The
> bandwidth requirements come in two parts, the program
> (instruction) and data areas, effiecient coding
> reduces instruction requirements but data is data and
> can be reduced by effiecient design to some extent,
> but generally this will not change much.  With the 25
> processors and a single memory then, assuming each is
> doing similar work they will have similar requirements
> and thus get equal portions of the available
> bandwidth.  If the extrenal memory is capable of 250
> Mwords/Sec, then each processor could use no more than
> 10 Mwords/sec.  For some applications this is fine for
> others this is not and the processors could be idle
> much of the time.  For appliactions like AI, this
> probably works out fine, for others the processors
> starve.
> 
> -- stuff deleted
> 
> > > What is described above is the classic problem,
> > and
> > > one that has plagued the CPU industry for years.
> > This
> > > has become a main mantra of mine, a system isn't
> > > limited
> > > nearly as much by MIPS as by memory bandwidth, and
> >
> > Very true.  And by the programs being 100 times
> > larger
> > than they need to be.  The overhead is built into
> > the
> > systems to create the artificial problem that can
> > be improved in little steps for marketing purposes.
> > The easist problems to solve are these sorts of
> > artificial problems, but they are what drives the
> > industry.
> 
> Agreed
> 
> -- stuff deleted
> > Instead of a single 1000Mhz processor with a huge
> > cache (that is dwarfed by the size of the software
> > overhead required) and a huge amount of memory, a
> > design optimized to carry the markeing introduced
> > overhead, the same number of transistors can
> > be 1000x more efficient on problems that are
> > parallel.
> >
> > Almost all problems, certainly almost all
> > interesting
> > problems, are embarrasingling parallel.  The only
> > problems that are not are the one we artificially
> > created for ourselves in our antiquated serial
> > computers with absurd computational overhead.
> >
> > Humans don't look like Pentiums, they have 2*10^11
> > processing nodes.  They don't run Unix or Windows.
> >
> 
> This is true, I have also had a fair portion of career
> spent in parallel processing and the systems are
> generally designed to be 1000 workstations rather than
> 10,000 efficent processors.  They have a full OS that
> takes multiple megabytes to handle communications, a
> big portion of which is getting "printf" statements
> from this processor out to a control console.  I spent
> a large portion of my career using the Inmos
> Transputer, now defunct as it didn't fit peoples ideas
> of what a processor should be but it followed many of
> the MISC concepts.  Intrested parties should scoure
> the web for information as it is a very powerful and
> good model of what a minimalist processor should look
> like.  It used byte codes and had a 3 element stack
> architecture, it was designed to be programmed in a
> high level languauge called OCCAM for which most
> operations turned straight into byte codes so even
> though it looked like a high level language (like C or
> Pascal) it ran like assembly.  It also had 4
> communications links that could tie to 4 other
> processors and create a "computing surface".  It is
> very low in transistor count and even contained a
> process scheduler in hardware so that no OS was ever
> required.  It fit with many if not all the MISC
> concepts and the code was suffienctly small that n
> many cases the processor could do many significant
> functions completely from its 4Kbytes of on-chip
> memory.  Many people missused it and added OS's and
> large amounts of external ram but you could build very
> powerfull systems efficently if you took simplicity as
> a design goal.  There is much to be learned from this
> processor and hopefully the 25x will use some of these
> concepts.
> 
> > > 60,000 MIPS that can't be used is worthless,
> >
> > If it is considered useless it may never be made.
> > If people keep repeating that it is useless other
> > people will keep thinking it is useless.  If none
> > are ever made the only value will be the educational
> > value to the few people who study the good ideas
> > that are there.
> >
> > Some of the most brilliant people I have met love
> > the idea of cheap chips with millions of mips.  But
> > convincing people with money is a more difficult
> > problem.  Convincing most people seems to simply
> > be a matter of showing them that it has become
> > mainstream.  They equate good idea with mainstream
> 
> I'm not trying to say that this is a bad idea, just
> that bandwidth issues often dominate and limit
> processing power so make sure that your processing
> power has suffiecent bandwidth support, so it doesn't
> spend all its time waiting for data.  Maybe my comment
> are related to my current work which is developing a
> chip for voice processing that has 9 DSPs running at a
> realitively high speed (10% of Chuck's 25x speed) and
> we are bandwidth not MIPS limited so I put this
> forward and caution that many people underestimate
> thier bandwidth needs and get burned in the longrun.
> You can never have too many friends, money or
> bandwidth.
> 
> Thanks - mark

7/20/01
List-Admin@chaossolutions.org wrote:
> 
> Hello.
> 
> They may be some disruption over the next 2-3 days, so if you get bounces
> etc. wait until after the weekend.
> 
> We have set up our own DNS servers but are relying on our provider to
> change their records and provide delegation.
> 
> It could go cleanly, but, it`s out of our hands and DNS can be problematic
> due to propogation etc.
> 
> Regards...Martin
> 

7/25/01
List-Admin@chaossolutions.org wrote:
> 
> Hello.
> 
> When this list was started in March, there was not much of a description of
> the scope of the list.  So here is an update.
> 
> The NOSC   (No Operand Set Computers)   mailing list is for discussions of
> the design of Forth CPU, No Operand Set Computers, ie. Zero Operand or
> Stack Machines.
> 
> Please do not stray off the narrow focus of this mailing list.
> 
> There are plenty of other sources for Forth
> information   http://www.forth.org   for example.
> 
> Have fun, ask relavent questions.  There is some further information on
> NOSC at :-
> 
> http://www.ultratechnology.com
> 
> May the Forth be with you...always
> 

7/28/01
Eric Laforest wrote:
> 
> Forgive the cross-post, but this is more germane to the NOSC list.
> Further replies should go there.
> 
> On Fri, Jul 27, 2001 at 05:49:17PM -0400, Jecel Assumpcao Jr thus spake:
> 
> >
> > But I noticed that the people for whom I am doing this project have
> > more ambitious plans for the future, so I suggested that we might do
> > better with a MISC in FPGA instead. A 15K gate Spartan 2 costs $7, so
> > even with an external Flash memory it is half the price of the
> > ATmega103 we now have. A quick test of the free Xilinx tools with Dr.
> > Ting's nice P16 VHDL design resulted in a 25K layout running at a
> > surprising 50 MHz for the slowest speed grade chip. I am sure that a
> > design that used RAM for the stacks instead of individual flip flops
> > would be much smaller.
> 
> With the help of a couple of people I have a preliminary version of
> a stack computer that is not unlike the F21 CPU core.
> 
> Summary:
> Machine Forth variant, Data/Address/Return stacks (16 deep each),
> 3 instructions/word, 32Kword codespace, 96Kw data space,
> ( both can be fetched/stored from/to)
> ...and it's 17 bits wide.
> (3 5-bit instructions or a 15-bit address for 2-bit JMP/JMP0/CALL)
> Memory and CPU run at same speed.
> (couldn't figure out how to do the F21-style pre-fetch)
> If last slot is not an I/O instruction, the next word
> is prefetched then, else in next 'dummy' cycle.
> All instructions and JMPs/CALLs/RETs are single-cycle.
> 
> It's coded in Verilog.
> It synthesizes into ~15K gates (1100 LUTs or 1158 slices) on a
> Virtex 50-E (XCV50E-6, the slowest speed grade).
> Predicted speed is currently ~50MHz.
> 
> Things to be done:
> (feedback is welcome)
> 
> Add interrupts (how many? 4?)
> Add I/O lines (~37 enough? think of LCDs and IDE drives)
> Clean up code for speed and size...and embarrasing mistakes. :)
> Add serial EEPROM boot-loader (In-System-Programmable?)
> Add some simple form of UARTs for interfacing to user and to
> other identical systems.
> (The Virtex II FPGAs have built-in differential signalling, which
> would be a very nice feature for inter-system communications.)
> 
> The current idea is to mount the FPGA+SRAM+FPGA EEPROM+FORTH EEPROM
> onto a single little board with the IO/IRQ/serial lines brought out
> to a connector.  All peripheral interfacing is done through the
> IO/IRQ lines to eliminate many of the headaches of interfacing
> slow devices to a fast memory bus.
> (At 50+MHz, most peripherals can't be used easily)
> This goes with the idea that one board does only a few functions
> at most if not only one (like behaving like an LCD or IDE
> interface and controller) with only the bare minimum interfacing
> hardware needed.
> 
> A larger system would be composed of several of these boards, each
> running the core Forth and whatever code is needed to do its tasks.
> (One for the LCD, one for the IDE drive, one for ethernet,
> one for a few serial ports, one for sound, etc...)
> The user gets his/her own board to act as interface.
> All bords talk through some point-to-point or crossbar system.
> 
> Instead of designing one chip with intelligent co-processors like
> the F21, I want to have as many peripheral functions implemented
> in Forth on identical hardware.
> 
> This is just a quick overview of something I've been working on for
> a while.  I would like feedback from the list members as there
> is likely some really useful feature I've ommited or implemented
> in a non-optimal way.  I try to keep it KISS unless it saves
> on hardware and wiring or software speed/efficiency.
> 
> It's late and I must go sleep, hence please forgive the lack of
> coherency if any. :)
> 
> Eric LaForest

(archive in progress)
Page Created 07/29/01