In normal operation there will be layers of network support software. At a low level a processor must act as network master and control the flow, error correction, and housekeeping of the network. The network master will assign network SOM and EOM addresses on the network and control which units are allowed to write at a given time.
A low level software network operation will be to mark a section of memory for a DMA transfer, prepare the data, and queue up the request to send. When the master gives this unit permission to send the message, the CPU will set the serial/network coprocessor control bits to write rather than read, and initiate a serial transmission that will interrupt the CPU of both the sending and receiving units when it finishes.
In normal operation the serial/network coprocessor will be reading and echoing serial data, but it will not be making any memory access. When it sees its own SOM it begins writing from the serial port to memory. When it sees its own EOM it stops writing and interrupts the CPU. The CPU interrupt code will reset serial/network interrupt bit, process the serial data for error control, and reset the serial/network coprocessor. At a high level of network software control there will be words RX( and G!. RX( remotely executes a word by queueing up its execution vector for a remote CPU interrupt somewhere on the network. And G! queues up a write to a location in the global memory, the distributed shared memory. At the hardware levels these operations are supported by CPU interrupts and DMA transfers across the network.
The serial/network coprocessor on the initial F21 prototypes will use the same clock as the analog and video coprocessors. A later version may use a separate pin for network clock input. There is an 11 bit counter that counts down the timing from the input clock. If a 14Mhz xtal is being used to generate NTSC video timing for the video processor this counter would provide a range from 14Mhz to 7Khz for the timing of the bits on the serial/network coprocessor. The upper limit for the video clock on the initial F21 prototype is 20Mhz, and tests will be made to see how fast the network interface can be clocked on F21. The internals of the serial/network coprocessor limit it to about 1Gbps, so the network clock signal and network i/o signal quality will determine the upper limit for serial/network operation.
Name abbr size address_pattern function Buffer 20 (internal) Buffer memory Data 20 (internal) Shift Address 21 (internal) Address memory Config Cn 20 1E8000 Specify configuration options Match SOM 20 1E4000 Start Of Message pattern Match EOM 20 1E2000 End Of Message patternThree of these registers can be read or written by the CPU by simply accessing that location in memory. The Buffer and Data registers are internal only, and are not set or read by the CPU. The CPU will also set the Address register in the serial/network coprocessor by executing a routine shown later that causes the serial/network coprocessor to grab an address from the CPU and place it in this register.
Addressable registers have the following formats:
bit 19 . . . . . . . . . . . . . . . . . . 0 SOM m m m m g g g g m m m m m m m m m m m m EOM m m m m m m m m m m m m m m m m m m m m Cn s o - - c c c c c c c c c c c b b a - x - 0 - - - - - - - - - - - - - - - - - - power-up off m - bit must match gggg - needn't match if gggg = 0 s - 1 send input onward Normal operation 0 sink input Master unit echos nothing o - 0 coprocessor off (power-up) 1 coprocessor on x - 0 transmit, read memory CPU sets this to send to network 1 receive, write memory Normal operation read from network c-c - clock rate count down timing clock bb - 0 20 bits/word only mode implemented on prototype 1 16 (not implemented) 2 8 (not implemented) a - address from Stack Processor used by CPU to set address register
Bits are shifted into the Data register every tick. If receiving, Data is transferred to Buffer every bb ticks and written to memory. If transmitting, Buffer is read from memory and transferred to Data.
If transmitting, the output pin is connected to Data bit 0. If receiving onward (x is 1, s is 1), it's connected to bit 19. This passes the input through with a one-bit delay for re-clocking. If receiving sinking (x is 1, s is 0), it's alternately connected to bits 0 and 1 of EOM.
There must be sufficient transitions in serial input to maintain clock synchronization (TBD, probably every 5-10 bits). The counter is reset at every transition. This synchronizes independent clocks to within a tick (70 ns @ 14 MHz). Bits are right-shifted into bit 20 and out of bit 0.
Addresses do not increment beyond 10 bits. That is, 0FFFFF increments to 0FFC00. Thus network I/O cycles within a DRAM page.
Data is continuously compared with the Match registers. If receiving, when SOM is found, the length counter is reset (indicating a word align) and data starts being recorded. When EOM is found, the coprocessor is halted (Cn18 reset) and the Stack processor interrupted. EOM terminates both received and transmitted messages.
SOM has 8 bits intended to be a port address. The 4 high-order bits specify a group, and the 4 low-order bits a unit. If the unit bits are 0, the data is recognized as broadcast to the entire group. Examples:
13800 - recognized only by unit 13800 70AAA - recognized by units 71AAA - 7FAAA 80000 - recognized by units 80000 - 8F000 (start bit)SOM (and EOM) must be a pattern that will not occur within a message. Since high-speed transmissions must have sufficient transitions to maintain clock sync, an illegal string of 0s (or 1s) is one possibility. Better is a reserved pattern that preserves sync. If SOM is 80000 and EOM is 00000, the effect is of asynchronous start and stop bits, though there must be 20 stop bits between messages.
At power-up, the processor is off, registers undefined. Setting sinking receive will allow a message that sets a unique port address. A future chip might power-up with receive on and default SOM, EOM and Address. This would permit booting from the net.
The Stack Processor sets the rate and control bits in Cn. If Cn2 is one, the next address the SP provides will be incremented and latched into Address. Thus the SP must execute the code
: SET-NET-ADDRESS ( a -- ) 17FFF p com a! nop \ Cn to A @a 4 # -or nop \ set get address bit !a a! @a drop \ put address on bus ; \ for network coprocessorwith both !a and @a in the same word, to set Address. After the @a Cn2 is automatically set to zero. The next Network transfer will be will be at the word after the SP address.
The pattern 17FFF when complemented makes the pattern 1E8000 which is the address of the network configuration register Cn. Bit 4 the _a_ bit is changed to force the serial/network coprocessor to prepare to latch the next address used by the CPU. The CPU then sets the CPU _a_ register with the a! instruction and reads from that location in memory with the @a instruction. The !a and @a must be in the same word so that the serial/network coprocessor will properly latch the address intended for use as the network data transfer buffer.
The serial/network coprocessor on the F21 is a hardware device designed to provide hardware support for Direct Memory Access transfers and remote CPU interrupts in an F21 multiprocessor. The unit is essentially a serial shifter with a DMA and CPU interrupt unit. The only instructions it recognizes are control bits in the serial/network coprocessor configuration register, and the Start of Message and End of Message tokens in the serial bit stream.
In normal operation the unit is continuously scanning the serial bit stream for the instructions to perform a download from the serial stream to memory, or stop such a transfer and interrupt the CPU. It does not read any instructions from the memory bus to do this. It only makes use of the memory bus for DMA transfers. Thus there is normally no overhead involved in having it running. It will simply echo incomming data to its output. If it sees a SOM that matches the one in its own SOM register in the data stream it begins a DMA transfer, and if it sees its own EOM instruction it stops a transfer and performs a CPU interrupt. If the CPU sets the transmit bit in the configuration register it will read from memory and write to the output of the serial/network coprocessor.
If a transfer is made into memory, it will require a certain number of memory write operations, but the serial/network coprocessor does not read instructions from memory, and if it is not performing a DMA to or from memory it will add no memory access overhead.
The F21 chip has two pins labeled Si and So these are serial input and serial output. The serial/network coprocessor actually uses three pins since is also shares the CLK signal with the video and analog coprocessors. The processes instructions in the serial data stream and provides DMA and CPU interrupt functions in hardware.
The F21 serial/network coprocessor performs serialization, DMA, clocking, and recognizes its instructions with its own hardware and does not require the use the CPU. The serial/network coprocessor can interrupt the CPU. This happens after the End of Message pattern turns off the DMA transfer. The interaction between the CPU and the serial/network coprocessor is programmable and determines what happens on software layers.
The serial/network coprocessor provides all the hardware needed to create a ring topology network. But the single input and output on the serial/network coprocessor does not limit the F21 network, or even the serial/network coprocessor configuration to a ring. With an amplifier the serial output of one unit can be fed into any number of inputs. With the addition of an OR gate any number of outputs could be fed into one input. A ring is one topology. There is only one input and one output on each F21, and it is not bi-directional.
Whatever topology is used software is required for operation of a network. Among some of the operations in the network software are configuration and initialization of nodes, administration of the operating network, error detection and correction, and DMA and interrupt services. The network software will be layered to provide more complex protocols and services.
An F21 network is controled by software and is not limited to the serial/network coprocessor. Any node can be a bridge to another ring of F21 or to an external network. The on-chip parallel port provides an easy way to expand an F21 network with bridges between multiple rings.
The performance of the serial/network coprocessor will be limited by the quality of the data and clock signals. With an 11 bit counter on the input CLK a wide frequency range is available for a given clock. Chips are designed to connect the output on one chip to the input on another chip with wire. Any medium could be used to move the signal from one chip to another. The connects could be amplified, optical fibers, IR, or whatever. There will be a limit to the distance between units with a wire only interconnect, and it will be related to speed. Lower speed should work over longer unamplified wires within a range.
Ultra Technology provides the technical information on the hardware operation of serial/network coprocessor for those who want to program this unit. Ultra Technology will also provide software to use the serial/network coprocessors on F21 to support a network. Low level software routines will be part of the OS code in the boot ROM. The initial F21 must boot from slow 8 bit RAM/ROM memory as the bits in the Cn serial/network coprocessor configuration register do not boot up with the network live ready to do DMA and interrupts.
The first network software Ultra Technology will provide will support a ring. A master processor will assign addresses to each node and will arbitrate write control on the ring. This ring can operate in one of two modes.
The first mode is the general purpose networked Distributed Shared Memory mode. In this DSM mode most of the time there will be only one processor writing to the network at any time. In this mode the nodes are time sharing write access to the network. If the number of nodes is N and if the network is being operated at B bits per second then each processor will only be able to use B/N bandwidth. The networks software provides collision avoidance, error detection and correction, atomic write access to DSM, and the ability to remotely process an execution vector. One feature that can increase the efficiency of operation in this mode is the group and unit SOM bit assignments. This feature permits a single message with DMA and CPU interrupt to be broadcast to a group of nodes in a single message.
The second mode of operation of the ring is pipeline mode. In this mode all of the odd or even nodes in the ring would send at the same time, but only to the next unit on the ring. For some applications it is more efficient to use the extra network bandwidth in this mode. Where data is piped between execution processes this mode can provide greater network throughput. The Occam like Parallel Channel wordset can be implemented in the DSM mode, but should be able to also take advantage of pipeline mode network operation. In this mode of operation the total network bandwidth becomes BN/2.
Various hardware and software control layers will be provided in network software. At the highest levels of parallel code the network details are not visible. The programmer or compiler can specify sections of code to run in parallel configured automatically at run time for dynamic load balancing.
But of course the details of the operation of the hardware are provided for those who wish to have total control over the run time hardware.