P21Forth 1.02 Assembler
The P21Forth system offer the programmer the ability to write executable routines in the ANS compliant high level compiler. This is done using colon and other high level defining words. Alternately P21Forth also offers the programmer a built in Forth assembler for the MuP21. Since the assembly language of the MuP21 is based on Forth it is easy to learn and use.
To define a new word in assembler one needs to use the Forth word CODE. Like colon CODE takes a name from the input stream for name of the new word being defined. Words in assembler normally end with the next function which returns control to the next Forth word.
The following sequence:
CODE MYCODE ( -- )
next
END-CODE
Will define a new word in assembler called MYCODE. This
shows how CODE and END-CODE, and next are used.The MuP21 microprocessor has two small on-chip stacks in hardware. The data stack on MuP21 is 6 cells deep, and the return stack on MuP21 is 4 cells deep. There is a register that is used for memory addressing called the `A' register.
MuP21 accesses 20 bit wide cells of memory. These 20 bit wide cells can contain data or instructions. MuP21 only has 24 instructions, so these instructions may be represented with only 5 bits each. Thus a 20 bit cell in memory can contain up to four MuP21 instructions. It is thus possible for the CPU to execute these instructions up to four times faster than it can access memory. Assembler routines in MuP21 are normally written to show how the instructions are packed into words in memory for clarity.
Since P21Forth must support stacks larger than the hardware stacks provided on MuP21 the P21Forth program must maintain stacks in memory like more conventional processors.
At the start and end of all words in P21Forth there are three registers on MuP21 which must be preserved. The A `` '' register will always hold the interpreter pointer `` (IP) '' in Forth. The top of the data stack register will always hold the ``data stack pointer'' (SP). And the top of the return stack register will always hold the return stack `` pointer'' (RP). These registers must contain these things, and they are manipulated by the internals of assembler words in P21Forth.
The data and return stacks in memory in P21Forth are designed togrow upward. Each time an item is added to a stack the pointer to memory is incremented. Each time an item is removed from a stack the pointer to memory is decremented.
The P21Forth word DUP does two things. It duplicates the top item on the Forth data stack in memory, incrementing the data stack pointer in the process, and then advances to the next word in Forth. Here is a definition to do this:
\ CODE DUP ( n -- n n )
\ at the start of this word A=IP T=SP and R=RP
\ ( n -- n n ) is a stack diagram showing an item being duplcated
CODE DUP ( n -- n n ) \ create a new word in assembler called DUP
a push a! @+ \ these four instructions assemble one MuP21 memory cell
\ a push gets the IP from the A register and stores it on the
\ on chip return stack. Then a! moves SP into the A register.
\ @+ fetches the top item from the P21Forth stack in memory
\ places it on the top of the MuP21 hardware stack, and
\ increments the A register (SP).
! a pop nop \ the data stack pointer (SP) has been incremented the !
\ instruction stores a copy of what was the top of the memory
\ stack into the top of that stack. The a instruction then gets
\ a copy of the new data stack pointer and places it on the
\ top of the MuP21 hardware data stack where SP should be left.
\ pop gets the IP from the hardware return stack and then
\ then the a! instruction puts the IP back into the A register.
a! next \ go to the next Forth word
END-CODE \ end this assembler definition
Of course the comments are not needed to make this
definition work. This example is intended to show how there
are three registers that must hold certain things at the
start and end of each word written in assembler.
CODE Name Function
Transfer Instructions
00 JUMP Jump to 10 bit address in the lower 10
bits of the current word. Must be the
first or second instruction in a word
01 ;' Subroutine return. (pop the address
from the top of the return stack and
jump to it)
02 T=0 Jump if T=0
03 C=0 Jump if carry is reset
04 CALL Subroutine call. (push the address of
the next location in memory to the
return stack, and jump to the 10 bit
address in the lower 10 bits of the
current word.)
05 reserved
06 reserved
07 reserved
Memory Access Instructions
08 reserved
09 @A+ fetch a value from memory pointed to by
the A register, place it on the top of
the data stack, and increment A
0A # fetch the next cell from memory as a
literal and place it on the top of the
data stack
0B @A fetch a value from memory pointed to by
the A register, place it on the top of
the data stack, and increment A
0C reserved
0D !A+ remove the item in the top of data stack
and store it into memory pointed to by
the A register, increment A
0E reserved
0F !A remove the item in the top of data stack
and store it into memory pointed to by
the A register
ALU Instructions
10 COM complement all 21 bits in T (top of data
stack)
11 2* shift T left 1 bit ( the bottom bit
becomes 0)
12 2/ shift T right 1 bit ( the top two bits
remain unchanged)
13 +* Add the second item on the data stack to
the top item without removing the second
item, if the least signifigant bit of T
is 1
14 XOR remove the top two items from the data
stack and replace them with the result
of logically exclusively-oring them
together
15 AND remove the top two items from the data
stack and replace them with the result
of logically and-ing them together
16 reserved
17 + remove the top two items from the data
stack and replace them with the result
of adding them together
Register Instructions
18 POP move one item from the return stack to
the data stack
19 A copy the contents of the A register to
the top of stack
1A DUP copy the top of stack to the top of
stack
1B OVER copy the second item on the data stack
and make it the new top of the data
stack
1C PUSH move one item from the data stack to the
return stack
1D A! move the top of stack to the A register
1E NOP null operation (delay 10ns)
1F DROP discard the item on the top of the data
stack
The P21Forth assembler provides structured flow control.
IF, ELSE, and THEN can be used just as they would in high
level Forth code. However it should be noted that the IF in
the assembler does not remove the flag from the data stack
as does the standard high level IF. Chuck has also
introduced a similar operation -IF. -IF compiles a C=0
instruction and therefore tests for carry. -IF will execute
the code that follows if carry is set, or it will jump to
the ELSE or THEN if carry is not set. BEGIN, WHILE, UNTIL,
and REPEAT are also supported in the assembler. Chuck has
also introduced the -UNTIL which compiles a C=0 and loops
until there is carry.The next word assembles three opcodes that perform the advance to the next Forth word. This is know as the Forth inner interpreter. next assembles @A+ PUSH ; The @A+ fetches the next Forth word pointed to by the A register (IP) and increments the IP. Then the PUSH ; sequence pushes the address to the return stack and then `returns' to that address to execute the next word.
MuP21 is designed to match the hardware on the DRAM chips, which have 1K sized pages. Two addresses are on the same page if they have the same upper ten bits. Care should be taken to ensure that words written in assembler do not contain jumps or calls that are expected to go to a different page. They would not jump or call to a different page of memory with a jump or call instruction directly. A sequence like PUSH ; is needed to jump to an off page location.
There are several things to remember when coding math on the MuP21 microprocessor in assembler. Items read from memory are only 20 bits, but the CPU registers and math operations are 21 bits. The most signifigant bit is both carry bit and a valid addressing bit to memory. If the most signifigant bit (carry) is set in address used for a memory reference then the SRAM memory will be addressed. Addresses 0-FFFFF are in DRAM, but address 100000 up are in SRAM.
This means that if you if you load a 20 bit -1 (FFFFF) from memory and add it to 1 you will get 100000 which is not the same as 0. If you add 1 to a 21 bit -1 (1FFFFF) then the result will be 0 because carry will be reset. Since you cannot store a 21 bit number in memory directly it is done by complementing the number with COM then storing it into memory. When it is fetched COM is used again to reset the lower 20 bits and to set the carry bit. Since -1 is often used to decrement numbers (MuP21 does not have any auto- decrement instructions) there is a faster way to generate a 21 bit -1 than to load a literal 0 and execute a COM instruction. The instruction sequence DUP DUP XOR COM is a faster way to generate a 21 bit -1, but it also uses one extra location on the data stack.
The MuP21 uses a ripple carry mechanism on the + and +* instructions. The carry in the add will move upward through eight bits in the time of a single instruction. This means that the result of adding 1 to 1 is ready in one instruction time, as is the result of adding 127 to 127. But adding 1 to -1 would require carry to more through 20 bits in the process of the add, and this takes longer than one instruction time. To compensate for this a NOP or two may be needed before the + or +* instruction. There will be no need for a NOP if the + or +* is the first instruction in a word of DRAM. The extra delay needed to fetch the word containing the + or +* in the first instruction from DRAM will ensure that there is sufficient time for a correct result from the addition.
The amount and nature of memory access will generally be the limiting factor in the speed of execution of MuP21 programs. DRAMS can access memory on the same page in about 55ns, but memory accesses to a different page will take 150ns. For this reason it is very important to try to keep critical routines to one page of memory and if possible to let them manipulate data on the same page as the code. For this reason the default data and return stacks in P21Forth are on the same page of memory as the most frequently used words in the Forth kernel.
Chuck's code and Dr. Ting's code in the OK Operating System and the code in the OKAD application are very good examples of techniques to get the most speed from MuP21 assembler.
\ ASM.FOX Chuck Moore's 20 bit assembler for MuP21
\ modified for P21Forth Jeff Fox 10/6/94
HEX VOCABULARY ASM \ create the wordlist for the assembler
: ASSEMBLER
ALSO ASM ; \ ASSEMBLER adds ASM to wordlist
ASSEMBLER DEFINITIONS
: END-CODE \ get out of assembler
PREVIOUS
DEFINITIONS ; \ and put definitions wherever that is
VARIABLE HI \ pointer to current slot
VARIABLE HW \ pointer to current word under assembly
: ALIGN ( -- ) \ 0 1 2 3 .. 4
4 HI ! ; \ force slot pointer to overflow
: ORG ( a -- ) \ ORG to an address
DUP . CR DP !
ALIGN ; \ DP is the eForth CODE POINTER H in OK
CREATE MASK ( -- a ) \ 4 masks for 4 slots scrambled bits
AA800 , 55400 ,
32A , D5 , \ 1 CELL per mask on MuP21
\ compile pattern
: P, ( n -- )
AAAAA XOR , ; \ Patterns must be xored AAAAA
: #, ( n -- ) \ compile number
, ; \ Numbers are normal on MuP21
: ,W ( mask -- ) \ or in masked bits into word
HW @ @ OR HW @ ! ;
: ,I ( inst -- ) \ assemble instructin in one slot
HI @ \ check slot pointer in HI
4 AND \ overflow?
IF \ so align slot pointer
0 HI ! \ clear HI
DP @ HW ! \ point HW to current location of DP
0 , THEN \ move to next clear location
HI @ \ HI points to current slot 0-3
MASK + \ add offset to start of MASK table
@ AND \ AND in the mask bits
,W \ assemble instruction to current slot
1 HI +! ; \ bump slot pointer by 1 CELL
: INST ( n -- ) \ defining word
CREATE ,
DOES> @ ,I ; \ Chuck' CONSTANT DOES> is not ANSI
6A82A INST COM \ com com com com
55956 INST NOP \ nop nop nop nop
: JPI ( n -- ) \ assembler jump instruction
CREATE , \ Chuck's CONSTANT DOES> isn't ANSI
DOES> @ ( -- a )
BEGIN HI @ 2 AND \ skip slots 2 and 3
WHILE NOP \ by assembling NOPs
REPEAT \ then
,I \ assembler the branch instruction
3FF AND 3FF XOR
,W ALIGN ; \ assemble 10 bit address in slots 2 & 3
: BEGIN ( -- n ) \ start a loop structure , leave addr
BEGIN HI @ 4 AND \ check for word boundry
0= WHILE NOP \ assmbler NOPs if needed
REPEAT DP @ ;
: # ( -- n ) \ assemble a literal
99BE6. ,I , ; \ assembler the instruction n and literal
: -# ( -- n ) \ assemble a 21bit negative
FFFFF XOR # COM ; \ complement then assemble lit & add COM
: P ( n -- ) \ assemble a pattern as literal
AAAAA XOR # ;
: -P ( n -- ) \ assemble a complement pattern literal
55555 XOR # ;
AAAAA JPI JUMP A9AA6 JPI T=0
A96A5 JPI C=0 A6A9A JPI CALL
A9AA6 JPI UNTIL A96A5 JPI -UNTIL
: IF 3FF T=0 HW @ ;
: -IF 3FF C=0 HW @ ;
: SKIP 3FF JUMP HW @ ;
: THEN DUP >R >R
BEGIN 3FF AND 3FF XOR R> @ XOR R> ! ;
: ELSE SKIP SWAP THEN ;
: WHILE IF SWAP ;
: REPEAT JUMP THEN ;
9A7E9 INST @A+ 997E5 INST @A
967D9 INST !A+ 957D5 INST !A
6A429 INST 2* 69826 INST 2/
69425 INST +*
6681A INST XOR 66419 INST AND
65415 INST +
5A96A INST POP 5A569 INST A
59966 INST DUP 59565 INST OVER
5695A INST PUSH 56559 INST A!
55555 INST DROP
AA6A9 INST ;'
: next \ next macro in eForth assembler
@A+ PUSH ;' ALIGN ; \ compiles @A+ PUSH ; and ALIGNs
PREVIOUS DEFINITIONS
ALSO ASM \ CODE is a Forth word
: CODE ( -- )
HERE HEAD, REVEAL \ create header in eForth for HERE
HERE HW ! ALIGN \ start assembly at HERE
ASSEMBLER
DEFINITIONS ; \ any more defintions go into
PREVIOUS