www.jupiter-ace.co.uk

Previous Page > Index of General Forth Information > Going Forth Again.


Computing ToDay May 83 - Going Forth Again


Computing ToDay May 83 - Going Forth Again page 45
Series: More on Forth
GOING FORTH

D S Peckett
GOING FORTH
AGAIN
With the advent of the first micro to have FORTH as its standard language we delve deeper into
this language.

Given R(i+ 1), we can then produce a number in the range 0 -(n-1), where (0 = n <65536), by:

Rn=INT(R(i+1)*n/65536)
(2)

and the more usual range of 1 - n inclusive by:

Rn=INT(R(i+1)*n/65536)+1
(3)

We are almost there, apart from one snag - FORTH normally treats numbers as 16-bit signed integers; ie they lie in the range - 32768 to + 32767. Although, as we will see, we can do intermediate unsigned arithmetic (number range 0 - 65535), the end result is always signed. Equation (1) will therefore produce negative random numbers half the time. Although we can get around this by:

R(i+1)=ABS(1509*R(i)+41) MOD 65536
(1a)

we end up with a random number in the range 0 - 32767. Equation (3) must therefore become:

Rn=INT(R(i+1)*n/32768)+1
(3a)

and n is limited to the range 0 - 32767.
My offering to do these jobs in FORTH is Screen 101 (Listing 1). This contains three main elements:


a. RSEED is defined as a variable, and is used to hold the last number generated. If you are writing in fig-FORTH (or in many versions of -79), the definition should be [0 VARIABLE RSEED ].

b. RANDOM generates the random number in the range 0 - 32767, saving the unsigned version (0 - 65535) in RSEED for future use. The number, which is R(i+ 1) in equation (la), is left at top-of-stack (TOS).


c. RND expects to read 'n' at TOS, and outputs a random number in the range 1 - n at TOS.

Both RANDOM and RND use the word [U * ], which is a standard FORTH word. to perform the
A
t the beginning of last year, I wrote a short series for Computing Today which introduced the language FORTH. At that time, the language had not received a great deal of exposure in this country but, over the last 12 months, we have seen an explosion of interest in it. FORTH packages are now freely available for virtually all micros and there is even one machine, the Jupiter Ace, which has it as its standard language. I like to think that my earlier articles had something to do with the boom, but I'm sure that they didn't.
The time has therefore come to delve a little deeper into this unusual language, and this is the first of two articles intended to do just that. This month, I will investigate how to add features which are missing from most FORTHs, such as random number generators and arrays. The article will also take in the definition of new compiling words and generally probe fairly deeply into some of the language's wierder features. Let me state right now, therefore, that this article is NOT for the FORTH tyro - it assumes a certain familiarity with the language. If you don't have that, may I recommend that you read my articles in the Jan-Apr 82 issues of Computing Today; I'm sure that the back numbers department would be pleased to help you.
You may well know that two FORTH 'standards' exist - in this article, all my programming examples will be written in FORTH-79, as implemented by, for example, Acornsoft's BBC FORTH, and on the Jupiter Ace. I will highlight, where necessary, differences between this and fig-FORTH, which appears on such micros as the VIC-20, the ATOM and the Atari 800.
Remember, though, that since FORTH is a re-definable language, it is relatively easy to make fig-look like -79, and vice-versa. Finally, a convention. Since a FORTH word can be almost any combination of characters (other than spaces), there may be times when I need to make them stand out from the text. If so, I will enclose them in square brackets - The brackets are not part of the word.

RANDOM NUMBERS

A curious omission from virtually all FORTH systems is a random number generator. This looks doubly odd when one considers that FORTH's speed makes it ideally suited to applications which demand a lot of random numbers, such as games and Monte-Carlo modelling. However, it is quite easy to extend the language to include random numbers.
Before we can do that, though, how do we generate them in the first place? There are many ways, but one of the simplest, which is good enough for most purposes although it might not please a statistician, uses the equation:

R(i+1)=(1509*R(i)+41) MOD 55536
(1)

This generates a pseudo-random number (R(i+ 1)), in the range 0-65535, using the previous number (R(i)) as its starting point.
0: ( Create a Forth random number generator Screen 101)
1:
2: VARIABLE RSEED ( Used to hold seed /last number)
4: ( Generate a random number from 0->32767 )
5: : RANDOM RSEED @ 1509 U* ( DP unsigned multiply)
6: 41. D+ ( DP add)
7: DROP ( Apply unsigned MOD 65536)
8: DUP RSEED ! ABS ; ( Save it, and leave final)
9:
10: ( Generate a random number in range 1->n. Syntax: "n RND" )
11: : RND RANDOM U* 32768 U/MOD SWAP DROP 1+ ;
12:
13:
14:
15: 
Listing 1. Creating a FORTH random number generator.

Computing ToDay May 83 - Going Forth Again page 46
Series: More on Forth
unsigned multiplication of the two unsigned numbers at TOS, leaving a DP (double-precision -four bytes, with the most significant bytes nearest TOS) unsigned answer. RANDOM's [DROP]at line 7 removes the two high bytes of the DP number, effectively performing 'MOD 65536'. The [41. D+] at line 6 performs a DP addition of two DP numbers -putting a decimal point anywhere in a number going onto the stack automatically treats it as a DP integer.
The [U/MOD] in RND is a FORTH-79 word which takes an unsigned DP number at second on the stack (20S) and divides it by the unsigned 16-bit number at TOS. The remainder (ie the modulus) is output at 20S and the quotient at TOS. The fig-FORTH word [U/] has an identical action.
The only other thing we need is a starting value for R(0) - if it always has the same value (eg 0), the same sequence of 'random' numbers will occur. Occasionally, this may be what you need but, more normally, it will be a confounded nuisance. We therefore need to emulate BASIC's 'RANDOMIZE'.
If your computer is Z-80-based, the micro's 'refresh register' (R) can be used to provide a sort of eight-bit random number. The standard assembly code for the job would be:

RANDOMISE LD A,R
LD L,A
LD H,0
LD (RSEED),HL
RET

However, the detail of implementing this will depend on your FORTH system's assembler. The easiest way would be to get the contents of R on TOS, and then move this to RSEED by, for example:

CODE RI R A LD 0 H LD A L LD PSH
: RANDOMISE RI RSEED ! ;

Not all FORTHs incorporate an assembler, however, but we can get around the problem with the [CREATE] defining word. When used in the [CREATE name] form, it makes a dictionary entry for the word [name ], with no specific action associated with it. CREATE can be used to set up a word, and the [, ] (and, in fig, the 'character' equivalent [C, ]) words used to POKE hand-assembled code into the dictionary:

HEX
CREATE RI 47ED , 26 , C36F , <nnnn> ,
DECIMAL

In this, <nnnn> is the address of the FORTH operating system routine which pushes HL onto the
stack and returns control to the interpreter.
Things are not quite so easy in a 6502-based system, because that chip does not have anything like the Z-80's R-register. Sometimes, though, there will be a real-time clock in the computer, and you can read the lowest bytes of this as a source of one-off random numbers:

: RANDOMISE clock @ RSEED ! ;

Failing everything else, how about:

: RANDOMISE CR ." PRESS ANY 2 KEYS"
KEY KEY * RSEED ! CR ;

EXTENDING FORTH

In my earlier articles, and so far in this one, we have only skimmed the surface of what FORTH can and cannot do. Let us now have a look at the somewhat esoteric subject of defining words. A FORTH system can be thought of as working on four levels:

a. Level 1. Using existing words to do their job; for example:

17 DUP DUP * * .

prints the cube of 17, but adds nothing to the language and is forgotten as soon as it is executed. b. Level 2. Using the standard defining words to add to the system. For instance:

: PRINTCUBE DUP DUP * * . ;

adds the new word PRINTCUBE to the system. In this case, the defining words(s) is [: ...; ], but other common defining words are VARIABLE, CONSTANT and CREATE. Level 2 is the normally-used level in most applications and lies at the very heart of the FORTH concept.

c. Level 3. Adding new defining words. It is possible to design new defining words which can be used at Level 2 to create whole new families of FORTH words which may, in turn, be used directly at Levels 1 and 2.

d. Level 4. Using FORTH itself to create totally new, but normally FORTH-like, languages. This is termed metaFORTH and is beyond the understanding of anyone but academics and 12 year old schoolboys.

For the rest of this article I would like to concentrate on Level 3 operations but, first, a look at what is going on during Level 1
and 2 operations.
FORTH is unusual (surely not?!) in that is both a compiled and an interpreted language. Level 1 operations show it working in its interpreted mode - a word is read, identified and then acted upon immediately. This job is done by the system's 'outer interpreter', which identifies the word, and the 'inner interpreter', which actually executes it. Before any of this is possible, however, the word must be compiled (Level 2) into a form which the inner interpreter can handle. The compilation process generates the nested pointers to successively simpler routines which make up the 'indirect threaded code' of the normal FORTH dictionary.
That compilation is the job of FORTH's defining words. Although it is easy to think of the language as having a single compiler (like COBOL, Pascal, et al), it actually has a whole series of microcompilers, each associated with specific defining words. When you use [:...; ], a totally different compiler from the one which VARIABLE, say, uses is called up.
These microcompilers are themselves written in FORTH, and the act of creating them is a Level 3 operation. The usual way of forming a new defining word in FORTH-79 is the [CREATE .. . DOES> ... ] structure while, in fig-FORTH, [<BUILDS DOSE>...].is effectively identical and may be used in all the following examples. You should also note that some FORTH-79 systems provide [ <BUILDS...DOES> ...]. For the real FORTH experts who may read this, I know that fig and -79 systems handle [DOES> ] differently, but this is invisible unless you are probing the darkest depths of the dictionary.

JOB OF
[CREATE . . . DOES> . .]

This structure allows us to define whole groups of new FORTH words which behave in an identical manner to each other, and differently from any other word. If we only want to create one word with the new behaviour, then [:...; ] will do the job but, for a whole family, use a new defining word.
As an example, it would not be difficult to set up a variable by way of a colon definition - however, it is much neater to use VARIABLE, saving space and, probably, making the program easier to follow. A FORTH variable is just a special type of word 

Computing ToDay May 83 - Going Forth Again page 47
GOING FORTH
special type of word which behaves in a specific way (ie it has two bytes to receive data and, when used, puts its address at TOS); since it is a special type, it gets its own defining word.
The [CREATE ...DOES> ... ] pair is used as below:

: name CREATE compile-time action
DOES> run-time action ;

That colon definition will produce the defining word [name]. Whenever [name] is used to define [word], the compile-time action takes place to set up the dictionary entry. Later, when [word] is used, the run-time action part of the definition occurs.
An example would probably be a good idea at this point. The FORTH-79 defining word VARIABLE is used as [VARIABLE vname] to create a 2-byte entry called [vname] which, on being used, puts its address at TOS. VARIABLE could be defined by:

: VARIABLE CREATE 2 ALLOT DOES> ;

The compile-time action is to reserve two bytes in the directory, and there is no run-time action in this case. Generally, the very act of using any word puts its address at TOS; the word's run-time action then manipulates that address. In the case of variables, all we want is their address at TOS.
The corresponding fig-FORTH definition (and the one in many -79 implementations) uses VARIABLE as [n VARIABLE vword] to initialize vword to the value 'n'. This has the definition (fig, remember):

: VARIABLE <BUILDS , DOES> ;

where the [,] puts 'n' into the dictionary.
Let's now think of a new sort of variable - one which puts its value at 20S and its address at TOS, whenever used. It will be set up, using the defining word PVARIABLE, by [n PVARIABLE pyword].
To define PVARIABLE:

: PVARIABLE CREATE , DOES>
DUP @ SWAP ;

This time, whenever Epywordl is executed, it has the run-time action [DUP @ SWAP] to put its value, as well as its address, on the stack.

ARRAYS IN FORTH

Those were fairly trivial examples. We will now take a look at a classic use of [CREATE
DOES> . . .], which is to produce words which can define arrays, a data structure oddly missing from FORTH.
We will take two basic cases -a single-dimension (1-D) array, and a two-dimensional (2-D) array - but the method can, naturally, be extended to any number of dimensions. The first case will allow us to create an array [name] with (n+ 1) cells (0 to n) by using [n ARRAY name], subsequently putting the address of cell 'p' at TOS with [p name ].
The 2-D array, of size (x+ 1) *(y+ 1), will be set up by [x y 2ARRAY 2 name], with [p q 2name] putting the address of cell (p,q) on TOS.
Note that, in both these cases, the subscripts start from zero and go up to 'x' or 'y' as appropriate. The technique is therefore analogous to BASIC's 'DIM name(x), 2name(x,y)', in that the dimension(s) define the highest permitted subscript(s).
The two new defining words are set up in Screen 102 (Listing 2). The compile-time behaviour of ARRAY is simple - it merely reserves 2 *(x+ 1) bytes in the dictionary for the array. Its runtime behaviour, after DOES>, is nearly as simple, as long as you remember that executing [p name] will, before DOES> gets to work, leave 'p' at 20S and the address of [name l's zero cell at TOS. These two are simply swapped, 'p' is doubled, and the result is added to the base address to give the desired address. It's as simple as that.
At first glance, 2ARRAY is rather more complex, but it is not really. At compile-time, before any space is reserved for the array itself, the value of (y+ 1) is saved at the start of [2namel's dictionary entry - it will be needed at execution time. The system then reserves an additional 2 *(x+ 1) *(y+ 1) bytes to hold the array.
At rug-time, the first action is to save a copy of the address of [2name] (actually, this is the
address where (y+ 1) is saved) on the stack for later use. The system then extracts the location
of cell (p,q) from the formula:

(address of [name] )+2+2*(p*(y+1)+q)

This formula is necessary because data is stored, from low memory, in the sequence:

(0,0),(0,1)...(0,y),(1,0),(1,1)...
(1,y)...(x-1,y),(x,0)...(x,y)

You should now be able to see why we had to save (y+ 1) at compile-time. If you are still a little confused, try sketching out what is on the stack as every word in lines 12-14 of Screen 102 is executed.
Note two things about these two new defining words:

a. They do not initialize an array's contents when it is created.
b. They do no checking of subscript limits.

The first point may or may not be important to you - in any case, it is fairly simple to remedy at compile-time (how?). The second point is consistent with FORTH's philosophy of simple, high-speed, code but could cause real problems. For instance: [50 20 name H when [name] had been defined as a 15-element array, would hopelessly corrupt the system dictionary by setting a pair of bytes outside the array's bounds to the value 50, At best the result would be confusing, but it would more likely be catastrophic.
However, why not define new defining words ARRAYCHK and 2ARRAYCHK which act exactly like ARRAY and 2ARRAY, with the addition that they check subscripts for validity before doing anything else at run-time?
Screen 103 (Listing 3) does just that for ARRAYCHK. The compile-time action is very similar to that of ARRAY, but also saves (x+ 1) at the start of the dictionary entry for use in subscript checking.
At run-time, however, the behaviour is much more complex. First of all, the top two items on the stack ('p' and the address of [name] ) are duplicated; having  
0 : ( Defining words for non-checking arrays      Screen 102       )
 1: 
 2: ( Set UP a 1-dimensional array, with cells 0->n                )
 3: (          Create with "n ARRAY name"                          )
 4: (          Get address of cell "P" with "p name"               )
 5: : ARRAY  CREATE 1+ 2 * ALLOT      ( Save space in dict.        )
 6:          DOES> SWAP 2 * + ;    ( Get address of cell "P"       )
 8: ( Set UP a 2-dimensional array, with cells 0->x, 0->Y          )
 9: (          Create with "x y 2ARRAY 2name"                      )
10: (          Get address of cell "p,q" with  "P q 2name"         )
11: : 2ARRAY CREATE 1+ DUP  SWAP 1+ * 2 * ALLOT  (   Save space    )
12:            DOES> DUP  >R  (Get address of start of array       )
13:            @ ROT * + 2 *        (Get offset of "p,q"           )
14:            R>.+  2+ ;        (and work out its address         )
15:
Listing 2. Setting up new defining words for non-checking arrays.

Computing ToDay May 83 - Going Forth Again page 48
Series: More on Forth
GOING FORTH
stack ('p' and the address of [name]) are duplicated; having done that, (x+ 1) is pulled out of the dictionary entry and checks are made that 'p' is less than zero, and not more than 'x' (line 12). If the subscript is OK, line 11 extracts the address of cell 'p'. If, however, a fault occurs, an error message is printed; ABORT then clears the calculation and return stacks and shuts the system down. As you see, ERMESS uses the values of 'p' and [name ]'s address left on the stack to show the fault in detail.
In essence, 2ARRAYCHK, which is defined in Screens 104 and 105 (Listing 4), behaves in just the same way but, inevitably, it is rather more complex. Screen 104 simply sets up the error messages, which report x- and y-subscript errors separately. In Screen 105, 2ARRAYCHK itself is defined. Its compile-time behaviour, defined in lines 2 and 3, is straightforward; (x+ 1) and (y+ 1) are saved at the beginning of [2name ]'s dictionary entry, and 2 *(x+ 1) *(y+ 1) additional bytes are reserved for the data which will fill the array.
Lines 4-11 of Screen 105 define the run-time behaviour. First of all, a copy of [2name ]'s start address is saved for future use, and then 'p' is checked to ensure that it lies in the range 0-x inclusive. If it does not, an error message (1ERMESS) is invoked at line 10 and the system shuts down. If the x-subscript is OK, line 6 recovers (y+ 1) from the dictionary entry and tests that 'q' is in the range 0-y inclusive. A failure produces the 2ERMESS error message and, again, shuts the program down.
Finally, if both subscripts are valid, line 7 puts the address of cell (p,q) on TOS, using the formula:

(start address of [2name] )+4
+ 2*(p*(y+1)+q)

Simple, isn't it? As before, if you cannot quite follow part of the coding, try writing down the stack contents at every stage.
Now we come on to one of FORTH's most dramatic benefits. If a program uses arrays a great deal, the limit checking of ARRAYCHK, 2ARRAYCHK and their ilk will obviously slow it down considerably; furthermore, once the program is fully debugged, the checks are normally redundant. On the other hand, it is vital for your peace of mind to check the subscripts during program development.
The remedy is both elegant and simple. At the start of program development, define ARRAY and 2ARRAY - which will have to be specified at the start of the program anyway - but use the checking code of ARRAYCHK and 2ARRAYCHK. Write and debug the program in the secure knowledge that the system will tell you if a subscript is wrong.
When the program is finally debugged, go back to its start and edit ARRAY and 2ARRAY to the simpler, non-checking, form. Having done that, re-compile the program and all array subscript checking will be eliminated with no further action on your part. Simply by altering the definitions at the start, the whole program behaviour can be altered, since you can define any word to act in any way you like. That change will affect all subsequent definitions which use that word, with no further action on your part. With FORTH, you really can have your cake and eat it.


CONCLUSION
In this article, I have covered two main aspects of using FORTH. First of all, and by way of warming up, I have shown how a random number generator can be added to almost any system in order to increase its usefulness.
I then considered in some detail how to influence the way in which a FORTH system works, by adding completely new classes of words to it, using the [CREATE ...
DOES> ... ] (or [ <BUILDS... DOES> ... ] in fig-FORTH) structure. I used the classical demonstration of 1-D and 2-D arrays, but there is no reason why the same approach cannot be used for more complex data structures, or for anything else you may think of. Although FORTH does very little error checking at run-time (certain implementations, such as the Jupiter Ace, can, however, do quite a lot) it is relatively straightforward to add such checking to suit your own needs. Once they have done their job, they can be very simply removed, with benefit to run-time, etc.
Next month, I will survey the state of the FORTH market .
 0: ( Defining word to create a 1-D array; and                Screen 103 )
 1: ( to check subscript bounds for validity                             )
 2: ( Use it exactly like "ARRAY"                                        )
 4: ( Check that 0 <= 208 < TOS                                          )
 5  : +< OVER > SWAP -1 > AND ;                                          )
 6:
 7: ( Error message for a faulty subscript                               )
 8: : ERMESS CR SWAP ." INDEX OF" . ." NOT PERMITTED. RANGE: 0 -"        )
 9:          @ 1- . CR ;                                                 )
10: ( Now we can define the defining word                                )
11: : ARRAYCHK CREATE 1+ DUP , 2 * ALLOT          ( Save size, make room )
12:            DOES> OVER OVER @ +<              ( Not greater than max? )
13:                    IF SWAP 2 * + 2+              ( Ok: - get address )
14:                    ELSE ERMESS ABORT         ( otherwise print error )
15:                    THEN ;                                            )
Listing 3. Defining words for creating and checking 1-D arrays
 0: ( Defining word to create a 2-D array, and                 Screen 104)
 1: ( to check both its subscripts for validity.                         )
 2: 
 3: ( use it exactly like "2ARRAY"                                       )
 4: 
 5: ( Error messages for faulty subscripts                               )
 6: ( First of all, the common part of the messages                      )
 7: : PRTERR ." -INDEX OF" . . " NOT PERMITTED. RANGE: 0 -" ;
 9: ( Out-of-range X-index, disregarding Y-index                         )
10: : 1ERMESS  CR ." X" PRTERR @ 1- . CR ;
11: 
12: ( Now, 1.n out-of-range Y-index, if X is OK:                         )
13: : 2ERMESS  CR  ." Y" PRTERR DROP 2+ @  1- . CR ;
14: 
15:                                                               105 LOAD 

 0: ( Now go on to actually set up the defining word           Screen 105)
 1:
 2: : 2ARRAYCHK  CREATE OVER 1+ , 1+ DUP ,           ( Save Xmax and Ymax)
 3:                     SWAP 1+ * 2* ALLOT                   ( Save space)
 4:             DOES> DUP >R                          ( Save base address)
 5:                      ROT OVER 0VER SWAP @ +<          ( Check X-index)
 6: ( OK - so Y-index)    IF ROT R> DUP >R 2+ @ 0VER SWAP +<
 7: ( Get address of cell)    IF SWAP R> 2+ @ * + 2 * 4 + + 
 8:                           ELSE 2ERMESS ABORT       ( Y-index oversize)
 9:                          THEN
10:                     ELSE lERMESS ABORT             ( X-index oversize)
11:                     THEN ;
12: 
13: 
14: 
15:
Listing 4. Defining words for creating and checking 2-D arrays.



Computing ToDay May 1983 cover

Computing ToDay May 1983 page 45    Computing ToDay May 1983 page 46

Computing ToDay May 1983 page 47    Computing ToDay May 1983 page 48


Valid HTML 4.01!