As you learn Forth, it learns
LIKE ANY computer language, Forth has a vocabulary - but this vocabulary can be broadened by the programmer. Each new word added is defined in terms of older words already known to the system. Thus, increasingly-powerful words can be formed in hierarchical fashion. In this sense, Forth bears a strong resemblance to human knowledge.
As we all know, the single most popular language today is Basic. The reason for this is not just that Basic is a powerful programming language but is mainly due to the fact that Basic is a complete programming system. A Basic program can be constructed and made to work in a fraction of the time taken by compiling languages such as Fortran or Pascal.
The big problem is that the system never learns anything from your Basic program. Each program line has to be interpreted by the system and if a line is encountered again and again, it must be interpreted again and again. Thus programming speed is obtained at the expense of execution speed.
Forth, on the other hand, is both an interpreter and a compiler - but not the kind of compiler which produces massive amounts of machine code. Each word compiled into Forth is, in fact, a compact list of subroutines. Each of these subroutines will be a list of more subroutines and so on, down the tree, until machine code is encountered.
The resulting structure, called "Threaded code", is not as fast as pure machine code, but is constructed quickly and easily. When you are familiar with Forth, it is just as quick to write as Basic, just as easy to debug, yet runs between 10 to 20 times as quickly.
Basic does not stand alone in its class; there have been similar languages such as Focal. Likewise, there are other self-extending languages, like Forth, but which do not necessarily use exactly the same words or exactly the same syntax. They are called Threaded Interpretative Languages or TILs. Forth is the original threaded language and was developed originally by Charles Moore.
The impetus to investigate Forth arose from the August 1980 issue of Byte which was completely dedicated to articles about the language. I understand that particular issue of Byte is completely sold out and highly sought-after. I then whetted my appetite with the Pet version of Forth called FullForth.
The quintessence of Forth is that it learns from its master - Forth is extensible. David Sands reports.
Keen to try something more serious, 1 obtained a copy of Stackworks Forth called SL5, running under CP/M for
North Star. This Forth is some 10 times faster than North Star Basic. For example the quick test:
10 FOR X = 0 to 10000: NEXT
takes about eight seconds in North star Basic. A corresponding code for Forth is DECIMAL
: TEST 10000 0 DO LOOP;
This runs in about 0.8 seconds in SL5.
As the full potential of Forth became apparent, I became dissatisfied also with my Stackworks SL5. One of the major advantages Forth will bring to the microcomputer industry is the ability to write ROMable code for such applications as automatic testing, or process control. It can write this code quickly and, therefore, cheaply without undue run-time penalty.
However, Stackworks has obviously had some difficulty in making its Forth produce ROMable code. The main obstacle was the CP/M under which SL5 runs. I decided I needed a Forth to run under North Star DOS. This may surprise CP/M enthusiasts, but North Star DOS, although primitive compared to CP/M, has many advantages, for the following reasons:
The sophistication of CP/M is achieved with more code and more disc activity and so takes longer to run. Equivalent disc activity under North Star DOS is approximately twice as fast as CP/M.
• With North Star DOS, it is possible to take snapshots of any part of memory and put them on any part of the disc and vice versa. This is not possible with CP/M where one is forced to use the text-processing area and the indexed self-arranging disc allocation.
• With North Star DOS, is is also possible to keep more than one program in RAM at the same time and to jump back and forth between them without unnecessary disc activity.
• Because of the requirement in CP/M for a text processing area at 100 Hex and a small CP/M jump table in the first page of memory, it is very difficult to write ROMable code which starts from zero. Z-80 and 8080 microprocessors re-set to zero, so it is most important to have all ROMable code at the beginning of memory space.
• North Star DOS can be re-located anywhere in memory so as not to interfere with the new system generated.
• Stackworks Forth uses the CP/M text editor. This text editor is far from brilliant and text produced must be stored on disc and Forth reloaded before the text can be tested. When one is working interactively with a computer, perhaps trying to debug
something tricky, these delays are intolerable. It is essential to have rapid communication between the program-generation and the program-testing parts of a system - otherwise you might as well stay with Basic.
In addition to all these reasons was the nagging awareness of the fact that Stackworks SL5, CP/M and indeed North Star Basic are all written in 8080 code and so fail to use some of the powerful features of the Z-80 microporcessor. So, when I laid my hands on a copy of the Byte book, Threaded Interpretative Languages by R G Loeliger, I had no excuse not to write my own version of Forth to run under North Star DOS.
This I accomplished in a matter of days using Allen Ashley's excellent PDS assembly-language development system, which includes a superb text editor and macro assembler, and a debugging utility which must be the finest available anywhere for the Z-80 microprocessor. The system also runs under North Star DOS.
Loeliger clearly had no idea how to write arithmetic routines - particularly division - so I borrowed some very neat arithmetic from Tiny Basic. I also changed many routines to comply with popular Forth practice and added some of the features of Stackworks SL5.
I would not advise anyone with a faint heart to tackle the same task but one advantage of being left completely at sea is that you are forced to understand fully what you are doing. The final result of all these labours is an implementation of Forth in true Z-80 language, using all the Z-80 registers.
It runs twice as fast as Stackworks Forth performing the quick test in 0.4 seconds - 20 times faster than North Star Basic. Disc access is also quicker between two and three times faster than CP/M SL5. On top of all that, the editor - which is Allen Ashley's text editor from the PDS development system Forth, DOS and a machine-code monitor can all be co-resident in the computer at the same time.
I can build a program in the editor, jump quickly to the threaded interpreter to test it, and then jump straight back into the text editor to make alterations or additions finding the text completely intact and all text pointers in their original positions.
The borderline between the Forth program area and RAM area is clearly defined and the finished, extended Forth program can be saved en bloc on disc and impressed in EPROM at a moment's notice.
Practical Computing August 1981 page 92
If you know only Basic there are two fundamentally new concepts to get to grips with.
All data is handled on a datastack: that is not to say that there are no variables - there can be any number and with any names - but to move a value from one variable to another or to perform arithmetic, data must first be moved to the stack.
Arithmetic, relational operators and all activity using data must be written in reverse Polish notation.
Here are some examples.
Now the value 12 is on the stack.
. 12 OK
The user hits return after the full stop which means print what is on the stack and remove it from the stack. Thus if we try to print again, we obtain
. STACK UNDERFLOW ABORT
This error occurred because we tried to print something from the top of the stack when there was nothing there - the value 12 had already been removed by the previous print operation. Let us now do some arithmetic:
The value 12 is on the stack
Now the value 7 is on the stack and the previous value 12 has been pushed down to second position on the stack
Now the 7 and the 12 have been multiplied together and the two of them have been replaced by the answer which occupies the top and only position of the stack.
. 84 OK
The user hits return after the full stop, this popped off the top of the stack and printed it leaving the stack empty. Now all that could have been written on one line as follows:
12 7 * . 84 OK
As you can see, this is reverse Polish notation. In this example, the asterisk, meaning multiply, and the full stop, meaning print, are words. There may be many words in a shredded interpretative language. These are classified into various types.
Terminal Words including the full stop for print; CR for carriage return; KEY which is used for having a key pressed by the user; putting its ASCII value on to the stack - equivalent to Pet's GETDIR function; EMIT which sends the value on the top of the stack as an ASCII character to the screen.
Stack words including arithmetic operators and relational operators such as > < = ;DP which duplicates what is on the top of the stack; SWAP which exchanges the top two values on the stack.
System Variables and commands such as DP, dictionary pointer, which is a variable pointing to the first free memory location available; decimal and Hex which convert the number system to Basel0 or Basel6 respectively.
Defining words; these are the words used by the programmer to create
new words out of combinations of words which already exist.
When creating a new word, all the words used in the definition must be looked up in the dictionary. The addresses of the code associated with these words is then incorporated in the code of the new word. The paramount defining word is a colon. Let us look at the following example of a simple but useful new word.
We have seen how the print command removes the value on the top of the stack to print it. Suppose we want to print the value on the top of the stack without removing it - then we would have to duplicate it first as follows:
and every time we wanted to see what is on top of the stack we would have to write the same thing. To save time, let us define a new word P to duplicate and print the top of the stack
: P DUP . ; OK
The semicolon marks the end of the definition. You could have put as much code as you wanted between the colon and the semicolon. Even if it does not fit on one line, you can use more than one line, in fact, it is good practice to break a definition up into smaller parts on separate lines. Each line is compiled into your new definition and the compiling does not stop until you finally type the semicolon.
Constants are created with the word Constant:
12 CONSTANT FRED OK
creates a constant called Fred with the value 12. To use the constant simply type its name and its value goes to the stack for example,
FRED 5 * . 60 OK
Variables are created in a similar fashion, e.g.,
8 VARIABLE BERT OK
This creates variable Bert with an initial value of 8. When Bert is used, only the address of the variable is put on to the stack and a second word " @ " must be used to change this into the actual value.
puts the address of Bert on top of stack, TOSQ.
BERT @ OK
puts the value contained in Bert on top of stack.
If Bert were, for example, a pointer to a memory location you would have to use the @ sign twice:
BERT @ @ OK
To store something in your variable, the exclamation mark is used:
5 BERT ! OK
Here 5 is pushed to top of stack, TOS, and then Bert the address of the variable Bert is pushed to stack moving the value 5 down one level. The exclamation mark stores the second stack entry in the address at the top of stack.
If Bert were, for example, a pointer to a memory location then to store 5 in the actual memory location you would
have to write
BERT @ ! OK
Once the techniques used for @ , to get, and !, to put, have become habitual, programming becomes more fluent. Fewer variables are required than in other languages. There is a distinction between variables which have a specific meaning throughout a program and variables which are used for scratch-pad calculations only: for example, the odd Q1 Q2 X Y, etc., which creep into parts of Basic programs.
These scratch variables are not required in Forth because all calculations and operations take place on the stack. Such operations are obviously very much quicker than having to look up variables in a symbol table, but even when variables are in use in Forth, they are compiled into code blocks as explicit addresses and so eliminating further need for a symbol table.
One further compiler directive which must be mentioned is Forget. Since there are no line numbers to shuffle and no way of knowing whether the word you want to delete has become incorporated into high-level words, writing Forget (name of word) deletes not only the word but all words added subsequently to the creation of that word. So, if you created Fred, Bert and then Bill, Forget Fred would also delete Bert and Bill.
All branches and loops in Forth must be conditional as in Pascal and other structured languages. Goto does not exist. As always, the conditions for loops and branches must precede the instruction in reverse Polish notation. For example, the syntax for if is as follows:
(condition) IF (function) THEN
(condition) IF (function if true)
ELSE (function if false) THEN
Counting loops equivalent of For-Next in Basic are accomplished with the DO loop, syntax:
(upper limit) (starting value) DO (function) LOOP
The indefinite loop with a condition for leaving is
BEGIN END . syntax :
BEGIN (function) (condition for ending) END
Notice that the test is performed at the end of the loop. If the test is required at the beginning of the loop, the following syntax is used
BEGIN (condition for staying in the loop) WHILE (function)
To clarify all these different forms of syntax, here is an example definition which sorts key strokes into a disc buffer at 4000 Hex in memory:
UNBUF INPTR ( MAKES THE POINTER TO START OF RAM)
BEGIN KEY DUP EMIT ( GET THE KEY & ECHO)
DUP OD < > WHILE( WHILE IF ISN'T A CR *)
DUP 8 = IF ( IS IT A BACK SPACE')
DROP 20 EMIT 8 EMIT ( IF SO RUB OUT ONE CHARACTER)
INPTR + ( AND BACK UP THE POINTER)
Practical Computing August 1981 page 94
(continued from previous page)
ELSE (ELSE NOT A BACK SPACE)
INPTR @ C! ( STORE THE CHARACTER)
I INPTR ( AND INCREMENT THE POINTER)
THEN REPEAT ( AND DO IT AGAIN)
3 INPTR @ C' , ( FINALLY STORE AN ETX CHARACTER)
( DOCUMENTATION AND DISC ACCESS OK
Because what you type is compiled immediately into the language, there is no facility to extract from the language afterwards an equivalent of what you originally typed. The language itself grows and grows and meanwhile, all that you have typed has long since scrolled off the top of the screen.
Forth is, therefore, not self-documenting and it is essential to keep careful notes or to arrange your printer to print everything on the screen. Better still, keep all your text on a disc file before compiling.
Normally, Forth disc space is divided into segments called screens. Each screen, as its name implies, can be viewed in its totality on the computer screen, can be individually edited and re-saved on to disc. In a file-orientated system, the file names are used and files can be of any length.
The screen or file contains text which, when loaded, is read by the system exactly as if it has been typed by the programmer. Each screen or file can end with a command to load another screen or another file. For storage of data, disc space is treated as virtual memory. That is, it is treated as a single continuous random-access file.
The quick tests mentioned earlier were perhaps a little unfair because
poor old Basic has to do all its counting in floating-point arithmetic, whereas Forth works with 16-bit integers. In by far the majority of Basic programs in business applications, games, graphics, etc., most of the For loops are integer counts anyway and the majority of them have a unity step size.
Screen addressing, disc addressing, computation with dates, alphabetic sorting etc., could all be managed comfortably and would run much more quickly with integer arithmetic. Of course Forth still runs faster than integer Basic.
Where floating-point computation is required, high-level code must be introduced. FullForth for Pet has the appropriate coding on disc screens. New words for example, F+ ,F* etc., control the floating-point arithmetic performed solely with variables whose addresses are put on to the stack. Floating-point variables use more than two bytes.
Up to this point, 8080-style 16-byte arithmetic has been very useful but, from now on, its processors like the 6502 which perform better. Having defined floating-point add, subtract, multiply and divide, we can now go on to transcendental functions which are accomplished using series expansions.
First, we define the factorial and then power series, exponential and logarithms.
From there, we can go on to
complete sine, cosines and tangents which in turn use powers and factorials in their series.
The most common source of error is misuse of the stack. For example, adding more than you remove will cumulatively fill the stack until overflow occurs. Likewise, occasionally removing from the stack one value more than has been put on will eventually result in stack underflow.
In immediate mode, stack underflow is detected and an error message is given, but in compiled mode, stack errors can be fatal. After such an error you will probably have to re-boot your entire system.
While writing a TIL, it is easy to see why the temptation to guard automatically against this error should be avoided. Forth relies on its stack words for its high-speed computation. So, if you built stack safeguards into every little stack word, you would easily slow the system down by some 50 percent.
Other ways of crashing the system are to load a screened or disc file which has no end of file marker; to load disc data into the middle of Forth; to Move a data block from one part of memory to the middle of Forth; to misuse a constant or a variable in such a way as to store data in the Forth code; to misuse or upset the special system variables such as DP, dictionary pointer.
All the software mentioned in this article can be obtained from Intelligent