2007-10-28 / 19:59 / dave


From the wikipedia entry

Forth parsing is simple, as it has no explicit grammar. The interpreter reads a line of input from the user input device, which is then parsed for a word using spaces as a delimiter; some systems recognise additional whitespace characters. When the interpreter finds a word, it tries to look the word up in the dictionary. If the word is found, the interpreter executes the code associated with the word, and then returns to parse the rest of the input stream. If the word isn’t found, the word is assumed to be a number, and an attempt is made to convert it into a number and push it on the stack; if successful, the interpreter continues parsing the input stream. Otherwise, if both the lookup and number conversion fails, the interpreter prints the word followed by an error message indicating the word is not recognised, flushes the input stream, and waits for new user input.

From Richard Jones’ sometimes minimal FORTH compiler assembly file

[Historical note: If the execution model that FORTH uses looks strange from the following paragraphs, then it was motivated entirely by the need to save memory on early computers. This code compression isn't so important now when our machines have more memory in their L1 caches than those early computers had in total, but the execution model still has some useful properties...

One interesting consequence of using a linked list is that you can redefine words, and a newer definition of a word overrides an older one. This is an important concept in FORTH because it means that any word (even "built-in" or "standard" words) can be overridden with a new definition, either to enhance it, to make it faster or even to disable it. However because of the way that FORTH words get compiled, which you'll understand below, words defined using the old definition of a word continue to use the old definition. Only words defined after the new definition use the new definition.

From Jones' Forth file

\\ Now we can use [ and ] to insert literals which are calculated at compile time.  (Recall that
\\ [ and ] are the FORTH words which switch into and out of immediate mode.)
\\ Within definitions, use [ ... ] LITERAL anywhere that ‘…’ is a constant expression which you
\\ would rather only compute once (at compile time, rather than calculating it each time your word runs).
: ‘:’
	[		\\ go into immediate mode (temporarily)
	CHAR :		\\ push the number 58 (ASCII code of colon) on the parameter stack
	]		\\ go back to compile mode
	LITERAL		\\ compile LIT 58 as the definition of ‘:’ word
;

Forth's design seems driven by efficiency. But as the quote says, that gives it some interesting properties.

The lack of grammer makes parsing trivial but doesn't seem to limit expressiveness.

The [ and ] words were particularly pretty. They allow Jones' trick--inserting character literals at compile time--and also peeking inside words while you're defining them with something that looks like a syntax. Similar for the ( and ) words for comments.

The uniform syntax and stack based operations also make for compact code. I guess the same can be said for Lisp/Scheme. Also like Lisp, the philosophy of Forth seems to be "here is a small set of orthogonal tools, use them to quickly shape your own universe!"

Implementing the dictionary as a linked list provides an elegant way to allow extension without breaking existing words. The unfortunate downside is that you can't use "virtual" words: have old words call new code. That keeps us from defining a debugging version of a word and watching output then deleting it and reverting to the old definition.

I played around with manually hacking a words memory to change the call from an old word (MAX) to a new one (MAXD, max with debugging output added). The memory looked right when dumped, but the new code didn't seem to be called. But Forth is so low-level I can imagine a Forth hacker could come up with a word--VIRTUAL?--that would make words late-binding.

Some quick googling turned up a few articles:

But I didn't read any in depth, mostly because for me this is

The end

So Forth is pretty neat but I don't think I'll be using it anytime soon. It's niche seems to be embedded programming or code micro-management. Neither is part of making interactive SVG graphs which is what I'm supposed to be doing.

If I want to do any more, I'd probably take a look at Factor which claims to be heavily influenced by Forth.