2007-11-13 / 13:10 /


Heard from the man himself:

from	Richard Jones
to	Nick Danger
date	Nov 12, 2007 6:22 AM

1- is a separate word in some FORTHs because it saves memory and can
be faster.

If your word contains '1 -' then that is represented in the compiled
description of the word as:

       LIT     1       -

say, 4 * 3 = 12 bytes on a 32 bit processor.

But 1- is a single cell, so 4 bytes.

The performance story is more complicated, but it's probably going to
be faster: LIT is an assembler primitive which pushes the next word on
the stack, then - is another assembler primitive which subtracts the
top two stack entries.  If 1- is written as an assembler primitive
then it can be just a single assembler instruction to decrement the
top of stack.  Furthermore because of the way that branch prediction
tables are implemented in modern processors it makes sense to reduce
the total number of FORTH words executed in your program, since modern
processors aren't designed to predict the returning indirect jmp from
a FORTH word[1].

On the other hand if you decide to go over the top with these
optimized primitives (1-, 2-, 3-, 4-, ..., 1+, 2+, 3+, 4+, etc.) there
is a danger that your code will get larger than the L1 or L2 cache
which can result in terrible performance penalties.  Or in the
embedded space your code might become larger than available RAM, so it
doesn't run at all.

Similarly in early compilers like FIG FORTH, the smaller numbers (1 2 3)
were represented as words written in assembler, and the reason again
is for space (4 bytes vs 8 bytes) and possibly for speed.  One
interesting thing about writing these small numbers as words is that
it doesn't require any code changes (unlike 1 - vs 1- where you
obviously need to change your code to realise the benefit of the
"optimised" 1- word).

Rich.

[1] But it would be possible to modify them to do so, assuming that
FORTH suddenly became really popular and processor designers decided
to optimize for it -- ie. don't hold your breath:
http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz .

--
Richard Jones
Red Hat

from	Nick Danger
to	Richard Jones
date	Nov 13, 2007 6:57 AM


Hi Rich,

Thanks for the info.  I did find out that "1-" is a unique word by
playing around (& confirmed by reading your compiler code), but hadn't
considered the space factors.  You mention FIG making small #'s into
words for a space savings.  Is this savings simply because the
compiler doesn't insert LIT?

Do you mind if I post your message as an update to the blog post?


Cheers,
Dave

from	Richard Jones 
to	Nick Danger ,
date	Nov 13, 2007 7:19 AM

On Tue, Nov 13, 2007 at 06:57:06AM -0500, Nick Danger wrote:
> Thanks for the info.  I did find out that "1-" is a unique word by
> playing around (& confirmed by reading your compiler code), but hadn't
> considered the space factors.  You mention FIG making small #'s into
> words for a space savings.  Is this savings simply because the
> compiler doesn't insert LIT?

Yes.  Imagine a 32 bit processor, and say that the pointer to the
codeword of LIT is 0x1234, and the pointer to the codeword of 1 (the
word, not the number) is 0x4320.  Then the compiled representations
are:

       0x1234  0x1
       LIT     1

and:
       0x4320
       1

You can also do evilness like defining the word 1 to push 2 on the
stack and cackling insanely while programmers try to work out what is
going on.

> Do you mind if I post your message as an update to the blog post?

No problem.

Rich.