Adventures in Forth, part 6 (email from Richard Jones)
2007-11-13 / 13:10 / dave
from Richard Jones
to Nick Danger
date Nov 12, 2007 6:22 AM
1- is a separate word in some FORTHs because it saves memory and can
be faster.
If your word contains '1 -' then that is represented in the compiled
description of the word as:
LIT 1 -
say, 4 * 3 = 12 bytes on a 32 bit processor.
But 1- is a single cell, so 4 bytes.
The performance story is more complicated, but it's probably going to
be faster: LIT is an assembler primitive which pushes the next word on
the stack, then - is another assembler primitive which subtracts the
top two stack entries. If 1- is written as an assembler primitive
then it can be just a single assembler instruction to decrement the
top of stack. Furthermore because of the way that branch prediction
tables are implemented in modern processors it makes sense to reduce
the total number of FORTH words executed in your program, since modern
processors aren't designed to predict the returning indirect jmp from
a FORTH word[1].
On the other hand if you decide to go over the top with these
optimized primitives (1-, 2-, 3-, 4-, ..., 1+, 2+, 3+, 4+, etc.) there
is a danger that your code will get larger than the L1 or L2 cache
which can result in terrible performance penalties. Or in the
embedded space your code might become larger than available RAM, so it
doesn't run at all.
Similarly in early compilers like FIG FORTH, the smaller numbers (1 2 3)
were represented as words written in assembler, and the reason again
is for space (4 bytes vs 8 bytes) and possibly for speed. One
interesting thing about writing these small numbers as words is that
it doesn't require any code changes (unlike 1 - vs 1- where you
obviously need to change your code to realise the benefit of the
"optimised" 1- word).
Rich.
[1] But it would be possible to modify them to do so, assuming that
FORTH suddenly became really popular and processor designers decided
to optimize for it -- ie. don't hold your breath:
http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz .
--
Richard Jones
Red Hat
from Nick Danger to Richard Jones date Nov 13, 2007 6:57 AM Hi Rich, Thanks for the info. I did find out that "1-" is a unique word by playing around (& confirmed by reading your compiler code), but hadn't considered the space factors. You mention FIG making small #'s into words for a space savings. Is this savings simply because the compiler doesn't insert LIT? Do you mind if I post your message as an update to the blog post? Cheers, Dave
from Richard Jonesto Nick Danger , date Nov 13, 2007 7:19 AM On Tue, Nov 13, 2007 at 06:57:06AM -0500, Nick Danger wrote: > Thanks for the info. I did find out that "1-" is a unique word by > playing around (& confirmed by reading your compiler code), but hadn't > considered the space factors. You mention FIG making small #'s into > words for a space savings. Is this savings simply because the > compiler doesn't insert LIT? Yes. Imagine a 32 bit processor, and say that the pointer to the codeword of LIT is 0x1234, and the pointer to the codeword of 1 (the word, not the number) is 0x4320. Then the compiled representations are: 0x1234 0x1 LIT 1 and: 0x4320 1 You can also do evilness like defining the word 1 to push 2 on the stack and cackling insanely while programmers try to work out what is going on. > Do you mind if I post your message as an update to the blog post? No problem. Rich.
