2007-11-13 / 13:10 / dave
from Richard Jones to Nick Danger date Nov 12, 2007 6:22 AM 1- is a separate word in some FORTHs because it saves memory and can be faster. If your word contains '1 -' then that is represented in the compiled description of the word as: LIT 1 - say, 4 * 3 = 12 bytes on a 32 bit processor. But 1- is a single cell, so 4 bytes. The performance story is more complicated, but it's probably going to be faster: LIT is an assembler primitive which pushes the next word on the stack, then - is another assembler primitive which subtracts the top two stack entries. If 1- is written as an assembler primitive then it can be just a single assembler instruction to decrement the top of stack. Furthermore because of the way that branch prediction tables are implemented in modern processors it makes sense to reduce the total number of FORTH words executed in your program, since modern processors aren't designed to predict the returning indirect jmp from a FORTH word. On the other hand if you decide to go over the top with these optimized primitives (1-, 2-, 3-, 4-, ..., 1+, 2+, 3+, 4+, etc.) there is a danger that your code will get larger than the L1 or L2 cache which can result in terrible performance penalties. Or in the embedded space your code might become larger than available RAM, so it doesn't run at all. Similarly in early compilers like FIG FORTH, the smaller numbers (1 2 3) were represented as words written in assembler, and the reason again is for space (4 bytes vs 8 bytes) and possibly for speed. One interesting thing about writing these small numbers as words is that it doesn't require any code changes (unlike 1 - vs 1- where you obviously need to change your code to realise the benefit of the "optimised" 1- word). Rich.  But it would be possible to modify them to do so, assuming that FORTH suddenly became really popular and processor designers decided to optimize for it -- ie. don't hold your breath: http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz . -- Richard Jones Red Hat
from Nick Danger to Richard Jones date Nov 13, 2007 6:57 AM Hi Rich, Thanks for the info. I did find out that "1-" is a unique word by playing around (& confirmed by reading your compiler code), but hadn't considered the space factors. You mention FIG making small #'s into words for a space savings. Is this savings simply because the compiler doesn't insert LIT? Do you mind if I post your message as an update to the blog post? Cheers, Dave
from Richard Jones
to Nick Danger , date Nov 13, 2007 7:19 AM On Tue, Nov 13, 2007 at 06:57:06AM -0500, Nick Danger wrote: > Thanks for the info. I did find out that "1-" is a unique word by > playing around (& confirmed by reading your compiler code), but hadn't > considered the space factors. You mention FIG making small #'s into > words for a space savings. Is this savings simply because the > compiler doesn't insert LIT? Yes. Imagine a 32 bit processor, and say that the pointer to the codeword of LIT is 0x1234, and the pointer to the codeword of 1 (the word, not the number) is 0x4320. Then the compiled representations are: 0x1234 0x1 LIT 1 and: 0x4320 1 You can also do evilness like defining the word 1 to push 2 on the stack and cackling insanely while programmers try to work out what is going on. > Do you mind if I post your message as an update to the blog post? No problem. Rich.