Adventures in Forth, part 4 (poking around the stack)
2007-10-28 / 13:51 / dave
Finally made it through Jones’ Forth compiler, both the assembly and forth. Surprisingly accessible, given my complete lack of x86 assembly knowledge.
I’ll type up some general thoughts later, but here are a few functions I wrote to look at the stack while playing around. I don’t claim these functions are Forthonic* or even useful–an equivalent may already exist:
(In gforth v0.6.2. Bold Indicates input, normal text is output.
( empty stack and print all values top-down ) ok : empty_stack ( -- clears stack ) begin depth 0> while . repeat ; ok ( works on empty stack ) ok empty_stack ok ( prints stack ) ok 1 2 3 4 5 empty_stack 5 4 3 2 1 ok ( stack is now empty ) ok . 2142846968 *the terminal*:12: Stack underflow . ^ Backtrace: ( print all values on stack bottom-up which was easier for me to read ) ok : print_stack ( -- ) depth begin dup 0> while dup pick . 1- repeat drop ; ok ( works on empty stack ) ok print_stack ok ( prints non-empty stack ) ok 1 2 3 4 5 print_stack 1 2 3 4 5 ok ( stack has not been emptied ) ok print_stack 1 2 3 4 5 ok
I have to admit, these were pretty fun to write. Using the stack as an implicit variable store also makes print_stack a pretty useful debugging tool:
( max with debug info ) ok : maxd ( x y -- x or y ) 2dup > print_stack if drop else nip then ; ok 3 4 maxd 3 4 0 ok empty_stack 4 ok 4 3 maxd 4 3 -1 ok
So we can check that > is working! Ok, so that’s not very useful, but since variables are “always”–as far as I know–on the stack, we can move print_stack around and debug.
Looking at words
Forth’s [ and ] operators switch between immediate (interpreter) and compile mode. What if we combine it with print_stack?
( switch to hex mode ) ok hex ok : test [ print_stack ] ; 0 7F30D3EC 7F30D3F8 0 ok
What the heck are those numbers? : starts a word definition (and enters compile mode), test specifies the name, [ switches to immediate mode, print_stack prints the stack while we're in the middle of defining the word and finally ] ; switches back to compile mode and ends the word.
So 0 7F30D3EC 7F30D3F8 0 is placed on the stack at the start of a word definition. The second and third numbers change each time:
: test [ print_stack ] ; 0 7F30D404 7F30D410 0 redefined test ok : test [ print_stack ] ; 0 7F30D41C 7F30D428 0 redefined test ok
So it looks like it’s memory addresses allocated for the new word. latest puts the memory address of the last defined word on the stack.
: test [ print_stack ] ; 0 7F30D4FC 7F30D508 0 7F30D4FC ok latest 32 dump 7F30D4FC: CC D4 30 7F 04 00 00 80 - 74 65 73 74 B4 10 40 00 ..0.....test..@. 7F30D50C: 00 00 00 00 1B E8 AF 7F - 00 00 00 00 00 00 00 00 ................ 7F30D51C: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................ 7F30D52C: 00 00 - .. ok
If we compare this to Jones’ diagram of Forth words:
| LINK POINTER | LENGTH/| NAME | DEFINITION...
| | FLAGS | |
+ (4 bytes) + byte + n bytes +
The memory contains a link pointer (7F30D4CC; don’t forget x86 is little endian), length (4), padding and the name (“test”). The third number returned by the [ print_stack ], 7F30D508, points to 004010B4. I’m not sure exactly what this is, but if we make a few more functions:
: test2 [ print_stack ] ; latest . 0 7F30D514 7F30D528 0 7F30D514 ok
latest 32 dump
7F30D514: FC D4 30 7F 05 00 00 80 - 74 65 73 74 32 20 20 20 ..0.....test2
7F30D524: 20 20 20 20 B4 10 40 00 - 00 00 00 00 37 E8 AF 7F ..@.....7...
7F30D534: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
7F30D544: 00 00 - ..
ok
: test3 [ print_stack ] test2 test ; latest . 0 7F30D558 7F30D568 0 7F30D558 ok
latest 32 dump
7F30D558: 14 D5 30 7F 05 00 00 80 - 74 65 73 74 33 20 20 20 ..0.....test3
7F30D568: B4 10 40 00 00 00 00 00 - 7B E8 AF 7F 30 D5 30 7F ..@.....{...0.0.
7F30D578: A3 E8 AF 7F 10 D5 30 7F - CB E8 AF 7F 00 00 00 00 ......0.........
7F30D588: 00 00 - ..
ok
It always points to 004010B4 so it looks like the name terminator / definition start. Jones’ talks about DOCOL, the Forth “interpreter” that sets up memory before running the body of the function. That could be what this is, but I’m not sure.
The first italicized number, 7F30D530, is end of the test2 name and hence the start of its definition. The second number, 7F30D510, is the start of the test definition. So a Forth word is a header and then a series of pointers to other Forth words.
So, what’s your point
Well, nothing really. Forth’s code structure is described in Jones’ writeup, so I didn’t learn anything novel. But it is a neat interactive way to poke around and see what Forth is doing internally.
* “FORTH is case-sensitive. Use capslock!”. I guess using lowercase is starting off pretty non-Forthic.
PS: One more stupid trick: since the start of every Forth word is a pointer to the previous word (the dictionary is a linked list), you can use latest and @ (dereference) to step backwards through the dictionary:
: test ; ok : test2 ; ok : test3 ; ok latest 16 dump 7F30D2EC: CC D2 30 7F 05 00 00 80 - 74 65 73 74 33 20 20 20 ..0.....test3 ok latest @ 16 dump 7F30D2CC: B0 D2 30 7F 05 00 00 80 - 74 65 73 74 32 20 20 20 ..0.....test2 ok latest @ @ 16 dump 7F30D2B0: A0 CF 30 7F 04 00 00 80 - 74 65 73 74 20 20 20 20 ..0.....test ok
