Write You a Forth, 0x07 ----------------------- :date: 2018-03-01 19:31 :tags: wyaf, forth At this point, I've finished most of the nucleus layer. All that's left to implement are ``EXIT``, ``I``, and ``J`` --- the first requires better execution support, which I'll talk about at the end. The other two, I'm not so sure about yet. However, I made some large changes, so let's dive in. Here's the new Linux definitions file:: #ifndef __KF_LINUX_DEFS_H__ #define __KF_LINUX_DEFS_H__ #include #include typedef int32_t KF_INT; typedef uint32_t KF_UINT; typedef int64_t KF_LONG; typedef uintptr_t KF_ADDR; constexpr uint8_t STACK_SIZE = 128; constexpr size_t ARENA_SIZE = 65535; #endif I've also updated the main ``defs.h`` file to move some constants there:: #ifndef __KF_DEFS_H__ #define __KF_DEFS_H__ #ifdef __linux__ #include "linux/defs.h" #else typedef int KF_INT; typedef long KF_LONG; constexpr uint8_t STACK_SIZE = 16; #endif constexpr size_t MAX_TOKEN_LENGTH = 16; constexpr size_t dshift = (sizeof(KF_INT) * 8) - 1; static inline KF_INT mask(size_t bits) { KF_INT m = 0; for (size_t i = 0; i < bits; i++) { m += 1 << i; } return m; } #endif // __KF_DEFS_H__ Addresses ^^^^^^^^^ The first major change is the addition of the ``KF_ADDR`` type. This is needed to implement the memory manipulation words. I've added some additional utility functions for pushing and popping addresses from the data stack; they're stored as double numbers:: static bool pop_addr(System *sys, KF_ADDR *a) { KF_LONG b; if (!pop_long(sys, &b)) { // Status is already set. return false; } *a = static_cast(b); sys->status = STATUS_OK; return true; } static bool push_addr(System *sys, KF_ADDR a) { KF_LONG b = static_cast(a); if (!push_long(sys, b)) { // Status is already set. return false; } sys->status = STATUS_OK; return true; } Now I can actually implement ``!`` and so forth:: static bool store(System *sys) { KF_ADDR a = 0; // address KF_INT b = 0; // value KF_LONG c = 0; // temporary if (!pop_long(sys, &c)) { sys->status = STATUS_STACK_UNDERFLOW; return false; } a = static_cast(c); if (!sys->dstack.pop(&b)) { sys->status = STATUS_STACK_UNDERFLOW; return false; } *((KF_INT *)a) = b; sys->status = STATUS_OK; return true; } There's definitely a sense of finangling here. The return stack ^^^^^^^^^^^^^^^^ The ``>R`` series of words requires a return stack, so I've added a ``Stack`` field to the ``System`` structure. The address stack manipulation functions I introduced earlier only operate on the data stack, so these require some extra verbosity; for example:: static bool to_r(System *sys) { KF_INT a; if (!sys->dstack.pop(&a)) { sys->status = STATUS_STACK_UNDERFLOW; return false; } if (!sys->rstack.push(static_cast(a))) { sys->status = STATUS_RSTACK_OVERFLOW; return false; } sys->status = STATUS_OK; return true; } Adding the ``rstack`` field also required adding return stack over- and underflow status codes. The arena ^^^^^^^^^ As I was reading through the words left to implement, I found I'd have to implement ``COUNT``. This provides some support for counted strings, which are implemented as a byte array where the first byte is the length of the string. In my mind, this has two implications: 1. There needs to be some area of user memory that's available for storing strings and the like. I've termed this the arena, and it's a field in the ``System`` structure now. 2. There needs to be a Word type for addresses. So now I have this definition for the ``System`` structure:: typedef struct _System { Stack dstack; Stack rstack; IO *interface; Word *dict; SYS_STATUS status; uint8_t arena[ARENA_SIZE]; } System; The ``Address`` type seems like it's easy enough to implement:: class Address : public Word { public: ~Address() {}; Address(const char *name, size_t namelen, Word *head, KF_ADDR addr); bool eval(System *); Word *next(void); bool match(struct Token *); void getname(char *, size_t *); private: char name[MAX_TOKEN_LENGTH]; size_t namelen; Word *prev; KF_ADDR addr; }; And the implementation:: Address::Address(const char *name, size_t namelen, Word *head, KF_ADDR addr) : prev(head), addr(addr) { memcpy(this->name, name, namelen); this->namelen = namelen; } bool Address::eval(System *sys) { KF_INT a; a = static_cast(this->addr & mask(dshift)); if (!sys->dstack.push(a)) { return false; } a = static_cast((this->addr >> dshift) & mask(dshift)); if (!sys->dstack.push(a)) { return false; } return true; } Word * Address::next(void) { return this->prev; } bool Address::match(struct Token *token) { return match_token(this->name, this->namelen, token->token, token->length); } void Address::getname(char *buf, size_t *buflen) { memcpy(buf, this->name, this->namelen); *buflen = namelen; } It's kind of cool to see this at work:: $ ./kforth kforth interpreter ? arena drop 2+ 0 @ . 0 ok. ? arena drop 2+ 0 4 rot rot ! . stack underflow (error code 2). ? arena drop 2+ 0 @ . 4 ok. Unsigned numbers ^^^^^^^^^^^^^^^^ This is really just a bunch of casting:: static bool u_dot(System *sys) { KF_INT a; KF_UINT b; if (!sys->dstack.pop(&a)) { sys->status = STATUS_STACK_UNDERFLOW; return false; } b = static_cast(a); write_unum(sys->interface, b); sys->interface->newline(); sys->status = STATUS_OK; return true; } Execute ^^^^^^^ Implementing ``execute`` was fun, but it begins to highlight the limits of my approach so far. EXECUTE addr -- 79 The word definition indicated by addr is executed. An error condition exists if addr is not a compilation address For example:: (gdb) break 83 Breakpoint 1 at 0x4077cf: file kforth.cc, line 83. (gdb) run Starting program: /home/kyle/code/kforth/kforth Breakpoint 1, main () at kforth.cc:83 83 Console interface; (gdb) p sys.dict->next()->next()->next()->next() $1 = (Word *) 0x7e45b0 (gdb) p (Builtin) *sys.dict->next()->next()->next()->next() $2 = { = {_vptr$Word = 0x55f220 }, name = "+", '\000' , namelen = 1, prev = 0x7e4570, fun = 0x406eb0 } (gdb) p/u 0x7e45b0 $3 = 8275376 (gdb) c Continuing. kforth interpreter ? 2 3 8275376 0 execute . executing word: + 5 ok. In case the ``gdb`` example wasn't clear, I printed the address of the fourth entry in the dictionary, which happens to be ``+``. I push the numbers 2 and 3 onto the stack, then push the address of ``+`` on the stack, then call execute. As the dot function shows, it executes correctly, pushing the resulting 5 onto the stack. Which leads me to the next section, wherein I need to rethink the execution model. The execution model ^^^^^^^^^^^^^^^^^^^ In most of the Forth implementations I've, the dictionary is a list of contiguous pointers to words. That is, something like:: Word *dict[ARRAY_SIZE] = { 0 }; dict[0] = new Builtin((const char *)"+", 1, add); dict[1] = new Builtin((const char *)"-", 1, sub); And so forth. Or, maybe, :: Word dict[ARRAY_SIZE] = { Builtin((const char *)"+", 1, add), Builtin((const char *)"-", 1, sub) }; So some questions: + How big should this array be? + How do I handle different word types? + How do I transfer execution to functions? I'm thinking something like: + the parser looks up a word, and pushes the parser function's address onto the return stack. + the parser jumps to the word's function pointer and executes it. + the function pointer jumps back to the last address on the return stack. The second step could involve chaining multiple functions in there. I don't know how to transfer execution to a random address in memory (maybe ``setjmp`` and ``longjmp``), or how I'm going to push the current word's address onto the stack. I guess include some sort of additional fields in the system type. This starts to jump into the realm of an operating system or virtual machine; the OS approach makes more sense for embedded system.