Write You a Forth, 0x08 ----------------------- :date: 2018-03-05 21:42 :tags: wyaf, forth After reading some more in Threaded Interpreted Languages (TIL_ from now on), I've decided to start over. .. _TIL: http://wiki.c2.com/?ThreadedInterpretiveLanguage Some design choices that didn't really work out: + the system structure + not making it easier to test building for different platforms + my linked list approach to the dictionary + my class-based approach to words I get the distinct feeling that I could (maybe should) be doing this in C99, so I think I'll switch to that. The new design ^^^^^^^^^^^^^^ I'll need to provide a few initial pieces: 1. eval.c 2. stack.c 3. the platform parts I'll skip the parser at first and hand hack some things, then try to port over my I/O layer from before. Also, talking to Steve got me to think about doing this in C99, because a lot of the fun I've had with computers in the past involved hacking on C projects. So, C99 it is. Platforms ^^^^^^^^^ I've elected to set a new define type, ``PLATFORM_$PLATFORM``. The Makefile sets this, so it's easier now to test building for different platforms. Here's the current top-level definitions:: #ifndef __KF_DEFS_H__ #define __KF_DEFS_H__ #include #include #include #ifdef PLATFORM_pc #include "pc/defs.h" #else #include "default/defs.h" #endif The ``pc/defs.h`` header:: #ifndef __KF_PC_DEFS_H__ #define __KF_PC_DEFS_H__ typedef int32_t KF_INT; typedef uintptr_t KF_ADDR; static const size_t DSTACK_SIZE = 65535; static const size_t RSTACK_SIZE = 65535; static const size_t DICT_SIZE = 65535; #endif /* __KF_PC_DEFS_H__ */ #endif /* __KF_DEFS_H__ */ The new stack ^^^^^^^^^^^^^ I'll start with a much simplified stack interface:: #ifndef __KF_STACK_H__ #define __KF_STACK_H__ /* data stack interaction */ bool dstack_pop(KF_INT *); bool dstack_push(KF_INT); bool dstack_get(size_t, KF_INT *); size_t dstack_size(void); void dstack_clear(void); /* return stack interaction */ bool rstack_pop(KF_ADDR *); bool rstack_push(KF_ADDR); bool rstack_get(size_t, KF_ADDR *); size_t rstack_size(void); void rstack_clear(void); #endif /* __KF_STACK_H__ */ The implementation is simple enough; the ``rstack`` interface is similar enough to the ``dstack`` that I'll just show the first:: #include "defs.h" #include "stack.h" static KF_INT dstack[DSTACK_SIZE] = {0}; static size_t dstack_len = 0; bool dstack_pop(KF_INT *a) { if (dstack_len == 0) { return false; } *a = dstack[--dstack_len]; return true; } bool dstack_push(KF_INT a) { if (dstack_len == DSTACK_SIZE) { return false; } dstack[dstack_len++] = a; return true; } bool dstack_get(size_t i, KF_INT *a) { if (i >= dstack_len) { return false; } *a = dstack[dstack_len - i - 1]; return true; } size_t dstack_size() { return dstack_len; } void dstack_clear() { dstack_len = 0; } Words ^^^^^ Reading TIL has given me some new ideas on how to implement words:: #ifndef __KF_WORD_H__ #define __KF_WORD_H__ /* * Every word in the dictionary starts with a header: * uint8_t length; * uint8_t flags; * char *name; * uintptr_t next; * * The body looks like the following: * uintptr_t codeword; * uintptr_t body[]; * * The codeword is the interpreter for the body. This is defined in * eval.c. Note that a native (or builtin function) has only a single * body element. * * The body of a native word points to a function that's compiled in already. */ /* * store_native writes a new dictionary entry for a native-compiled * function. */ void store_native(uint8_t *, const char *, const uint8_t, void(*)(void)); /* * match_word returns true if the current dictionary entry matches the * token being searched for. */ bool match_word(uint8_t *, const char *, const uint8_t); /* * word_link returns the offset to the next word. */ size_t word_link(uint8_t *); size_t word_body(uint8_t *); #endif /* __KF_WORD_H__ */ The codeword is the big changer here. I've put a native evaluator and a codeword executor in the ``eval`` files:: #ifndef __KF_EVAL_H__ #define __KF_EVAL_H__ #include "defs.h" /* * cwexec is the codeword executor. It assumes that the uintptr_t * passed into it points to the correct executor (e.g. nexec), * which is called with the next address. */ void cwexec(uintptr_t); /* * nexec is the native executor. * * It should take a uintptr_t containing the address of a code block * and will execute the function starting there. The function should * half the signature void(*target)(void) - a function returning * nothing and taking no arguments. */ void nexec(uintptr_t); static const uintptr_t nexec_p = (uintptr_t)&nexec; #endif /* __KF_EVAL_H__ */ The implementations of these are short:: #include "defs.h" #include "eval.h" #include ``nexec`` just casts its target to a void function and calls it. :: void nexec(uintptr_t target) { ((void(*)(void))target)(); } ``cwexec`` is the magic part: it reads a pair of addresses; the first is the executor, and the next is the start of the code body. In the case of native execution, this is a pointer to a function. :: void cwexec(uintptr_t entry) { uintptr_t target = 0; uintptr_t codeword = 0; memcpy(&codeword, (void *)entry, sizeof(uintptr_t)); memcpy(&target, (void *)(entry + sizeof(uintptr_t)), sizeof(uintptr_t)); ((void(*)(uintptr_t))codeword)(target); } So I wrote a quick test program to check these out:: #include "defs.h" #include "eval.h" #include #include static void hello(void) { printf("hello, world\n"); } int main(void) { uintptr_t target = (uintptr_t)hello; nexec(hello); uint8_t arena[32] = { 0 }; uintptr_t arena_p = (uintptr_t)arena; memcpy(arena, (void *)&nexec_p, sizeof(nexec_p)); memcpy(arena + sizeof(nexec_p), (void *)&target, sizeof(target)); cwexec(arena_p); } But does it work? :: $ gcc -o eval_test eval_test.c eval.o $ ./eval_test hello, world hello, world What magic is this? Now I need to write a couple functions to make this easier:: #include "defs.h" #include "eval.h" #include "word.h" #include static uint8_t dict[DICT_SIZE] = {0}; static size_t last = 0; The first two functions will operate on the internal dict, and are intended to be used to maintain the internal dictionary. The first adds a new word to the dictionary, and the second attempts to look up a word by name and execute it:: void append_native_word(const char *name, const uint8_t len, void(*target)(void)) { store_native(dict+last, name, len, target); } bool execute(const char *name, const uint8_t len) { size_t offset = 0; size_t body = 0; while (true) { if (!match_word(dict+offset, name, len)) { if ((offset = word_link(dict+offset)) == 0) { return false; } continue; } body = word_body(dict+offset); cwexec(dict + body + offset); return true; } } Actually, now that I think about it, maybe I should also add in a function to return a uintptr_t to the word, too. Should this point to the header or to the body? My first instinct is to point to the header and have the caller (me) use ``word_body`` to get the actual body. That being said, however, we already have the useful information from the header (namely, the name and length); the link is only useful for the search phase. Following this logic means that ``lookup`` will return a pointer to the body. So say we all:: bool lookup(const char *name, const uint8_t len, uintptr_t *ptr) { size_t offset = 0; size_t body = 0; while (true) { if (!match_word(dict+offset, name, len)) { if ((offset = word_link(dict+offset)) == 0) { return false; } continue; } body = word_body(dict+offset); *ptr = (uintptr_t)(dict + offset + body); return true; } } The rest of the functions in the header (all of which are publicly visible) are made available for use later. Maybe (but let's be honest, probably not) I'll go back later and make these functions private. The first such function stores a native (built-in) word. This is what ``append_native_word`` is built around:: void store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(void)) { uintptr_t target_p = (uintptr_t)target; size_t link = 2 + len + (2 * sizeof(uintptr_t)); /* write the header */ entry[0] = len; entry[1] = 0; // flags aren't used yet memcpy(entry+2, name, len); memcpy(entry+2+len, &link, sizeof(link)); /* write the native executor codeword and the function pointer */ memcpy(entry, (uint8_t *)(&nexec_p), sizeof(uintptr_t)); memcpy(entry + sizeof(uintptr_t), (uint8_t *)(&target_p), sizeof(uintptr_t)); } The rest of the functions are utility functions. ``match_word`` is used to... match words:: bool match_word(uint8_t *entry, const char *name, const uint8_t len) { if (entry[0] != len) { return false; } if (memcmp(entry+2, name, len) != 0) { return false; } return true; } Finally, ``word_link`` returns the offset to the next function (e.g. so as to be able to do ``entry+offset``) and ``word_body`` returns the offset to the body of the word:: size_t word_link(uint8_t *entry) { size_t link; if (entry[0] == 0) { return 0; } memcpy(&link, entry+2+entry[0], sizeof(link)); return link; } size_t word_body(uint8_t *entry) { return 2 + entry[0] + sizeof(size_t); } That about wraps up this chunk of work. Next to maybe start porting builtins? I also need to rewrite the parser and I/O layer.