sandbox/doc/part-0x08.rst

Write You a Forth, 0x08
-----------------------

:date: 2018-03-05 21:42
:tags: wyaf, forth

After reading some more in Threaded Interpreted Languages (TIL_ from now on),
I've decided to start over.

.. _TIL: http://wiki.c2.com/?ThreadedInterpretiveLanguage

Some design choices that didn't really work out:

+ the system structure
+ not making it easier to test building for different platforms
+ my linked list approach to the dictionary
+ my class-based approach to words

I get the distinct feeling that I could (maybe should) be doing this in C99, so
I think I'll switch to that.

The new design
^^^^^^^^^^^^^^

I'll need to provide a few initial pieces:

1. eval.c
2. stack.c
3. the platform parts

I'll skip the parser at first and hand hack some things, then try to
port over my I/O layer from before.

Also, talking to Steve got me to think about doing this in C99, because
a lot of the fun I've had with computers in the past involved hacking
on C projects. So, C99 it is.


Platforms
^^^^^^^^^

I've elected to set a new define type, ``PLATFORM_$PLATFORM``. The Makefile
sets this, so it's easier now to test building for different platforms.
Here's the current top-level definitions::

        #ifndef __KF_DEFS_H__
        #define __KF_DEFS_H__

        #include <stdbool.h>
        #include <stddef.h>
        #include <stdint.h>

        #ifdef PLATFORM_pc
        #include "pc/defs.h"
        #else
        #include "default/defs.h"
        #endif

The ``pc/defs.h`` header::

        #ifndef __KF_PC_DEFS_H__
        #define __KF_PC_DEFS_H__

        typedef int32_t	KF_INT;
        typedef uintptr_t KF_ADDR;

        static const size_t	DSTACK_SIZE = 65535;
        static const size_t	RSTACK_SIZE = 65535;
        static const size_t	DICT_SIZE   = 65535;

        #endif /* __KF_PC_DEFS_H__ */

        #endif /* __KF_DEFS_H__ */

The new stack
^^^^^^^^^^^^^

I'll start with a much simplified stack interface::

        #ifndef __KF_STACK_H__
        #define __KF_STACK_H__

        /* data stack interaction */
        bool	dstack_pop(KF_INT *);
        bool	dstack_push(KF_INT);
        bool	dstack_get(size_t, KF_INT *);
        size_t	dstack_size(void);
        void	dstack_clear(void);

        /* return stack interaction */
        bool	rstack_pop(KF_ADDR *);
        bool	rstack_push(KF_ADDR);
        bool	rstack_get(size_t, KF_ADDR *);
        size_t	rstack_size(void);
        void	rstack_clear(void);

        #endif /* __KF_STACK_H__ */

The implementation is simple enough; the ``rstack`` interface is similar
enough to the ``dstack`` that I'll just show the first::

        #include "defs.h"
        #include "stack.h"

        static KF_INT	dstack[DSTACK_SIZE] = {0};
        static size_t	dstack_len = 0;

        bool
        dstack_pop(KF_INT *a)
        {
        	if (dstack_len == 0) {
        		return false;
        	}

        	*a = dstack[--dstack_len];
        	return true;
        }

        bool
        dstack_push(KF_INT a)
        {
        	if (dstack_len == DSTACK_SIZE) {
        		return false;
        	}

        	dstack[dstack_len++] = a;
        	return true;
        }

        bool
        dstack_get(size_t i, KF_INT *a)
        {
        	if (i >= dstack_len) {
        		return false;
        	}

        	*a = dstack[dstack_len - i - 1];
        	return true;
        }

        size_t
        dstack_size()
        {
        	return dstack_len;
        }

        void
        dstack_clear()
        {
        	dstack_len = 0;
        }

Words
^^^^^

Reading TIL has given me some new ideas on how to implement words::

        #ifndef __KF_WORD_H__
        #define __KF_WORD_H__

        /*
         * Every word in the dictionary starts with a header:
         * uint8_t	 length;
         * uint8_t	 flags;
         * char		*name;
         * uintptr_t	 next;
         *
         * The body looks like the following:
         * uintptr_t	 codeword;
         * uintptr_t	 body[];
         *
         * The codeword is the interpreter for the body. This is defined in
         * eval.c. Note that a native (or builtin function) has only a single
         * body element.
         *
         * The body of a native word points to a function that's compiled in already.
         */


        /*
         * store_native writes a new dictionary entry for a native-compiled
         * function.
         */
        void	store_native(uint8_t *, const char *, const uint8_t, void(*)(void));

        /*
         * match_word returns true if the current dictionary entry matches the
         * token being searched for.
         */
        bool	match_word(uint8_t *, const char *, const uint8_t);

        /*
         * word_link returns the offset to the next word.
         */
        size_t	word_link(uint8_t *);

        size_t	word_body(uint8_t *);

        #endif /* __KF_WORD_H__ */

The codeword is the big changer here. I've put a native evaluator and
a codeword executor in the ``eval`` files::

        #ifndef __KF_EVAL_H__
        #define __KF_EVAL_H__

        #include "defs.h"

        /*
         * cwexec is the codeword executor. It assumes that the uintptr_t
         * passed into it points to the correct executor (e.g. nexec),
         * which is called with the next address.
         */
        void	cwexec(uintptr_t);


        /*
         * nexec is the native executor.
         *
         * It should take a uintptr_t containing the address of a code block
         * and will execute the function starting there. The function should
         * half the signature void(*target)(void) - a function returning
         * nothing and taking no arguments.
         */
        void	nexec(uintptr_t);

        static const uintptr_t	nexec_p = (uintptr_t)&nexec;


        #endif /* __KF_EVAL_H__ */

The implementations of these are short::

        #include "defs.h"
        #include "eval.h"

        #include <string.h>

``nexec`` just casts its target to a void function and calls it.

::

        void
        nexec(uintptr_t target)
        {
        	((void(*)(void))target)();
        }

``cwexec`` is the magic part: it reads a pair of addresses; the first
is the executor, and the next is the start of the code body. In the
case of native execution, this is a pointer to a function.

::

        void
        cwexec(uintptr_t entry)
        {
        	uintptr_t	target = 0;
        	uintptr_t	codeword = 0;

        	memcpy(&codeword, (void *)entry, sizeof(uintptr_t));
        	memcpy(&target, (void *)(entry + sizeof(uintptr_t)), sizeof(uintptr_t));
        	((void(*)(uintptr_t))codeword)(target);
        }


So I wrote a quick test program to check these out::

        #include "defs.h"
        #include "eval.h"
        #include <stdio.h>
        #include <string.h>

        static void
        hello(void)
        {
                printf("hello, world\n");
        }

        int
        main(void)
        {
                uintptr_t       target = (uintptr_t)hello;

                nexec(hello);

                uint8_t arena[32] = { 0 };
                uintptr_t arena_p = (uintptr_t)arena;

                memcpy(arena, (void *)&nexec_p, sizeof(nexec_p));
                memcpy(arena + sizeof(nexec_p), (void *)&target, sizeof(target));

                cwexec(arena_p);
        }

But does it work?

::

        $ gcc -o eval_test eval_test.c eval.o
        $ ./eval_test
        hello, world
        hello, world

What magic is this?

Now I need to write a couple functions to make this easier::

        #include "defs.h"
        #include "eval.h"
        #include "word.h"

        #include <string.h>

        static uint8_t	dict[DICT_SIZE] = {0};
        static size_t	last = 0;

The first two functions will operate on the internal dict, and are
intended to be used to maintain the internal dictionary. The first
adds a new word to the dictionary, and the second attempts to look
up a word by name and execute it::

        void
        append_native_word(const char *name, const uint8_t len, void(*target)(void))
        {
        	store_native(dict+last, name, len, target);
        }

        bool
        execute(const char *name, const uint8_t len)
        {
        	size_t	offset = 0;
        	size_t	body = 0;
        	while (true) {
        		if (!match_word(dict+offset, name, len)) {
        			if ((offset = word_link(dict+offset)) == 0) {
        				return false;
        			}
        			continue;
        		}

        		body = word_body(dict+offset);
        		cwexec(dict + body + offset);
        		return true;
        	}
        }

Actually, now that I think about it, maybe I should also add in a function
to return a uintptr_t to the word, too. Should this point to the header or
to the body? My first instinct is to point to the header and have the caller
(me) use ``word_body`` to get the actual body. That being said, however,
we already have the useful information from the header (namely, the name and
length); the link is only useful for the search phase. Following this logic
means that ``lookup`` will return a pointer to the body. So say we all::

        bool
        lookup(const char *name, const uint8_t len, uintptr_t *ptr)
        {
        	size_t	offset = 0;
        	size_t	body = 0;
        	while (true) {
        		if (!match_word(dict+offset, name, len)) {
        			if ((offset = word_link(dict+offset)) == 0) {
        				return false;
        			}
        			continue;
        		}

        		body = word_body(dict+offset);
        		*ptr = (uintptr_t)(dict + offset + body);
        		return true;
        	}

        }

The rest of the functions in the header (all of which are publicly
visible) are made available for use later. Maybe (but let's be honest,
probably not) I'll go back later and make these functions private.

The first such function stores a native (built-in) word. This is what
``append_native_word`` is built around::

        void
        store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(void))
        {
        	uintptr_t	target_p = (uintptr_t)target;
        	size_t		link = 2 + len + (2 * sizeof(uintptr_t));

        	/* write the header */
        	entry[0] = len;
        	entry[1] = 0; // flags aren't used yet
        	memcpy(entry+2, name, len);
        	memcpy(entry+2+len, &link, sizeof(link));

        	/* write the native executor codeword and the function pointer */
        	memcpy(entry, (uint8_t *)(&nexec_p), sizeof(uintptr_t));
        	memcpy(entry + sizeof(uintptr_t), (uint8_t *)(&target_p), sizeof(uintptr_t));
        }

The rest of the functions are utility functions. ``match_word`` is used
to... match words::

        bool
        match_word(uint8_t *entry, const char *name, const uint8_t len)
        {
        	if (entry[0] != len) {
        		return false;
        	}

        	if (memcmp(entry+2, name, len) != 0) {
        		return false;
        	}

        	return true;
        }

Finally, ``word_link`` returns the offset to the next function (e.g. so
as to be able to do ``entry+offset``) and ``word_body`` returns the offset
to the body of the word::

        size_t
        word_link(uint8_t *entry)
        {
        	size_t	link;

        	if (entry[0] == 0) {
        		return 0;
        	}
        	memcpy(&link, entry+2+entry[0], sizeof(link));
        	return link;
        }

        size_t
        word_body(uint8_t *entry)
        {
        	return 2 + entry[0] + sizeof(size_t);
        }


That about wraps up this chunk of work. Next to maybe start porting builtins? I
also need to rewrite the parser and I/O layer.