sandbox/misc/kforth/doc/part-0x05.rst

429 lines
13 KiB
ReStructuredText
Raw Blame History

Write You a Forth, 0x05
-----------------------
:date: 2018-02-27 08:06
:tags: wyaf, forth
NB: Today's update was pretty large, so I don't show all of the code; this is
what ``git`` is for.
Today I need to start actually doing things with tokens. This requires two
things:
1. Some idea of what a word is, and
2. A dictionary of words
I started taking some notes on this previously, and I think there are a few
kinds of words that are possible:
1. Numbers (e.g. defining a variable)
2. Built-in functions
3. Lambda functions (that is, user-defined functions).
Stage 1 really only needs to incorporate #2, so that's what I'll focus on for
now. However, to prepare for the future, I'm going to define a ``Word`` base
class and inherit from there. This interface is going to need to be
stack-aware, so what I've done is define a ``System`` struct in ``system.h``::
#ifndef __KF_CORE_H__
#define __KF_CORE_H__
#include "defs.h"
#include "stack.h"
typedef struct _System {
Stack<KF_INT> dstack;
IO *interface;
} System;
#endif // __KF_CORE_H__
This will let me later add in support for the return stack and other things
that might be useful. Other ideas: adding something like an ``errno``
equivalent to indicate the last error, and storing a dictionary of words. This
will need some restructuring, though. I've already moved the I/O into the
system as well. This took some finangling in ``kforth.cc``; I'm eliding the
diff here because it's so long, but it's basically a ``sed -i -e
's/interface./sys->interface.``.
The Word interface
^^^^^^^^^^^^^^^^^^
Now I can start defining a Word.h. Maybe this is a case of when you have an
object-oriented language, everything looks like a class, but I decided to
design an abstract class for a Word and implement the first concrete class,
**Builtin**. What I came up with was::
class Word {
public:
virtual ~Word() {};
The *eval* method takes a ``System`` structure and executes some function.
::
virtual bool eval(System *) = 0;
The dictionary is a linked list, so next is used to traverse the list.
::
virtual Word *next(void) = 0;
The ``match`` method is used to determine whether this is the word being
referred to.
::
virtual bool match(struct Token *) = 0;
Finally, ``getname`` will fill in a ``char[MAX_TOKEN_SIZE]`` buffer with the
word's name.
::
virtual void getname(char *, size_t *) = 0;
};
With the interface defined, I can implement ``Builtins`` (I've elided the
common interface from the listing below to focus on the implementation)::
class Builtin : public Word {
public:
~Builtin() {};
Builtin(const char *name, size_t namelen, Word *head, bool (*fun)(System *));
private:
char name[MAX_TOKEN_LENGTH];
size_t namelen;
Word *prev;
bool (*fun)(System *);
};
The ``bool`` works as a first pass, but I think I'm going to have add some
notion of system conditions later on to denote why execution failed. One thing
that both ``pforth`` and ``gforth`` do that I don't yet do is to clear the
stack when there's an execution failure. At least, they clear the stack with an
unrecognised word. The implementation is pretty trivial::
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
#include <string.h>
Builtin::Builtin(const char *name, size_t namelen, Word *head, bool (*target)(System *))
: prev(head), fun(target)
{
memcpy(this->name, name, namelen);
this->namelen = namelen;
}
bool
Builtin::eval(System *sys)
{
return this->fun(sys);
}
Word *
Builtin::next()
{
return this->prev;
}
bool
Builtin::match(struct Token *token)
{
return match_token(this->name, this->namelen, token->token, token->length);
}
void
Builtin::getname(char *buf, size_t *buflen)
{
memcpy(buf, this->name, this->namelen);
*buflen = namelen;
}
Right. Now to do something with this.
The system dictionary
^^^^^^^^^^^^^^^^^^^^^
The dictionary's interface is minimal::
// dict.h
#ifndef __KF_DICT_H__
#define __KF_DICT_H__
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
typedef enum _LOOKUP_ : uint8_t {
LOOKUP_OK = 0, // Lookup executed properly.
LOOKUP_NOTFOUND = 1, // The token isn't in the dictionary.
LOOKUP_FAILED = 2 // The word failed to execute.
} LOOKUP;
void init_dict(System *);
LOOKUP lookup(struct Token *, System *);
#endif // __KF_DICT_H__
There's a modicum of differentiation between "everything worked" and "no it
didn't," and by that I mean the lookup can tell you if the word wasn't found
or if there was a problem executing it.
I added a ``struct Word *dict`` field to the ``System`` struct, since we're
operating on these anyways. I guess it's best to start with the lookup function
first so that when I started adding builtins later it'll be easy to just
recompile and use them.
::
LOOKUP
lookup(struct Token *token, System *sys)
{
Word *cursor = sys->dict;
KF_INT n;
I seem to recall from *Programming a Problem-Oriented Language* that Chuck
Moore advocated checking whether a token was a number before looking it up
in the dictionary, so that's the approach I'll take::
if (parse_num(token, &n)) {
if (sys->dstack.push(n)) {
return LOOKUP_OK;
}
return LOOKUP_FAILED;
}
The remainder is pretty much bog-standard linked list traversal::
while (cursor != nullptr) {
if (cursor->match(token)) {
if (cursor->eval(sys)) {
return LOOKUP_OK;
}
return LOOKUP_FAILED;
}
cursor = cursor->next();
}
return LOOKUP_NOTFOUND;
}
This needs to get hooked up into the interpreter now; this is going to require
some reworking of the ``parser`` function::
static bool
parser(const char *buf, const size_t buflen)
{
static size_t offset = 0;
static struct Token token;
static PARSE_RESULT result = PARSE_FAIL;
static LOOKUP lresult = LOOKUP_FAILED;
static bool stop = false;
offset = 0;
// reset token
token.token = nullptr;
token.length = 0;
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
lresult = lookup(&token, &sys);
switch (lresult) {
case LOOKUP_OK:
continue;
case LOOKUP_NOTFOUND:
sys.interface->wrln((char *)"word not found", 15);
stop = true;
break;
case LOOKUP_FAILED:
sys.interface->wrln((char *)"execution failed", 17);
stop = true;
break;
default:
sys.interface->wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
if (stop) {
stop = false;
break;
}
}
switch (result) {
case PARSE_OK:
return false;
case PARSE_EOB:
sys.interface->wrbuf(ok, 4);
return true;
case PARSE_LEN:
sys.interface->wrln((char *)"parse error: token too long", 27);
return false;
case PARSE_FAIL:
sys.interface->wrln((char *)"parser failure", 14);
return false;
default:
sys.interface->wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
}
Now I feel like I'm at the part where I can start adding in functionality. The
easiest first builtin: addition. Almost impossible to screw this up, right?
::
static bool
add(System *sys)
{
KF_INT a = 0;
KF_INT b = 0;
if (!sys->dstack.pop(&a)) {
return false;
}
if (!sys->dstack.pop(&b)) {
return false;
}
b += a;
return sys->dstack.push(b);
}
Now this needs to go into the ``init_dict`` function::
void
init_dict(System *sys)
{
sys->dict = nullptr;
sys->dict = new Builtin((const char *)"+", 1, sys->dict, add);
}
And this needs to get added into the ``main`` function::
int
main(void)
{
init_dict(&sys);
#ifdef __linux__
Console interface;
sys.interface = &interface;
#endif
sys.interface->wrbuf(banner, bannerlen);
interpreter();
return 0;
}
The moment of truth
^^^^^^^^^^^^^^^^^^^
Hold on to your pants, let's see what breaks::
$ ./kforth
kforth interpreter
<>
? 2 3 +
ok.
<5>
Oh hey, look, it actually works. Time to add a few more definitions for good
measure:
+ the basic arithmetic operators `-`, `*`, `/`
+ the classic `SWAP` and `ROT` words
+ `DEFINITIONS` to look at all the definitions in the language
These are all pretty simple, fortunately. A few things that tripped me up,
though:
+ The *a* and *b* names kind of threw me off because I fill *a* first. This
means it's the last number on the stack; this didn't matter for addition,
but in subtraction, it means I had to be careful to do ``b -= a`` rather
than the other way around.
+ pforth and gfortran both support case insensitive matching, so I had to
modify the token matcher::
bool
match_token(const char *a, const size_t alen,
const char *b, const size_t blen)
{
if (alen != blen) {
return false;
}
for (size_t i = 0; i < alen; i++) {
if (a[i] == b[i]) {
continue;
}
if (!isalpha(a[i]) || !isalpha(b[i])) {
return false;
}
The XOR by 0x20 is just a neat trick for inverting the case of a letter.
::
if ((a[i] ^ 0x20) == b[i]) {
continue;
}
if (a[i] == (b[i] ^ 0x20)) {
continue;
}
return false;
}
return true;
}
+ I forgot to include the case for ``PARSE_OK`` in the result switch statement
in the ``parser`` function, so I could get a line of code evaluated but then
it'd die with "the world is broken."
+ When I tried doing some division, I ran into some weird issues::
$ ./kforth
kforth interpreter
<>
? 2 5040 /
ok.
<<3C><>>
It turns out that in ``write_num``, the case where *n = 0* results in nothing
happening, and therefore the buffer just being written. This is a dirty thing,
but I edge cased this::
$ git diff io.cc
diff --git a/io.cc b/io.cc
index 77e0e2a..a86156b 100644
--- a/io.cc
+++ b/io.cc
@@ -24,6 +24,10 @@ write_num(IO *interface, KF_INT n)
n++;
}
}
+ else if (n == 0) {
+ interface->wrch('0');
+ return;
+ }
while (n != 0) {
char ch = (n % 10) + '0';
May the compiler have mercy on my soul and whatnot.
For you sports fans keeping track at home, here's the classic bugs I've
introduced so far:
1. bounds overrun
2. missing case in a switch statement
But now here I am with the interpreter in good shape. Now I can start
implementing the builtins in earnest!
As before, see the tag `part-0x05 <https://github.com/kisom/kforth/tree/part-0x05>`_.