Fix up bad git subtree import.

This commit is contained in:
2018-06-11 09:39:27 -07:00
parent e7c4c5ba49
commit 6ad979d28f
86 changed files with 1 additions and 0 deletions

View File

@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"import actions, kb, sample"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## AQE: A Query Engine\n",
"\n",
"This is an implementation of a knowledge base, hacked together in Python\n",
"3 (it won't work in Python 2 for reasons of modules) for now to quickly\n",
"iterate on ideas.\n",
"\n",
"The `KnowledgeBase` is a repository of facts."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"skb = sample.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A fact is a tuple: (relationship, subject, object). `object` is admittedly a terrible name (and is subject to change) but it's what I came up with and what I'm working with for now."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('is', 'cbr600', 'Driver'),\n",
" ('at', 'cbr600', 'oakland'),\n",
" ('at', 'airliner', 'denver'),\n",
" ('is', 'oakland', 'Airport'),\n",
" ('is', 'airliner', 'Flyer'),\n",
" ('is', 'oakland', 'City'),\n",
" ('is', 'trooper', 'Driver'),\n",
" ('is', 'denver', 'City'),\n",
" ('is', 'denver', 'Airport')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"skb.facts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A KB can be told a fact with the `tell` method."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"skb.tell(('is', 'san francisco', 'cool'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, the KB can be told a fact is *not* true with the `retract` method."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"skb.retract(('is', 'san francisco', 'cool'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The KB can be queried about the facts it has. There are two types of queries. The first is done with a full fact, and represents the question \"Is this fact true?\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('is', 'oakland', 'City')]\n",
"[]\n"
]
}
],
"source": [
"print(skb.ask(('is', 'oakland', 'City')))\n",
"print(skb.ask(('is', 'cbr600', 'City')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A query returns a list of facts; the empty list means no facts were found. This might seem an odd way to represent this first question; an invalid fact is represented by an empty list, or it returns a list of a single fact. The reason for doing it this way is to support the second type of question: \"What are the facts for which this query is valid?\" This is done by providing a `None` value to *either* the subject or object. (Eventually, I'll get around to adding support for empty relationships too...)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('is', 'oakland', 'City'), ('is', 'denver', 'City')]\n"
]
}
],
"source": [
"print(skb.ask(('is', None, 'City')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another thing the KB can do is provide some basic substution using the `subst` method. It takes a fact template, a subject, and an object, and returns a fact (without making any statement as to the validity of the fact). The subject and object can be one of several values:\n",
"\n",
"+ `None`: the subject or object (depending on which position is `None`) from the arguments is substituted into the fact.\n",
"+ `?subject`: substitutes the subject.\n",
"+ `?object`: substitutes the object.\n",
"+ `?current`: the current value is kept --- this must be used only with singleton facts.\n",
"+ `?any`: the value is kept as `None`.\n",
"\n",
"Some examples should clarify this."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('is', 'oakland', 'City')\n",
"('at', 'cbr600', 'oakland')\n",
"('is', 'City', 'oakland')\n"
]
}
],
"source": [
"print(skb.subst(('is', None, 'City'), 'oakland', None))\n",
"print(skb.subst(('at', '?subject', '?current'), 'cbr600', None))\n",
"print(skb.subst(('is', '?object', '?subject'), 'oakland', 'City'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To understand `subst`, it's useful to note that it was written to support actions.\n",
"\n",
"Actions are initialised with a positive precondition (facts that must be valid for the action to be performed), a negative precondition (facts that must not be valid for the action to be performed), a set of retractions, and a set of updates.\n",
"\n",
"To illustrate this, here's a small example of airplanes and airports."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"airport_kb = kb.from_facts([\n",
" ('is', 'N29EO', 'Plane'),\n",
" ('at', 'N29EO', 'dia'),\n",
" ('is', 'N10IV', 'Plane'),\n",
" ('at', 'N10IV', 'oak'),\n",
" ('is', 'N33FR', 'Plane'),\n",
" ('at', 'N33FR', 'lga'),\n",
" ('is', 'dia', 'Airport'),\n",
" ('is', 'lga', 'Airport'),\n",
" ('is', 'oak', 'Airport'),\n",
"])\n",
"\n",
"fly = actions.Action(\n",
" [('is', '?subject', 'Plane'), ('is', '?object', 'Airport')], # Positive preconditions.\n",
" [('at', '?subject', '?object'),], # Negative preconditions.\n",
" [('at', '?subject', '?current'),], # Retractions.\n",
" [('at', '?subject', '?object')]) # Updates."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a `fly` action to be performed, there's a few facts we should make sure are true:\n",
"\n",
"1. The subject of the action is a `Plane`, and\n",
"2. The object of the action is an `Airport`.\n",
"\n",
"We should make sure that the subject isn't currently at our target airport.\n",
"\n",
"If these hold, we can perform the action. The retraction says that the subject is no longer at the airport it was at before the action, and the KB is updated to say that the plane is at a new airport."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Before flying, is N10IV at LGA? []\n",
"Before flying, is N10IV at OAK? [('at', 'N10IV', 'oak')]\n",
"After flying, is N10IV at LGA? [('at', 'N10IV', 'lga')]\n",
"After flying, is N10IV at OAK? []\n"
]
}
],
"source": [
"print('Before flying, is N10IV at LGA? ', airport_kb.ask(('at', 'N10IV', 'lga')))\n",
"print('Before flying, is N10IV at OAK? ', airport_kb.ask(('at', 'N10IV', 'oak')))\n",
"\n",
"new_airport_kb = fly.perform(airport_kb, 'N10IV', 'lga')\n",
"\n",
"print('After flying, is N10IV at LGA? ', new_airport_kb.ask(('at', 'N10IV', 'lga')))\n",
"print('After flying, is N10IV at OAK? ', new_airport_kb.ask(('at', 'N10IV', 'oak')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's more work to be done, but this represents a solid night of putting the plan into action based on what I'd learned from the AI nanodegree. I've got a bigger vision for what I want to do out of this, but it's nice to have a baseline to reason about."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

61
misc/aqe/NOTES.txt Normal file
View File

@@ -0,0 +1,61 @@
Inference example:
Given
airport_kb = kb.from_facts([
('is', 'N29EO', 'Plane'),
('at', 'N29EO', 'dia'),
('is', 'N10IV', 'Plane'),
('at', 'N10IV', 'oak'),
('is', 'N33FR', 'Plane'),
('at', 'N33FR', 'lga'),
('is', 'dia', 'Airport'),
('is', 'lga', 'Airport'),
('is', 'oak', 'Airport'),
])
fly = actions.Action(
[('is', '?subject', 'Plane'), ('is', '?object', 'Airport')],
[('at', '?subject', '?object'),],
[('at', '?subject', '?current'),],
[('at', '?subject', '?object')])
Should be able to do something like:
> infer(airport_kb, [fly], ('at', 'N10IV', 'lga'))
('fly', 'N10IV', 'lga')
------------------------------------------------------------------------------
Inference search example:
airport_kb = [
('is', 'N29EO', 'Plane'),
('at', 'N29EO', 'dia'),
('is', 'N10IV', 'Plane'),
('at', 'N10IV', 'oak'),
('is', 'N33FR', 'Plane'),
('at', 'N33FR', 'lga'),
('is', '1Z12345E0205271688', 'Package'),
('at', '1Z12345E0205271688', 'dia'),
('is', '1Z12345E6605272234', 'Package'),
('at', '1Z12345E6605272234', 'dia'),
('is', '1Z12345E0305271640', 'Package'),
('at', '1Z12345E0305271640', 'oak'),
('is', '1Z12345E1305277940', 'Package'),
('at', '1Z12345E1305277940', 'lga'),
('is', '1Z12345E6205277936', 'Package'),
('at', '1Z12345E6205277936', 'lga'),
('is', 'dia', 'Airport'),
('is', 'lga', 'Airport'),
('is', 'oak', 'Airport'),
]
fly = actions.Action(
[('is', '?subject', 'Plane'), ('is', '?object', 'Airport')],
[('at', '?subject', '?object'),],
[('at', '?subject', '?current'),],
[('at', '?subject', '?object')])
Trying to define load which requires that both package and airplane are
at the same place: how can this be expressed?

44
misc/aqe/README.md Normal file
View File

@@ -0,0 +1,44 @@
## AQE: A Query Engine
This is an implementation of a knowledge base, hacked together in Python
3 (it won't work in Python 2 for reasons of modules) for now to quickly
iterate on ideas.
There are a few key points:
+ A `KnowledgeBase` contains facts.
+ A fact is a tuple: (relationship, subject, object). For example,
`('is', 'sky', 'blue')`.
+ A `KnowledgeBase` has three core methods: ask, retract, and tell.
+ The `ask` method queries the `KnowledgeBase` to ascertain whether
a fact is true. Either the subject or the object may be `None`,
in which case all satisifiable facts are returned.
+ The `retract` method tells the `KnowledgeBase` that the fact is
no longer true. If it's rainy, we might retract our fact about the
sky being blue.
+ The `tell` method tells the `KnowledgeBase` that the fact is
now true. For example, if it's rainy (and we've retracted the previous
'sky is blue' fact), we might tell the `KnowledgeBase` that
`('is', 'sky', 'grey')`.
+ A `KnowledgeBase` can also perform substitutions.
+ An action contains positive and negative preconditions, retractions,
and updates. The positive condition list contains facts that must
be true for a knowledge base, and the negative condition list contains
facts that must be false. If these preconditions hold, the retractions
are applied, followed by the updates.
+ See `test_actions.py` for an example.
### Limitations
+ Singleton facts aren't supported; that is, there is no way to make a
`KnowledgeBase` assert that there is only one relationship → subject
mapping. For example, the `KnowledgeBase` will admit that
`('is', 'shrödingers cat', 'alive')` and
`('is', 'schrödingers cat', 'dead')` are both true simultaneously.
### TODO
+ Inference: given a list of actions, how to go from one state to
another. The first step would be single-step, then integrating
a search into the inference.
+ Rewrite in C++?

0
misc/aqe/__init__.py Normal file
View File

33
misc/aqe/actions.py Normal file
View File

@@ -0,0 +1,33 @@
import copy
import logging
class Action:
def __init__(self, pos_precond, neg_precond, retracts, updates):
self.pos_precond = copy.deepcopy(pos_precond)
self.neg_precond = copy.deepcopy(neg_precond)
self.retracts = copy.deepcopy(retracts)
self.updates = copy.deepcopy(updates)
def satisfied(self, kb, subject, obj):
for fact in self.pos_precond:
if not kb.ask(kb.subst(fact, subject, obj)):
logging.warning('{} is not valid in the current knowledgebase'.format(fact))
return False
for fact in self.neg_precond:
if kb.ask(kb.subst(fact, subject, obj)):
logging.warning('{} is valid in the current knowledgebase'.format(fact))
return False
return True
def perform(self, kb, subject, obj):
if not self.satisfied(kb, subject, obj):
return None
kbprime = copy.deepcopy(kb)
for retraction in self.retracts:
kbprime.retract(kb.subst(retraction, subject, obj))
for update in self.updates:
kbprime.tell(kb.subst(update, subject, obj))
return kbprime

File diff suppressed because one or more lines are too long

153
misc/aqe/example.md Normal file
View File

@@ -0,0 +1,153 @@
```python
import actions, kb, sample
```
## AQE: A Query Engine
This is an implementation of a knowledge base, hacked together in Python
3 (it won't work in Python 2 for reasons of modules) for now to quickly
iterate on ideas.
The `KnowledgeBase` is a repository of facts.
```python
skb = sample.load()
```
A fact is a tuple: (relationship, subject, object). `object` is admittedly a terrible name (and is subject to change) but it's what I came up with and what I'm working with for now.
```python
skb.facts()
```
[('is', 'cbr600', 'Driver'),
('at', 'cbr600', 'oakland'),
('at', 'airliner', 'denver'),
('is', 'oakland', 'Airport'),
('is', 'airliner', 'Flyer'),
('is', 'oakland', 'City'),
('is', 'trooper', 'Driver'),
('is', 'denver', 'City'),
('is', 'denver', 'Airport')]
A KB can be told a fact with the `tell` method.
```python
skb.tell(('is', 'san francisco', 'cool'))
```
Similarly, the KB can be told a fact is *not* true with the `retract` method.
```python
skb.retract(('is', 'san francisco', 'cool'))
```
The KB can be queried about the facts it has. There are two types of queries. The first is done with a full fact, and represents the question "Is this fact true?"
```python
print(skb.ask(('is', 'oakland', 'City')))
print(skb.ask(('is', 'cbr600', 'City')))
```
[('is', 'oakland', 'City')]
[]
A query returns a list of facts; the empty list means no facts were found. This might seem an odd way to represent this first question; an invalid fact is represented by an empty list, or it returns a list of a single fact. The reason for doing it this way is to support the second type of question: "What are the facts for which this query is valid?" This is done by providing a `None` value to *either* the subject or object. (Eventually, I'll get around to adding support for empty relationships too...)
```python
print(skb.ask(('is', None, 'City')))
```
[('is', 'oakland', 'City'), ('is', 'denver', 'City')]
Another thing the KB can do is provide some basic substution using the `subst` method. It takes a fact template, a subject, and an object, and returns a fact (without making any statement as to the validity of the fact). The subject and object can be one of several values:
+ `None`: the subject or object (depending on which position is `None`) from the arguments is substituted into the fact.
+ `?subject`: substitutes the subject.
+ `?object`: substitutes the object.
+ `?current`: the current value is kept --- this must be used only with singleton facts.
+ `?any`: the value is kept as `None`.
Some examples should clarify this.
```python
print(skb.subst(('is', None, 'City'), 'oakland', None))
print(skb.subst(('at', '?subject', '?current'), 'cbr600', None))
print(skb.subst(('is', '?object', '?subject'), 'oakland', 'City'))
```
('is', 'oakland', 'City')
('at', 'cbr600', 'oakland')
('is', 'City', 'oakland')
To understand `subst`, it's useful to note that it was written to support actions.
Actions are initialised with a positive precondition (facts that must be valid for the action to be performed), a negative precondition (facts that must not be valid for the action to be performed), a set of retractions, and a set of updates.
To illustrate this, here's a small example of airplanes and airports.
```python
airport_kb = kb.from_facts([
('is', 'N29EO', 'Plane'),
('at', 'N29EO', 'dia'),
('is', 'N10IV', 'Plane'),
('at', 'N10IV', 'oak'),
('is', 'N33FR', 'Plane'),
('at', 'N33FR', 'lga'),
('is', 'dia', 'Airport'),
('is', 'lga', 'Airport'),
('is', 'oak', 'Airport'),
])
fly = actions.Action(
[('is', '?subject', 'Plane'), ('is', '?object', 'Airport')], # Positive preconditions.
[('at', '?subject', '?object'),], # Negative preconditions.
[('at', '?subject', '?current'),], # Retractions.
[('at', '?subject', '?object')]) # Updates.
```
For a `fly` action to be performed, there's a few facts we should make sure are true:
1. The subject of the action is a `Plane`, and
2. The object of the action is an `Airport`.
We should make sure that the subject isn't currently at our target airport.
If these hold, we can perform the action. The retraction says that the subject is no longer at the airport it was at before the action, and the KB is updated to say that the plane is at a new airport.
```python
print('Before flying, is N10IV at LGA? ', airport_kb.ask(('at', 'N10IV', 'lga')))
print('Before flying, is N10IV at OAK? ', airport_kb.ask(('at', 'N10IV', 'oak')))
new_airport_kb = fly.perform(airport_kb, 'N10IV', 'lga')
print('After flying, is N10IV at LGA? ', new_airport_kb.ask(('at', 'N10IV', 'lga')))
print('After flying, is N10IV at OAK? ', new_airport_kb.ask(('at', 'N10IV', 'oak')))
```
Before flying, is N10IV at LGA? []
Before flying, is N10IV at OAK? [('at', 'N10IV', 'oak')]
After flying, is N10IV at LGA? [('at', 'N10IV', 'lga')]
After flying, is N10IV at OAK? []
There's more work to be done, but this represents a solid night of putting the plan into action based on what I'd learned from the AI nanodegree. I've got a bigger vision for what I want to do out of this, but it's nice to have a baseline to reason about.

159
misc/aqe/kb.py Normal file
View File

@@ -0,0 +1,159 @@
"""
AQE: A Query Engine
This is a proof of concept of a baseline query engine for AI work.
"""
class InvalidQuery(Exception):
pass
class Inconsistency(Exception):
def __init__(self, fact):
self.fact = fact
def __str__(self):
return 'Inconsistency: {}'.format(self.fact)
class KnowledgeBase:
def __init__(self):
# TODO(kyle): support loading an initial set of facts.
self.__kb__ = {}
self.__facts__ = set()
def tell(self, fact):
relationship, subject, obj = fact
# NB: in the future, these assertions may not need to be true; there
# might be space in the world for "fuzzy" facts.
assert(relationship)
assert(subject)
assert(obj)
if relationship not in self.__kb__:
self.__kb__[relationship] = {'subjects':{}, 'objects': {}}
if subject not in self.__kb__[relationship]['subjects']:
self.__kb__[relationship]['subjects'][subject] = set()
self.__kb__[relationship]['subjects'][subject].add(obj)
if obj not in self.__kb__[relationship]['objects']:
self.__kb__[relationship]['objects'][obj] = set()
self.__kb__[relationship]['objects'][obj].add(subject)
self.__facts__.add(fact)
def retract(self, fact):
relationship, subject, obj = fact
# For now, these assertions are required. In the future, it would be
# interesting to say something to the effect of "forget everything you
# know about X".
assert(relationship)
assert(subject)
assert(obj)
# TODO(kyle): answer existential question: if I delete all the objects
# from a subject (or vice versa), should that subject/object be kept or
# removed entirely? This is the difference between "I have no concept
# of X" and "I am aware that X exists but I don't know anything about it".
# For now, I'm electing to keep the entry.
#
# Similarly, if the relationship is empty, we could make the argument
# for removing it --- at the expense of now saying that we have no
# concept of this relationship.
try:
self.__kb__[relationship]['subjects'][subject].remove(obj)
self.__kb__[relationship]['objects'][obj].remove(subject)
self.__facts__.remove(fact)
except KeyError:
# Being told to forget something about something you don't know
# isn't an error.
pass
pass
def ask(self, fact):
relationship, subject, obj = fact
# A future milestone will remove this requirement to support free
# variables.
assert(relationship)
if relationship and subject and obj:
if fact in self.__facts__:
return [fact,]
return []
if relationship and subject:
return [(relationship, subject, _obj) for _obj
in self.__kb__[relationship]['subjects'][subject]]
if relationship and obj:
return [(relationship, _subject, obj) for _subject
in self.__kb__[relationship]['objects'][obj]]
def facts(self):
return list(self.__facts__)
def is_consistent(self):
try:
for fact in self.__facts__:
relationship, subject, obj = fact
if obj not in self.__kb__[relationship]['subjects'][subject]:
raise Inconsistency(fact)
if subject not in self.__kb__[relationship]['objects'][obj]:
raise Inconsistency(fact)
for relationship, v in self.__kb__.items():
for subject in v['subjects'].keys():
for obj in v['subjects'][subject]:
if (relationship, subject, obj) not in self.__facts__:
raise Inconsistency(fact)
for obj in v['objects'].keys():
for subject in v['objects'][obj]:
if (relationship, subject, obj) not in self.__facts__:
raise Inconsistency(fact)
except KeyError:
raise Inconsistency(fact)
return True
def __len__(self):
return len(self.__facts__)
def subst(self, fact, subject, obj):
relationship, _subject, _obj = fact
if _subject is None:
_subject = subject
if _subject == '?any':
_subject = None
elif _subject == '?subject':
_subject = subject
elif _subject == '?object':
_subject = obj
if _obj is None:
_obj = obj
if _obj == '?any':
_obj = None
elif _obj == '?subject':
_obj = subject
elif _obj == '?object':
_obj = obj
if _subject == '?current':
possibilities = self.ask((relationship, None, _obj))
assert(len(possibilities) == 1)
_, _subject, _ = possibilities[0]
elif _obj == '?current':
possibilities = self.ask((relationship, subject, None))
assert(len(possibilities) == 1)
_, _, _obj = possibilities[0]
return (relationship, _subject, _obj)
def from_facts(facts):
kb = KnowledgeBase()
for fact in facts:
kb.tell(fact)
return kb

47
misc/aqe/sample.py Normal file
View File

@@ -0,0 +1,47 @@
import base64
import itertools
import json
import kb
import pickle
import random
FACTS = """
gANdcQAoWAIAAABpc3EBWAgAAABhaXJsaW5lcnECWAUAAABGbHllcnEDh3EEaAFYBwAAAG9ha2xh
bmRxBVgHAAAAQWlycG9ydHEGh3EHaAFoBVgEAAAAQ2l0eXEIh3EJaAFYBgAAAGRlbnZlcnEKaAaH
cQtoAWgKaAiHcQxoAVgGAAAAY2JyNjAwcQ1YBgAAAERyaXZlcnEOh3EPaAFYBwAAAHRyb29wZXJx
EGgOh3ERWAIAAABhdHESaAJoCodxE2gSaA1oBYdxFGUu
"""
def load():
facts = base64.decodebytes(FACTS.encode('ascii'))
facts = pickle.loads(facts)
skb = kb.KnowledgeBase()
for fact in facts:
skb.tell(fact)
return skb
def load_facts(corpus_path='data/corpus.json', is_count=1000000):
facts = set()
corpus = json.loads(open(corpus_path).read())
if 'nouns' in corpus and 'adjectives' in corpus:
perms = list(itertools.product(corpus['nouns'],
corpus['adjectives']))
if len(perms) < is_count:
is_count = len(perms)-1;
pool = random.choices(perms, k=is_count)
for noun, adjective in pool:
facts.add(('is', noun, adjective))
if 'cities' in corpus:
for city in corpus['cities']:
facts.add(('is', city, 'City'))
return facts
def generate_tail_number():
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
tailno = 'N' + str(random.randint(10, 99))
tailno += random.choice(letters)
tailno += random.choice(letters)
return tailno

67
misc/aqe/test_actions.py Normal file
View File

@@ -0,0 +1,67 @@
import actions
import kb
import unittest
INITIAL_FACTS = [
('is', 'N29EO', 'Plane'),
('at', 'N29EO', 'dia'),
('is', 'N10IV', 'Plane'),
('at', 'N10IV', 'oak'),
('is', 'N33FR', 'Plane'),
('at', 'N33FR', 'lga'),
('is', '1Z12345E0205271688', 'Package'),
('at', '1Z12345E0205271688', 'dia'),
('is', '1Z12345E6605272234', 'Package'),
('at', '1Z12345E6605272234', 'dia'),
('is', '1Z12345E0305271640', 'Package'),
('at', '1Z12345E0305271640', 'oak'),
('is', '1Z12345E1305277940', 'Package'),
('at', '1Z12345E1305277940', 'lga'),
('is', '1Z12345E6205277936', 'Package'),
('at', '1Z12345E6205277936', 'lga'),
('is', 'dia', 'Airport'),
('is', 'lga', 'Airport'),
('is', 'oak', 'Airport'),
]
FLY_POS_PRECONDS = [
('is', '?subject', 'Plane'),
('is', '?object', 'Airport'),
]
FLY_NEG_PRECONDS = [
('at', '?subject', '?object'),
]
FLY_RETRACTIONS = [
('at', '?subject', '?current'),
]
FLY_UPDATES = [
('at', '?subject', '?object'),
]
fly = actions.Action(FLY_POS_PRECONDS, FLY_NEG_PRECONDS,
FLY_RETRACTIONS, FLY_UPDATES)
class ActionTestSuite(unittest.TestCase):
def setUp(self):
self.kb = kb.from_facts(INITIAL_FACTS)
def test_a_flight(self):
self.assertTrue(self.kb.ask(('at', 'N10IV', 'oak')))
self.assertFalse(self.kb.ask(('at', 'N10IV', 'lga')))
shadow = fly.perform(self.kb, 'N10IV', 'lga')
self.assertTrue(shadow)
# Shadow should reflect the updates and retractions.
self.assertTrue(shadow.ask(('at', 'N10IV', 'lga')))
self.assertFalse(shadow.ask(('at', 'N10IV', 'oak')))
# The original shouldn't be touched.
self.assertTrue(self.kb.ask(('at', 'N10IV', 'oak')))
self.assertFalse(self.kb.ask(('at', 'N10IV', 'lga')))

59
misc/aqe/test_kb.py Normal file
View File

@@ -0,0 +1,59 @@
import copy
import kb
import random
import sample
import unittest
class KnowledgeBaseTestSuite(unittest.TestCase):
def setUp(self):
self.kb = sample.load()
def test_a_sanity_check(self):
assert(self.kb.is_consistent())
for fact in self.kb.__facts__:
self.assertTrue(self.kb.ask(fact))
def test_tell(self):
new_fact = ('is', 'berkeley', 'City')
# make sure it's not something we already know
self.assertFalse(self.kb.ask(new_fact))
self.kb.tell(new_fact)
answer = self.kb.ask(new_fact)
self.assertListEqual(answer, [new_fact,])
def test_inconsistency(self):
badkb = copy.deepcopy(self.kb)
badfact = random.choice(badkb.facts())
relationship, subject, obj = badfact
# muck with subjects part
badkb.__kb__[relationship]['subjects'][subject].remove(obj)
with self.assertRaises(kb.Inconsistency):
badkb.is_consistent()
# muck with objects part
badkb = copy.deepcopy(self.kb)
badkb.__kb__[relationship]['objects'][obj].remove(subject)
with self.assertRaises(kb.Inconsistency):
badkb.is_consistent()
# muck with facts part
badkb = copy.deepcopy(self.kb)
badkb.__facts__.remove(badfact)
with self.assertRaises(kb.Inconsistency):
badkb.is_consistent()
# inject false data into the subject
badkb = copy.deepcopy(self.kb)
badkb.__kb__[relationship]['subjects'][subject].add('false memory')
with self.assertRaises(kb.Inconsistency):
badkb.is_consistent()
# inject false data into the object
badkb = copy.deepcopy(self.kb)
badkb.__kb__[relationship]['objects'][obj].add('false memory')
with self.assertRaises(kb.Inconsistency):
badkb.is_consistent()

8
misc/aqe/util.py Normal file
View File

@@ -0,0 +1,8 @@
import random
def generate_tail_number():
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
tailno = 'N' + random.randint(10, 99)
tailno += random.choice(letters)
tailno += random.choice(letters)
return tailno

View File

@@ -0,0 +1,12 @@
#ifndef __KF_DEFAULT_DEFS_H__
#define __KF_DEFAULT_DEFS_H__
typedef int KF_INT;
typedef uintptr_t KF_ADDR;
static const size_t DSTACK_SIZE = 12;
static const size_t RSTACK_SIZE = 12;
static const size_t DICT_SIZE = 4096;
#endif /* __KF_DEFAULT_DEFS_H__ */

15
misc/kforth/defs.h Normal file
View File

@@ -0,0 +1,15 @@
#ifndef __KF_DEFS_H__
#define __KF_DEFS_H__
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#ifdef PLATFORM_pc
#include "pc/defs.h"
#else
#include "default/defs.h"
#endif
#endif /* __KF_DEFS_H__ */

216
misc/kforth/doc/Makefile Normal file
View File

@@ -0,0 +1,216 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " applehelp to make an Apple Help Book"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
@echo " coverage to run coverage check of the documentation (if enabled)"
.PHONY: clean
clean:
rm -rf $(BUILDDIR)/*
.PHONY: html
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
.PHONY: dirhtml
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
.PHONY: singlehtml
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
.PHONY: pickle
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
.PHONY: json
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
.PHONY: htmlhelp
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
.PHONY: qthelp
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/WriteYouaForth.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/WriteYouaForth.qhc"
.PHONY: applehelp
applehelp:
$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
@echo
@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
@echo "N.B. You won't be able to view it unless you put it in" \
"~/Library/Documentation/Help or install it in your application" \
"bundle."
.PHONY: devhelp
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/WriteYouaForth"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/WriteYouaForth"
@echo "# devhelp"
.PHONY: epub
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
.PHONY: latex
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
.PHONY: latexpdf
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: latexpdfja
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
.PHONY: text
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
.PHONY: man
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
.PHONY: texinfo
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
.PHONY: info
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
.PHONY: gettext
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
.PHONY: changes
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
.PHONY: linkcheck
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
.PHONY: doctest
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
.PHONY: coverage
coverage:
$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
@echo "Testing of coverage in the sources finished, look at the " \
"results in $(BUILDDIR)/coverage/python.txt."
.PHONY: xml
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
.PHONY: pseudoxml
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

283
misc/kforth/doc/conf.py Normal file
View File

@@ -0,0 +1,283 @@
# -*- coding: utf-8 -*-
#
# Write You a Forth documentation build configuration file, created by
# sphinx-quickstart on Thu Feb 22 08:15:32 2018.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'Write You a Forth'
copyright = u'2018, K. Isom'
author = u'K. Isom'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'0.0.1'
# The full version, including alpha/beta/rc tags.
release = u'0.0.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all
# documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
#keep_warnings = False
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'alabaster'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (relative to this directory) to use as a favicon of
# the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
#html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Language to be used for generating the HTML full-text search index.
# Sphinx supports the following languages:
# 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
# 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr'
#html_search_language = 'en'
# A dictionary with options for the search language support, empty by default.
# Now only 'ja' uses this config value
#html_search_options = {'type': 'default'}
# The name of a javascript file (relative to the configuration directory) that
# implements a search results scorer. If empty, the default will be used.
#html_search_scorer = 'scorer.js'
# Output file base name for HTML help builder.
htmlhelp_basename = 'WriteYouaForthdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
# Latex figure (float) alignment
#'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'WriteYouaForth.tex', u'Write You a Forth Documentation',
u'K. Isom', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'writeyouaforth', u'Write You a Forth Documentation',
[author], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'WriteYouaForth', u'Write You a Forth Documentation',
author, 'WriteYouaForth', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,265 @@
FORTH-83 STANDARD
A PUBLICATION OF THE FORTH STANDARDS TEAM
AUGUST 1983
FORTH-83 STANDARD
COPYRIGHT c. 1983 FORTH STANDARDS TEAM
Permission is hereby granted to reproduce this document in whole
or in part provided that such reproductions refer to the fact
that the copied material is subject to copyright by the FORTH
Standards Team. No changes or modifications may be made to the
copied material unless it is clearly indicated that such changes
were not incorporated in the original copyrighted work.
The existence of a FORTH Standard does not in any respect
preclude anyone, whether the individual has approved this
Standard or not, from implementing, marketing, purchasing or
using products, processes, or procedures not conforming to the
Standard. FORTH Standards are subject to periodic review and
users are cautioned to obtain the latest editions.
ISBN 0-914699-03-2
FORTH STANDARDS TEAM
P.O. BOX 4545
MOUNTAIN VIEW, CA 94040
USA
ii
FORTH-83 STANDARD
TABLE OF CONTENTS
1. FOREWORD ............................................... 1
2. PURPOSE ................................................ 2
3. SCOPE .................................................. 2
4. TRADEOFFS .............................................. 3
5. DEFINITIONS OF TERMS ................................... 4
6. REFERENCES ............................................. 12
7. REQUIREMENTS ........................................... 13
8. COMPLIANCE AND LABELING ................................ 15
9. USAGE .................................................. 17
10. ERROR CONDITIONS ....................................... 20
11. GLOSSARY NOTATION ...................................... 22
12. REQUIRED WORD SET ...................................... 25
13. DOUBLE NUMBER EXTENSION WORD SET ....................... 41
14. ASSEMBLER EXTENSION WORD SET ........................... 44
15. SYSTEM EXTENSION WORD SET .............................. 46
16. CONTROLLED REFERENCE WORDS ............................. 48
APPENDICES
A. FORTH STANDARDS TEAM MEMBERSHIP ................... 51
B. UNCONTROLLED REFERENCE WORDS ...................... 54
C. EXPERIMENTAL PROPOSALS ............................ 60
C.1 SEARCH ORDER SPECIFICATION AND CONTROL ....... 61
C.2 DEFINITION FIELD ADDRESS CONVERSION OPERATORS . 66
D. STANDARDS TEAM CHARTER ............................ 69
E. PROPOSAL/COMMENT FORM AND INSTRUCTIONS ............ 78
iii
FORTH-83 STANDARD
iv

View File

@@ -0,0 +1,93 @@
1. FOREWORD
1. FOREWORD
FORTH is an integrated programming approach and computer
language. FORTH was invented by Mr. Charles Moore specifically
to increase programmer productivity in the development of
computer related applications without sacrificing machine
efficiency. FORTH is a layered environment containing the
elements of a computer language as well as those of an operating
system and a machine monitor. This extensible, layered
environment provides for highly interactive program development
and testing.
In the interests of transportability of application software
written in FORTH, standardization efforts began in the mid-1970s
by the European FORTH User's Group (EFUG). This effort resulted
in the FORTH-77 Standard. As the language continued to evolve,
an interim FORTH-78 Standard was published by the FORTH Standards
Team. Following FORTH Standards Team meetings in 1979 the FORTH-
79 Standard was published in 1980.
The FORTH Standards Team is comprised of individuals who have a
great variety of experience and technical expertise with FORTH.
The FORTH Standards Team consists of both users and implementers.
Comments, proposals, and correspondence should be mailed to:
FORTH Standards Team, P.O. Box 4545, Mountain View, CA 94040 USA.
FORTH's extensibility allows the language to be expanded and
adapted to special needs and different hardware systems. A
programmer or vendor may choose to strictly adhere with the
standard, but the choice to deviate is acknowledged as beneficial
and sometimes necessary. If the standard does not explicitly
specify a requirement or restriction, a system or application may
utilize any choice without sacrificing compliance to the standard
provided that the system or application remains transportable and
obeys the other requirements of the standard.
1

View File

@@ -0,0 +1,132 @@
10. ERROR CONDITIONS
10. ERROR CONDITIONS
10.1 Possible Actions on an Error
When an error condition occurs, a Standard System may take one or
more of the following actions:
1. ignore and continue;
2. display a message;
3. execute a particular word;
4. set interpret state and interpret a block;
5. set interpret state and begin interpretation;
6. other system dependent actions.
See: "7.1 Documentation Requirements"
10.2 General Error Conditions
The following error conditions apply in many situations. These
error conditions are listed below, but may occur at various times
and with various words.
1. input stream exhausted before encountering a required <name>
or delimiting character;
2. insufficient stack space or insufficient number of stack
entries during text interpretation or compilation;
3. a word not found and not a valid number, during text
interpretation or compilation;
4. compilation of incorrectly nested control structures;
5. execution of words restricted to compilation only, when not
in the compile state and while not compiling a colon
definition;
6. FORGETting within the system to a point that removes a word
required for correct execution;
7. insufficient space remaining in the dictionary;
8. a stack parameter out of range, e.g., a negative number when
a +n was specified in the glossary;
21
10. ERROR CONDITIONS
9. correct mass storage read or write was not possible.
22

View File

@@ -0,0 +1,264 @@
11. GLOSSARY NOTATION
11. GLOSSARY NOTATION
11.1 Order
The glossary definitions are listed in ASCII alphabetical order.
11.2 Capitalization
Word names are capitalized throughout this Standard.
11.3 Stack Notation
The stack parameters input to and output from a definition are
described using the notation:
before -- after
before stack parameters before execution
after stack parameters after execution
In this notation, the top of the stack is to the right. Words
may also be shown in context when appropriate.
Unless otherwise noted, all stack notation describes exectution
time. If it applies at compile time, the line is followed by:
(compiling) .
11.4 Attributes
Capitalized symbols indicate attributes of the defined words:
C The word may only be used during compilation of a colon
definition.
I Indicates that the word is IMMEDIATE and will execute during
compilation, unless special action is taken.
M This word has a potential multiprogramming impact.
See: "9.7 Multiprogramming Impact"
U A user variable.
23
11. GLOSSARY NOTATION
11.5 Serial Numbers
When a substantive alteration to a word's definition is made or
when a new word is added, the serial number will be the last two
digits of the year of the Standard in which such change was made
(i.e., "83"). When such change is made within a Working Draft,
the number will be suffixed with the character identifying the
draft (i.e., "83A").
11.6 Pronunciation
The natural language pronunciation of word names is given in
double quotes (") where it differs from English pronunciation.
11.7 Stack Parameters
Unless otherwise stated, all references to numbers apply to 16-
bit signed integers. The implied range of values is shown as
{from..to}. The contents of an address is shown by double
braces, particularly for the contents of variables, i.e., BASE
{{2..72}}.
The following are the stack parameter abbreviations and types of
numbers used throughout the glossary. These abbreviations may be
suffixed with a digit to differentiate multiple parameters of the
same type.
24
11. GLOSSARY NOTATION
Stack Number Range in Minimum
Abbrv. Type Decimal Field
flag boolean 0=false, else=true 16
true boolean -1 (as a result) 16
false boolean 0 0
b bit {0..1} 1
char character {0..127} 7
8b 8 arbitrary bits (byte) not applicable 8
16b 16 arbitrary bits not applicable 16
n number (weighted bits) {-32,768..32,767} 16
+n positive number {0..32,767} 16
u unsigned number {0..65,535} 16
w unspecified weighted number
(n or u) {-32,768..65,535} 16
addr address (same as u) {0..65,535} 16
32b 32 arbitrary bits not applicable 32
d double number {-2,147,483,648..
2,147,483,647} 32
+d positive double number {0..2,147,483,647} 32
ud unsigned double number {0..4,294,967,265} 32
wd unspecified weighted double
number (d or ud) {-2,147,483,648..
4,294,967,295} 32
sys 0, 1, or more system
dependent stack entries not applicable na
Any other symbol refers to an arbitrary signed 16-bit integer in
the range {-32,768..32,767}, unless otherwise noted.
Because of the use of two's complement arithmetic, the signed 16-
bit number (n) -1 has the same bit representation as the unsigned
number (u) 65,535. Both of these numbers are within the set of
unspecified weighted numbers (w). See: "arithmetic, two's
complement" "number" "number types" "stack, data"
11.8 Input Text
<name>
An arbitrary FORTH word accepted from the input stream.
This notation refers to text from the input stream, not to
values on the data stack. See: "10.2 General Error
Conditions"
ccc
25
11. GLOSSARY NOTATION
A sequence of arbitrary characters accepted from the input
stream until the first occurrence of the specified
delimiting character. The delimiter is accepted from the
input stream, but is not one of the characters ccc and is
therefore not otherwise processed. This notation refers to
text from the input stream, not to values on the data stack.
Unless noted otherwise, the number of characters accepted
may be from 0 to 255. See: "10.2 General Error Conditions"
11.9 References to other words and definitions
Glossary definitions may refer to other glossary definitions or
to definitions of terms. Such references are made using the
expression "See:". These references provide additional
information which apply as if the information is a portion of the
glossary entry using "See:".
26

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,198 @@
13. DOUBLE NUMBER EXTENSION WORD SET
13. DOUBLE NUMBER EXTENSION WORD SET
13.1 The Double Number Extension Word Set Layers
Nucleus layer
2! 2@ 2DROP 2DUP 2OVER 2ROT 2SWAP D+ D- D0= D2/
D< D= DABS DMAX DMIN DNEGATE DU<
Device layer
none
Interpreter layer
D. D.R
Compiler layer
2CONSTANT 2VARIABLE
45
13. DOUBLE NUMBER EXTENSION WORD SET
13.2 The Double Number Extension Word Set Glossary
2! 32b addr -- 79 "two-store"
32b is stored at addr. See: "number"
2@ addr -- 32b 79 "two-fetch"
32b is the value at addr. See: "number"
2CONSTANT 32b -- M,83 "two-constant"
A defining word executed in the form:
32b 2CONSTANT <name>
Creates a dictionary entry for <name> so that when <name> is
later executed, 32b will be left on the stack.
2DROP 32b -- 79 "two-drop"
32b is removed from the stack.
2DUP 32b -- 32b 32b 79 "two-dupe"
Duplicate 32b.
2OVER 32b1 32b2 -- 32b1 32b2 32b3 79 "two-over"
32b3 is a copy of 32b1.
2ROT 32b1 32b2 32b3 -- 32b2 32b3 32b1 79 "two-rote"
The top three double numbers on the stack are rotated,
bringing the third double number number to the top of the
stack.
2SWAP 32b1 32b2 -- 32b2 32b1 79 "two-swap"
The top two double numbers are exchanged.
2VARIABLE -- M,79 "two-variable"
A defining word executed in the form:
2VARIABLE <name>
A dictionary entry for <name> is created and four bytes are
ALLOTted in its parameter field. This parameter field is to
be used for contents of the variable. The application is
responsible for initializing the contents of the variable
which it creates. When <name> is later executed, the
address of its parameter field is placed on the stack. See:
VARIABLE
D+ wd1 wd2 -- wd3 79
See the complete definition in the Required Word Set.
D- wd1 wd2 -- wd3 79 "d-minus"
wd3 is the result of subtracting wd2 from wd1.
D. d -- M,79 "d-dot"
The absolute value of d is displayed in a free field format.
A leading negative sign is displayed if d is negative.
46
13. DOUBLE NUMBER EXTENSION WORD SET
D.R d +n -- M,83 "d-dot-r"
d is converted using the value of BASE and then displayed
right aligned in a field +n characters wide. A leading
minus sign is displayed if d is negative. If the number of
characters required to display d is greater than +n, an
error condition exists. See: "number conversion"
D0= wd -- flag 83 "d-zero-equals"
flag is true if wd is zero.
D2/ d1 -- d2 83 "d-two-divide"
d2 is the result of d1 arithmetically shifted right one bit.
The sign is included in the shift and remains unchanged.
D< d1 d2 -- flag 83
See the complete definition in the Required Word Set.
D= wd1 wd2 -- flag 83 "d-equal"
flag is true if wd1 equals wd2.
DABS d -- ud 79 "d-absolute"
ud is the absolute value of d. If d is -2,147,483,648 then
ud is the same value. See: "arithmetic, two's complement"
DMAX d1 d2 -- d3 79 "d-max"
d3 is the greater of d1 and d2.
DMIN d1 d2 -- d3 79 "d-min"
d3 is the lesser of d1 and d2.
DNEGATE d1 -- d2 79
See the complete definition in the Required Word Set.
DU< ud1 ud2 -- flag 83 "d-u-less"
flag is true if ud1 is less than ud2. Both numbers are
unsigned.
47

View File

@@ -0,0 +1,139 @@
14. ASSEMBLER EXTENSION WORD SET
14. ASSEMBLER EXTENSION WORD SET
14.1 The Assembler Extension Word Set Layers
Nucleus layer
none
Device layer
none
Interpreter layer
ASSEMBLER
Compiler layer
;CODE CODE END-CODE
14.2 Assembler Extension Word Set Usage
Because of the system dependent nature of machine language
programming, a Standard Program cannot use CODE or ;CODE .
48
14. ASSEMBLER EXTENSION WORD SET
14.3 The Assembler Extension Word Set Glossary
;CODE -- C,I,79 "semi-colon-
sys1 -- sys2 (compiling) code"
Used in the form:
: <namex> ... <create> ... ;CODE ... END-CODE
Stops compilation, terminates the defining word <namex> and
executes ASSEMBLER. When <namex> is executed in the form:
<namex> <name>
to define the new <name>, the execution address of <name>
will contain the address of the code sequence following the
;CODE in <namex>. Execution of any <name> will cause this
machine code sequence to be executed. sys1 is balanced with
its corresponding : . sys2 is balanced with its
corresponding END-CODE . See: CODE DOES>
ASSEMBLER -- 83
Execution replaces the first vocabulary in the search order
with the ASSEMBLER vocabulary. See: VOCABULARY
CODE -- sys M,83
A defining word executed in the form:
CODE <name> ... END-CODE
Creates a dictionary entry for <name> to be defined by a
following sequence of assembly language words. Words thus
defined are called code definitions. This newly created
word definition for <name> cannot be found in the dictionary
until the corresponding END-CODE is successfully processed
(see: END-CODE ). Executes ASSEMBLER . sys is balanced
with its corresponding END-CODE .
END-CODE sys -- 79 "end-code"
Terminates a code definition and allows the <name> of the
corresponding code definition to be found in the dictionary.
sys is balanced with its corresponding CODE or ;CODE . See:
CODE
49

View File

@@ -0,0 +1,132 @@
15. THE SYSTEM EXTENSION WORD SET
15. THE SYSTEM EXTENSION WORD SET
15.1 The System Extension Word Set Layers
Nucleus layer
BRANCH ?BRANCH
Device layer
none
Interpreter layer
CONTEXT CURRENT
Compiler layer
<MARK <RESOLVE >MARK >RESOLVE
15.2 System Extension Word Set Usage
After BRANCH or ?BRANCH is compiled, >MARK or <RESOLVE is
executed. The addr left by >MARK is passed to >RESOLVE . The
addr left by <MARK is passed to <RESOLVE . For example:
: IF COMPILE ?BRANCH >MARK ; IMMEDIATE
: THEN >RESOLVE ; IMMEDIATE
50
15. THE SYSTEM EXTENSION WORD SET
15.3 The System Extension Word Set Glossary
<MARK -- addr C,83 "backward-mark"
Used at the destination of a backward branch. addr is
typically only used by <RESOLVE to compile a branch address.
<RESOLVE addr -- C,83"backward-resolve"
Used at the source of a backward branch after either BRANCH
or ?BRANCH . Compiles a branch address using addr as the
destination address.
>MARK -- addr C,83 "forward-mark"
Used at the source of a forward branch. Typically used
after either BRANCH or ?BRANCH . Compiles space in the
dictionary for a branch address which will later be resolved
by >RESOLVE .
>RESOLVE addr -- C,83"forward-resolve"
Used at the destination of a forward branch. Calculates the
branch address (to the current location in the dictionary)
using addr and places this branch address into the space
left by >MARK .
?BRANCH flag -- C,83"question-branch"
When used in the form: COMPILE ?BRANCH a conditional
branch operation is compiled. See BRANCH for further
details. When executed, if flag is false the branch is
performed as with BRANCH . When flag is true execution
continues at the compilation address immediately following
the branch address.
BRANCH -- C,83
When used in the form: COMPILE BRANCH an unconditional
branch operation is compiled. A branch address must be
compiled immediately following this compilation address.
The branch address is typically generated by following
BRANCH with <RESOLVE or >MARK .
CONTEXT -- addr U,79
The address of a variable which determines the dictionary
search order.
CURRENT -- addr U,79
The address of a variable specifying the vocabulary in which
new word definitions are appended.
51

View File

@@ -0,0 +1,198 @@
16. CONTROLLED REFERENCE WORDS
16. CONTROLLED REFERENCE WORDS
The Controlled Reference Words are word definitions which,
although not required, cannot be present with a non-standard
definition in the vocabulary FORTH of a Standard System. These
words have present usage and/or are candidates for future
standardization.
--> -- I,M,79 "next-block"
-- (compilation)
Continue interpretation on the next sequential block. May
be used within a colon definition that crosses a block
boundary.
.R n +n -- M,83 "dot-r"
n is converted using BASE and then displayed right aligned
in a field +n characters wide. A leading minus sign is
displayed if n is negative. If the number of characters
required to display n is greater than +n, an error condition
exists. See: "number conversion"
2* w1 -- w2 83 "two-times"
w2 is the result of shifting w1 left one bit. A zero is
shifted into the vacated bit position.
BL -- 32 79 "b-l"
Leave the ASCII character value for space (decimal 32).
BLANK addr u -- 83
u bytes of memory beginning at addr are set to the ASCII
character value for space. No action is taken if u is zero.
C, 16b -- 83 "c-comma"
ALLOT one byte then store the least-significant 8 bits of
16b at HERE 1- .
DUMP addr u -- M,79
List the contents of u addresses starting at addr. Each
line of values may be preceded by the address of the first
value.
EDITOR -- 83
Execution replaces the first vocabulary in the search order
with the EDITOR vocabulary. See: VOCABULARY
EMPTY-BUFFERS -- M,79 "empty-buffers"
Unassign all block buffers. UPDATEed blocks are not written
to mass storage. See: BLOCK
52
16. CONTROLLED REFERENCE WORDS
END flag -- C,I,79
sys -- (compiling)
A synonym for UNTIL .
ERASE addr u -- 79
u bytes of memory beginning at addr are set to zero. No
action is taken if u is zero.
HEX -- 29
Set the numeric input-output conversion base to sixteen.
INTERPRET -- M,83
Begin text interpretation at the character indexed by the
contents of >IN relative to the block number contained in
BLK , continuing until the input stream is exhausted. If
BLK contains zero, interpret characters from the text input
buffer. See: "input stream"
K -- w C,83
w is a copy of the index of the second outer loop. May only
be used within a nested DO-LOOP or DO-+LOOP in the form, for
example:
DO ... DO ... DO ... K ... LOOP ... +LOOP ... LOOP
LIST u -- M,79
The contents of screen u are displayed. SCR is set to u.
See: BLOCK
OCTAL -- 83
Set the numeric input-output conversion base to eight.
OFFSET -- addr U,83
The address of a variable that contains the offset added to
the block number on the stack by BLOCK or BUFFER to
determine the actual physical block number.
QUERY -- M,83
Characters are received and transferred into the memory area
addressed by TIB . The transfer terminates when either a
"return" is received or the number of characters transferred
reaches the size of the area addressed by TIB . The values
of >IN and BLK are set to zero and the value of #TIB is set
to the value of SPAN . WORD may be used to accept text from
this buffer. See: EXPECT "input stream"
RECURSE -- C,I,83
-- (compiling)
Compile the compilation address of the definition being
compiled to cause the definition to later be executed
recursively.
SCR -- addr U,79 "s-c-r"
The address of a variable containing the number of the
screen most recently LISTed.
53
16. CONTROLLED REFERENCE WORDS
SP@ -- addr 79 "s-p-fetch"
addr is the address of the top of the stack just before SP@
was executed.
THRU u1 u2 -- M,83
Load consecutively the blocks from u1 through u2.
U.R u +n -- M,83 "u-dot-r"
u is converted using the value of BASE and then displayed as
an unsigned number right aligned in a field +n characters
wide. If the number of characters required to display u is
greater than +n, an error condition exists. See: "number
conversion"
54

View File

@@ -0,0 +1,91 @@
2. PURPOSE
2. PURPOSE
The purpose of this standard is to allow transportability of
FORTH-83 Standard Programs in source form among FORTH-83 Standard
Systems. A standard program shall execute equivalently on all
standard systems.
3. SCOPE
This standard shall apply to any FORTH-83 Standard Program
executing on any FORTH-83 Standard System, provided sufficient
computer resources (memory, mass storage) are available.
2

View File

@@ -0,0 +1,91 @@
2. PURPOSE
2. PURPOSE
The purpose of this standard is to allow transportability of
FORTH-83 Standard Programs in source form among FORTH-83 Standard
Systems. A standard program shall execute equivalently on all
standard systems.
3. SCOPE
This standard shall apply to any FORTH-83 Standard Program
executing on any FORTH-83 Standard System, provided sufficient
computer resources (memory, mass storage) are available.
2

View File

@@ -0,0 +1,67 @@
4. TRADEOFFS
4. TRADEOFFS
When conflicting choices are made, the following order guides the
Standards Team:
1) Functional correctness - known bounds, non-ambiguous;
2) Portability - repeatable results when programs are
transported among Standard Systems;
3) Simplicity;
4) Naming clarity - uniformity of expression using descriptive
rather than procedure names, i.e., [COMPILE] rather than 'C,
and ALLOT rather than DP+! ;
5) Generality;
6) Execution speed;
7) Memory compactness;
8) Compilation speed;
9) Historical continuity;
10) Pronounceability;
11) Teachability.
3

View File

@@ -0,0 +1,596 @@
5. DEFINITIONS OF TERMS
5. DEFINITIONS OF TERMS
These are the definitions of the terms used within this Standard.
address, byte
An unsigned 16-bit number that locates an 8-bit byte in a
standard FORTH address space over the range {0..65,535}. It
may be a native machine address or a representation on a
virtual machine, locating the addr-th byte within the
virtual byte address space. Addresses are treated as
unsigned numbers. See: "arithmetic, two's complement"
address, compilation
The numerical value compiled for a FORTH word definition
which identifies that definition. The address interpreter
uses this value to locate the machine code corresponding to
each definition.
address, native machine
The natural address representation of the host computer.
address, parameter field
The address of the first byte of memory associated with a
word definition for the storage of compilation addresses (in
a colon definition), numeric data, text characters, etc.
arithmetic, two's complement
Arithmetic is performed using two's complement integers
within a field of either 16 or 32 bits as indicated by the
operation. Addition and subtraction of two's complement
integers ignore any overflow condition. This allows numbers
treated as unsigned to produce the same results as if the
numbers had been treated as signed.
block
The 1024 bytes of data from mass storage which are
referenced by block numbers in the range {0..the number of
blocks available -1}. The actual amount of data transferred
and the translation from block number to device and physical
record is a function of the implementation. See: "block
buffer" "mass storage"
block buffer
A 1024-byte memory area where a block is made temporarily
available for use. Block buffers are uniquely assigned to
blocks. See: "9.7 Multiprogramming Impact"
byte
An assembly of 8 bits. In reference to memory, it is the
storage capacity for 8 bits.
4
5. DEFINITIONS OF TERMS
character
A 7-bit number the significance of which is given by the
ASCII standard. When contained in a larger field, the
higher order bits are zero. See: "6. REFERENCES"
compilation
The action of converting text words from the input stream
into an internal form suitable for later execution. When in
the compile state, the compilation addresses of FORTH words
are compiled into the dictionary for later execution by the
address interpreter. Numbers are compiled to be placed on
the data stack when later executed. Numbers are accepted
from the input stream unsigned or negatively signed and
converted using the value of BASE . See: "number" "number
conversion" "interpreter, text"
defining word
A word that, when executed, creates a new dictionary entry
in the compilation vocabulary. The new word name is taken
from the input stream. If the input stream is exhausted
before the new name is available, an error condition exists.
Example of defining words are: : CONSTANT CREATE
definition
See: "word definition"
dictionary
A structure of word definitions in computer memory which is
extensible and grows toward higher memory addresses.
Entries are organized in vocabularies to aid location by
name. See: "search order"
display
The process of sending one or more characters to the current
output device. These characters are typically displayed or
printed on a terminal. The selection of the current output
device is system dependent.
division, floored
Integer division in which the remainder carries the sign of
the divisor or is zero, and the quotient is rounded to its
arithmetic floor. Note that, except for error conditions,
n1 n2 SWAP OVER /MOD ROT * + is identical to n1. See:
"floor, arithmetic"
Examples:
dividend divisor remainder quotient
10 7 3 1
-10 7 4 -2
10 -7 -4 -2
-10 -7 -3 1
equivalent execution
5
5. DEFINITIONS OF TERMS
A standard program will produce the same results, exclusive
of timing dependencies, when given the same inputs on any
Standard System which has sufficient resources to execute
the program. Only standard source programs are
transportable.
error condition
An exceptional condition which requires action by the system
which may be other than the expected function. Refer to the
section "10. Error Conditions".
false
A zero number represents the false state of a flag.
flag
A number that may have one of two logical states, false or
true. See: "false" "true"
floor, arithmetic
If z is any real number, then the floor of z is the greatest
integer less than or equal to z.
The floor of +.6 is 0
The floor of -.4 is -1
free field format
Numbers are converted using the value of BASE and then
displayed with no leading zeros. A trailing space is
displayed. The number of characters displayed is the
minimum number of characters, at least one, to uniquely
represent the number. See: "number conversion"
glossary
A set of explanations in natural language to describe the
corresponding computer execution of word definitions.
immediate word
A word which executes when encountered during compilation or
interpretation. Immediate words handle special cases during
compilation. See, for example, IF LITERAL ." etc.
input stream
A sequence of characters available to the system, for
processing by the text interpreter. The input stream
conventionally may be taken from the current input device
(via the text input buffer) and mass storage (via a block
buffer). BLK , >IN , TIB and #TIB specify the input stream.
Words using or altering BLK , >IN , TIB and #TIB are
responsible for maintaining and restoring control of the
input stream.
6
5. DEFINITIONS OF TERMS
The input stream extends from the offset value of >IN to the
size of the input stream. If BLK is zero the input stream
is contained within the area addressed by TIB and is #TIB
bytes long. If BLK is non-zero the input stream is
contained within the block buffer specified by BLK and is
1024 bytes long. See: "11.8 Input Text"
interpreter, address
The machine code instructions, routine or other facilities
that execute compiled word definitions containing
compilation addresses.
interpreter, text
The word definitions(s) that repeatedly accepts a word name
from the input stream, locates the corresponding compilation
address and starts the address interpreter to execute it.
Text from the input stream interpreted as a number leaves
the corresponding value on the data stack. Numbers are
accepted from the input stream unsigned or negatively signed
and converted using the value of BASE . See: "number"
"number conversion"
layers
The grouping of word names of each Standard word set to show
like characteristics. No implementation requirements are
implied by this grouping.
layer, compiler
Word definitions which add new procedures to the dictionary
or which aid compilation by adding compilation addresses or
data structures to the dictionary.
layer, devices
Word definitions which allow access to mass storage and
computer peripheral devices.
layer, interpreter
Word definitions which support vocabularies, terminal
output, and the interpretation of text from the text input
buffer or a mass storage device by executing the
corresponding word definitions.
layer, nucleus
Word definitions generally defined in machine code that
control the execution of the fundamental operations of a
virtual FORTH machine. This includes the address
interpreter.
load
Redirection of the text interpreter's input stream to be
from mass storage. This is the general method for
compilation of new definitions into the dictionary.
mass storage
7
5. DEFINITIONS OF TERMS
Storage which might reside outside FORTH's address space.
Mass storage data is made available in the form of 1024-byte
blocks. A block is accessible within the FORTH address
space in a block buffer. When a block has been indicated as
UPDATEed (modified) the block will ultimately be transferred
to mass storage.
number
When values exist within a larger field, the most-
significant bits are zero. 16-bit numbers are represented
in memory by addressing the first of two bytes at
consecutive addresses. The byte order is unspecified by
this Standard. Double numbers are represented on the stack
with the most-significant 16 bits (with sign) most
accessible. Double numbers are represented in memory by two
consecutive 16-bit numbers. The address of the least
significant 16 bits is two greater than the address of the
most significant 16 bits. The byte order within each 16-bit
field is unspecified. See: "arithmetic, two's complement"
"number types" "9.8 Numbers" "11.7 Stack Parameters"
number conversion
Numbers are maintained internally in binary and represented
externally by using graphic characters within the ASCII
character set. Conversion between the internal and external
forms is performed using the current value of BASE to
determine the digits of a number. A digit has a value
ranging from zero to the value of BASE-1. The digit with
the value zero is represented by the ASCII character "0"
(position 3/0 with the decimal equivalent of 48). This
representation of digits proceeds through the ASCII
character set to the character "(" corresponding to the
decimal value 9. For digits with a value exceeding 9, the
ASCII graphic characters beginning with the character "A"
(position 4/1 with the decimal equivalent 65) corresponding
to the decimal value 10 are used. This sequence then
continues up to and including the digit with the decimal
value 71 which is represented by the ASCII character "~"
(position 7/14 with a decimal equivalent 126). A negative
number may be represented by preceding the digits with a
single leading minus sign, the character "-".
number types
All number types consist of some number of bits. These bits
are either arbitrary or are weighted.
8
5. DEFINITIONS OF TERMS
Signed and unsigned numbers use weighted bits. Weighted
bits within a number have a value of a power of two
beginning with the rightmost (least-significant) bit having
the value of two to the zero power. This weighting
continues to the leftmost bit increasing the power by one
for each bit. For an unsigned number this weighting pattern
includes the leftmost bit; thus, for an unsigned 16-bit
number the weight of the leftmost bit is 32,768. For a
signed number this weighting pattern includes the leftmost
bit but the weight of the leftmost bit is negated; thus, for
a signed 16-bit number the weight of the leftmost bit is
-32,768. This weighting pattern for signed numbers is
called two's complement notation.
Unspecified weighted numbers are either unsigned numbers or
signed numbers; program context determines whether the
number is signed or unsigned. See: "11.7 Stack Parameters"
pictured numeric output
The use of numeric output definitions which convert
numerical values into text strings. These definitions are
used in a sequence which resembles a symbolic 'picture' of
the desired text format. Conversion proceeds from least-
significant digit to most-significant digit, and converted
characters are stored from higher memory addresses to lower.
program
A complete specification of execution to achieve a specific
function (application task) expressed in FORTH source code
form.
receive
The process of obtaining one character from the current
input device. The selection of the current input device is
system dependent.
recursion
The process of self-reference, either directly or
indirectly.
return
The means of indicating the end of text by striking a key on
an input device. The key used is system dependent. This
key is typically called "RETURN", "CARRIAGE RETURN", or
"ENTER".
screen
Textual data arranged for editing. By convention, a screen
consists of 16 lines (numbered 0 through 15) of 64
characters each. Screens usually contain program source
text, but may be used to view mass storage data. The first
byte of a screen occupies the first byte of a mass storage
block, which is the beginning point for text interpretation
during a load.
9
5. DEFINITIONS OF TERMS
search order
A specification of the order in which selected vocabularies
in the dictionary are searched. Execution of a vocabulary
makes it the first vocabulary in the search order. The
dictionary is searched whenever a word is to be located by
its name. This order applies to all dictionary searches
unless otherwise noted. The search order begins with the
last vocabulary executed and ends with FORTH , unless
altered in a system dependent manner.
source definition
Text consisting of word names suitable for compilation or
execution by the text interpreter. Such text is usually
arranged in screens and maintained on a mass storage device.
stack, data
A last in, first out list consisting of 16-bit binary
values. This stack is primarily used to hold intermediate
values during execution of word definitions. Stack values
may represent numbers, characters, addresses, boolean
values, etc.
When the name 'stack' is used alone, it implies the data
stack.
stack, return
A last in, first out list which contains the addresses of
word definitions whose execution has not been completed by
the address interpreter. As a word definition passes
control to another definition, the return point is placed on
the return stack.
The return stack may cautiously be used for other values.
string, counted
A sequence of consecutive 8-bit bytes located in memory by
their low memory address. The byte at this address contains
a count {0..255} of the number of bytes following which are
part of the string. The count does not include the count
byte itself. Counted strings usually contain ASCII
characters.
string, text
A sequence of consecutive 8-bit bytes located in memory by
their low memory address and length in bytes. Strings
usually, but not exclusively, contain ASCII characters.
When the term 'string' is used alone or in conjunction with
other words it refers to text strings.
structure, control
10
5. DEFINITIONS OF TERMS
A group of FORTH words which when executed alter the
execution sequence. The group starts and terminates with
compiler words. Examples of control structures: DO ...
LOOP DO ... +LOOP BEGIN ... WHILE ... REPEAT BEGIN ...
UNTIL IF ... THEN IF ... ELSE ... THEN See: "9.9 Control
Structures"
transportability
This term indicates that equivalent execution results when a
program is executed on other than the system on which it was
created. See: "equivalent execution"
true
A non-zero value represents the true state of a flag. Any
non-zero value will be accepted by a standard word as
'true'; all standard words return a 16-bit value with all
bits set to one when returning a 'true' flag.
user area
An area in memory which contains the storage for user
variable.
variable, user
A variable whose data storage area is usually located in the
user area. Some system variables are maintained in the user
area so that the words may be re-entrant to different users.
vocabulary
An ordered list of word definitions. Vocabularies are an
advantage in separating different word definitions that may
have the same name. More than one definition with the same
name can exist in one vocabulary. The latter is called a
redefinition. The most recently created redefinition will
be found when the vocabulary is searched.
vocabulary, compilation
The vocabulary into which new word definitions are appended.
word
A sequence of characters terminated by one blank or the end
of the input stream. Leading blanks are ignored. Words are
usually obtained via the input stream.
word definition
A named FORTH execution procedure compiled into the
dictionary. Its execution may be defined in terms of
machine code, as a sequence of compilation address, or other
compiled words.
word name
11
5. DEFINITIONS OF TERMS
The name of a word definition. Word names are limited to 31
characters and may not contain an ASCII space. If two
definitions have different word names in the same vocabulary
they must be uniquely findable when this vocabulary is
searched. See: "vocabulary" "9.5.3 EXPECT"
word set
A named group of FORTH word definitions in the Standard.
word set, assembler extension
Additional words which facilitate programming in the native
machine language of the computer which are by nature system
dependent.
word set, double number extension
Additional words which facilitate manipulation of 32-bit
numbers.
word set, required
The minimum words needed to compile and execute Standard
Programs.
word set, system extension
Additional words which facilitate the access to internal
system characteristics.
word, standard
A named FORTH procedure definition, in the Required word set
or any extension word sets, formally reviewed and accepted
by the Standards Team.
12

View File

@@ -0,0 +1,67 @@
6. REFERENCES
6. REFERENCES
The following document is considered to be a portion of this
Standard:
`American National Standard Code for Information Interchange`,
X3.4-1977 (ASCII), American National Standards Institute,
1430 Broadway, New York, NY 10018, USA.
The following documents are noted as pertinent to the FORTH-83
Standard, but are not part of this Standard.
FORTH-77, FORTH Users Group, FST-780314
FORTH-78, FORTH International Standards Team
FORTH-79, FORTH Standards Team
FORTH-83 STANDARD, Appendices, FORTH Standards Team
`Webster's Collegiate Dictionary` shall be used to resolve
conflicts in spelling and English word usage.
13

View File

@@ -0,0 +1,66 @@
7. REQUIREMENTS
3. Return stack of 48 bytes;
4. Mass storage capacity of 32 blocks, numbered 0 through 31;
5. One ASCII input/output device acting as an operator's
terminal.
15

View File

@@ -0,0 +1,132 @@
8. COMPLIANCE AND LABELING
8. COMPLIANCE AND LABELING
The FORTH Standards Team hereby specifies the requirements for
labeling of systems and applications so that the conditions for
program portability may be established.
A Standard System may use the specified labeling if it complies
with the terms of this Standard and meets the particular Word Set
definitions.
A Standard Program (application) may use the specified labeling
if it utilizes the specified Standard System according to this
Standard and executes equivalently on any such system.
In a system or application, a standard word may not be redefined
to perform a different function within the vocabulary FORTH.
FORTH Standard
A system may be labeled:
FORTH-83 Standard
if it includes all of the Required Word Set in either source or
object form and complies with the text of this Standard. After
executing "FORTH-83" the dictionary must contain all of the
Required Word Set in the vocabulary FORTH, as specified in this
Standard.
Standard Sub-set
A system may be labeled:
FORTH-83 Standard Sub-set
if it includes a portion of the Required Word Set and complies
with the remaining text of this Standard. However, no Required
Word may be present with a non-standard definition.
Standard with Extensions
A system may be labeled:
FORTH-83 Standard with <name> Standard Extension(s)
if it comprises a FORTH-83 Standard System and one or more
Standard Extension Word Set(s). For example, a designation would
be in the form:
16
8. COMPLIANCE AND LABELING
FORTH-83 Standard with Double-Number Standard Extension
Standard Program
A FORTH source program which executes equivalently on any
Standard System may be labeled:
FORTH-83 Standard Program
See: "equivalent execution" "7. REQUIREMENTS"
Standard Program with Environmental Dependencies
A program which is standard in all ways except for specific
environmentally dependent words may be labeled:
FORTH-83 Standard Program with Environmental Dependencies
if the following additional requirements are met:
1) Environmental dependencies (including hardware
dependencies) shall be factored into an isolated set of
application word definitions.
2) Each environmentally dependent word definition must be
fully documented, including all dependencies in a manner at
least as detailed as the standard words.
17

View File

@@ -0,0 +1,199 @@
9. USAGE
9. USAGE
9.1 Words Names and Word Definitions
A Standard Program may reference only the definitions of the
Required Word Set and Standard Extensions and definitions which
are subsequently defined in terms of these words. Furthermore, A
Standard Program must use the standard words as required by any
conventions of this Standard. Equivalent execution must result
from Standard Programs.
The implementation of a Standard System may use words and
techniques outside the scope of the Standard, provided that no
program running on that system is required to use words outside
the Standard for normal operation.
If a Standard System or Standard Program redefines Standard
definitions within the FORTH vocabulary, these definitions must
comply with the Standard.
9.2 Addressable Memory
The FORTH system may share the dictionary space with the user's
application. The native addressing protocol of the host computer
is beyond the scope of this Standard.
Therefore, in a Standard Program, the user may only operate on
data which was stored by the application. No exceptions!
A Standard Program may address:
1. parameter fields of words created with CREATE , VARIABLE ,
and user defined words which execute CREATE ;
2. dictionary space ALLOTted;
3. data in a valid mass storage block buffer.
See: "9.7 Multiprogramming Impact";
4. data area of user variables;
5. text input buffer and PAD up to the amount specified as the
minimum for each area.
A Standard Program may NOT address:
1. directly into the data or return stacks;
2. into a definition's name field, link field, or code field;
18
9. USAGE
3. into a definition's parameter field if not stored by the
application.
9.3 Return Stack
A Standard Program may cautiously use the return stack with the
following restrictions:
The return stack may not be accessed inside a do-loop for values
placed on the return stack before the loop was entered. Further,
neither I nor J may be used to obtain the index of a loop if
values are placed and remain on the return stack within the loop.
When the do-loop is executed all values placed on the return
stack within that loop must be removed before LOOP , +LOOP , or
LEAVE is executed. Similarly, all values placed on the return
stack within a colon definition must be removed before the colon
definition is terminated at ; or before EXIT is executed.
9.4 Compilation
The system uses the return stack and the dictionary in a system
dependent manner during the compilation of colon definitions.
Some words use the data stack in a system dependent manner during
compilation. See: "sys (11.7)"
9.5 Terminal Input and Output
9.5.1 KEY
A Standard System must receive all valid ASCII characters. Each
KEY receives one ASCII character, with more-significant bits
environmentally dependent and might be zero. KEY must receive as
many bits as are obtainable. A Standard Program without
environmental dependencies may only use the least significant 7-
bit ASCII character received by KEY . For example: KEY 127 AND
9.5.2 EXPECT
Control characters may be processed to allow system dependent
editing of the characters prior to receipt. Therefore, a
Standard Program may not anticipated that control characters can
be received.
19
9. USAGE
9.5.3 EMIT
Because of the potential non-transportable action by terminal
devices of control characters, the use of ASCII control
characters is an environmental dependency. Each EMIT deals with
only one ASCII character. The ASCII character occupies the
least-significant 7 bits; the more-significant bits may be
environmentally dependent. Using the more-significant bits when
other than zero is an environmentally dependent usage. EMIT must
display as many bits as can be sent.
9.5.4 TYPE
Because of the potential non-transportable action by terminal
devices of control characters, the use of ASCII control
characters is an environmental dependency.
9.6 Transporting Programs Between Standard Systems
Further usage requirements are expected to be added for
transporting programs between Standard Systems.
9.7 Multiprogramming Impact
In a multiprogrammed system, Device Layer words and those words
which implicitly reference the Device Layer words may relinquish
control of the processor to other tasks. Although there is
insufficient experience to specify a standard for
multiprogramming, historical usage dictates that a programmer be
aware of the potential impact with regard to resources shared
between tasks. The only shared resources specified within the
Standard are block buffers. Therefore the address of a block
buffer returned by BLOCK or BUFFER becomes invalid during and
after the execution of any word marked by the attribute M in the
glossary or any words executing them. A block buffer is valid
only if its address is valid. See: "11.4 Attributes"
9.8 Numbers
Interpreted or compiled numbers are in the range
{-32,768..65,535}. See: "number conversion"
9.9 Control Structures
Control structures are compiled inside colon definitions.
Control structures can be nested but cannot overlap. For
additional limitations see DO .
20

View File

@@ -0,0 +1,198 @@
A. STANDARDS TEAM MEMBERSHIP
APPENDIX A. STANDARDS TEAM MEMBERSHIP
A.1 Standard Team Membership: Members
The following is a list in alphabetical order of the people who
are FORTH Standards Team Members. These names are provided to
indicate the texture and make-up of the team itself. Where
appropriate, the official capacity of individuals is also
indicated.
Paul Bartholdi, Sauverny, Switzerland
Robert Berkey, Palo Alto, California USA Treasurer
David Boulton, Redwood City, California USA
John Bumgarner, Morgan Hill, California USA
Don Colburn, Rockville, Maryland USA
James T. Currie, Jr., Blacksburg, Virginia USA
Thomas B. Dowling, Lowell, Massachusetts USA
William S. Emery, Malibu, California USA
Lawrence P. Forsley, Rochester, New York USA
Kim R. Harris, Palo Alto, California USA Referee
John S. James, Los Gatos, California USA
Guy M. Kelly, La Jolla, California USA Chair
Thea Martin, Rochester, New York USA
Michael McNeil, Scotts Valley, California USA
Robert E. Patten, Modesto, California USA
Michael Perry, Berkeley, California USA
David C. Petty, Cambridge, Massachusetts USA
William F. Ragsdale, Hayward, California USA
Elizabeth D. Rather, Hermosa Beach, California USA
Dean Sanderson, Hermosa Beach, California USA Referee
Klaus Schleisiek, Hamburg, W-Germany
George W. Shaw II, Hayward, California USA Referee
Robert L. Smith, Palo Alto, California USA Secretary
Michael K. Starling, Elkview, West Virginia USA
John K. Stevenson, Portland, Oregon USA
Glenn S. Tenney, San Mateo, California USA Referee
55
A. STANDARDS TEAM MEMBERSHIP
A.2 FORTH Standards Team Sponsors
The following is a list in alphabetical order of individuals and
organizations who have contributed funds and other assistance to
aid the word of the FST and deserve recognition for their
involvement. FST sponsors have no duties or responsibilities in
the FST, but they receive copies of proposals and comments
considered at a formal meeting, and drafts and adopted standards
prepared as a result of that meeting.
Creative Solutions Inc., 4801 Randolph Rd., Rockville, MD 20852
USA
Fantasia Systems Inc., 1059 Alameda de las Pulgas, Belmont, CA
94002 USA
FORTH, Inc., 2309 Pacific Coast Highway, Hermosa Beach, CA 90254
USA
FORTH Interest Group Inc., P.O. Box 1105, San Carlos, CA 94070
USA
Forthright Enterprises, P.O. Box 50911, Palo Alto, CA 94020 USA
Glen Haydon Enterprises, Box 439 Rt. 2, La Honda, CA 94020 USA
John K. Gotwals, W. Lafayette, IN USA
John D. Hall, Oakland, CA USA
Hartronix, Inc., 1201 N. Stadem, Tempe, AZ 85281 USA
Hewlett-Packard Corvallis Div., 1000 NE Circle Blvd., Corvallis,
OR 97330 USA
Information Unlimited Software, Inc., 2401 Marinship, Sausalito,
CA 94965 USA
Henry H. Laxen, 1259 Cornell Avenue, Berkeley, CA 94705 USA
Laxen & Harris, Inc.
George B. Lyons, 280 Henderson Street, Jersey Cit, NJ 07302 USA
C. Kevin McCabe, Chicago, IL USA
MicroMotion, 12077 Wilshire Blvd #506, Los Angeles, CA 90025 USA
Bruce R. Montague, Monterey, CA USA
Mountain View Press, P.O. Box 4659, Mountain View, CA 94040 USA
56
A. STANDARDS TEAM MEMBERSHIP
Michael A. Perry, Berkeley, CA USA
Robert Berkey Services, 2334 Dumbarton Ave., Palo Alto, CA 94303
USA
Royal Greenwich Observatory, Herstmonsioux Castle, Eastbourne,
England
Shaw Laboratories, Ltd., 24301 Southland Drive #216, Hayward, CA
94545 USA
Sygnetron Protection Systems, Inc., 2103 Greenspring, Timonium,
MD 21093 USA
Telelogic Inc., 196 Broadway, Cambridge, MA 02139 USA
UNISOFT, P.O. Box 2644, New Carrollton, MD 20784 USA
57

View File

@@ -0,0 +1,594 @@
C. EXPERIMENTAL PROPOSALS
APPENDIX C. EXPERIMENTAL PROPOSALS
Since FORTH is an extensible language and subject to evolution,
the Standard contains a section describing experimental
proposals. FORTH users are encouraged to study, implement, and
try these proposals to aid in the analysis of and the decision
for or against future adoption into the Standard. Readers are
cautioned that these proposals contain opinions and conclusions
of the authors of the proposals and that these proposals may
contain non-standard source code.
65
C. EXPERIMENTAL PROPOSALS
SEARCH ORDER SPECIFICATION AND CONTROL
WILLIAM F. RAGSDALE
1 INTRODUCTION
The method of selecting the order in which the dictionary is
searched has grown from unchained vocabularies to the present use
of chained vocabularies. Many techniques are in use for
specification of the sequence in which multiple vocabularies may
be searched. In order to offer generality and yet get precision
in specification, this proposal is offered.
2 DESCRIPTION
The following functions are required:
1. Two search orders exist. CONTEXT is the group of
vocabularies searched during interpretation of text from the
input stream. CURRENT is the single vocabulary into which
new definitions are compiled, and from which FORGET
operates.
2. Empty CONTEXT to a minimum number of system words. These
are just the words to further specify the search order.
3. Add individual vocabularies into CONTEXT. The most recently
added is searched first.
4. Specify which single vocabulary will become CURRENT.
The following optional functions aid the user:
1. Display the word names of the first vocabulary in the
CONTEXT search order.
2. Display the vocabulary names comprising CURRENT and CONTEXT
search orders.
66
C. EXPERIMENTAL PROPOSALS
3 ADVANTAGES
Use over the past year has demonstrated that the proposed
methods may emulate the vocabulary selection of all other
systems. The order is explicit by execution, may be interpreted
and compiled, and is obvious from the declaration. The search
order is specified at run-time rather than the time a new
vocabulary is created.
4 DISADVANTAGES
By migrating to a common structure, vendors give up one
point at which they may claim their product is better than
others. Another drawback is that the number of CONTEXT
vocabularies is fixed; older methods had an indefinite 'tree'
structure. In practice, the branching of such a structure was
very rarely greater than four.
Forth words operate in a context sensitive environment, as
word names may be redefined and have different definitions in
different vocabularies. This proposal compounds the problem. By
displaying the search order names, the user at least can readily
verify the search order.
5 IMPACT
The text of the Forth 83 Standard has been carefully chosen
for consistency and generality. However, no specification on how
the search order is developed by the user is given. This
omission is unavoidable, due to the diversity of contemporary
practice. This proposal is intended to complete the Forth 83
requirements in a fashion that exceeds all other methods.
Previously standardized words continue in their use:
VOCABULARY, FORTH, DEFINITIONS, and FORGET. However, this
proposal assumes that vocabulary names are not IMMEDIATE .
6 DEFINITIONS
Search order:
The sequence in which vocabularies are selected when
locating a word by name in the dictionary. Consists of one
transient and up to three resident vocabularies.
Transient order:
Execution of any vocabulary makes it the first vocabulary
searched, replacing the previously selected transient
vocabulary.
67
C. EXPERIMENTAL PROPOSALS
Resident order:
After searching the transient order, up to three additional
vocabularies may be searched. The application program
controls this selection.
7 GLOSSARY
ONLY -- ONLY
Select just the ONLY vocabulary as both the transient
vocabulary and resident vocabulary in the search order.
FORTH -- ONLY
The name of the primary vocabulary. Execution makes FORTH
the transient vocabulary, the first in the search order, and
thus replaces the previous transient vocabulary.
ALSO -- ONLY
The transient vocabulary becomes the first vocabulary in the
resident portion of the search order. Up to the last two
resident vocabularies will also be reserved, in order,
forming the resident search order.
ORDER -- ONLY
Display the vocabulary names forming the search order in
their present search order sequence. Then show the
vocabulary into which new definitions will be placed.
WORDS -- ONLY
Display the word names in the transient vocabulary, starting
with the most recent definition.
FORGET -- ONLY
Used in the form:
FORGET <name>
Delete from the dictionary <name> and all words added to the
dictionary after <name> regardless of the vocabulary.
Failure to find <name> is an error condition. An error
condition also exists upon implicitly forgetting a
vocabulary (due to its definition after <name>).
DEFINITIONS -- ONLY
Select the transient vocabulary as the current vocabulary
into which subsequent definitions will be added.
SEAL -- ONLY
Delete all occurances of ONLY from the search order. The
effect is that only specified application vocabularies will
be searched.
68
C. EXPERIMENTAL PROPOSALS
8 TYPICAL SOURCE CODE
0 ( ALSO ONLY 82jun12 WFR )
1 ( note the systems -FIND searches 1 to 5 vocabs in CONTEXT )
2 VOCABULARY ONLY ONLY DEFINITIONS
3 : ALSO ( slide transient into resident )
4 CONTEXT DUP 2+ 6 CMOVE> ;
5
6 HERE 2+ ] ( alter run time from usual vocabulary )
7 DOES> CONTEXT 8 ERASE DUP CONTEXT ! CONTEXT 8 + !
8 ALSO EXIT [
9 ' ONLY CFA ! ( Patch into ONLY; make NULL word )
10 CREATE X ' EXIT >BODY X ! 41088 ' X NFA ! IMMEDIATE
11 : FORTH FORTH ;
12 : DEFINITIONS DEFINITIONS ; : FORGET FORGET ;
13 : VOCABULARY VOCABULARY ; : ONLY ONLY ;
14 : WORDS WORDS ;
15
0 ( ORDER 82jun12 WFR )
1 : ORDER ( show the search order )
2 10 SPACES CONTEXT 10 OVER + SWAP
3 DO I @ ?DUP 0= ?LEAVE ID. 2 +LOOP
4 10 SPACES CURRENT @ ID. ;
5
6 ONLY FORTH ALSO DEFINITIONS
7
8
9
10
11
12
13
14
15
9 EXAMPLES OF USE
ONLY reduce search order to minimum
FORTH search FORTH then ONLY
ALSO EDITOR search EDITOR, FORTH then ONLY
DEFINITIONS new definitions will be added into the EDITOR
The same sequence would be compiled:
: SETUP ONLY FORTH ALSO EDITOR DEFINITIONS ;
10 REFERENCES
W. F. Ragsdale, The 'ONLY' Concept for Vocabularies, Proceedings
of the 1982 FORML Conference, pub. Forth Interest Group.
69
C. EXPERIMENTAL PROPOSALS
W. F. Ragsdale, fig-FORTH Installation Manual, Forth Interest
Group.
70
C. EXPERIMENTAL PROPOSALS
DEFINITION FIELD ADDRESS CONVERSION OPERATORS
by
Kim R. Harris
A. INTRODUCTION
The standard provides a transportable way to obtain the
compilation address of a definition in the dictionary of a FORTH
system (cf., FIND and ' ). It also provides an operator to
convert a compilation address to its corresponding parameter
field address. However, the standard does not provide a
transportable way to convert either of these addresses to the
other fields of a definition. Since various FORTH
implementations have different dictionary structures, a standard
set of conversion operators would increase transportability and
readability.
A set of words is proposed which allows the conversion of any
definitions field address to any other.
B. GLOSSARY
In the following words, the compilation address is either the
source or the destination, so it is not indicated in the names.
>BODY addr1 -- addr2 "to-body"
addr2 is the parameter field address corresponding to the
compilation address addr1.
>NAME addr1 -- addr2 "to-name"
addr2 is the name field address corresponding to the
compilation address addr1.
>LINK addr1 -- addr2 "to-link"
addr2 is the link field address corresponding to the
compilation address addr1.
BODY> addr1 -- addr2 "from-body"
addr2 is the compilation address corresponding to the
parameter field address addr1.
NAME> addr1 -- addr2 "from-name"
addr2 is the compilation address corresponding to the name
field address addr1.
71
C. EXPERIMENTAL PROPOSALS
LINK> addr1 -- addr2 "from-link"
addr2 is the compilation address corresponding to the link
field address addr1.
The previous set of words is complete, but may be inefficient for
going between two fields when one is not the compilation address.
For greater efficiency, additional operators may be defined which
name both the source and destination fields.
N>LINK addr1 -- addr2 "name-to-link"
addr2 is the link field address corresponding to the name
field address addr1.
L>NAME addr1 -- addr2 "link-to-name"
addr2 is the name field address corresponding to the link
field address addr1.
C. DISCUSSION
The previous words provide a complete, consistent, and efficient
set of definition field address conversion operations. They can
be implemented in a FORTH system which uses any combination of
the following options for its dictionary structure:
Link fields first or second.
Fixed or variable length name fields.
Additional fields in the definitions structure.
Heads contiguous or separated from bodies.
Indirect, direct, subroutine, or token threaded code.
The words are compatible with this standard; their inclusion
would not require other changes to be made to the standard.
Disadvantages to including them in the standard include:
They add 6 to 8 more words to the standard.
A standard program may not use all of them since it is not
allowed to access the name or link fields. However, this
does not disqualify them from being in the standard.
If a definition's head is not in the dictionary, an error
condition would exist. In this case, what action should the
words take in an implemented system?
The author of this experimental proposal recommends that FORTH
system implementors try them and that they be included in the
System Word Set of the next FORTH standard.
72
C. EXPERIMENTAL PROPOSALS
D. SOURCE CODE EXAMPLE
High level source code is shown below for a very simple
dictionary structure. This code assumes a FORTH system which
uses indirect threaded code, heads contiguous to bodies, and a
definition structure of the following format:
Name field, 4 bytes long, fixed length.
Link field, 2 bytes long.
Code field, 2 bytes long.
Parameter field, variable length.
: >BODY ( acf -- apf ) 2+ ;
: BODY> ( apf -- acf ) 2- ;
: >LINK ( acf -- alf ) 2- ;
: LINK> ( alf -- acf ) 2- ;
: >NAME ( acf -- anf ) 6 - ;
: NAME> ( anf -- alf ) 6 + ;
: N>LINK ( anf -- alf ) 4 + ;
: L>NAME ( alf -- anf ) 4 - ;
E. EXAMPLES OF USE
No examples are given because their use should be obvious.
73

View File

@@ -0,0 +1,728 @@
D. CHARTER
APPENDIX D.
CHARTER
of the
FORTH STANDARDS TEAM
1. Purpose and Goals
1.1 Purpose
1.1.1 This Charter establishes and guides a voluntary
membership professional organization, the FORTH Standards
Team (hereafter referred to as the "FST") and provides a
method for its operation.
1.2 Goals
1.2.1 The goal of the FST is the creation, maintenance, and
proliferation of a standard (hereafter referred to as the
"Standard") for the FORTH computer programming system and
for application programs executed by a Standard system. The
Standard shall specify requirements and constraints which
such computer software must satisfy.
1.2.2 The team shall also develop a method of
identification and labeling of FORTH implementations and
programs which conform to the Standard.
1.3 Organization
1.3.1 The FST is a voluntary membership organization with
no formal status as a legal entity. It operates by
consensus of the professional and commercial FORTH community
and conducts business by the professional discourse and
agreement of its members. It is intended that this Charter
be a guide to the operation of the FST subject to reasonable
minor digression, rather than being a rigid document under
which vested rights are granted.
74
D. CHARTER
2. METHODS
2.1 Formal Meetings
2.1.1 The FST shall hold periodic formal meetings for
discussion and decisions concerning a current or future
Standard.
2.1.2 There is not specified frequency for formal meetings.
Each meeting shall be at such time and place as was decided
at the prior meeting. If a meeting cannot be held as
decided, the Chairperson may designate another time and
place.
2.1.3 The Chairperson shall send a written notice at least
sixty (60) days in advance of each formal meeting to each
voting member. A longer notification period is recommended.
It is anticipated that the continuing close coordination of
the participants, the decision at the prior formal meeting,
and publication of a meeting notice in FORTH Dimensions and
other trade journals will provide sufficient notice to the
FORTH community.
2.1.4 At a formal FST meeting, there shall be general
sessions consisting of all attendees. General sessions are
for matters that are ready for discussion and decision. All
votes concerning the Standard, Charter, or FST procedures
must take place during a general session.
2.1.5 Also at formal meetings, subteams will be established
to examine groups of proposals and to prepare
recommendations for a general session. All meeting
attendees may participate in the work and voting of a
subteam. Each subteam should elect from its members a
coordinator to conduct its meetings and a reporter to record
and report its recommendations.
2.1.6 The Chairperson may publish and distribute an agenda
at or in advance of a formal meeting. As a guideline, each
day of a formal meeting begins with a general session,
followed by concurrent subteam meetings followed by another
general session.
2.1.7 In view of the voluntary nature of the FST, at least
one third of the membership is required to hold a formal
meeting. Two thirds of the number of voting members present
at the start of each day's first general session shall set
the quorum for the remainder of that day.
75
D. CHARTER
2.1.8 Between formal meetings, the Chairperson may appoint
such informal working groups as is appropriate. Each group
may be given a goal and scope to direct its activities. Its
conclusions or recommendations must be given to the
Chairperson in written form.
2.2 Proposals and Comments
2.2.1 Prior to each formal meeting, the Chairperson may
solicit submission of comments and proposals for changes,
additions, or deletions to the then-current Standard, the
draft Standard or this Charter. A cutoff date may be
specified for the submission of such proposals.
2.2.2 A considerable amount of information must accompany
each proposal to help FST members analyze the proposal.
Therefore, submission of proposals and comments shall be
according to the format and instructions shown in the
"Proposal/Comment Form" included as an Appendix to this
Standard. Any proposal not in the appropriate form or
received after the cutoff date may not be considered unless
the Chairperson deems it to be of sufficient significance.
2.2.3 Unsolicited proposals and comments by volunteers are
acknowledged as valuable. Any individual or group may
submit proposals and/or comments concerning the Standard or
this Charter. These should be sent to the official address
of the FST. Properly formatted proposals and comments are
preferred. The author or a representative should plan to
attend the next formal meeting to emphasize, support, and
possibly modify the proposals.
2.2.4 Since the quantity of proposals and comments may
exceed the number for which there is time to be voted upon,
submission of a proposal does not automatically mean that it
will be voted upon at the next formal FST meeting. The
Chairperson or some members appointed by the Chairperson or
elected by the voting members may screen and organize the
received proposals and comments for voting upon at the next
formal meeting.
2.2.5 To allow reflection and examination, proposals and
comments shall be distributed to FST voting members and
sponsors in advance of a formal meeting. Proposals and
comments not distributed in advance, including proposals
made during a formal meeting, may be considered at the
discretion of the Chairperson.
76
D. CHARTER
2.3 Draft Standard
After a formal meeting, the referees and officers of the FST
shall prepare a draft Standard for review by the then-
current FST voting members. The referees and officers shall
consolidate proposals accepted by vote during the meeting,
resolve any ambiguities or problems, and incorporate these
changes with the text of the previous Standard or draft
Standard.
2.4 Standard
2.4.1 The referees and officers may, by near unanimous
decision (not more than one no vote), declare the draft
Standard, as mentioned in the previous paragraph, as being
the proposed Standard.
2.4.2 A proposed Standard shall be distributed to all FST
voting members for a mail ballot. This ballot shall be
based solely on the text of the proposed Standard as
distributed.
2.4.3 Each ballot returned shall be signed by the voting
member submitting it. An affirmative vote of at least two
thirds of the voting members shall adopt the document. Such
adoption makes the draft Standard the current, official FST
Standard which supersedes all prior Standards.
2.5 Charter
2.5.1 At a formal FST meeting, the charter may be amended
by a simple majority of voting members present provided that
at least one third of all voting members are present; such
amendments become effective at the end of the current formal
meeting.
2.5.2 At other than a formal FST meeting, the charter may
be amended by a simple majority of all voting members, such
vote to be taken by signed mail ballots.
77
D. CHARTER
3. MEMBERSHIP
3.1 General
Membership in the FST is a privilege, not a right. An
invitation for voting membership may be extended to those
who the FST feels can contribute to the goals of the
Standard and the FST. There are several classes of
participation in the efforts of the FST. Membership in each
class has no specified term but continues from the time when
membership is initiated to the conclusion of the next formal
meeting.
3.2 Voting Members
3.2.1 Voting members are individuals who are elected into
such membership at the concluding session of a formal FST
meeting. Any voting member who resigns between formal
meetings shall not be replaced until the membership
elections at the conclusion of the next formal meeting. A
newly elected voting member gains voting rights only after
all voting members have been elected. A significant
professional FORTH background is required of voting members.
3.2.2 Each voting member present at a formal meeting shall
indicate in writing his or her desire to continue as a
voting member. Only these voting members can vote in a
general session of a formal meeting on any matters affecting
the Standard or the Charter and on the election of all
voting members.
3.2.3 Voting members are elected by a simple majority of
those voting members present. The number of voting members
shall be limited to thirty (30). Individuals eligible to be
elected are selected from each of the following ordered
categories in order, until the number of voting members
reaches the limit.
3.2.3.1 Category 1: current voting member who have
actively participated in at least two days of a formal
meeting. Voting members are expected to actively
participate in subteam meetings and all general
sessions.
3.2.3.2 Category 2: current voting members who are
not eligible by Category 1, but who have requested in
writing that his or her voting membership be
maintained.
3.2.3.3 Category 3: eligible candidates. Eligible
candidates will be presented to the voting members then
elected as follows:
78
D. CHARTER
3.2.3.3.1 If the number of eligible candidates
does not exceed the number of openings for voting
membership, each candidate is voted upon and
accepted by a simple majority.
3.2.3.3.2 If the number of eligible candidates
does exceed the number of openings for voting
membership, candidates will be voted upon by
ballot whereby each voting member may vote for up
to the number of openings remaining. Those
candidates receiving the most votes will be
elected until there are no more openings for
voting membership.
3.3 Candidates
3.3.1 Candidates are individuals who desire to actively
participate in and support the FST by becoming voting
members.
3.3.2 To be eligible, each Candidate must: declare in
writing to the secretary at the first general session of a
formal FST meeting that he or she is a Candidate, actively
participate in subteam meetings and all general sessions at
a formal FST meeting, and have a significant professional
background in FORTH. The Chairperson may request
information or ask questions of any candidate to determine
his or her technical knowledge and experience. Candidates
are expected to submit proposals, participate in the
discussions of the formal meeting, and contribute to the
work and voting of subteams.
3.4 Observers
3.4.1 Observers are individuals who attend a formal meeting
but are neither voting members nor candidates. At the
discretion of the Chairperson, they may contribute to the
discussion at general sessions and to the work of subteams.
The number of observers allowed at a formal meeting may be
limited by the Chairperson.
3.5 FST Sponsors
3.5.1 FST sponsors are individuals or organizations who
contribute funds and other assistance to aid the work of the
FST. FST sponsors have no duties or responsibilities in the
FST, but they will receive copies of proposals and comments
considered at a formal meeting, and drafts and adopted
standards prepared as a result of that meeting.
79
D. CHARTER
3.5.3 FST sponsorship exists from the end of one formal
meeting to the end of the next formal meeting.
3.5.3 Qualification of FST sponsors may be determined by a
simple majority vote at a formal FST meeting. If no such
qualification exist, the Chairperson may specify
qualifications, including the amount of financial
contributions, which will remain in effect until the next
formal FST meeting.
4. OFFICERS
4.1 General
There shall be four types of elected officers of the FST:
the Chairperson, the Secretary, the Treasurer, and one or
more Referees. Each officer shall be elected at a formal
meeting of the FST and serve until the next formal meeting.
4.2 Vacancies
If any office other than the Chairperson becomes vacant
between formal meetings, the Chairperson may appoint a
replacement. If the office of the Chairperson becomes
vacant between formal meetings, a new Chairperson shall be
elected by an informal majority vote of the remaining
officers. At any formal meeting, any officer, including the
Chairperson, may be replaced by a simple majority vote of
the voting members present at that meeting.
4.3 Chairperson
4.3.1 The Chairperson is responsible for governing the
general business of the FST. He or she is responsible for
implementing the FST's Charter and any other requirements
specified by the Standard.
4.3.2 The Chairperson's term of office shall be from the
conclusion of the formal meeting at which he or she is
elected to the conclusion of the next formal meeting. The
election of a Chairperson is held at the concluding general
session of a formal meeting after the election of voting
members; hence, newly elected voting members may vote for
the Chairperson. Only voting members are eligible to be
elected Chairperson.
4.3.3 The Chairperson shall conduct each formal meeting.
In general, the meetings will follow the current Robert's
Rules of Order; however, the Chairperson may determine the
specific rules for a formal meeting.
80
D. CHARTER
4.3.4 Any matter needing a decision between formal meetings
not specified by this Charter shall be decided by the
Chairperson.
4.3.5 The Chairperson has duties and responsibilities
specified elsewhere in this Charter.
4.4 Secretary
4.4.1 The Secretary is responsible for recording the
activities and results of the FST.
4.4.2 The Secretary is elected at the first general session
of a formal meeting and serves until a Secretary is elected
at the beginning of the next formal meeting.
4.4.3 The Secretary has many responsibilities.
4.4.3.1 The Secretary is responsible for collecting,
maintaining, and archiving the official copies of the
Standard, the Charter, all other FST documents,
correspondence, and lists of the FST members of each class.
4.4.3.2 During a formal meeting, the Secretary is
responsible for:
(a) Keeping the minutes of the general sessions,
including all votes taken. For votes affecting the
Standard or Charter, he or she shall: record the
number of voting members present, determine if a quorum
is present, determine the number of affirmative votes
required for the vote to pass, the number of voting
members voting in the affirmative and negative, and the
result of the vote.
(b) Recording and verifying the attendance and
membership class of each attendee.
(c) Recording the recommendations of subteams.
4.4.3.3 The Secretary is also responsible for collecting,
archiving, and distributing proposals before a formal
meeting. He or she is also responsible for incorporating
proposals accepted during a formal meeting into the Standard
or Charter. Other officers aid the Secretary in these
duties.
4.5 Treasurer
4.5.1 The Treasurer is responsible for managing the
financial business of the FST. He or she is responsible for
maintaining accurate and current financial records and for
accepting and dispersing funds for official FST activities.
81
D. CHARTER
4.5.2 The Treasurer's term of office shall be from the
conclusion of the formal meeting at which he or she is
elected to the conclusion of the next formal meeting. The
election of a Treasurer is held just after the election of
the Chairperson. Only voting members are eligible to be
elected Treasurer.
4.6 Referees
4.6.1 At the conclusion of a formal meeting there may be
additional technical work required to prepare a draft
Standard or Charter. This work shall be performed by the
officers of the FST, including a group of Referees. They
should be individuals who have superior knowledge and
experience in the implementation and use of FORTH.
4.6.2 At least three and no more than five Referees shall
be elected by a majority of the voting members present at
the concluding general sessions of a formal meeting. This
takes place after the election of voting members. A
Referee's term is from election at the end of one formal
meeting until the end of the next formal meeting. Only
voting members are eligible to be elected as Referees.
4.6.3 The Referees shall adopt methods and rules as they
deem appropriate to complete their work; they may be
informal. However, any matter committed to the Referees for
resolution must achieve near unanimous agreement (not more
than one no vote). Lacking that, the matter shall be
omitted from further action pending further consideration at
the next formal meeting.
5. EXPERIMENTAL PROPOSALS
5.1 General
5.1.1 Since FORTH is an extensible language and subject to
evolution, the Standard may contain a section describing
experimental proposal to aid in the analysis of and the
decision for or against future adoption into the Standard.
After the results of experimentation are known, each
proposal will be considered, at a future formal meeting, for
inclusion into the Standard.
5.1.2 An experimental proposal may be individual FORTH
words, sets of related words, or specifications for part of
the Standard. Experimental proposals may be derived from
ordinary proposals or other contributions.
82
D. CHARTER
5.2 Required Information
Each experimental proposal must contain the following
minimum information:
5.2.1 A description of the proposal including an overview
of its functions and its interactions with existing FORTH
words.
5.2.2 A glossary entry of each word in the form and
notation of the Standard.
5.2.3 A statement by the author(s) indicating why the
proposal meets inclusion into the Standard. Both advantages
and disadvantages should be discussed.
5.3 Suggested Information
It is suggested that each experimental proposal also
include:
5.3.1 A source definition for each word in the proposal.
High level definitions using Standard words are preferred,
but new primitive words may be defined in an assembly
language of one commonly-known processor. Sufficient
documentation should be provided so that implementation on
other processors is direct.
5.3.2 An example showing usage of the new words.
6. VOTING
6.1 General
Only voting members have the right to vote on proposals
affecting the Standard, a draft Standard, or this Charter.
6.2 Advisory Votes
At the discretion of the Chairperson, advisory votes may be
requested at a formal meeting. At the discretion of the
Chairperson, all attendees may participate in an advisory
vote.
6.3 Method
Any vote at a formal meeting may be by show of hands or, at
the discretion of the Chairperson, by an informal secret
paper ballot or a roll call.
83
D. CHARTER
6.4 Number
A vote to adopt a proposal into the draft Standard or to
change the Standard, except for the Experimental Proposals
section of the Standard requires a two-thirds affirmative
vote of the voting members present at a general session of a
formal meeting, provided that the number of votes cast are
at least two thirds of that morning's quorum count. To
adopt an experimental proposal into the Experimental
Proposals section of the draft Standard or to change this
Charter, an affirmative vote of a simple majority is
required. Accepting any other procedural matter at a formal
meeting requires only a simple majority affirmative vote.
6.5 Proxies
All votes must be cast by the particular voting member
eligible to vote. No proxy voting is allowed.
84

View File

@@ -0,0 +1,331 @@
E. PROPOSAL/COMMENT FORM
APPENDIX E. PROPOSAL/COMMENT FORM
The following pages are the proposal and/or comment submittal
form. The form includes instructions which should be
explanatory. Copies of submitted proposals and comments will be
made available to FORTH Standards Team members and to team
sponsors.
85
FST Proposal and Comment Submittal Form
-----------------------------------------------------------------
FST USER Title: Proposal Number:
ONLY --> Related Proposals: Disposition:
=================================================================
Keyword(s): Category:
( ) Proposal or ( ) Comment
FORTH Word(s): Section #(s):
-----------------------------------------------------------------
Abstract:
-----------------------------------------------------------------
Proposal and Discussion:
----------------------------------------------------------------
Submitted by: Date:
Page of
=================================================================
FORTH Standards Team; PO Box 4545; Mountain View, CA 94040 820801
86
Proposal and Comment Submittal Form Instructions
Please use the supplied forms for your entire proposal. The
continuation form is only to be used if absolutely necessary; try
to get your proposal to fit on the first sheet. If it helps, use
a reducing copy machine to get more material onto the first
sheet. If you must use multiple sheets, put the main idea onto
the first sheet and less important material onto continuation
sheets. Remember that material on continuation sheets may be
overlooked.
The proposal forms have been produced on a computer system so
that you may produce your proposals using your own computer
system. If you print your proposal and form on your computer
system, all of the information shown on the form(s) MUST be
printed and in the same location.
The following are the instructions for each of the areas of the
form:
1. Please think of the most appropriate keyword or keywords
describing your proposal.
2. Select the best of the following categories of proposals:
0 Nucleus Layer other than #1 (i.e., + AND )
1 Memory Operations (i.e., @ CMOVE )
2 Dictionary (i.e., ' FORGET )
3 String Operations (i.e., WORD COUNT )
4 Interpreter Layer other than #2 or #3 (i.e., ABORT . )
5 Compiler Layer (i.e., : DO )
6 Device Layer (i.e., BLOCK TYPE )
7 Experimental (i.e., 32-bit stack entries)
8 Other Technical (i.e., mono-addressing)
9 Charter
3. Mark whether this is a PROPOSAL or a COMMENT.
4. Indicate which FORTH word or words are relevant.
5. Indicate which section or sections of the Standard are
relevant.
6. The abstract must be kept short. The title, keywords,
category, and abstract may be used in a database for
organization and display on a terminal during a Standards
Team meeting.
7. Detail your proposal and provide supporting discussion.
8. Indicate the name of the submitter or the names of the
submitters.
87
9. Finally, date the submittal and number each page.
88
FST Proposal and Comment Submittal Continuation Form
-----------------------------------------------------------------
FST USE ONLY --> Proposal Number:
=================================================================
-----------------------------------------------------------------
Submitted by: Date:
Page of
=================================================================
FORTH Standards Team; PO Box 4545; Mountain View, CA 94040 820801
89

View File

@@ -0,0 +1,69 @@
( -*- forth -*- )
checking ================= REQUIRED WORD SET ====================
checking Nucleus layer
checks: ! *
checks: ! * */ */MOD + +! - / /MOD 0< 0= 0> 1+ 1- 2+
checks: 2- 2/ < = > >R ?DUP @ ABS AND C! C@ CMOVE
checks: CMOVE> COUNT D+ D< DEPTH DNEGATE DROP DUP EXECUTE
checks: EXIT FILL I J MAX MIN MOD NEGATE NOT OR OVER PICK
checks: R> R@ ROLL ROT SWAP U< UM* UM/MOD XOR
checking Device layer
checks: BLOCK BUFFER CR EMIT EXPECT FLUSH KEY SAVE-BUFFERS
checks: SPACE SPACES TYPE UPDATE
checking Interpreter layer
checks: # #> #S #TIB ' ( -TRAILING . .( <# >BODY >IN
checks: ABORT BASE BLK CONVERT DECIMAL DEFINITIONS FIND
checks: FORGET FORTH FORTH-83 HERE HOLD LOAD PAD QUIT SIGN
checks: SPAN TIB U. WORD
checking Compiler layer
checks: +LOOP , ." : ; ABORT" ALLOT BEGIN COMPILE CONSTANT
checks: CREATE DO DOES> ELSE IF IMMEDIATE LEAVE LITERAL LOOP
checks: REPEAT STATE THEN UNTIL VARIABLE VOCABULARY WHILE [
checks: ['] [COMPILE] ]
44

View File

@@ -0,0 +1,17 @@
checking ============== The Double Number Extension Word Set Layers
checking Nucleus layer
checks: 2! 2@ 2DROP 2DUP 2OVER 2ROT 2SWAP D+ D- D0= D2/
checks: D< D= DABS DMAX DMIN DNEGATE DU<
checking Interpreter layer
checks: D. D.R
checking Compiler layer
checks: 2CONSTANT 2VARIABLE

View File

@@ -0,0 +1,13 @@
checking =============== The Assembler Extension Word Set Layers
checking Interpreter layer
checks: ASSEMBLER
checking Compiler layer
checks: ;CODE CODE END-CODE

View File

@@ -0,0 +1,20 @@
checking ================== The System Extension Word Set Layers
checking Nucleus layer
checks: BRANCH ?BRANCH
checking Interpreter layer
checks: CONTEXT CURRENT
checking Compiler layer
checks: <MARK <RESOLVE >MARK >RESOLVE

View File

@@ -0,0 +1,10 @@
checking ================= CONTROLLED REFERENCE WORDS
checks: --> .R 2* BL BLANK C, DUMP EDITOR EMPTY-BUFFERS
checks: END ERASE HEX INTERPRET K LIST OCTAL OFFSET QUERY
checks: RECURSE SCR SP@ THRU U.R

View File

@@ -0,0 +1,19 @@
checking UNCONTROLLED REFERENCE WORDS
( No recommendation is made that these words be included in a system. )
( No restrictions are placed on the definition or usage of )
( uncontrolled words. However, use of these names for procedures )
( differing from the given definitions is discouraged. )
checks: !BITS ** +BLOCK -' -MATCH -TEXT /LOOP 1+! 1-! ;: ;S
checks: <> <BUILDS <CMOVE >< >MOVE< @BITS AGAIN ASCII ASHIFT B/BUF
checks: BELL CHAIN CONTINUED CUR DBLOCK DPL FLD H. I'
checks: IFEND IFTRUE INDEX LAST LINE LINELOAD LOADS MAP0
checks: MASK MOVE MS NAND NOR NUMBER O. OTHERWISE PAGE READ-MAP
checks: REMEMBER REWIND ROTATE S0 SET SHIFT TEXT USER WORDS
checks: \LOOP

View File

@@ -0,0 +1,11 @@
checking ===================== EXPERIMENTAL PROPOSALS
checking SEARCH ORDER SPECIFICATION AND CONTROL
checks: ONLY FORTH ALSO ORDER WORDS FORGET DEFINITIONS SEAL
checking DEFINITION FIELD ADDRESS CONVERSION OPERATORS
checks: >BODY >NAME >LINK BODY> NAME> LINK> N>LINK L>NAME

View File

@@ -0,0 +1,33 @@
( -*- forth -*- )
: checking ( [text<:>] -- )
[char] : parse ." /////// " type cr
;
: checks: ( [text< >].. -- )
8 spaces
begin
bl word
dup dup if c@ then
while
dup count type space
dup find if 2drop
else cr ." missing " count type cr 8 spaces
then
repeat
cr
drop
;
include fst83_12.fs
include fst83_13.fs
include fst83_14.fs
include fst83_15.fs
include fst83_16.fs
include fst83_b.fs
include fst83_c.fs

462
misc/kforth/doc/fst83/s.txt Normal file
View File

@@ -0,0 +1,462 @@
B. UNCONTROLLED REFERENCE WORDS
APPENDIX B. UNCONTROLLED REFERENCE WORDS
The Uncontrolled Reference Word Set contains glossary definitions
which are included for public reference of words that have past
or present usage and/or are candidates for future
standardization. No recommendation is made that these words be
included in a system.
No restrictions are placed on the definition or usage of
uncontrolled words. However, use of these names for procedures
differing from the given definitions is discouraged.
!BITS 16b1 addr 16b2 -- "store-bits"
Store the value of 16b1 masked by 16b2 into the equivalent
masked part of the contents of addr, without affecting bits
outside the mask.
** n1 n2 -- n3 "power"
n3 is the value of n1 to the power n2.
+BLOCK w -- u "plus-block"
u is the sum of w plus the number of the block being
interpreted.
-' -- addr false "dash-tick"
-- true
Used in the form:
-' <name>
Leave the parameter field of <name> beneath zero (false) if
<name> can be found in the search order; leave only true if
not found.
-MATCH addr1 +n1 addr2 +n2 -- addr3 flag "dash-match"
Attempt to find the +n2-length text string beginning at
addr2 somewhere in the +n1-length text string beginning at
addr1. Return the last+1 address addr3 of the match point
and a flag which is zero if a match exists.
-TEXT addr1 +n1 addr2 -- n2 "dash-text"
Compare two strings over the length +n1 beginning at addr1
and addr2. Return zero if the strings are equal. If
unequal, return n2, the difference between the last
characters compared: addr1(i) - addr2(i).
58
B. UNCONTROLLED REFERENCE WORDS
/LOOP +n -- C,I "up-loop"
sys -- (compiling)
A do-loop terminating word. The loop index is incremented
by the positive value +n. If the unsigned magnitude of the
resultant index is greater than the limit, then the loop is
terminated, otherwise execution returns to the corresponding
DO . The comparison is unsigned magnitude. sys is balanced
with its corresponding DO . See: DO
1+! addr -- "one-plus-store"
Add one to the 16-bit contents at addr.
1-! addr -- "one-minus-store"
Subtract one from the 16-bit contents at addr.
;: -- addr C,I"semi-colon-colon"
Used to specify a new defining word:
: <namex> <name>
When <namex> is executed, it creates an entry for the new
word <name>. Later execution of <name> will execute the
sequence of words between ;: and ; , with the address of the
first (if any) parameters associated with <name> on the
stack.
;S -- Interpret only"semi-s"
Stop interpretation of a block.
<> w1 w2 -- flag "not-equal"
flag is true if w1 is not equal to w2.
<BUILDS -- "builds"
Used in conjunction with DOES> in defining words, in the
form:
: <namex> ... <BUILDS ... DOES> ... ;
and then:
<namex> <name>
When <namex> executes, <BUILDS creates a dictionary entry
for the new <name>. The sequence of words between <BUILDS
and DOES> established a parameter field for <name>. When
<name> is later executed, the sequence of words following
DOES> will be executed, with the parameter field address of
<name> on the data stack.
<CMOVE addr1 addr2 u -- "reverse-c-move"
A synonym for CMOVE> .
>< 16b1 -- 16b2 "byte-swap"
Swap the high and low bytes within 16b1.
>MOVE< addr1 addr2 u -- "byte-swap-move"
Move u bytes beginning at addr1 to the memory beginning at
addr2. During this move, the order of each byte pair is
reversed.
59
B. UNCONTROLLED REFERENCE WORDS
@BITS addr 16b1 -- 16b2 "fetch-bits"
Return the 16-bits at addr masked by 16b1.
AGAIN -- C,I
sys -- (compiling)
Effect an unconditional jump back to the start of a BEGIN-
AGAIN loop. sys is balanced with its corresponding BEGIN .
See: BEGIN
ASCII -- char I,M "as-key"
-- (compiling)
Used in the form:
ASCII ccc
where the delimiter of ccc is a space. char is the ASCII
character value of the first character in ccc. If
interpreting, char is left on the stack. If compiling,
compile char as a literal so that when the colon definition
is later executed, char is left on the stack.
ASHIFT 16b1 n -- 16b2 "a-shift"
Shift the value 16b1 arithmetically n bits left if n is
positive, shifting zeros into the least significant bit
positions. If n is negative, 16b1 is shifted right; the
sign is included in the shift and remains unchanged.
B/BUF -- 1024 "bytes-per-buffer"
A constant leaving 1024, the number of bytes per block
buffer.
BELL --
Activate a terminal bell or noise-maker as appropriate to
the device in use.
CHAIN -- M
Used in the form:
CHAIN <name>
Connect the CURRENT vocabulary to all definitions that might
be entered into the vocabulary <name> in the future. The
CURRENT vocabulary may not be FORTH or ASSEMBLER . Any
given vocabulary may only be chained once, but may be the
object of any number of chainings. For example, every user-
defined vocabulary may include the sequence:
CHAIN FORTH
CONTINUED u -- M
Continue interpretation at block u.
CUR -- addr
A variable pointing to the physical record number before
which the tape is currently positioned. REWIND sets CUR=1.
DBLOCK ud -- addr M "d-block"
Identical to BLOCK but with a 32-bit block unsigned number.
60
B. UNCONTROLLED REFERENCE WORDS
DPL -- addr U "d-p-l"
A variable containing the number of places after the
fractional point for input conversion.
FLD -- addr U "f-l-d"
A variable pointing to the field length reserved for a
number during output conversion.
H. u -- M "h-dot"
Output u as a hexadecimal integer with one trailing blank.
The current base is unchanged.
I' -- w C "i-prime"
Used within a colon definition executed only from within a
do-loop to return the corresponding loop index.
IFEND Interpret only"if-end"
Terminate a conditional interpretation sequence begun by
IFTRUE .
IFTRUE flag -- Interpret only "if-true"
Begin an:
IFTRUE ... OTHERWISE ... IFEND
conditional sequence. These conditional words operated
like:
IF ... ELSE ... THEN
except that they cannot be nested, and are to be used only
during interpretation. In conjunction with the words [ and
] the words [ and ] they may be used within a colon
definition to control compilation, although they are not to
be compiled.
INDEX u1 u2 -- M
Print the first line of each screen over the range {u1..u2}.
This displays the first line of each screen of source text,
which conventionally contains a title.
LAST -- addr U
A variable containing the address of the beginning of the
last dictionary entry made, which may not yet be a complete
or valid entry.
LINE +n -- addr M
addr is the address of the beginning of line +n for the
screen whose number is contained in SCR . The range of +n
is {0..15}.
LINELOAD +n u -- "line-load"
Begin interpretation at line +n of screen u.
61
B. UNCONTROLLED REFERENCE WORDS
LOADS u -- M
A defining word executed in the form:
u LOADS <name>
When <name> is subsequently executed, block u will be
loaded.
MAP0 -- addr "map-zero"
A variable pointing to the first location in the tape map.
MASK n -- 16b
16b is a mask of n most-significant bits if n is positive,
or n least-significant bits if n is negative.
MOVE addr1 addr2 u --
The u bytes at address addr1 are moved to address addr2.
The data are moved such that the u bytes remaining at
address addr2 are the same data as was originally at address
addr1. If u is zero nothing is moved.
MS +n -- M "m-s"
Delay for approximately +n milliseconds.
NAND 16b1 16b2 -- 16b3
16b3 is the one's complement of the logical AND of 16b1 with
16b2.
NOR 16b1 16b2 -- 16b3
16b3 is the one's complement of the logical OR of 16b1 with
16b2.
NUMBER addr -- d
Convert the count and character string at addr, to a signed
32-bit integer, using the value of BASE . If numeric
conversion is not possible, an error condition exists. The
string may contain a preceding minus sign.
O. u -- M "o-dot"
Print u in octal format with one trailing blank. The value
in BASE is unaffected.
OTHERWISE -- Interpret only
An interpreter-level conditional word. See: IFTRUE
PAGE -- M
Clear the terminal screen or perform a form-feed action
suitable to the output device currently active.
READ-MAP -- M "read-map"
Read to the next file mark on tape constructing a
correspondence table in memory (the map) relating physical
block position to logical block number. The tape should
normally be rewound to its load point before executing READ-
MAP .
62
B. UNCONTROLLED REFERENCE WORDS
REMEMBER -- M
A defining word executed in the form:
REMEMBER <name>
Defines a word which, when executed, will cause <name> and
all subsequently defined words to be deleted from the
dictionary. <name> may be compiled into and executed from a
colon definition. The sequence
DISCARD REMEMBER DISCARD
provides a standardized preface to any group of transient
word definitions.
REWIND -- M
Rewind the tape to its load point, setting CUR equal to one.
ROTATE 16b1 n -- 16b2
Rotate 16b1 left n bits if n is positive, right n bits if n
is negative. Bits shifted out of one end of the cell are
shifted back in at the opposite end.
S0 -- addr U "s-zero"
A variable containing the address of the bottom of the
stack.
SET 16b addr -- M
A defining word executed in the form:
16b addr SET <name>
Defines a word <name> which, when executed, will cause the
value 16b to be stored at addr.
SHIFT 16b1 n -- 16b2
Logical shift 16b1 left n bits if n is positive, right n
bits if n is negative. Zeros are shifted into vacated bit
positions.
TEXT char -- M
Accept characters from the input stream, as for WORD , into
PAD , blank-filling the remainder of PAD to 84 characters.
USER +n -- M
A defining word executed in the form:
+n USER <name>
which creates a user variable <name>. +n is the offset
within the user area where the value for <name> is stored.
Execution of <name> leaves its absolute user area storage
address.
WORDS -- M
List the word names in the first vocabulary of the currently
active search order.
63
B. UNCONTROLLED REFERENCE WORDS
\LOOP +n -- C,I "down-loop"
sys -- (compiling)
A do-loop terminating word. The loop index is decremented
by the positive value +n. If the unsigned magnitude of the
resultant index is less than or equal to the limit, then the
loop is terminated, otherwise execution returns to the
corresponding DO . The comparison is unsigned. sys is
balanced with its corresponding DO . See: DO
64

25
misc/kforth/doc/index.rst Normal file
View File

@@ -0,0 +1,25 @@
Write You a Forth
=================
Contents:
.. toctree::
:maxdepth: 2
part-0x01
part-0x02
part-0x03
part-0x04
part-0x05
part-0x06
part-0x07
part-0x08
part-0x09
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@@ -0,0 +1,116 @@
Write You a Forth, 0x01
-----------------------
:date: 2018-02-21 23:17
:tags: wyaf, forth
Following on from the `last post`_ I've decided to frame this as a Write You an
X-type series where I'll write up my thinking and planning as I go.
.. _last post: https://dl.kyleisom.net/posts/2018/02/21/2018-02-21-revisiting-forth/
I've always wanted to write a Forth_; I've made a few attempts_ at it in the
past. This time, I'm actually going to do it.
.. _Forth: https://en.wikipedia.org/wiki/Forth_(programming_language)
.. _attempts: https://github.com/isrlabs/avr-forth
The basics
^^^^^^^^^^
Let's start with the basics: what are the characteristics of a Forth? First,
it's a stack-based language, so it'll need a stack. Actually, it'll need at
least two stacks --- the data stack and the return stack (where return addresses
are normally stored). Modern Forths also have a floating point stack.
Forth calls functions *words*, and the FORTH-83 standard defines a set of
required words for an implementation. Note that there is an ANS Forth, but I'll
target FORTH-83 first for simplicity. The `required words`_ are:
.. _required words: http://forth.sourceforge.net/standard/fst83/fst83-12.htm)
**Nucleus layer**::
! * */ */MOD + +! - / /MOD 0< 0= 0> 1+ 1- 2+
2- 2/ < = > >R ?DUP @ ABS AND C! C@ CMOVE
CMOVE> COUNT D+ D< DEPTH DNEGATE DROP DUP EXECUTE
EXIT FILL I J MAX MIN MOD NEGATE NOT OR OVER PICK
R> R@ ROLL ROT SWAP U< UM* UM/MOD XOR
**Device layer**::
BLOCK BUFFER CR EMIT EXPECT FLUSH KEY SAVE-BUFFERS
SPACE SPACES TYPE UPDATE
**Interpreter layer**::
# #> #S #TIB ' ( -TRAILING . .( <# >BODY >IN
ABORT BASE BLK CONVERT DECIMAL DEFINITIONS FIND
FORGET FORTH FORTH-83 HERE HOLD LOAD PAD QUIT SIGN
SPAN TIB U. WORD
**Compiler layer**::
+LOOP , ." : ; ABORT" ALLOT BEGIN COMPILE CONSTANT
CREATE DO DOES> ELSE IF IMMEDIATE LEAVE LITERAL LOOP
REPEAT STATE THEN UNTIL VARIABLE VOCABULARY WHILE
['] [COMPILE] ]
In a lot of cases, Forth is also the operating system for the device. This
won't be a target at first, but something to keep in mind as I progress.
Eventually, I'd like to build a zero-allocation Forth that can run on an
STM-32 or an MSP430, but the first goal is going to get a minimal Forth
working. I'll define the stages tentatively as
Stage 1
~~~~~~~
1. Runs on Linux (that's what my Pixelbook runs, more or less).
2. Implements the nucleus layer.
3. Has a REPL that works in a terminal.
4. Explicit non-goal: performance. I'll build a working minimal Forth to get a
baseline experience.
Stage 2
~~~~~~~
1. Implement the compiler and interpreter layers.
Stage 3
~~~~~~~~
1. Define a block layer interface.
2. Implement a Linux block layer interface.
Stage 4
~~~~~~~~
1. Build a memory management system.
2. Replace all managed memory with the homebrew memory management system.
3. Switch to a JPL rule #3 (no heap allocation) implementation.
Next steps
^^^^^^^^^^
I've decided to use C++ for two reasons: it's supported by all the targets I
want (amd64, arm/arm64, msp430, avr), and I know it well enough (and
importantly, I know the tooling) to get by. Typically, the TI compilers lag
behind the others in supporting newer C++ standards, so those will be the
limiting factor. Fortunately, just a few days before I started this, the TI
wiki was updated_ to note that the latest compilers now support C++11 and
C++14, so I'll target C++14.
As a reminder to myself: this is not going to be the prettiest or best or most
secure or production ready code. The goal is to have fun writing some software
again and to rekindle some of the joy of computing that I had before. Once I
have something working, I can go back and make an exercise of cleaning it up
and refactoring it. The prose in this series is also not going to be my finest
writing ever --- again, it suffices just to do it. The goal is to have
something to show, not to achieve perfection; it'll mostly going to be hacked
on while I'm on the bus or when I have a bit of downtime here and there.
.. _updated: http://processors.wiki.ti.com/index.php/C%2B%2B_Support_in_TI_Compilers#Status_as_of_February_2018
I don't really know what I'm doing, so in the next section, I'll build out the
basic framework and set up the build.

View File

@@ -0,0 +1,280 @@
Write You a Forth, 0x02
-----------------------
:date: 2018-02-22 10:48
:tags: wyaf, forth
The basic framework will consist of two main parts:
1. A modular I/O subsystem: on Linux, it makes sense to use the operating
system's terminal I/O features. On the MSP430, there won't be the luxury
of any operating system and I'll have to build out the I/O facilities. The
I/O interface will be defined in ``io.h``; the build system will eventually
have to decide which interface implementation to bring in.
2. A toplevel function (the C++ ``main`` function, for example) that will
handle starting up the Forth system and bring us into an interpreter. We'll
put this in ``kforth.cc``.
The project will also need a build system. For simplicity, I'll at least start
with a basic Makefile::
# Makefile
CXXSTD := c++14
CXXFLAGS := -std=$(CXXSTD) -Werror -Wall -g -O0
OBJS := linux/io.o \
kforth.o
TARGET := kforth
all: $(TARGET)
$(TARGET): $(OBJS)
$(CXX) $(CFLAGS) -o $@ $(OBJS)
clean:
rm -f $(OBJS) $(TARGET)
A simple frontend
^^^^^^^^^^^^^^^^^
Starting out with the most basic front end; we'll first want to include our I/O
interface::
#include "io.h"
If kforth is running on Linux, and it will be for the first stage, the
frontend should pull in Linux specific pieces. ``linux.h`` is the place
to set up the Linux-specific pieces::
#ifdef __linux__
#include "linux.h"
#endif // __linux__
The interpreter function takes an I/O interface instance, and reads lines in
an infinite loop, printing "ok" after each line is read. I'll go over the
methods called on the ``interface`` instance when I get to the I/O subsystem.
Printing the line buffer right now helps to verify that the I/O subsystem is
working correctly::
static char ok[] = "ok.\n";
static void
interpreter(IO &interface)
{
static size_t buflen = 0;
static char linebuf[81];
while (true) {
buflen = interface.rdbuf(linebuf, 80, true, '\n');
interface.wrln(linebuf, buflen);
interface.wrbuf(ok, 4);
}
}
The main function, for right now, can just instantiate a new I/O interface and
then call the interpreter::
static char banner[] = "kforth interpreter\n";
const size_t bannerlen = 19;
int
main(void)
{
#ifdef __linux__
Console interface;
#endif
interface.wrbuf(banner, bannerlen);
interpreter(interface);
return 0;
}
That gives a good interactive test framework that I can use to start playing
with the system. I'm trying to avoid bringing in ``iostream`` directly in order
to force writing and building useful tooling built around the I/O interface.
This is, after all, the Forth ideal: start with a core system, then build your
world on top of that.
The I/O interface
^^^^^^^^^^^^^^^^^
In the truest of C++ fashions, the I/O interface is defined with the ``IO``
abstract base class::
#ifndef __KF_IO_H__
#define __KF_IO_H__
#include "defs.h"
class IO {
public:
// Virtual destructor is required in all ABCs.
virtual ~IO() {};
The two building block methods are the lowest-level. My original plan was to
include these in the interface, but there's one snag with that: line endings.
But, we'll get to that.
::
// Building block methods.
virtual char rdch(void) = 0;
virtual void wrch(char c) = 0;
I could have just made the buffer I/O methods functions inside the ``io.h``
header, but it seems easy to just include them here. I may move them outside
the class later, though.
::
// Buffer I/O.
virtual size_t rdbuf(char *buf, size_t len, bool stopat, char stopch) = 0;
virtual void wrbuf(char *buf, size_t len) = 0;
Line I/O presents some challenges. On a serial console, it's the sequence 0x0d
0x0a; on the Linux terminal, it's 0x0a. Therefore, reading a line is
platform-dependent, and I can't just make this a generic function unless I want
to handle all the cases. And, *surprise surprise*, right now I don't.
::
// Line I/O
virtual bool rdln(char *buf, size_t len, size_t *readlen) = 0;
virtual void wrln(char *buf, size_t len) = 0;
};
#endif // __KF_IO_H__
The Linux implementation is the ``Console`` (as seen in ``main``). The header
file isn't interesting; it's basically a copy of ``io.h`` in ``linux/io.h``.
::
#include <iostream>
#include "../io.h"
#include "io.h"
The building blocks flush I/O. ``getchar`` is used instead of ``cin`` because
the latter skips whitespace. Later, flushing may be removed but it's not a
performance concern yet.
::
char
Console::rdch()
{
std::cout.flush();
return getchar();
}
void
Console::wrch(char c)
{
std::cout.flush();
std::cout << c;
}
The buffer read and write functions are straightforward, and are just built on
top of the character read and write methods.
::
size_t
Console::rdbuf(char *buf, size_t len, bool stopat, char stopch)
{
size_t n = 0;
char ch;
while (n < len) {
ch = this->rdch();
if (stopat && stopch == ch) {
break;
}
buf[n++] = ch;
}
return n;
}
void
Console::wrbuf(char *buf, size_t len)
{
for (size_t n = 0; n < len; n++) {
this->wrch(buf[n]);
}
}
Line reading doesn't reuse the buffer I/O functions, because the latter
doesn't indicate whether the buffer ran out or the line has ended. I could add
length checks and whatnot, but this is straightforward and gives me something
to work with now. Again, the mantra is dumb and works rather than clever. For
now.
::
bool
Console::rdln(char *buf, size_t len, size_t *readlen) {
size_t n = 0;
char ch;
bool line = false;
while (n < len) {
ch = this->rdch();
if (ch == '\n') {
line = true;
break;
}
buf[n++] = ch;
}
if (nullptr != readlen) {
*readlen = n;
}
return line;
}
Line writing, however, can absolutely reuse the buffer and character I/O
methods.
::
void
Console::wrln(char *buf, size_t len)
{
this->wrbuf(buf, len);
this->wrch(0x0a);
}
``defs.h``
^^^^^^^^^^
The common definition file ``defs.h`` is just a front for the actual platform
definitions::
#ifndef __KF_DEFS_H__
#define __KF_DEFS_H__
#ifdef __linux__
#include "linux/defs.h"
#endif
#endif // __KF_DEFS_H__
The Linux definitions in ``linux/defs.h`` just bring in the standard
definitions from the standard library::
#ifndef __KF_LINUX_DEFS_H__
#define __KF_LINUX_DEFS_H__
#include <stddef.h>
#endif
Next steps
^^^^^^^^^^
I guess the next thing to do will be to start parsing.
Some housekeeping: I'll keep the state of the code at each part in
the tag ``part-$PART``; this part, for example is in the tag
`part-0x02`_.
.. _part-0x02: https://github.com/kisom/kforth/tree/part-0x02

View File

@@ -0,0 +1,293 @@
Write You a Forth, 0x03
-----------------------
:date: 2018-02-23 09:36
:tags: wyaf, forth
Today, I'm working on parsing. I was talking to `steveo
<https://github.com/steveo>`_ yesterday, and he mentioned string interning, and
it sounded like a fun thing to do (and then I started thinking about ropes and
so on).
However, I'm not going to intern strings --- at least, not yet. I'm going to do
something way more primitive::
bool match_token(const char *a, const size_t alen,
const char *b, const size_t blen)
{
if (alen != blen) {
return false;
}
return memcmp(a, b, alen) == 0;
}
I'd also like to operate on a buffer without having to store a bunch of copies
of strings. Performance may not be the number one concern here, but I think
it'll be more fun to implement, and it will be a little easier. The parser
should return the next token that we can push off to the rest of the process.
It seems like we'll want a structure for that.
``parser.h``
^^^^^^^^^^^^
The parser seems like it really only needs a few things, so time to take a stab at
``parser.h``::
#ifndef __KF_PARSER_H__
#define __KF_PARSER_H__
#include "defs.h"
A ``Token`` can be defined as just the pointer to the start of the token and
its length. There's a limit to the maximum size of the buffer, and it'll be
important to check the length of the token. For simplicity, I'm going to define
the maximum length of a token as 16, and I'll put this as a ``constexpr`` in the
``defs.h`` file.
::
struct Token {
char *token;
uint8_t length;
};
Next up is to define the function from before for matching tokens.
::
bool match_token(const char *, const size_t, const char *, const size_t);
The meat of the parser is `parse_next`, for which we'll also need some return codes.
::
typedef enum _PARSE_RESULT_ : uint8_t {
PARSE_OK = 0, // token now has a valid token.
PARSE_EOB = 1, // end of buffer, parsing a line should stop.
PARSE_LEN = 2, // token is too long
PARSE_FAIL = 3 // catch-all error
} PARSE_RESULT;
int parse_next(const char *, const size_t, size_t *, struct Token *);
#endif // __KF_PARSER_H__
``parser.cc``
^^^^^^^^^^^^^^
``parser.cc`` will open with a helper to reset tokens and the same
matching code I mentioned before::
#include "defs.h"
#include "parser.h"
#include <string.h>
static void
reset(struct Token *t)
{
t->token = nullptr;
t->length = 0;
}
bool
match_token(const char *a, const size_t alen,
const char *b, const size_t blen)
{
if (alen != blen) {
return false;
}
return memcmp(a, b, alen) == 0;
}
At the start of the parser, I'm going to reset the token; if there's a failure,
there shouldn't be a valid token anyhow.
::
PARSE_RESULT
parse_next(const char *buf, const size_t length, size_t *offset,
struct Token *token)
{
size_t cursor = *offset;
// Clear the token.
reset(token);
If the offset is already at the end of the buffer, there's no more work to do
on this buffer, so I'll cut out early ``PARSE_EOB``. If I was doing a more
careful job of programming this, I'd *generally* try to avoid multiple returns,
but in this case, having working code is more important than awesome code.
::
if (cursor == length) {
return PARSE_EOB;
}
I'm going to assume that tokens are separated by spaces or tabs. I wasn't going
to support tabs at first, but it's easy enough to do that I just included it.
::
while (cursor <= length) {
if (buf[cursor] != ' ') {
if (buf[cursor] != '\t') {
break;
}
}
cursor++;
}
This part might seem superfluous, but it's important in case there's trailing
whitespace in the buffer. I haven't touched the token yet, so no need to reset
it.
::
if (cursor == length) {
return PARSE_EOB;
}
Now I can point the token to the buffer at the start of the next token and walk
through the buffer until the end of the buffer or the first whitespace
character::
token->token = (char *)buf + cursor;
while ((token->length <= MAX_TOKEN_LENGTH) && (cursor < length)) {
if (buf[cursor] != ' ') {
if (buf[cursor] != '\t') {
cursor++;
token->length++;
continue;
}
}
This got me at first and took me a few minutes to figure out. If the cursor
isn't updated at the end, the next run of the parser is going to be stuck on
this word as the cursor doesn't point to whitespace anymore.
::
cursor++;
break;
}
Finally, if the token length hasn't been exceeded, the offset can be updated
and the token returned::
if (token->length > MAX_TOKEN_LENGTH) {
reset(token);
return PARSE_LEN;
}
*offset = cursor;
return PARSE_OK;
}
``kforth.cc``
^^^^^^^^^^^^^
That's all of ``parse.cc`` (at least for now), but this needs to be integrated
into the frontend. ``kforth.cc`` now starts off with::
#include "io.h"
#include "parser.h"
#include <stdlib.h>
#ifdef __linux__
#include "linux.h"
#endif // __linux__
static char ok[] = "ok.\n";
static char bye[] = "bye";
static bool
parser(IO &interface, const char *buf, const size_t buflen)
{
static size_t offset = 0;
static struct Token token;
static PARSE_RESULT result = PARSE_FAIL;
offset = 0;
// reset token
token.token = nullptr;
token.length = 0;
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
interface.wrbuf((char *)"token: ", 7);
interface.wrbuf(token.token, token.length);
interface.wrln((char *)".", 1);
There's no command parser right now, so I've added in this hack so it starts to
feel a little like a Forth.
::
if (match_token(token.token, token.length, bye, 3)) {
interface.wrln((char *)"Goodbye!", 8);
exit(0);
}
}
switch (result) {
case PARSE_EOB:
interface.wrbuf(ok, 4);
return true;
case PARSE_LEN:
interface.wrln((char *)"parse error: token too long", 27);
return false;
case PARSE_FAIL:
interface.wrln((char *)"parser failure", 14);
return false;
default:
interface.wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
}
static void
interpreter(IO &interface)
{
static size_t buflen = 0;
static char linebuf[81];
while (true) {
interface.wrch('?');
interface.wrch(' ');
buflen = interface.rdbuf(linebuf, 80, true, '\n');
The return value is being ignored right now, but later on it might be useful.
::
parser(interface, linebuf, buflen);
}
}
But does it work?
::
~/code/kforth (0) $ make
g++ -std=c++14 -Wall -Werror -g -O0 -c -o linux/io.o linux/io.cc
g++ -std=c++14 -Wall -Werror -g -O0 -c -o parser.o parser.cc
g++ -std=c++14 -Wall -Werror -g -O0 -c -o kforth.o kforth.cc
g++ -o kforth linux/io.o parser.o kforth.o
~/code/kforth (0) $ ./kforth
kforth interpreter
? 2 3 4 + * 1 SWAP
token: 2.
token: 3.
token: 4.
token: +.
token: *.
token: 1.
token: SWAP.
ok.
? thistokenistoolong!
parse error: token too long
bye
token: bye.
Goodbye!
~/code/kforth (0) $
Heyo! Now I'm getting somewhere. The next logical step (to me) is to add in a
command parser and a standard vocabulary.
The snapshot of the code from here is in the tag part-0x03_.
.. _part-0x03: https://github.com/kisom/kforth/tree/part-0x03

View File

@@ -0,0 +1,395 @@
Write You a Forth, 0x04
-----------------------
:date: 2018-02-23 19:20
:tags: wyaf, forth
So, I lied about words being next. When I thought about it some more, what I
really need to do is start adding the stack in and adding support for parsing
numerics. I'll start with the stack, because it's pretty straightforward.
I've added a new definition: ``constexpr uint8_t STACK_SIZE = 128``. This goes
in the ``linux/defs.h``, and the ``#else`` in the top ``defs.h`` will set a
smaller stack size for other targets. I've also defined a type called ``KF_INT``
that, on Linux, is a ``uint32_t``::
index 4dcc540..e070d27 100644
--- a/defs.h
+++ b/defs.h
@@ -3,6 +3,9 @@
#ifdef __linux__
#include "linux/defs.h"
+#else
+typedef int KF_INT;
+constexpr uint8_t STACK_SIZE = 16;
#endif
constexpr size_t MAX_TOKEN_LENGTH = 16;
diff --git a/linux/defs.h b/linux/defs.h
index 57cdaeb..3740f5a 100644
--- a/linux/defs.h
+++ b/linux/defs.h
@@ -4,4 +4,7 @@
#include <stddef.h>
#include <stdint.h>
+typedef int32_t KF_INT;
+constexpr uint8_t STACK_SIZE = 128;
+
#endif
\ No newline at end of file
It seems useful to be able to adapt the kind of numbers supported; an AVR might do
better with 16-bit integers, for example.
``stack.h``
^^^^^^^^^^^
The stack is going to be templated, because we'll need a ``double`` stack later
for floating point and a return address stack later. This means everything will
go under ``stack.h``. This is a pretty simple implementation that's CS 101 material;
I've opted to have the interface return ``bool``\ s for everything to indicate stack
overflow and underflow and out of bounds::
#ifndef __KF_STACK_H__
#define __KF_STACK_H__
#include "defs.h"
template <typename T>
class Stack {
public:
bool push(T val);
bool pop(T &val);
bool get(size_t, T &);
size_t size(void) { return this->arrlen; };
private:
T arr[STACK_SIZE];
size_t arrlen;
};
// push returns false if there was a stack overflow.
template <typename T>
bool
Stack<T>::push(T val)
{
if ((this->arrlen + 1) > STACK_SIZE) {
return false;
}
this->arr[this->arrlen++] = val;
return true;
}
// pop returns false if there was a stack underflow.
template <typename T>
bool
Stack<T>::pop(T &val)
{
if (this->arrlen == 0) {
return false;
}
val = this->arr[this->arrlen - 1];
this->arrlen--;
}
// get returns false on invalid bounds.
template <typename T>
bool
Stack<T>::get(size_t i, T &val)
{
if (i > this->arrlen) {
return false;
}
val = this->arr[i];
return true;
}
#endif // __KF_STACK_H__
I'll put a ``Stack<KF_INT>`` in ``kforth.cc`` later on. For now, this gives me
an interface for the numeric parser to push a number onto the stack.
``parse_num``
^^^^^^^^^^^^^
It seems like the best place for this is in ``parser.cc`` --- though I might
move into a token processor later. The definition for this goes in ``parser.h``,
and the body is in ``parser.cc``::
// parse_num tries to parse the token as a signed base 10 number,
// pushing it onto the stack if needed.
bool
parse_num(struct Token *token, Stack<KF_INT> &s)
{
KF_INT n = 0;
uint8_t i = 0;
bool sign = false;
It turns out you can't parse a zero-length token as a number...
::
if (token->length == 0) {
return false;
}
I'll need to invert the number later if it's negative, but it's worth checking
the first character to see if it's negative.
::
if (token->token[i] == '-') {
i++;
sign = true;
}
Parsing is done by checking whether each character is within the range of the ASCII
numeral values. Later on, I might add in separate functions for processing base 10
and base 16 numbers, and decide which to use based on a prefix (like ``0x``). If the
character is between those values, then the working number is multiplied by 10 and
the digit added.
::
while (i < token->length) {
if (token->token[i] < '0') {
return false;
}
if (token->token[i] > '9') {
return false;
}
n *= 10;
n += (uint8_t)(token->token[i] - '0');
i++;
}
If it was a negative number, then the working number has to be inverted::
if (sign) {
n *= -1;
}
Finally, return the result of pushing the number on the stack. One thing that
might come back to get me later is that this makes it impossible to tell if a
failure to parse the number is due to an invalid number or due to a stack
overflow. This will be a good candidate for revisiting later.
::
return s.push(n);
}
``io.cc``
^^^^^^^^^^
Conversely, it'll be useful to write a number to an ``IO`` interface. It
*seems* more useful right now to just provide a number → I/O function, but
that'll be easily adapted to a number → buffer function later. This will add
a real function to ``io.h``, which will require a corresponding ``io.cc``
(which also needs to be added to the ``Makefile``)::
#include "defs.h"
#include "io.h"
#include <string.h>
void
write_num(IO &interface, KF_INT n)
{
Through careful scientific study, I have determined that most number of digits
that a 32-bit integer needs is 10 bytes (sans the sign!). This will absolutely
need to be changed if ``KF_INT`` is ever moved to 64-bit (or larger!) numbers.
There's a TODO in the actual source code that notes this. ::
char buf[10];
uint8_t i = 10;
memset(buf, 0, 10);
Because this is going out to an I/O interface, I don't need to store the sign
in the buffer itself and can just print it and invert the number. Inverting is
important; I ran into a bug earlier where I didn't invert it and my subtractions
below were correspondingly off.
::
if (n < 0) {
interface.wrch('-');
n *= -1;
}
The buffer has to be filled from the end to the beginning to do the inverse of
the parsing method::
while (n != 0) {
char ch = (n % 10) + '0';
buf[i--] = ch;
n /= 10;
}
But then it can be just dumped to the interface::
interface.wrbuf(buf+i, 11-i);
}
``kforth.cc``
^^^^^^^^^^^^^^
And now I come to the fun part: adding the stack in. After including ``stack.h``,
I've added a stack implementation to the top of the file::
// dstack is the data stack.
static Stack<KF_INT> dstack;
It's kind of useful to be able to print the stack::
static void
write_dstack(IO &interface)
{
KF_INT tmp;
interface.wrch('<');
for (size_t i = 0; i < dstack.size(); i++) {
if (i > 0) {
interface.wrch(' ');
}
dstack.get(i, tmp);
write_num(interface, tmp);
}
interface.wrch('>');
}
Surrounding the stack in angle brackets is a cool stylish sort of thing, I
guess. All this is no good if the interpreter isn't actually hooked up to the
number parser::
// The new while loop in the parser function in kforth.cc:
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
interface.wrbuf((char *)"token: ", 7);
interface.wrbuf(token.token, token.length);
interface.wrln((char *)".", 1);
if (!parse_num(&token, dstack)) {
interface.wrln((char *)"failed to parse numeric", 23);
}
// Temporary hack until the interpreter is working further.
if (match_token(token.token, token.length, bye, 3)) {
interface.wrln((char *)"Goodbye!", 8);
exit(0);
}
}
But does it blend?
^^^^^^^^^^^^^^^^^^
Hopefully this works::
~/code/kforth (0) $ make
g++ -std=c++14 -Wall -Werror -g -O0 -c -o linux/io.o linux/io.cc
g++ -std=c++14 -Wall -Werror -g -O0 -c -o io.o io.cc
g++ -std=c++14 -Wall -Werror -g -O0 -c -o parser.o parser.cc
g++ -std=c++14 -Wall -Werror -g -O0 -c -o kforth.o kforth.cc
g++ -o kforth linux/io.o io.o parser.o kforth.o
~/code/kforth (0) $ ./kforth
kforth interpreter
<>
? 2 -2 30 1000 -1010
token: 2.
token: -2.
token: 30.
token: 1000.
token: -1010.
ok.
<2 -2 30 1000 -1010>
? bye
token: bye.
failed to parse numeric
Goodbye!
~/code/kforth (0) $
So there's that. Okay, next time *for real* I'll do a vocabulary thing.
As before, see the tag `part-0x04 <https://github.com/kisom/kforth/tree/part-0x04>`_.
Part B
^^^^^^^
So I was feeling good about the work above until I tried to run this on my
Pixelbook::
$ ./kforth
kforth interpreter
<>
? 2
token: 2.
ok.
<5>
WTF‽ I spent an hour debugging this to realise it was a bounds overflow in
``write_num``. This led me to checking the behaviour of the maximum and
minimum values of ``KF_INT`` which led to me revising ``io.cc``::
#include "defs.h"
#include "io.h"
#include <string.h>
static constexpr size_t nbuflen = 11;
void
write_num(IO &interface, KF_INT n)
{
// TODO(kyle): make the size of the buffer depend on the size of
// KF_INT.
char buf[nbuflen];
uint8_t i = nbuflen;
memset(buf, 0, i);
bool neg == n < 0;
if (neg) {
interface.wrch('-');
n = ~n;
}
while (n != 0) {
char ch = (n % 10) + '0';
if (neg && (i == nbuflen)) ch++;
This was the source of the actual bug: ``buf[i]`` where ``i`` == ``nbuflen``
was stomping over the value of ``n``, which is stored on the stack, too.
::
buf[i-1] = ch;
i--;
n /= 10;
}
uint8_t buflen = nbuflen - i % nbuflen;
interface.wrbuf(buf+i, buflen);
}
A couple of things here: first, the magic numbers were driving me crazy. It
didn't fix the problem, but I changed all but one of the uses of them at one
point and forgot one. So, now I'm doing the right thing (or the more right
thing) and using a ``constexpr``. Another thing is changing from ``n *= -1``
to ``n = ~n``. This requires the check for ``neg && (i == nbuflen)`` to add
one to get it right, but it handles the case where *n* = -2147483648::
(gdb) p -2147483648 * -1
$1 = 2147483648
(gdb) p ~(-2147483648)
$2 = 2147483647
Notice that *$1* will overflow a ``uint32_t``, which means it will wrap back
around to -2147483648, which means negating it this way has no effect. *~n + 1*
is a two's complement.
Finally, I made sure to wrap the buffer length so that we never try to write a
longer buffer than the one we have.
I feel dumb for making such a rookie mistake, but I suppose that's what
happens when you stop programming for a living. The updated code is under the
tag `part-0x04-update <https://github.com/kisom/kforth/tree/part-0x04-update>`_.

View File

@@ -0,0 +1,429 @@
Write You a Forth, 0x05
-----------------------
:date: 2018-02-27 08:06
:tags: wyaf, forth
NB: Today's update was pretty large, so I don't show all of the code; this is
what ``git`` is for.
Today I need to start actually doing things with tokens. This requires two
things:
1. Some idea of what a word is, and
2. A dictionary of words
I started taking some notes on this previously, and I think there are a few
kinds of words that are possible:
1. Numbers (e.g. defining a variable)
2. Built-in functions
3. Lambda functions (that is, user-defined functions).
Stage 1 really only needs to incorporate #2, so that's what I'll focus on for
now. However, to prepare for the future, I'm going to define a ``Word`` base
class and inherit from there. This interface is going to need to be
stack-aware, so what I've done is define a ``System`` struct in ``system.h``::
#ifndef __KF_CORE_H__
#define __KF_CORE_H__
#include "defs.h"
#include "stack.h"
typedef struct _System {
Stack<KF_INT> dstack;
IO *interface;
} System;
#endif // __KF_CORE_H__
This will let me later add in support for the return stack and other things
that might be useful. Other ideas: adding something like an ``errno``
equivalent to indicate the last error, and storing a dictionary of words. This
will need some restructuring, though. I've already moved the I/O into the
system as well. This took some finangling in ``kforth.cc``; I'm eliding the
diff here because it's so long, but it's basically a ``sed -i -e
's/interface./sys->interface.``.
The Word interface
^^^^^^^^^^^^^^^^^^
Now I can start defining a Word.h. Maybe this is a case of when you have an
object-oriented language, everything looks like a class, but I decided to
design an abstract class for a Word and implement the first concrete class,
**Builtin**. What I came up with was::
class Word {
public:
virtual ~Word() {};
The *eval* method takes a ``System`` structure and executes some function.
::
virtual bool eval(System *) = 0;
The dictionary is a linked list, so next is used to traverse the list.
::
virtual Word *next(void) = 0;
The ``match`` method is used to determine whether this is the word being
referred to.
::
virtual bool match(struct Token *) = 0;
Finally, ``getname`` will fill in a ``char[MAX_TOKEN_SIZE]`` buffer with the
word's name.
::
virtual void getname(char *, size_t *) = 0;
};
With the interface defined, I can implement ``Builtins`` (I've elided the
common interface from the listing below to focus on the implementation)::
class Builtin : public Word {
public:
~Builtin() {};
Builtin(const char *name, size_t namelen, Word *head, bool (*fun)(System *));
private:
char name[MAX_TOKEN_LENGTH];
size_t namelen;
Word *prev;
bool (*fun)(System *);
};
The ``bool`` works as a first pass, but I think I'm going to have add some
notion of system conditions later on to denote why execution failed. One thing
that both ``pforth`` and ``gforth`` do that I don't yet do is to clear the
stack when there's an execution failure. At least, they clear the stack with an
unrecognised word. The implementation is pretty trivial::
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
#include <string.h>
Builtin::Builtin(const char *name, size_t namelen, Word *head, bool (*target)(System *))
: prev(head), fun(target)
{
memcpy(this->name, name, namelen);
this->namelen = namelen;
}
bool
Builtin::eval(System *sys)
{
return this->fun(sys);
}
Word *
Builtin::next()
{
return this->prev;
}
bool
Builtin::match(struct Token *token)
{
return match_token(this->name, this->namelen, token->token, token->length);
}
void
Builtin::getname(char *buf, size_t *buflen)
{
memcpy(buf, this->name, this->namelen);
*buflen = namelen;
}
Right. Now to do something with this.
The system dictionary
^^^^^^^^^^^^^^^^^^^^^
The dictionary's interface is minimal::
// dict.h
#ifndef __KF_DICT_H__
#define __KF_DICT_H__
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
typedef enum _LOOKUP_ : uint8_t {
LOOKUP_OK = 0, // Lookup executed properly.
LOOKUP_NOTFOUND = 1, // The token isn't in the dictionary.
LOOKUP_FAILED = 2 // The word failed to execute.
} LOOKUP;
void init_dict(System *);
LOOKUP lookup(struct Token *, System *);
#endif // __KF_DICT_H__
There's a modicum of differentiation between "everything worked" and "no it
didn't," and by that I mean the lookup can tell you if the word wasn't found
or if there was a problem executing it.
I added a ``struct Word *dict`` field to the ``System`` struct, since we're
operating on these anyways. I guess it's best to start with the lookup function
first so that when I started adding builtins later it'll be easy to just
recompile and use them.
::
LOOKUP
lookup(struct Token *token, System *sys)
{
Word *cursor = sys->dict;
KF_INT n;
I seem to recall from *Programming a Problem-Oriented Language* that Chuck
Moore advocated checking whether a token was a number before looking it up
in the dictionary, so that's the approach I'll take::
if (parse_num(token, &n)) {
if (sys->dstack.push(n)) {
return LOOKUP_OK;
}
return LOOKUP_FAILED;
}
The remainder is pretty much bog-standard linked list traversal::
while (cursor != nullptr) {
if (cursor->match(token)) {
if (cursor->eval(sys)) {
return LOOKUP_OK;
}
return LOOKUP_FAILED;
}
cursor = cursor->next();
}
return LOOKUP_NOTFOUND;
}
This needs to get hooked up into the interpreter now; this is going to require
some reworking of the ``parser`` function::
static bool
parser(const char *buf, const size_t buflen)
{
static size_t offset = 0;
static struct Token token;
static PARSE_RESULT result = PARSE_FAIL;
static LOOKUP lresult = LOOKUP_FAILED;
static bool stop = false;
offset = 0;
// reset token
token.token = nullptr;
token.length = 0;
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
lresult = lookup(&token, &sys);
switch (lresult) {
case LOOKUP_OK:
continue;
case LOOKUP_NOTFOUND:
sys.interface->wrln((char *)"word not found", 15);
stop = true;
break;
case LOOKUP_FAILED:
sys.interface->wrln((char *)"execution failed", 17);
stop = true;
break;
default:
sys.interface->wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
if (stop) {
stop = false;
break;
}
}
switch (result) {
case PARSE_OK:
return false;
case PARSE_EOB:
sys.interface->wrbuf(ok, 4);
return true;
case PARSE_LEN:
sys.interface->wrln((char *)"parse error: token too long", 27);
return false;
case PARSE_FAIL:
sys.interface->wrln((char *)"parser failure", 14);
return false;
default:
sys.interface->wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
}
Now I feel like I'm at the part where I can start adding in functionality. The
easiest first builtin: addition. Almost impossible to screw this up, right?
::
static bool
add(System *sys)
{
KF_INT a = 0;
KF_INT b = 0;
if (!sys->dstack.pop(&a)) {
return false;
}
if (!sys->dstack.pop(&b)) {
return false;
}
b += a;
return sys->dstack.push(b);
}
Now this needs to go into the ``init_dict`` function::
void
init_dict(System *sys)
{
sys->dict = nullptr;
sys->dict = new Builtin((const char *)"+", 1, sys->dict, add);
}
And this needs to get added into the ``main`` function::
int
main(void)
{
init_dict(&sys);
#ifdef __linux__
Console interface;
sys.interface = &interface;
#endif
sys.interface->wrbuf(banner, bannerlen);
interpreter();
return 0;
}
The moment of truth
^^^^^^^^^^^^^^^^^^^
Hold on to your pants, let's see what breaks::
$ ./kforth
kforth interpreter
<>
? 2 3 +
ok.
<5>
Oh hey, look, it actually works. Time to add a few more definitions for good
measure:
+ the basic arithmetic operators `-`, `*`, `/`
+ the classic `SWAP` and `ROT` words
+ `DEFINITIONS` to look at all the definitions in the language
These are all pretty simple, fortunately. A few things that tripped me up,
though:
+ The *a* and *b* names kind of threw me off because I fill *a* first. This
means it's the last number on the stack; this didn't matter for addition,
but in subtraction, it means I had to be careful to do ``b -= a`` rather
than the other way around.
+ pforth and gfortran both support case insensitive matching, so I had to
modify the token matcher::
bool
match_token(const char *a, const size_t alen,
const char *b, const size_t blen)
{
if (alen != blen) {
return false;
}
for (size_t i = 0; i < alen; i++) {
if (a[i] == b[i]) {
continue;
}
if (!isalpha(a[i]) || !isalpha(b[i])) {
return false;
}
The XOR by 0x20 is just a neat trick for inverting the case of a letter.
::
if ((a[i] ^ 0x20) == b[i]) {
continue;
}
if (a[i] == (b[i] ^ 0x20)) {
continue;
}
return false;
}
return true;
}
+ I forgot to include the case for ``PARSE_OK`` in the result switch statement
in the ``parser`` function, so I could get a line of code evaluated but then
it'd die with "the world is broken."
+ When I tried doing some division, I ran into some weird issues::
$ ./kforth
kforth interpreter
<>
? 2 5040 /
ok.
<<3C><>>
It turns out that in ``write_num``, the case where *n = 0* results in nothing
happening, and therefore the buffer just being written. This is a dirty thing,
but I edge cased this::
$ git diff io.cc
diff --git a/io.cc b/io.cc
index 77e0e2a..a86156b 100644
--- a/io.cc
+++ b/io.cc
@@ -24,6 +24,10 @@ write_num(IO *interface, KF_INT n)
n++;
}
}
+ else if (n == 0) {
+ interface->wrch('0');
+ return;
+ }
while (n != 0) {
char ch = (n % 10) + '0';
May the compiler have mercy on my soul and whatnot.
For you sports fans keeping track at home, here's the classic bugs I've
introduced so far:
1. bounds overrun
2. missing case in a switch statement
But now here I am with the interpreter in good shape. Now I can start
implementing the builtins in earnest!
As before, see the tag `part-0x05 <https://github.com/kisom/kforth/tree/part-0x05>`_.

View File

@@ -0,0 +1,324 @@
Write You a Forth, 0x06
-----------------------
:date: 2018-02-28 22:55
:tags: wyaf, forth
Lots of updates last night; SLOC-wise, I added a bunch of new definitions:
+ ``DEPTH``, ``.`` and ``.S`` to inspect the stack
+ ``/MOD``, ``*/``, and ``*/MOD``, which required adding some idea of a long
type
+ ``0<``, ``0=``, ``0>``, ``<``, ``=``, and ``>`` for conditionals
+ ``DUP`` and ``?DUP``
+ the logical operators ``AND``, ``OR``, and ``NEGATE``
+ ``ABS``
+ ``BYE`` moved from an interpreter hack to a defined word
+ ``D+`` and ``D-`` started me off on the concept of double numbers
+ ``DROP``, ``OVER``, and ``ROLL`` are more stack manipulation functions
It's starting to feel a lot like a Forth...
Speaking of SLOC, for shits and grins I decided to see how the code base has
grown:
+-----------+---------------+--------+----------------------+---------------+
| revision | lines of code | growth | focus | exec size (b) |
+-----------+---------------+--------+----------------------+---------------+
| 0x02 | 133 | n/a | starting point | 38368 |
+-----------+---------------+--------+----------------------+---------------+
| 0x03 | 245 | 1.8x | parsing | 40920 |
+-----------+---------------+--------+----------------------+---------------+
| 0x04 | 369 | 1.5x | stack / numerics | 48736 |
+-----------+---------------+--------+----------------------+---------------+
| 0x05 | 677 | 1.8x | initial dictionary | 62896 |
+-----------+---------------+--------+----------------------+---------------+
| 0x06 | 1436 | 2.1x | expanding vocabulary | 85256 |
+-----------+---------------+--------+----------------------+---------------+
Note that the executable is compiled with ``-O0 -g`` on the
``x86_64-linux-gnu`` target.
It makes sense that expanding the vocabulary is going to be a huge code
expansion. I did do more than that; so, I'm not really going to show most of
the work I did for the new words (a lot of it is repetative and mechanical).
System updates
^^^^^^^^^^^^^^
Before I started expanding the dictionary, though, I made some changes to
the ``System``::
$ git diff HEAD^ system.h
diff --git a/system.h b/system.h
index 00f4a34..91aa1fa 100644
--- a/system.h
+++ b/system.h
@@ -5,11 +5,24 @@
#include "io.h"
#include "stack.h"
+typedef enum _SYS_STATUS : uint8_t {
+ STATUS_OK = 0,
+ STATUS_STACK_OVERFLOW = 1,
+ STATUS_STACK_UNDERFLOW = 2,
+ STATUS_EXECUTION_FAILURE = 3,
+ STATUS_UNKNOWN_WORD = 4
+} SYS_STATUS;
+
+class Word;
+
typedef struct _System {
Stack<KF_INT> dstack;
IO *interface;
- struct Word *dict;
+ Word *dict;
+ SYS_STATUS status;
} System;
+void system_clear_error(System *sys);
+void system_write_status(System *sys);
#endif // __KF_CORE_H__
\ No newline at end of file
I've started adding a notion of system state, which I've deliberately kept
separate from the parser state. The new functions aren't particularly
interesting; they just write a string to the ``interface`` field so you
get things like::
$ ./kforth
kforth interpreter
? swap
stack underflow (error code 2).
? what-word?
unknown word (error code 4).
? 2
ok.
Note that this is separate from the parser errors::
$ ./kforth
kforth interpreter
? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ok.
parse error: token too long
? unknown word (error code 4).
?
Though this test does show that the interpreter could be made more robust.
I/O updates
^^^^^^^^^^^
The next thing I did was move the ``write_dstack`` function into ``io.cc``;
this is needed to implement ``.S``. While I was at it, I decided to make
``write_num`` finally work well and correctly, and I think I've got the final
version done::
void
write_num(IO *interface, KF_INT n)
{
char buf[nbuflen];
uint8_t i = nbuflen - 1;
memset(buf, 0, nbuflen);
if (n < 0) {
interface->wrch('-');
}
I'm still not proud of this hack, but it seems to be the best way to deal with
this right now::
else if (n == 0) {
interface->wrch('0');
return;
}
while (n != 0) {
char x = n % 10;
This was the magic that finally got it right: negating the digits as they're
going into the buffer. No more trying to invert the whole number, just each
digit::
x = x < 0 ? -x : x;
x += '0';
buf[i--] = x;
n /= 10;
}
interface->wrbuf(buf+i, nbuflen - i);
}
My first pass at this wrote the string forwards, then reversed it. I didn't
like that; while performance isn't my first concern, it just seemed like a
fun challenge to get the reversed buffer written correctly.
While I was in the I/O subsystem, I also decided to update the ``IO`` base
class to include a ``newline`` method; I had a few instances of
``interface->wrch('\n')`` sprinkled throughout, but that won't necessarily be
correct elsewhere.
Miscellaneous updates
^^^^^^^^^^^^^^^^^^^^^^
I add a new definition to the ``defs.h`` files: a ``KF_LONG`` type to prepare
for the double numbers mentioned in the next section, and switched to static
compilation.
New words!
^^^^^^^^^^
Finally, I started adding the new words in. I'm still trying to figure out a
good way to handle the address types (I think I'll just introduce a ``KF_ADDR``
type) so I've punted on those for now.
.. _pforth: http://www.softsynth.com/pforth/
.. _gforth: https://www.gnu.org/software/gforth/
One of the interesting challenges is dealing with the double numbers. These are
on the stack as a pair of smaller numbers, e.g. if the double number type is 64
bits and the standard number type is 32 bits, then you might see something like
this (via pforth_)::
0 1 0 1 D+
ok
Stack<10> 0 2
So, how to deal with this? There's a ``D.`` word, which I don't have
implemented yet, that will let me see what pforth_ and gforth_ do::
$ pforth -q
Begin AUTO.INIT ------
0 1 D. 1 0 D.
4294967296 1
^C
$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
0 1 D. 1 0 D. 18446744073709551616 1 ok
So, it looks like the first number on the stack is the low part, and the second
is the high part. This is, once again, pretty straightforward: I'll need to
shift the first number by the appropriate number of bits and then add the
second number to it.
::
constexpr size_t dshift = (sizeof(KF_INT) * 8) - 1;
static bool
pop_long(System *sys, KF_LONG *d)
{
KF_INT a = 0;
KF_INT b = 0;
if (!sys->dstack.pop(&a)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
if (!sys->dstack.pop(&b)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
*d = static_cast<KF_LONG>(a) << dshift;
*d += static_cast<KF_LONG>(b);
sys->status = STATUS_OK;
return true;
}
This function also shows off the new status work and how that turns out. I've
kept the exec interface as a boolean to indicate success or failure.
To push the results back onto the stack, I needed to first write a masking
function to make sure to clear out any lingering bits::
static inline KF_INT
mask(size_t bits)
{
KF_INT m = 0;
for (size_t i = 0; i < bits; i++) {
m += 1 << i;
}
return m;
}
I should probably check `Hacker's Delight <http://hackersdelight.org/>`_ to see
if there's any tricks for this.
With the mask available, getting a long into a pair of ints requires shifting
and clearing for the high part and clearing for the low part::
static bool
push_long(System *sys, KF_LONG d)
{
KF_INT a = static_cast<KF_INT>((d >> dshift) & mask(dshift));
KF_INT b = static_cast<KF_INT>(d & mask(dshift));
if (!sys->dstack.push(b)) {
sys->status = STATUS_STACK_OVERFLOW;
return false;
}
if (!sys->dstack.push(a)) {
sys->status = STATUS_STACK_OVERFLOW;
return false;
}
sys->status = STATUS_OK;
return true;
}
One of the words that interacts with doubles is ``D+``::
static bool
dplus(System *sys)
{
KF_LONG da, db;
if (!pop_long(sys, &da)) {
// Status is already set.
return false;
}
if (!pop_long(sys, &db)) {
// Status is already set.
return false;
}
da += db;
if (!push_long(sys, da)) {
// Status is already set.
return false;
}
// Status is already set.
return true;
}
The only other thing I really did was to add a ``remove`` method to the Stack
class to support ``ROLL``.
Huge diff, but not as much to say about it --- next up, I think I'm going to
introduce the ``KF_ADDR`` type and start working on some of the address
interaction stuff. I'll also add more of the double number words, too. The
words I still have to implement from the `FORTH-83 standard`_ nuclear layer
are:
+ ``!``, ``+!``, ``@``, ``C!``, ``C@``, ``CMOVE``, ``CMOVE>``, ``COUNT``,
``FILL``: memory manipulation words
+ ``DNEGATE``, ``MAX``, ``MIN``, ``MOD``, ``XOR``: more arithmetic words
+ ``EXECUTE``, ``EXIT``, ``I``, ``J``, ``PICK``: various words
+ ``>R``, ``R>``, ``R@``: return stack words
+ ``U<``, ``UM*``, ``UM/MOD``: unsigned math words
.. _FORTH-83 standard: http://forth.sourceforge.net/standard/fst83/fst83-12.htm
As before, the snapshot for this update is tagged `part-0x06
<https://github.com/kisom/kforth/tree/part-0x06>`_.

View File

@@ -0,0 +1,365 @@
Write You a Forth, 0x07
-----------------------
:date: 2018-03-01 19:31
:tags: wyaf, forth
At this point, I've finished most of the nucleus layer. All that's left to
implement are ``EXIT``, ``I``, and ``J`` --- the first requires better
execution support, which I'll talk about at the end. The other two, I'm not so
sure about yet.
However, I made some large changes, so let's dive in. Here's the new Linux
definitions file::
#ifndef __KF_LINUX_DEFS_H__
#define __KF_LINUX_DEFS_H__
#include <stddef.h>
#include <stdint.h>
typedef int32_t KF_INT;
typedef uint32_t KF_UINT;
typedef int64_t KF_LONG;
typedef uintptr_t KF_ADDR;
constexpr uint8_t STACK_SIZE = 128;
constexpr size_t ARENA_SIZE = 65535;
#endif
I've also updated the main ``defs.h`` file to move some constants there::
#ifndef __KF_DEFS_H__
#define __KF_DEFS_H__
#ifdef __linux__
#include "linux/defs.h"
#else
typedef int KF_INT;
typedef long KF_LONG;
constexpr uint8_t STACK_SIZE = 16;
#endif
constexpr size_t MAX_TOKEN_LENGTH = 16;
constexpr size_t dshift = (sizeof(KF_INT) * 8) - 1;
static inline KF_INT
mask(size_t bits)
{
KF_INT m = 0;
for (size_t i = 0; i < bits; i++) {
m += 1 << i;
}
return m;
}
#endif // __KF_DEFS_H__
Addresses
^^^^^^^^^
The first major change is the addition of the ``KF_ADDR`` type. This is needed
to implement the memory manipulation words. I've added some additional utility
functions for pushing and popping addresses from the data stack; they're stored
as double numbers::
static bool
pop_addr(System *sys, KF_ADDR *a)
{
KF_LONG b;
if (!pop_long(sys, &b)) {
// Status is already set.
return false;
}
*a = static_cast<KF_ADDR>(b);
sys->status = STATUS_OK;
return true;
}
static bool
push_addr(System *sys, KF_ADDR a)
{
KF_LONG b = static_cast<KF_LONG>(a);
if (!push_long(sys, b)) {
// Status is already set.
return false;
}
sys->status = STATUS_OK;
return true;
}
Now I can actually implement ``!`` and so forth::
static bool
store(System *sys)
{
KF_ADDR a = 0; // address
KF_INT b = 0; // value
KF_LONG c = 0; // temporary
if (!pop_long(sys, &c)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
a = static_cast<KF_ADDR>(c);
if (!sys->dstack.pop(&b)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
*((KF_INT *)a) = b;
sys->status = STATUS_OK;
return true;
}
There's definitely a sense of finangling here.
The return stack
^^^^^^^^^^^^^^^^
The ``>R`` series of words requires a return stack, so I've added a
``Stack<KF_ADDR>`` field to the ``System`` structure. The address stack
manipulation functions I introduced earlier only operate on the data stack, so
these require some extra verbosity; for example::
static bool
to_r(System *sys)
{
KF_INT a;
if (!sys->dstack.pop(&a)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
if (!sys->rstack.push(static_cast<KF_ADDR>(a))) {
sys->status = STATUS_RSTACK_OVERFLOW;
return false;
}
sys->status = STATUS_OK;
return true;
}
Adding the ``rstack`` field also required adding return stack over- and
underflow status codes.
The arena
^^^^^^^^^
As I was reading through the words left to implement, I found I'd have to
implement ``COUNT``. This provides some support for counted strings, which
are implemented as a byte array where the first byte is the length of the
string. In my mind, this has two implications:
1. There needs to be some area of user memory that's available for storing
strings and the like. I've termed this the arena, and it's a field in the
``System`` structure now.
2. There needs to be a Word type for addresses.
So now I have this definition for the ``System`` structure::
typedef struct _System {
Stack<KF_INT> dstack;
Stack<KF_ADDR> rstack;
IO *interface;
Word *dict;
SYS_STATUS status;
uint8_t arena[ARENA_SIZE];
} System;
The ``Address`` type seems like it's easy enough to implement::
class Address : public Word {
public:
~Address() {};
Address(const char *name, size_t namelen, Word *head, KF_ADDR addr);
bool eval(System *);
Word *next(void);
bool match(struct Token *);
void getname(char *, size_t *);
private:
char name[MAX_TOKEN_LENGTH];
size_t namelen;
Word *prev;
KF_ADDR addr;
};
And the implementation::
Address::Address(const char *name, size_t namelen, Word *head, KF_ADDR addr)
: prev(head), addr(addr)
{
memcpy(this->name, name, namelen);
this->namelen = namelen;
}
bool
Address::eval(System *sys)
{
KF_INT a;
a = static_cast<KF_INT>(this->addr & mask(dshift));
if (!sys->dstack.push(a)) {
return false;
}
a = static_cast<KF_INT>((this->addr >> dshift) & mask(dshift));
if (!sys->dstack.push(a)) {
return false;
}
return true;
}
Word *
Address::next(void)
{
return this->prev;
}
bool
Address::match(struct Token *token)
{
return match_token(this->name, this->namelen, token->token, token->length);
}
void
Address::getname(char *buf, size_t *buflen)
{
memcpy(buf, this->name, this->namelen);
*buflen = namelen;
}
It's kind of cool to see this at work::
$ ./kforth
kforth interpreter
? arena drop 2+ 0 @ .
0
ok.
? arena drop 2+ 0 4 rot rot ! .
stack underflow (error code 2).
? arena drop 2+ 0 @ .
4
ok.
Unsigned numbers
^^^^^^^^^^^^^^^^
This is really just a bunch of casting::
static bool
u_dot(System *sys)
{
KF_INT a;
KF_UINT b;
if (!sys->dstack.pop(&a)) {
sys->status = STATUS_STACK_UNDERFLOW;
return false;
}
b = static_cast<KF_UINT>(a);
write_unum(sys->interface, b);
sys->interface->newline();
sys->status = STATUS_OK;
return true;
}
Execute
^^^^^^^
Implementing ``execute`` was fun, but it begins to highlight the limits of my
approach so far.
EXECUTE addr -- 79
The word definition indicated by addr is executed. An error
condition exists if addr is not a compilation address
For example::
(gdb) break 83
Breakpoint 1 at 0x4077cf: file kforth.cc, line 83.
(gdb) run
Starting program: /home/kyle/code/kforth/kforth
Breakpoint 1, main () at kforth.cc:83
83 Console interface;
(gdb) p sys.dict->next()->next()->next()->next()
$1 = (Word *) 0x7e45b0
(gdb) p (Builtin) *sys.dict->next()->next()->next()->next()
$2 = {<Word> = {_vptr$Word = 0x55f220 <vtable for Builtin+16>}, name = "+", '\000' <repeats 14 times>, namelen = 1, prev = 0x7e4570,
fun = 0x406eb0 <add(_System*)>}
(gdb) p/u 0x7e45b0
$3 = 8275376
(gdb) c
Continuing.
kforth interpreter
? 2 3 8275376 0 execute .
executing word: +
5
ok.
In case the ``gdb`` example wasn't clear, I printed the address of the fourth
entry in the dictionary, which happens to be ``+``. I push the numbers 2 and 3
onto the stack, then push the address of ``+`` on the stack, then call execute.
As the dot function shows, it executes correctly, pushing the resulting 5 onto
the stack. Which leads me to the next section, wherein I need to rethink the
execution model.
The execution model
^^^^^^^^^^^^^^^^^^^
In most of the Forth implementations I've, the dictionary is a list of
contiguous pointers to words. That is, something like::
Word *dict[ARRAY_SIZE] = { 0 };
dict[0] = new Builtin((const char *)"+", 1, add);
dict[1] = new Builtin((const char *)"-", 1, sub);
And so forth. Or, maybe,
::
Word dict[ARRAY_SIZE] = {
Builtin((const char *)"+", 1, add),
Builtin((const char *)"-", 1, sub)
};
So some questions:
+ How big should this array be?
+ How do I handle different word types?
+ How do I transfer execution to functions?
I'm thinking something like:
+ the parser looks up a word, and pushes the parser function's address onto the
return stack.
+ the parser jumps to the word's function pointer and executes it.
+ the function pointer jumps back to the last address on the return stack.
The second step could involve chaining multiple functions in there. I don't
know how to transfer execution to a random address in memory (maybe ``setjmp``
and ``longjmp``), or how I'm going to push the current word's address onto the
stack. I guess include some sort of additional fields in the system type.
This starts to jump into the realm of an operating system or virtual machine;
the OS approach makes more sense for embedded system.
The parser is also going to need some updating to handle strings.
As before, the code for this update is tagged in `part-0x07 <https://github.com/kisom/kforth/tree/part-0x07>`_.

View File

@@ -0,0 +1,441 @@
Write You a Forth, 0x08
-----------------------
:date: 2018-03-05 21:42
:tags: wyaf, forth
After reading some more in Threaded Interpreted Languages (TIL_ from now on),
I've decided to start over.
.. _TIL: http://wiki.c2.com/?ThreadedInterpretiveLanguage
Some design choices that didn't really work out:
+ the system structure
+ not making it easier to test building for different platforms
+ my linked list approach to the dictionary
+ my class-based approach to words
I get the distinct feeling that I could (maybe should) be doing this in C99, so
I think I'll switch to that.
The new design
^^^^^^^^^^^^^^
I'll need to provide a few initial pieces:
1. eval.c
2. stack.c
3. the platform parts
I'll skip the parser at first and hand hack some things, then try to
port over my I/O layer from before.
Also, talking to Steve got me to think about doing this in C99, because
a lot of the fun I've had with computers in the past involved hacking
on C projects. So, C99 it is.
Platforms
^^^^^^^^^
I've elected to set a new define type, ``PLATFORM_$PLATFORM``. The Makefile
sets this, so it's easier now to test building for different platforms.
Here's the current top-level definitions::
#ifndef __KF_DEFS_H__
#define __KF_DEFS_H__
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#ifdef PLATFORM_pc
#include "pc/defs.h"
#else
#include "default/defs.h"
#endif
The ``pc/defs.h`` header::
#ifndef __KF_PC_DEFS_H__
#define __KF_PC_DEFS_H__
typedef int32_t KF_INT;
typedef uintptr_t KF_ADDR;
static const size_t DSTACK_SIZE = 65535;
static const size_t RSTACK_SIZE = 65535;
static const size_t DICT_SIZE = 65535;
#endif /* __KF_PC_DEFS_H__ */
#endif /* __KF_DEFS_H__ */
The new stack
^^^^^^^^^^^^^
I'll start with a much simplified stack interface::
#ifndef __KF_STACK_H__
#define __KF_STACK_H__
/* data stack interaction */
bool dstack_pop(KF_INT *);
bool dstack_push(KF_INT);
bool dstack_get(size_t, KF_INT *);
size_t dstack_size(void);
void dstack_clear(void);
/* return stack interaction */
bool rstack_pop(KF_ADDR *);
bool rstack_push(KF_ADDR);
bool rstack_get(size_t, KF_ADDR *);
size_t rstack_size(void);
void rstack_clear(void);
#endif /* __KF_STACK_H__ */
The implementation is simple enough; the ``rstack`` interface is similar
enough to the ``dstack`` that I'll just show the first::
#include "defs.h"
#include "stack.h"
static KF_INT dstack[DSTACK_SIZE] = {0};
static size_t dstack_len = 0;
bool
dstack_pop(KF_INT *a)
{
if (dstack_len == 0) {
return false;
}
*a = dstack[--dstack_len];
return true;
}
bool
dstack_push(KF_INT a)
{
if (dstack_len == DSTACK_SIZE) {
return false;
}
dstack[dstack_len++] = a;
return true;
}
bool
dstack_get(size_t i, KF_INT *a)
{
if (i >= dstack_len) {
return false;
}
*a = dstack[dstack_len - i - 1];
return true;
}
size_t
dstack_size()
{
return dstack_len;
}
void
dstack_clear()
{
dstack_len = 0;
}
Words
^^^^^
Reading TIL has given me some new ideas on how to implement words::
#ifndef __KF_WORD_H__
#define __KF_WORD_H__
/*
* Every word in the dictionary starts with a header:
* uint8_t length;
* uint8_t flags;
* char *name;
* uintptr_t next;
*
* The body looks like the following:
* uintptr_t codeword;
* uintptr_t body[];
*
* The codeword is the interpreter for the body. This is defined in
* eval.c. Note that a native (or builtin function) has only a single
* body element.
*
* The body of a native word points to a function that's compiled in already.
*/
/*
* store_native writes a new dictionary entry for a native-compiled
* function.
*/
void store_native(uint8_t *, const char *, const uint8_t, void(*)(void));
/*
* match_word returns true if the current dictionary entry matches the
* token being searched for.
*/
bool match_word(uint8_t *, const char *, const uint8_t);
/*
* word_link returns the offset to the next word.
*/
size_t word_link(uint8_t *);
size_t word_body(uint8_t *);
#endif /* __KF_WORD_H__ */
The codeword is the big changer here. I've put a native evaluator and
a codeword executor in the ``eval`` files::
#ifndef __KF_EVAL_H__
#define __KF_EVAL_H__
#include "defs.h"
/*
* cwexec is the codeword executor. It assumes that the uintptr_t
* passed into it points to the correct executor (e.g. nexec),
* which is called with the next address.
*/
void cwexec(uintptr_t);
/*
* nexec is the native executor.
*
* It should take a uintptr_t containing the address of a code block
* and will execute the function starting there. The function should
* half the signature void(*target)(void) - a function returning
* nothing and taking no arguments.
*/
void nexec(uintptr_t);
static const uintptr_t nexec_p = (uintptr_t)&nexec;
#endif /* __KF_EVAL_H__ */
The implementations of these are short::
#include "defs.h"
#include "eval.h"
#include <string.h>
``nexec`` just casts its target to a void function and calls it.
::
void
nexec(uintptr_t target)
{
((void(*)(void))target)();
}
``cwexec`` is the magic part: it reads a pair of addresses; the first
is the executor, and the next is the start of the code body. In the
case of native execution, this is a pointer to a function.
::
void
cwexec(uintptr_t entry)
{
uintptr_t target = 0;
uintptr_t codeword = 0;
memcpy(&codeword, (void *)entry, sizeof(uintptr_t));
memcpy(&target, (void *)(entry + sizeof(uintptr_t)), sizeof(uintptr_t));
((void(*)(uintptr_t))codeword)(target);
}
So I wrote a quick test program to check these out::
#include "defs.h"
#include "eval.h"
#include <stdio.h>
#include <string.h>
static void
hello(void)
{
printf("hello, world\n");
}
int
main(void)
{
uintptr_t target = (uintptr_t)hello;
nexec(hello);
uint8_t arena[32] = { 0 };
uintptr_t arena_p = (uintptr_t)arena;
memcpy(arena, (void *)&nexec_p, sizeof(nexec_p));
memcpy(arena + sizeof(nexec_p), (void *)&target, sizeof(target));
cwexec(arena_p);
}
But does it work?
::
$ gcc -o eval_test eval_test.c eval.o
$ ./eval_test
hello, world
hello, world
What magic is this?
Now I need to write a couple functions to make this easier::
#include "defs.h"
#include "eval.h"
#include "word.h"
#include <string.h>
static uint8_t dict[DICT_SIZE] = {0};
static size_t last = 0;
The first two functions will operate on the internal dict, and are
intended to be used to maintain the internal dictionary. The first
adds a new word to the dictionary, and the second attempts to look
up a word by name and execute it::
void
append_native_word(const char *name, const uint8_t len, void(*target)(void))
{
store_native(dict+last, name, len, target);
}
bool
execute(const char *name, const uint8_t len)
{
size_t offset = 0;
size_t body = 0;
while (true) {
if (!match_word(dict+offset, name, len)) {
if ((offset = word_link(dict+offset)) == 0) {
return false;
}
continue;
}
body = word_body(dict+offset);
cwexec(dict + body + offset);
return true;
}
}
Actually, now that I think about it, maybe I should also add in a function
to return a uintptr_t to the word, too. Should this point to the header or
to the body? My first instinct is to point to the header and have the caller
(me) use ``word_body`` to get the actual body. That being said, however,
we already have the useful information from the header (namely, the name and
length); the link is only useful for the search phase. Following this logic
means that ``lookup`` will return a pointer to the body. So say we all::
bool
lookup(const char *name, const uint8_t len, uintptr_t *ptr)
{
size_t offset = 0;
size_t body = 0;
while (true) {
if (!match_word(dict+offset, name, len)) {
if ((offset = word_link(dict+offset)) == 0) {
return false;
}
continue;
}
body = word_body(dict+offset);
*ptr = (uintptr_t)(dict + offset + body);
return true;
}
}
The rest of the functions in the header (all of which are publicly
visible) are made available for use later. Maybe (but let's be honest,
probably not) I'll go back later and make these functions private.
The first such function stores a native (built-in) word. This is what
``append_native_word`` is built around::
void
store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(void))
{
uintptr_t target_p = (uintptr_t)target;
size_t link = 2 + len + (2 * sizeof(uintptr_t));
/* write the header */
entry[0] = len;
entry[1] = 0; // flags aren't used yet
memcpy(entry+2, name, len);
memcpy(entry+2+len, &link, sizeof(link));
/* write the native executor codeword and the function pointer */
memcpy(entry, (uint8_t *)(&nexec_p), sizeof(uintptr_t));
memcpy(entry + sizeof(uintptr_t), (uint8_t *)(&target_p), sizeof(uintptr_t));
}
The rest of the functions are utility functions. ``match_word`` is used
to... match words::
bool
match_word(uint8_t *entry, const char *name, const uint8_t len)
{
if (entry[0] != len) {
return false;
}
if (memcmp(entry+2, name, len) != 0) {
return false;
}
return true;
}
Finally, ``word_link`` returns the offset to the next function (e.g. so
as to be able to do ``entry+offset``) and ``word_body`` returns the offset
to the body of the word::
size_t
word_link(uint8_t *entry)
{
size_t link;
if (entry[0] == 0) {
return 0;
}
memcpy(&link, entry+2+entry[0], sizeof(link));
return link;
}
size_t
word_body(uint8_t *entry)
{
return 2 + entry[0] + sizeof(size_t);
}
That about wraps up this chunk of work. Next to maybe start porting builtins? I
also need to rewrite the parser and I/O layer.

View File

@@ -0,0 +1,80 @@
Write You a Forth, 0x03
-----------------------
:date: 2018-03-09 13:10
:tags: wyaf, forth
So the last post had some issues and I hadn't updated the front end to use the
new tooling. I removed the arena and switched to the internal dictionary::
#include "defs.h"
#include "eval.h"
#include "stack.h"
#include "word.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void
hello(void)
{
printf("hello, world\n");
}
int
main(void)
{
dstack_push(2);
dstack_push(3);
append_native_word("hello", 5, hello);
uintptr_t hwb = 0;
if (!lookup("hello", 5, &hwb)) {
fprintf(stderr, "failed to lookup 'hello'\n");
exit(1);
}
printf("hello: 0x%lx\n", (unsigned long)hwb);
if (!execute("hello", 5)) {
fprintf(stderr, "failed to execute 'hello'\n");
exit(1);
}
printf("finished\n");
}
Also, there's a (not-so) subtle bug in ``word.c``: the header is overwritten by the function
body, which is the path to segfaulting. I've also added an offset variable to make tracking
the offset easier::
void
store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(void))
{
uintptr_t target_p = (uintptr_t)target;
- size_t link = 2 + len + (2 * sizeof(uintptr_t));
+ size_t offset = 2 + len + sizeof(size_t);
+ size_t link = offset + (2 * sizeof(uintptr_t));
/* write the header */
entry[0] = len;
@@ -45,8 +66,9 @@ store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(
memcpy(entry+2+len, &link, sizeof(link));
/* write the native executor codeword and the function pointer */
- memcpy(entry, (uint8_t *)(&nexec_p), sizeof(uintptr_t));
- memcpy(entry + sizeof(uintptr_t), (uint8_t *)(&target_p), sizeof(uintptr_t));
+ memcpy(entry+offset, (uint8_t *)(&nexec_p), sizeof(uintptr_t));
+ offset += sizeof(uintptr_t);
+ memcpy(entry+offset, (uint8_t *)(&target_p), sizeof(uintptr_t));
}
The header file ``word.h`` didn't contain ``append_native_word``, ``lookup``,
or ``execute``, so that gets updated too. The end result is::
$ ./kf-default
hello: 0x6cbc6f
hello, world
finished
As usual, the code is tagged with `part-0x09 <https://github.com/kisom/kforth/tree/part-0x09>`_.

21
misc/kforth/eval.c Normal file
View File

@@ -0,0 +1,21 @@
#include "defs.h"
#include "eval.h"
#include <string.h>
void
cwexec(uintptr_t entry)
{
uintptr_t target = 0;
uintptr_t codeword = 0;
memcpy(&codeword, (void *)entry, sizeof(uintptr_t));
memcpy(&target, (void *)(entry + sizeof(uintptr_t)), sizeof(uintptr_t));
((void(*)(uintptr_t))codeword)(target);
}
void
nexec(uintptr_t target)
{
((void(*)(void))target)();
}

28
misc/kforth/eval.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef __KF_EVAL_H__
#define __KF_EVAL_H__
#include "defs.h"
/*
* cwexec is the codeword executor. It assumes that the uintptr_t
* passed into it points to the correct executor (e.g. nexec),
* which is called with the next address.
*/
void cwexec(uintptr_t);
/*
* nexec is the native executor.
*
* It should take a uintptr_t containing the address of a code block
* and will execute the function starting there. The function should
* half the signature void(*target)(void) - a function returning
* nothing and taking no arguments.
*/
void nexec(uintptr_t);
static const uintptr_t nexec_p = (uintptr_t)&nexec;
#endif /* __KF_EVAL_H__ */

36
misc/kforth/kf.c Normal file
View File

@@ -0,0 +1,36 @@
#include "defs.h"
#include "eval.h"
#include "stack.h"
#include "word.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void
hello(void)
{
printf("hello, world\n");
}
int
main(void)
{
dstack_push(2);
dstack_push(3);
append_native_word("hello", 5, hello);
uintptr_t hwb = 0;
if (!lookup("hello", 5, &hwb)) {
fprintf(stderr, "failed to lookup 'hello'\n");
exit(1);
}
printf("hello: 0x%lx\n", (unsigned long)hwb);
if (!execute("hello", 5)) {
fprintf(stderr, "failed to execute 'hello'\n");
exit(1);
}
printf("finished\n");
}

11
misc/kforth/pc/defs.h Normal file
View File

@@ -0,0 +1,11 @@
#ifndef __KF_PC_DEFS_H__
#define __KF_PC_DEFS_H__
typedef int32_t KF_INT;
typedef uintptr_t KF_ADDR;
static const size_t DSTACK_SIZE = 65535;
static const size_t RSTACK_SIZE = 65535;
static const size_t DICT_SIZE = 65535;
#endif /* __KF_PC_DEFS_H__ */

98
misc/kforth/stack.c Normal file
View File

@@ -0,0 +1,98 @@
#include "defs.h"
#include "stack.h"
static KF_INT dstack[DSTACK_SIZE] = {0};
static size_t dstack_len = 0;
bool
dstack_pop(KF_INT *a)
{
if (dstack_len == 0) {
return false;
}
*a = dstack[--dstack_len];
return true;
}
bool
dstack_push(KF_INT a)
{
if (dstack_len == DSTACK_SIZE) {
return false;
}
dstack[dstack_len++] = a;
return true;
}
bool
dstack_get(size_t i, KF_INT *a)
{
if (i >= dstack_len) {
return false;
}
*a = dstack[dstack_len - i - 1];
return true;
}
size_t
dstack_size()
{
return dstack_len;
}
void
dstack_clear()
{
dstack_len = 0;
}
static KF_ADDR rstack[RSTACK_SIZE] = {0};
static size_t rstack_len = 0;
bool
rstack_pop(KF_ADDR *a)
{
if (rstack_len == 0) {
return false;
}
*a = rstack[--rstack_len];
return true;
}
bool
rstack_push(KF_ADDR a)
{
if (rstack_len == DSTACK_SIZE) {
return false;
}
rstack[rstack_len++] = a;
return true;
}
bool
rstack_get(size_t i, KF_ADDR *a)
{
if (i >= rstack_len) {
return false;
}
*a = rstack[rstack_len - i - 1];
return true;
}
size_t
rstack_size()
{
return rstack_len;
}
void
rstack_clear()
{
rstack_len = 0;
}

18
misc/kforth/stack.h Normal file
View File

@@ -0,0 +1,18 @@
#ifndef __KF_STACK_H__
#define __KF_STACK_H__
/* data stack interaction */
bool dstack_pop(KF_INT *);
bool dstack_push(KF_INT);
bool dstack_get(size_t, KF_INT *);
size_t dstack_size(void);
void dstack_clear(void);
/* return stack interaction */
bool rstack_pop(KF_ADDR *);
bool rstack_push(KF_ADDR);
bool rstack_get(size_t, KF_ADDR *);
size_t rstack_size(void);
void rstack_clear(void);
#endif /* __KF_STACK_H__ */

9
misc/kforth/stats.txt Normal file
View File

@@ -0,0 +1,9 @@
REVISION SLOC
0x02 133
0x03 245
0x04 369
0x05 677
0x06 1436
In part 0x06, I thought it would be interesting to track how the codebase has
grown over time.

23
misc/kforth/v1/Makefile Normal file
View File

@@ -0,0 +1,23 @@
CXXSTD := c++14
CXXFLAGS := -std=$(CXXSTD) -Wall -Werror -O0 -g
LDFLAGS := -static
OBJS := linux/io.o \
io.o \
system.o \
parser.o \
word.o \
dict.o \
kforth.o
TARGET := kforth
all: $(TARGET)
$(TARGET): $(OBJS)
$(CXX) $(LDFLAGS) -o $@ $(OBJS)
clean:
rm -f $(OBJS) $(TARGET)
install: $(TARGET)
cp $(TARGET) ~/bin
chmod 0755 ~/bin/$(TARGET)

8
misc/kforth/v1/TODO.txt Normal file
View File

@@ -0,0 +1,8 @@
nucleus layer:
+ EXIT: requires better execution control
+ I: requires support for loop index
+ J: requires support for loop index
return addressing / rstack
dictionary -> fixed size stack / array

27
misc/kforth/v1/defs.h Normal file
View File

@@ -0,0 +1,27 @@
#ifndef __KF_DEFS_H__
#define __KF_DEFS_H__
#ifdef __linux__
#include "linux/defs.h"
#else
typedef int KF_INT;
typedef long KF_LONG;
constexpr uint8_t STACK_SIZE = 16;
#endif
constexpr size_t MAX_TOKEN_LENGTH = 16;
constexpr size_t dshift = (sizeof(KF_INT) * 8) - 1;
static inline KF_INT
mask(size_t bits)
{
KF_INT m = 0;
for (size_t i = 0; i < bits; i++) {
m += 1 << i;
}
return m;
}
#endif // __KF_DEFS_H__

1678
misc/kforth/v1/dict.cc Normal file

File diff suppressed because it is too large Load Diff

15
misc/kforth/v1/dict.h Normal file
View File

@@ -0,0 +1,15 @@
#ifndef __KF_DICT_H__
#define __KF_DICT_H__
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
void init_dict(System *);
void reset_system(System *);
bool lookup(struct Token *, System *);
#endif // __KF_DICT_H__

98
misc/kforth/v1/io.cc Normal file
View File

@@ -0,0 +1,98 @@
#include "defs.h"
#include "io.h"
#include <string.h>
void
write_num(IO *interface, KF_INT n)
{
static constexpr size_t nbuflen = 11;
char buf[nbuflen];
uint8_t i = nbuflen - 1;
memset(buf, 0, nbuflen);
if (n < 0) {
interface->wrch('-');
}
else if (n == 0) {
interface->wrch('0');
return;
}
while (n != 0) {
char x = n % 10;
x = x < 0 ? -x : x;
x += '0';
buf[i--] = x;
n /= 10;
}
interface->wrbuf(buf+i, nbuflen - i);
}
void
write_unum(IO *interface, KF_UINT n)
{
static constexpr size_t nbuflen = 11;
char buf[nbuflen];
uint8_t i = nbuflen - 1;
memset(buf, 0, nbuflen);
if (n == 0) {
interface->wrch('0');
return;
}
while (n != 0) {
char x = n % 10;
x += '0';
buf[i--] = x;
n /= 10;
}
interface->wrbuf(buf+i, nbuflen - i);
}
void
write_dnum(IO *interface, KF_LONG n)
{
static constexpr size_t dnbuflen = 21;
char buf[dnbuflen];
uint8_t i = dnbuflen - 1;
memset(buf, 0, dnbuflen);
if (n < 0) {
interface->wrch('-');
}
else if (n == 0) {
interface->wrch('0');
return;
}
while (n != 0) {
char x = n % 10;
x = x < 0 ? -x : x;
x += '0';
buf[i--] = x;
n /= 10;
}
interface->wrbuf(buf+i, dnbuflen - i);
}
void
write_dstack(IO *interface, Stack<KF_INT> dstack)
{
KF_INT tmp;
interface->wrch('<');
for (size_t i = 0; i < dstack.size(); i++) {
if (i > 0) {
interface->wrch(' ');
}
dstack.get(i, tmp);
write_num(interface, tmp);
}
interface->wrch('>');
}

33
misc/kforth/v1/io.h Normal file
View File

@@ -0,0 +1,33 @@
#ifndef __KF_IO_H__
#define __KF_IO_H__
#include "defs.h"
#include "stack.h"
class IO {
public:
// Virtual destructor is required in all ABCs.
virtual ~IO() {};
// Building block methods.
virtual char rdch(void) = 0;
virtual void wrch(char c) = 0;
// Buffer I/O.
virtual size_t rdbuf(char *buf, size_t len, bool stopat, char stopch) = 0;
virtual void wrbuf(char *buf, size_t len) = 0;
// Line I/O
virtual bool rdln(char *buf, size_t len, size_t *readlen) = 0;
virtual void wrln(char *buf, size_t len) = 0;
virtual void newline(void) = 0;
};
void write_num(IO *, KF_INT);
void write_unum(IO *, KF_UINT);
void write_dnum(IO *, KF_LONG);
void write_dstack(IO *, Stack<KF_INT>);
#endif // __KF_IO_H__

89
misc/kforth/v1/kforth.cc Normal file
View File

@@ -0,0 +1,89 @@
#include "dict.h"
#include "io.h"
#include "parser.h"
#include "system.h"
#include <stdlib.h>
#include <string.h>
#ifdef __linux__
#include "linux.h"
#endif // __linux__
static System sys;
#ifndef TRACE_STACK
#define TRACE_STACK false
#endif
static bool
parser(const char *buf, const size_t buflen)
{
static size_t offset = 0;
static struct Token token;
static PARSE_RESULT result = PARSE_FAIL;
offset = 0;
// reset token
token.token = nullptr;
token.length = 0;
while ((result = parse_next(buf, buflen, &offset, &token)) == PARSE_OK) {
if (!lookup(&token, &sys)) {
break;
}
}
system_write_status(&sys);
sys.interface->newline();
switch (result) {
case PARSE_OK:
return false;
case PARSE_EOB:
return true;
case PARSE_LEN:
sys.interface->wrln((char *)"parse error: token too long", 27);
return false;
case PARSE_FAIL:
sys.interface->wrln((char *)"parser failure", 14);
return false;
default:
sys.interface->wrln((char *)"*** the world is broken ***", 27);
exit(1);
}
}
static void
interpreter()
{
static size_t buflen = 0;
static char linebuf[81];
while (true) {
if (TRACE_STACK) {
write_dstack(sys.interface, sys.dstack);
sys.interface->newline();
}
sys.interface->wrch('?');
sys.interface->wrch(' ');
buflen = sys.interface->rdbuf(linebuf, 80, true, '\n');
parser(linebuf, buflen);
}
}
static char banner[] = "kforth interpreter";
const size_t bannerlen = 18;
int
main(void)
{
reset_system(&sys);
init_dict(&sys);
#ifdef __linux__
Console interface;
sys.interface = &interface;
#endif
sys.interface->wrln(banner, bannerlen);
interpreter();
return 0;
}

10
misc/kforth/v1/linux.h Normal file
View File

@@ -0,0 +1,10 @@
#ifndef __KF_LINUX_H__
#define __KF_LINUX_H__
#include <stdint.h>
// build support for linux
#include "linux/io.h"
#endif // __KF_LINUX_H__

View File

@@ -0,0 +1,15 @@
#ifndef __KF_LINUX_DEFS_H__
#define __KF_LINUX_DEFS_H__
#include <stddef.h>
#include <stdint.h>
typedef int32_t KF_INT;
typedef uint32_t KF_UINT;
typedef int64_t KF_LONG;
typedef uintptr_t KF_ADDR;
constexpr uint8_t STACK_SIZE = 128;
constexpr size_t ARENA_SIZE = 65535;
#endif

View File

@@ -0,0 +1,82 @@
#include <iostream>
#include "../io.h"
#include "io.h"
char
Console::rdch()
{
std::cout.flush();
return getchar();
}
void
Console::wrch(char c)
{
std::cout << c;
}
size_t
Console::rdbuf(char *buf, size_t len, bool stopat, char stopch)
{
size_t n = 0;
char ch;
while (n < len) {
ch = this->rdch();
if (ch == 0x04) {
break;
}
if (stopat && stopch == ch) {
break;
}
buf[n++] = ch;
}
return n;
}
void
Console::wrbuf(char *buf, size_t len)
{
for (size_t n = 0; n < len; n++) {
this->wrch(buf[n]);
}
}
// Line I/O
bool
Console::rdln(char *buf, size_t len, size_t *readlen) {
size_t n = 0;
char ch;
bool line = false;
while (n < len) {
ch = this->rdch();
if (ch == '\n') {
line = true;
break;
}
buf[n++] = ch;
}
if (nullptr != readlen) {
*readlen = n;
}
return line;
}
void
Console::wrln(char *buf, size_t len)
{
this->wrbuf(buf, len);
this->wrch(0x0a);
}

25
misc/kforth/v1/linux/io.h Normal file
View File

@@ -0,0 +1,25 @@
#ifndef __KF_IO_LINUX_H__
#define __KF_IO_LINUX_H__
#include "io.h"
#include "defs.h"
class Console : public IO {
public:
~Console() {};
char rdch(void);
void wrch(char c);
// Buffer I/O.
size_t rdbuf(char *buf, size_t len, bool stopat, char stopch);
void wrbuf(char *buf, size_t len);
// Line I/O
bool rdln(char *buf, size_t len, size_t *readlen);
void wrln(char *buf, size_t len);
void newline(void) { this->wrch('\n'); };
private:
};
#endif // __KF_IO_LINUX_H__

131
misc/kforth/v1/parser.cc Normal file
View File

@@ -0,0 +1,131 @@
#include "defs.h"
#include "parser.h"
#include "stack.h"
#include <ctype.h>
#include <string.h>
static inline void
reset(struct Token *t)
{
t->token = nullptr;
t->length = 0;
}
bool
match_token(const char *a, const size_t alen,
const char *b, const size_t blen)
{
if (alen != blen) {
return false;
}
for (size_t i = 0; i < alen; i++) {
if (a[i] == b[i]) {
continue;
}
if (!isalpha(a[i]) || !isalpha(b[i])) {
return false;
}
if ((a[i] ^ 0x20) == b[i]) {
continue;
}
if (a[i] == (b[i] ^ 0x20)) {
continue;
}
return false;
}
return true;
}
PARSE_RESULT
parse_next(const char *buf, const size_t length, size_t *offset,
struct Token *token)
{
size_t cursor = *offset;
reset(token);
if (cursor == length) {
return PARSE_EOB;
}
while (cursor <= length) {
if (buf[cursor] != ' ') {
if (buf[cursor] != '\t') {
break;
}
}
cursor++;
}
if (cursor == length) {
return PARSE_EOB;
}
token->token = (char *)buf + cursor;
while ((token->length <= MAX_TOKEN_LENGTH) && (cursor < length)) {
if (buf[cursor] != ' ') {
if (buf[cursor] != '\t') {
cursor++;
token->length++;
continue;
}
}
cursor++;
break;
}
if (token->length > MAX_TOKEN_LENGTH) {
reset(token);
return PARSE_LEN;
}
*offset = cursor;
return PARSE_OK;
}
bool
parse_num(struct Token *token, KF_INT *n)
{
KF_INT tmp = 0;
uint8_t i = 0;
bool sign = false;
if (token->length == 0) {
return false;
}
if (token->token[i] == '-') {
if (token->length == 1) {
return false;
}
i++;
sign = true;
}
while (i < token->length) {
if (token->token[i] < '0') {
return false;
}
if (token->token[i] > '9') {
return false;
}
tmp *= 10;
tmp += (uint8_t)(token->token[i] - '0');
i++;
}
if (sign) {
tmp *= -1;
}
*n = tmp;
return true;
}

27
misc/kforth/v1/parser.h Normal file
View File

@@ -0,0 +1,27 @@
#ifndef __KF_PARSER_H__
#define __KF_PARSER_H__
#include "defs.h"
#include "stack.h"
struct Token {
char *token;
uint8_t length;
};
typedef enum _PARSE_RESULT_ : uint8_t {
PARSE_OK = 0,
PARSE_EOB = 1, // end of buffer
PARSE_LEN = 2, // token is too long
PARSE_FAIL = 3 // catch-all error
} PARSE_RESULT;
bool match_token(const char *, const size_t, const char *, const size_t);
PARSE_RESULT parse_next(const char *, const size_t, size_t *, struct Token *);
// TODO(kyle): investigate a better return value, e.g. to differentiate between
// stack failures and parse failures.
bool parse_num(struct Token *, KF_INT *);
#endif // __KF_PARSER_H__

92
misc/kforth/v1/stack.h Normal file
View File

@@ -0,0 +1,92 @@
#ifndef __KF_STACK_H__
#define __KF_STACK_H__
#include "defs.h"
template <typename T>
class Stack {
public:
bool push(T val);
bool pop(T *val);
bool peek(T *val);
bool get(size_t, T &);
bool remove(size_t, T *);
size_t size(void) { return this->arrlen; }
void clear(void) { this->arrlen = 0; }
private:
T arr[STACK_SIZE];
size_t arrlen;
};
// push returns false if there was a stack overflow.
template <typename T>
bool
Stack<T>::push(T val)
{
if ((this->arrlen + 1) > STACK_SIZE) {
return false;
}
this->arr[this->arrlen++] = val;
return true;
}
// pop returns false if there was a stack underflow.
template <typename T>
bool
Stack<T>::pop(T *val)
{
if (this->arrlen == 0) {
return false;
}
*val = this->arr[this->arrlen - 1];
this->arrlen--;
return true;
}
// peek returns false if there was a stack underflow.
template <typename T>
bool
Stack<T>::peek(T *val)
{
if (this->arrlen == 0) {
return false;
}
*val = this->arr[this->arrlen - 1];
return true;
}
// get returns false on invalid bounds.
template <typename T>
bool
Stack<T>::get(size_t i, T &val)
{
if (i > this->arrlen) {
return false;
}
val = this->arr[i];
return true;
}
// remove returns false on invalid bounds
template <typename T>
bool
Stack<T>::remove(size_t i, T *val)
{
if (i > this->arrlen) {
return false;
}
*val = this->arr[i];
for (; i < (arrlen - 1); i++) {
this->arr[i] = this->arr[i+1];
}
arrlen--;
return true;
}
#endif // __KF_STACK_H__

74
misc/kforth/v1/system.cc Normal file
View File

@@ -0,0 +1,74 @@
#include "defs.h"
#include "system.h"
#include <string.h>
constexpr static char STATE_STR_OK[] = "ok";
constexpr static char STATE_STR_STACK_OVERFLOW[] = "stack overflow";
constexpr static char STATE_STR_STACK_UNDERFLOW[] = "stack underflow";
constexpr static char STATE_STR_EXECUTION_FAILURE[] = "execution failure";
constexpr static char STATE_STR_UNKNOWN_WORD[] = "unknown word";
constexpr static char STATE_STR_RSTACK_OVERFLOW[] = "return stack overflow";
constexpr static char STATE_STR_RSTACK_UNDERFLOW[] = "return stack underflow";
constexpr static char STATE_STR_UNKNOWN_STATE[] = "undefined state";
constexpr static char STATE_STR_ERROR_CODE[] = " (error code ";
void
system_clear_error(System *sys)
{
sys->status = STATUS_OK;
}
void
system_write_status(System *sys)
{
char *buf = nullptr;
size_t len = 0;
if (sys->interface == nullptr) {
return;
}
switch (sys->status) {
case STATUS_OK:
buf = (char *)(STATE_STR_OK);
len = sizeof STATE_STR_OK;
break;
case STATUS_STACK_OVERFLOW:
buf = (char *)(STATE_STR_STACK_OVERFLOW);
len = sizeof STATE_STR_STACK_OVERFLOW;
break;
case STATUS_STACK_UNDERFLOW:
buf = (char *)(STATE_STR_STACK_UNDERFLOW);
len = sizeof STATE_STR_STACK_UNDERFLOW;
break;
case STATUS_EXECUTION_FAILURE:
buf = (char *)(STATE_STR_EXECUTION_FAILURE);
len = sizeof STATE_STR_EXECUTION_FAILURE;
break;
case STATUS_UNKNOWN_WORD:
buf = (char *)(STATE_STR_UNKNOWN_WORD);
len = sizeof STATE_STR_UNKNOWN_WORD;
break;
case STATUS_RSTACK_OVERFLOW:
buf = (char *)(STATE_STR_RSTACK_OVERFLOW);
len = sizeof STATE_STR_RSTACK_OVERFLOW;
break;
case STATUS_RSTACK_UNDERFLOW:
buf = (char *)(STATE_STR_RSTACK_UNDERFLOW);
len = sizeof STATE_STR_RSTACK_UNDERFLOW;
break;
default:
buf = (char *)(STATE_STR_UNKNOWN_STATE);
len = sizeof STATE_STR_UNKNOWN_STATE;
break;
}
sys->interface->wrbuf(buf, len);
if (sys->status != STATUS_OK) {
sys->interface->wrbuf((char *)STATE_STR_ERROR_CODE, sizeof STATE_STR_ERROR_CODE);
write_num(sys->interface, (KF_INT)sys->status);
sys->interface->wrch(')');
}
sys->interface->wrch('.');
}

32
misc/kforth/v1/system.h Normal file
View File

@@ -0,0 +1,32 @@
#ifndef __KF_CORE_H__
#define __KF_CORE_H__
#include "defs.h"
#include "io.h"
#include "stack.h"
typedef enum _SYS_STATUS : uint8_t {
STATUS_OK = 0,
STATUS_STACK_OVERFLOW = 1,
STATUS_STACK_UNDERFLOW = 2,
STATUS_EXECUTION_FAILURE = 3,
STATUS_UNKNOWN_WORD = 4,
STATUS_RSTACK_OVERFLOW = 5,
STATUS_RSTACK_UNDERFLOW = 6
} SYS_STATUS;
class Word;
typedef struct _System {
Stack<KF_INT> dstack;
Stack<KF_ADDR> rstack;
IO *interface;
Word *dict;
SYS_STATUS status;
uint8_t arena[ARENA_SIZE];
} System;
void system_clear_error(System *sys);
void system_write_status(System *sys);
#endif // __KF_CORE_H__

84
misc/kforth/v1/word.cc Normal file
View File

@@ -0,0 +1,84 @@
#include "defs.h"
#include "parser.h"
#include "system.h"
#include "word.h"
#include <string.h>
Builtin::Builtin(const char *name, size_t namelen, Word *head, bool (*target)(System *))
: prev(head), fun(target)
{
memcpy(this->name, name, namelen);
this->namelen = namelen;
}
bool
Builtin::eval(System *sys)
{
return this->fun(sys);
}
Word *
Builtin::next()
{
return this->prev;
}
bool
Builtin::match(struct Token *token)
{
return match_token(this->name, this->namelen, token->token, token->length);
}
void
Builtin::getname(char *buf, size_t *buflen)
{
memcpy(buf, this->name, this->namelen);
*buflen = namelen;
}
Address::Address(const char *name, size_t namelen, Word *head, KF_ADDR addr)
: prev(head), addr(addr)
{
memcpy(this->name, name, namelen);
this->namelen = namelen;
}
bool
Address::eval(System *sys)
{
KF_INT a;
a = static_cast<KF_INT>(this->addr & mask(dshift));
if (!sys->dstack.push(a)) {
return false;
}
a = static_cast<KF_INT>((this->addr >> dshift) & mask(dshift));
if (!sys->dstack.push(a)) {
return false;
}
return true;
}
Word *
Address::next(void)
{
return this->prev;
}
bool
Address::match(struct Token *token)
{
return match_token(this->name, this->namelen, token->token, token->length);
}
void
Address::getname(char *buf, size_t *buflen)
{
memcpy(buf, this->name, this->namelen);
*buflen = namelen;
}

54
misc/kforth/v1/word.h Normal file
View File

@@ -0,0 +1,54 @@
#ifndef __KF_WORD_H__
#define __KF_WORD_H__
#include "defs.h"
#include "parser.h"
#include "stack.h"
#include "system.h"
class Word {
public:
virtual ~Word() {};
virtual bool eval(System *) = 0;
virtual Word *next(void) = 0;
virtual bool match(struct Token *) = 0;
virtual void getname(char *, size_t *) = 0;
virtual uintptr_t address(void) = 0;
};
class Builtin : public Word {
public:
~Builtin() {};
Builtin(const char *name, size_t namelen, Word *head, bool (*fun)(System *));
bool eval(System *);
Word *next(void);
bool match(struct Token *);
void getname(char *, size_t *);
uintptr_t address(void) { return reinterpret_cast<uintptr_t>(this); }
private:
char name[MAX_TOKEN_LENGTH];
size_t namelen;
Word *prev;
bool (*fun)(System *);
};
class Address : public Word {
public:
~Address() {};
Address(const char *name, size_t namelen, Word *head, KF_ADDR addr);
bool eval(System *);
Word *next(void);
bool match(struct Token *);
void getname(char *, size_t *);
uintptr_t address(void) { return reinterpret_cast<uintptr_t>(this); }
private:
char name[MAX_TOKEN_LENGTH];
size_t namelen;
Word *prev;
KF_ADDR addr;
};
#endif // __KF_WORD_H__

104
misc/kforth/word.c Normal file
View File

@@ -0,0 +1,104 @@
#include "defs.h"
#include "eval.h"
#include "word.h"
#include <string.h>
static uint8_t dict[DICT_SIZE] = {0};
static size_t last = 0;
void
append_native_word(const char *name, const uint8_t len, void(*target)(void))
{
store_native(dict+last, name, len, target);
}
bool
execute(const char *name, const uint8_t len)
{
size_t offset = 0;
size_t body = 0;
while (true) {
if (!match_word(dict+offset, name, len)) {
if ((offset = word_link(dict+offset)) == 0) {
return false;
}
continue;
}
body = word_body(dict+offset);
cwexec((uintptr_t)(dict + body + offset));
return true;
}
}
bool
lookup(const char *name, const uint8_t len, uintptr_t *ptr)
{
size_t offset = 0;
size_t body = 0;
while (true) {
if (!match_word(dict+offset, name, len)) {
if ((offset = word_link(dict+offset)) == 0) {
return false;
}
continue;
}
body = word_body(dict+offset);
*ptr = (uintptr_t)(dict + offset + body);
return true;
}
}
void
store_native(uint8_t *entry, const char *name, const uint8_t len, void(*target)(void))
{
uintptr_t target_p = (uintptr_t)target;
size_t offset = 2 + len + sizeof(size_t);
size_t link = offset + (2 * sizeof(uintptr_t));
/* write the header */
entry[0] = len;
entry[1] = 0; // flags aren't used yet
memcpy(entry+2, name, len);
memcpy(entry+2+len, &link, sizeof(link));
/* write the native executor codeword and the function pointer */
memcpy(entry+offset, (uint8_t *)(&nexec_p), sizeof(uintptr_t));
offset += sizeof(uintptr_t);
memcpy(entry+offset, (uint8_t *)(&target_p), sizeof(uintptr_t));
}
bool
match_word(uint8_t *entry, const char *name, const uint8_t len)
{
if (entry[0] != len) {
return false;
}
if (memcmp(entry+2, name, len) != 0) {
return false;
}
return true;
}
size_t
word_link(uint8_t *entry)
{
size_t link;
if (entry[0] == 0) {
return 0;
}
memcpy(&link, entry+2+entry[0], sizeof(link));
return link;
}
size_t
word_body(uint8_t *entry)
{
return 2 + entry[0] + sizeof(size_t);
}

45
misc/kforth/word.h Normal file
View File

@@ -0,0 +1,45 @@
#ifndef __KF_WORD_H__
#define __KF_WORD_H__
/*
* Every word in the dictionary starts with a header:
* uint8_t length;
* uint8_t flags;
* char *name;
* uintptr_t next;
*
* The body looks like the following:
* uintptr_t codeword;
* uintptr_t body[];
*
* The codeword is the interpreter for the body. This is defined in
* eval.c. Note that a native (or builtin function) has only a single
* body element.
*
* The body of a native word points to a function that's compiled in already.
*/
void append_native_word(const char *, const uint8_t, void(*)(void));
bool execute(const char *, const uint8_t);
bool lookup(const char *, const uint8_t, uintptr_t *);
/*
* store_native writes a new dictionary entry for a native-compiled
* function.
*/
void store_native(uint8_t *, const char *, const uint8_t, void(*)(void));
/*
* match_word returns true if the current dictionary entry matches the
* token being searched for.
*/
bool match_word(uint8_t *, const char *, const uint8_t);
/*
* word_link returns the offset to the next word.
*/
size_t word_link(uint8_t *);
size_t word_body(uint8_t *);
#endif /* __KF_WORD_H__ */