From 8d6b87ba73a6bb54a156afc84d292fa1879a5e70 Mon Sep 17 00:00:00 2001
From: Kyle Isom <kyle@imap.cc>
Date: Wed, 23 Feb 2022 22:57:05 -0800
Subject: [PATCH] add spec to additional location.

---
 content/pages/spec.html | 238 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 238 insertions(+)
 create mode 100644 content/pages/spec.html

diff --git a/content/pages/spec.html b/content/pages/spec.html
new file mode 100644
index 0000000..29e5c2d
--- /dev/null
+++ b/content/pages/spec.html
@@ -0,0 +1,238 @@
+Title: Functional Spec for the Exocortex
+Tags: specs
+
+kExocortex is a tool for capturing and retaining knowledge, making it
+searchable.
+
+This is the initial top-level draft to sort out the high-level vision.
+
+## Summary
+
+The more you learn, the harder it is to recall specific things. Fortunately,
+computers are generally pretty good at remembering things. kExocortex is
+my attempt at building a knowledge graph for long-term memory.
+
+In addition to having functionality like notetaking systems like
+[Dendron](https://dendron.so), I'd like to keep track of what I call artifacts.
+An artifact is a source of some knowledge; it might be a PDF copy of a book, an
+image, or a URL.
+
+In a perfect world, I would have a local copy of everything with a remote backup.
+The remote backup lets me restore the exocortex in the event of data loss.
+
+## Usage sketches
+
+### Research mode
+
+If I am researching a topic, I have a top-level node that contains all the
+research I'm working on. I can link artifacts to a note, including URLs. One of
+the reasons it makes sense to attach a URL to a document is that I can reuse
+them, as well as go back and search URLs based on tags or categories. It would
+make sense to tag any artifacts with relevant tags from the note once it is saved.
+
+For example, let's say that I am research graphing databases. In Dendron, this
+note lives under `comp.database.graph`. I might find this O'Reilly book on
+[Neo4J](https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_Graph_Algorithms.pdf)
+that discusses graph algorithms. I might link it here, and I might link it
+under a Neo4J-specific node. I would store the PDF in an artifact repository,
+adding relevant tags (such as "graph-database", "neo4j", "oreilly") and
+categorize it under books, PDFs, comp/database/graph/neo4j.
+
+Going forward, if I want to revisit the book, I don't have to find it online
+again. It's easily accessible from the artifact repository.
+
+The user interface for the knowledge graph should show a list of associated
+artifacts.
+
+Nodes are also timestamped; I am leaning towards keep track of every time a
+page was edited (but probably not the edits). If I know I was researching
+graph databases last week, and I log the URLs I was reading as artifacts,
+I have a better history of what I was reading.
+
+### Reading from a mobile device
+
+Sometimes I'm on my iPad or phone, and I want to save the link I'm reading. I
+should be able to stash documents, URLs, etc, in the artifact repository. This
+implies a remote endpoint that I can enter a URL and a tag, and have that
+entered into the artifact repository later.
+
+### Cataloging artifacts
+
+If I've entered a bunch of artifacts, I should be able to see a list of ones
+that need categorizing or that aren't attached to a node.
+
+### Autotagging
+
+The interface should search the text of a note to identify any tags. This
+brings up an important feature: notes consist of cells, and each cell has a
+type. The primary use case is to support markdown formatting and code blocks,
+while not touching the code blocks during autotagging. For example,
+
+```
+---
+node: today.2022.02.21
+---
+
+I figured out how to get Cayley running in production.
+
+\```
+cayleyd --some flag --other flag
+\```
+```
+
+The exocortex would see Cayley, identify that as a node, and add the tags for
+that node to this one. It might see production and add that as a tag, e.g. for
+ops-related stuff.
+
+### Fast capture
+
+I should be able to enter a quick note, which would go under a daily node tree.
+Something like `quick.2022-02-27.1534`.
+
+This would get autotagged. Quick notes might also get a metadata tag indicating
+whether I went back and integrated them into the rest of the knowledge graph.
+
+One way I could use this might be to text or email a note, or to have a quick
+capture program on my computer.
+
+
+
+## Requirements & Assumptions
+
+What should it do? What assumptions are being made about it? What's
+considered "in scope" and what won't the project try to do?
+
+Does it need to be compatible with any existing solutions or systems?
+
+If it's a daemon, how are you going to manage it?
+
+What are the dependencies that are assumed to be available?
+
+## System Design
+
+### Major components
+
+The system has two logical components: the artifact repository and the
+knowledge graph.
+
+#### Artifact repository
+
+There should be, at a minimum, a local artifact repository. It will have its
+own tooling and UI for interaction, as well as being linked to the knowledge
+graph.
+
+Previous prototypes stored artifact metadata in SQLite, and the contents of the
+artifacts in a blob store. The blob store is a content-addressable system for
+retrieving arbitrary data. A remote option might use an S3-equivalent like
+Minio.
+
+#### Knowledge graph
+
+The knowledge graph stores nodes. The current model stores the graph in SQLite,
+using an external file sync (e.g. syncthing) to sync the databases across
+machines.
+
+### Data model
+
+Previous prototypes used separate SQLite databases for the artifact repository
+and the knowledge graph. 
+
+#### Single SQLite database
+
+The concern with a single SQLite database is that it would be accessed by two
+different systems, causing potential locking issues.
+
+This could be solved by a single unified backend server; this is the preferred
+approach.
+
+#### Split SQLite databases
+
+The original prototype split the databases for performance reasons. However,
+this was based on any empirical evidence.
+
+The major downside to this is that tags and categories are not shared between
+the artifact repository and the knowledge graph. Categories might make sense
+for splitting; e.g. an artifact category might be 'PDF' while a node might have
+the category 'Research'. However, tags should be shared between both systems.
+
+#### PostgreSQL database
+
+Another option is to to use postgres. This brings a heavy ops cost, while
+enabling a variety of replication and backup strategies.
+
+### Architectural overview
+
+[![The exocortex architecture](/files/i/t/exo-arch.jpg)](/files/i/exo-arch.jpg)
+
+There is a backend server, `exod`, that will have a gRPC endpoint for
+communicating with frontends. The approach allows for a reverse-proxy front end
+on a public server over Tailscale for remote devices. It also maintains a local
+blob store, the database, and a connection to a remote minio server for backing
+up blobs and retrieving missing blobs.
+
+If a standard HTTP API is needed, it can be added in later. One potential use
+for this is for retrieving blobs (e.g. GET /artifacts/blob/id/...).
+
+## Supportability
+
+### Failure scenarios
+
+#### Data corruption
+
+If the data is corrupted locally, a local import from the remote end would
+restore it. Alternatively, it may be restored from local backups.
+
+If the data is corrupted remotely, a local export to the remote end would
+restore it.
+
+### Platform support
+
+The main program would ideally run on Linux primarily, but I'd like to be able
+to use it on my Windows desktop too.
+
+### Packaging and deployment
+
+## Security
+
+The gRPC endpoint should be authenticated. The system is intended to operate
+over localhost or a local network, so the use of TLS is probably untenable.
+[minica](https://github.com/jsha/minica) is an option, but then key rotation
+needs to be built in.
+
+A possible workaround is to only enable authentication (HTTP basic auth will
+suffice) on the reverse proxy, which will also have TLS.
+
+## Project Dependencies
+
+The software should rely on no external sources, except for the software
+packages that it uses. This can be mitigated with vendoring.
+
+## Open Issues
+  
+* If I track each time a page was edited, does it make sense to roll this up?
+  e.g. I don't track edits to the second, but maybe to the hour or day.
+ 
+
+## Milestones
+
+1. Specifications
+   a. Write up spec for the artifact repository data structures.
+   b. Write up a spec for the knowledge graph data structures.
+2. Core systems
+   a. Build the artifact repository server.
+   b. Build the backend for the knowledge graph.
+   c. Build rough CLI interfaces to both.
+3. Build the user interfaces.
+   a. Simple note taking.
+   b. Artifact upload and searching by tag, content type, title.
+
+## Review History
+
+This may not be applicable, but it's usually nice to have someone else
+sanity check this.
+
+Keep a table of who reviewed the doc and when, for in-person reviews. Consider
+having at least 1 in-person review.
+
+
+