add spec to additional location.

2022-02-23 22:57:05 -08:00
parent 41a42c7a59
commit 8d6b87ba73
1 changed files with 238 additions and 0 deletions
--- a/content/pages/spec.html
+++ b/content/pages/spec.html
@@ -0,0 +1,238 @@
 Title: Functional Spec for the Exocortex
 Tags: specs
 kExocortex is a tool for capturing and retaining knowledge, making it
 searchable.
 This is the initial top-level draft to sort out the high-level vision.
 ## Summary
 The more you learn, the harder it is to recall specific things. Fortunately,
 computers are generally pretty good at remembering things. kExocortex is
 my attempt at building a knowledge graph for long-term memory.
 In addition to having functionality like notetaking systems like
 [Dendron](https://dendron.so), I'd like to keep track of what I call artifacts.
 An artifact is a source of some knowledge; it might be a PDF copy of a book, an
 image, or a URL.
 In a perfect world, I would have a local copy of everything with a remote backup.
 The remote backup lets me restore the exocortex in the event of data loss.
 ## Usage sketches
 ### Research mode
 If I am researching a topic, I have a top-level node that contains all the
 research I'm working on. I can link artifacts to a note, including URLs. One of
 the reasons it makes sense to attach a URL to a document is that I can reuse
 them, as well as go back and search URLs based on tags or categories. It would
 make sense to tag any artifacts with relevant tags from the note once it is saved.
 For example, let's say that I am research graphing databases. In Dendron, this
 note lives under `comp.database.graph`. I might find this O'Reilly book on
 [Neo4J](https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_Graph_Algorithms.pdf)
 that discusses graph algorithms. I might link it here, and I might link it
 under a Neo4J-specific node. I would store the PDF in an artifact repository,
 adding relevant tags (such as "graph-database", "neo4j", "oreilly") and
 categorize it under books, PDFs, comp/database/graph/neo4j.
 Going forward, if I want to revisit the book, I don't have to find it online
 again. It's easily accessible from the artifact repository.
 The user interface for the knowledge graph should show a list of associated
 artifacts.
 Nodes are also timestamped; I am leaning towards keep track of every time a
 page was edited (but probably not the edits). If I know I was researching
 graph databases last week, and I log the URLs I was reading as artifacts,
 I have a better history of what I was reading.
 ### Reading from a mobile device
 Sometimes I'm on my iPad or phone, and I want to save the link I'm reading. I
 should be able to stash documents, URLs, etc, in the artifact repository. This
 implies a remote endpoint that I can enter a URL and a tag, and have that
 entered into the artifact repository later.
 ### Cataloging artifacts
 If I've entered a bunch of artifacts, I should be able to see a list of ones
 that need categorizing or that aren't attached to a node.
 ### Autotagging
 The interface should search the text of a note to identify any tags. This
 brings up an important feature: notes consist of cells, and each cell has a
 type. The primary use case is to support markdown formatting and code blocks,
 while not touching the code blocks during autotagging. For example,
 ```
 ---
 node: today.2022.02.21
 ---
 I figured out how to get Cayley running in production.
 \```
 cayleyd --some flag --other flag
 \```
 ```
 The exocortex would see Cayley, identify that as a node, and add the tags for
 that node to this one. It might see production and add that as a tag, e.g. for
 ops-related stuff.
 ### Fast capture
 I should be able to enter a quick note, which would go under a daily node tree.
 Something like `quick.2022-02-27.1534`.
 This would get autotagged. Quick notes might also get a metadata tag indicating
 whether I went back and integrated them into the rest of the knowledge graph.
 One way I could use this might be to text or email a note, or to have a quick
 capture program on my computer.
 ## Requirements & Assumptions
 What should it do? What assumptions are being made about it? What's
 considered "in scope" and what won't the project try to do?
 Does it need to be compatible with any existing solutions or systems?
 If it's a daemon, how are you going to manage it?
 What are the dependencies that are assumed to be available?
 ## System Design
 ### Major components
 The system has two logical components: the artifact repository and the
 knowledge graph.
 #### Artifact repository
 There should be, at a minimum, a local artifact repository. It will have its
 own tooling and UI for interaction, as well as being linked to the knowledge
 graph.
 Previous prototypes stored artifact metadata in SQLite, and the contents of the
 artifacts in a blob store. The blob store is a content-addressable system for
 retrieving arbitrary data. A remote option might use an S3-equivalent like
 Minio.
 #### Knowledge graph
 The knowledge graph stores nodes. The current model stores the graph in SQLite,
 using an external file sync (e.g. syncthing) to sync the databases across
 machines.
 ### Data model
 Previous prototypes used separate SQLite databases for the artifact repository
 and the knowledge graph. 
 #### Single SQLite database
 The concern with a single SQLite database is that it would be accessed by two
 different systems, causing potential locking issues.
 This could be solved by a single unified backend server; this is the preferred
 approach.
 #### Split SQLite databases
 The original prototype split the databases for performance reasons. However,
 this was based on any empirical evidence.
 The major downside to this is that tags and categories are not shared between
 the artifact repository and the knowledge graph. Categories might make sense
 for splitting; e.g. an artifact category might be 'PDF' while a node might have
 the category 'Research'. However, tags should be shared between both systems.
 #### PostgreSQL database
 Another option is to to use postgres. This brings a heavy ops cost, while
 enabling a variety of replication and backup strategies.
 ### Architectural overview
 [![The exocortex architecture](/files/i/t/exo-arch.jpg)](/files/i/exo-arch.jpg)
 There is a backend server, `exod`, that will have a gRPC endpoint for
 communicating with frontends. The approach allows for a reverse-proxy front end
 on a public server over Tailscale for remote devices. It also maintains a local
 blob store, the database, and a connection to a remote minio server for backing
 up blobs and retrieving missing blobs.
 If a standard HTTP API is needed, it can be added in later. One potential use
 for this is for retrieving blobs (e.g. GET /artifacts/blob/id/...).
 ## Supportability
 ### Failure scenarios
 #### Data corruption
 If the data is corrupted locally, a local import from the remote end would
 restore it. Alternatively, it may be restored from local backups.
 If the data is corrupted remotely, a local export to the remote end would
 restore it.
 ### Platform support
 The main program would ideally run on Linux primarily, but I'd like to be able
 to use it on my Windows desktop too.
 ### Packaging and deployment
 ## Security
 The gRPC endpoint should be authenticated. The system is intended to operate
 over localhost or a local network, so the use of TLS is probably untenable.
 [minica](https://github.com/jsha/minica) is an option, but then key rotation
 needs to be built in.
 A possible workaround is to only enable authentication (HTTP basic auth will
 suffice) on the reverse proxy, which will also have TLS.
 ## Project Dependencies
 The software should rely on no external sources, except for the software
 packages that it uses. This can be mitigated with vendoring.
 ## Open Issues
 * If I track each time a page was edited, does it make sense to roll this up?
  e.g. I don't track edits to the second, but maybe to the hour or day.
 ## Milestones
 1. Specifications
   a. Write up spec for the artifact repository data structures.
   b. Write up a spec for the knowledge graph data structures.
 2. Core systems
   a. Build the artifact repository server.
   b. Build the backend for the knowledge graph.
   c. Build rough CLI interfaces to both.
 3. Build the user interfaces.
   a. Simple note taking.
   b. Artifact upload and searching by tag, content type, title.
 ## Review History
 This may not be applicable, but it's usually nice to have someone else
 sanity check this.
 Keep a table of who reviewed the doc and when, for in-person reviews. Consider
 having at least 1 in-person review.