add spec to additional location.
This commit is contained in:
parent
41a42c7a59
commit
8d6b87ba73
|
@ -0,0 +1,238 @@
|
|||
Title: Functional Spec for the Exocortex
|
||||
Tags: specs
|
||||
|
||||
kExocortex is a tool for capturing and retaining knowledge, making it
|
||||
searchable.
|
||||
|
||||
This is the initial top-level draft to sort out the high-level vision.
|
||||
|
||||
## Summary
|
||||
|
||||
The more you learn, the harder it is to recall specific things. Fortunately,
|
||||
computers are generally pretty good at remembering things. kExocortex is
|
||||
my attempt at building a knowledge graph for long-term memory.
|
||||
|
||||
In addition to having functionality like notetaking systems like
|
||||
[Dendron](https://dendron.so), I'd like to keep track of what I call artifacts.
|
||||
An artifact is a source of some knowledge; it might be a PDF copy of a book, an
|
||||
image, or a URL.
|
||||
|
||||
In a perfect world, I would have a local copy of everything with a remote backup.
|
||||
The remote backup lets me restore the exocortex in the event of data loss.
|
||||
|
||||
## Usage sketches
|
||||
|
||||
### Research mode
|
||||
|
||||
If I am researching a topic, I have a top-level node that contains all the
|
||||
research I'm working on. I can link artifacts to a note, including URLs. One of
|
||||
the reasons it makes sense to attach a URL to a document is that I can reuse
|
||||
them, as well as go back and search URLs based on tags or categories. It would
|
||||
make sense to tag any artifacts with relevant tags from the note once it is saved.
|
||||
|
||||
For example, let's say that I am research graphing databases. In Dendron, this
|
||||
note lives under `comp.database.graph`. I might find this O'Reilly book on
|
||||
[Neo4J](https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_Graph_Algorithms.pdf)
|
||||
that discusses graph algorithms. I might link it here, and I might link it
|
||||
under a Neo4J-specific node. I would store the PDF in an artifact repository,
|
||||
adding relevant tags (such as "graph-database", "neo4j", "oreilly") and
|
||||
categorize it under books, PDFs, comp/database/graph/neo4j.
|
||||
|
||||
Going forward, if I want to revisit the book, I don't have to find it online
|
||||
again. It's easily accessible from the artifact repository.
|
||||
|
||||
The user interface for the knowledge graph should show a list of associated
|
||||
artifacts.
|
||||
|
||||
Nodes are also timestamped; I am leaning towards keep track of every time a
|
||||
page was edited (but probably not the edits). If I know I was researching
|
||||
graph databases last week, and I log the URLs I was reading as artifacts,
|
||||
I have a better history of what I was reading.
|
||||
|
||||
### Reading from a mobile device
|
||||
|
||||
Sometimes I'm on my iPad or phone, and I want to save the link I'm reading. I
|
||||
should be able to stash documents, URLs, etc, in the artifact repository. This
|
||||
implies a remote endpoint that I can enter a URL and a tag, and have that
|
||||
entered into the artifact repository later.
|
||||
|
||||
### Cataloging artifacts
|
||||
|
||||
If I've entered a bunch of artifacts, I should be able to see a list of ones
|
||||
that need categorizing or that aren't attached to a node.
|
||||
|
||||
### Autotagging
|
||||
|
||||
The interface should search the text of a note to identify any tags. This
|
||||
brings up an important feature: notes consist of cells, and each cell has a
|
||||
type. The primary use case is to support markdown formatting and code blocks,
|
||||
while not touching the code blocks during autotagging. For example,
|
||||
|
||||
```
|
||||
---
|
||||
node: today.2022.02.21
|
||||
---
|
||||
|
||||
I figured out how to get Cayley running in production.
|
||||
|
||||
\```
|
||||
cayleyd --some flag --other flag
|
||||
\```
|
||||
```
|
||||
|
||||
The exocortex would see Cayley, identify that as a node, and add the tags for
|
||||
that node to this one. It might see production and add that as a tag, e.g. for
|
||||
ops-related stuff.
|
||||
|
||||
### Fast capture
|
||||
|
||||
I should be able to enter a quick note, which would go under a daily node tree.
|
||||
Something like `quick.2022-02-27.1534`.
|
||||
|
||||
This would get autotagged. Quick notes might also get a metadata tag indicating
|
||||
whether I went back and integrated them into the rest of the knowledge graph.
|
||||
|
||||
One way I could use this might be to text or email a note, or to have a quick
|
||||
capture program on my computer.
|
||||
|
||||
|
||||
|
||||
## Requirements & Assumptions
|
||||
|
||||
What should it do? What assumptions are being made about it? What's
|
||||
considered "in scope" and what won't the project try to do?
|
||||
|
||||
Does it need to be compatible with any existing solutions or systems?
|
||||
|
||||
If it's a daemon, how are you going to manage it?
|
||||
|
||||
What are the dependencies that are assumed to be available?
|
||||
|
||||
## System Design
|
||||
|
||||
### Major components
|
||||
|
||||
The system has two logical components: the artifact repository and the
|
||||
knowledge graph.
|
||||
|
||||
#### Artifact repository
|
||||
|
||||
There should be, at a minimum, a local artifact repository. It will have its
|
||||
own tooling and UI for interaction, as well as being linked to the knowledge
|
||||
graph.
|
||||
|
||||
Previous prototypes stored artifact metadata in SQLite, and the contents of the
|
||||
artifacts in a blob store. The blob store is a content-addressable system for
|
||||
retrieving arbitrary data. A remote option might use an S3-equivalent like
|
||||
Minio.
|
||||
|
||||
#### Knowledge graph
|
||||
|
||||
The knowledge graph stores nodes. The current model stores the graph in SQLite,
|
||||
using an external file sync (e.g. syncthing) to sync the databases across
|
||||
machines.
|
||||
|
||||
### Data model
|
||||
|
||||
Previous prototypes used separate SQLite databases for the artifact repository
|
||||
and the knowledge graph.
|
||||
|
||||
#### Single SQLite database
|
||||
|
||||
The concern with a single SQLite database is that it would be accessed by two
|
||||
different systems, causing potential locking issues.
|
||||
|
||||
This could be solved by a single unified backend server; this is the preferred
|
||||
approach.
|
||||
|
||||
#### Split SQLite databases
|
||||
|
||||
The original prototype split the databases for performance reasons. However,
|
||||
this was based on any empirical evidence.
|
||||
|
||||
The major downside to this is that tags and categories are not shared between
|
||||
the artifact repository and the knowledge graph. Categories might make sense
|
||||
for splitting; e.g. an artifact category might be 'PDF' while a node might have
|
||||
the category 'Research'. However, tags should be shared between both systems.
|
||||
|
||||
#### PostgreSQL database
|
||||
|
||||
Another option is to to use postgres. This brings a heavy ops cost, while
|
||||
enabling a variety of replication and backup strategies.
|
||||
|
||||
### Architectural overview
|
||||
|
||||
[![The exocortex architecture](/files/i/t/exo-arch.jpg)](/files/i/exo-arch.jpg)
|
||||
|
||||
There is a backend server, `exod`, that will have a gRPC endpoint for
|
||||
communicating with frontends. The approach allows for a reverse-proxy front end
|
||||
on a public server over Tailscale for remote devices. It also maintains a local
|
||||
blob store, the database, and a connection to a remote minio server for backing
|
||||
up blobs and retrieving missing blobs.
|
||||
|
||||
If a standard HTTP API is needed, it can be added in later. One potential use
|
||||
for this is for retrieving blobs (e.g. GET /artifacts/blob/id/...).
|
||||
|
||||
## Supportability
|
||||
|
||||
### Failure scenarios
|
||||
|
||||
#### Data corruption
|
||||
|
||||
If the data is corrupted locally, a local import from the remote end would
|
||||
restore it. Alternatively, it may be restored from local backups.
|
||||
|
||||
If the data is corrupted remotely, a local export to the remote end would
|
||||
restore it.
|
||||
|
||||
### Platform support
|
||||
|
||||
The main program would ideally run on Linux primarily, but I'd like to be able
|
||||
to use it on my Windows desktop too.
|
||||
|
||||
### Packaging and deployment
|
||||
|
||||
## Security
|
||||
|
||||
The gRPC endpoint should be authenticated. The system is intended to operate
|
||||
over localhost or a local network, so the use of TLS is probably untenable.
|
||||
[minica](https://github.com/jsha/minica) is an option, but then key rotation
|
||||
needs to be built in.
|
||||
|
||||
A possible workaround is to only enable authentication (HTTP basic auth will
|
||||
suffice) on the reverse proxy, which will also have TLS.
|
||||
|
||||
## Project Dependencies
|
||||
|
||||
The software should rely on no external sources, except for the software
|
||||
packages that it uses. This can be mitigated with vendoring.
|
||||
|
||||
## Open Issues
|
||||
|
||||
* If I track each time a page was edited, does it make sense to roll this up?
|
||||
e.g. I don't track edits to the second, but maybe to the hour or day.
|
||||
|
||||
|
||||
## Milestones
|
||||
|
||||
1. Specifications
|
||||
a. Write up spec for the artifact repository data structures.
|
||||
b. Write up a spec for the knowledge graph data structures.
|
||||
2. Core systems
|
||||
a. Build the artifact repository server.
|
||||
b. Build the backend for the knowledge graph.
|
||||
c. Build rough CLI interfaces to both.
|
||||
3. Build the user interfaces.
|
||||
a. Simple note taking.
|
||||
b. Artifact upload and searching by tag, content type, title.
|
||||
|
||||
## Review History
|
||||
|
||||
This may not be applicable, but it's usually nice to have someone else
|
||||
sanity check this.
|
||||
|
||||
Keep a table of who reviewed the doc and when, for in-person reviews. Consider
|
||||
having at least 1 in-person review.
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue