Initial import.

2022-02-23 22:55:41 -08:00
commit 41a42c7a59
48 changed files with 1876 additions and 0 deletions
--- a/content/files/bullet_scan.jpg
+++ b/content/files/bullet_scan.jpg
--- a/content/files/exo-arch.jpg
+++ b/content/files/exo-arch.jpg
--- a/content/files/i/bullet_scan.jpg
+++ b/content/files/i/bullet_scan.jpg
--- a/content/files/i/exo-arch.jpg
+++ b/content/files/i/exo-arch.jpg
--- a/content/files/i/on_exocortices_graph.jpg
+++ b/content/files/i/on_exocortices_graph.jpg
--- a/content/files/i/t/bullet_scan.jpg
+++ b/content/files/i/t/bullet_scan.jpg
--- a/content/files/i/t/exo-arch.jpg
+++ b/content/files/i/t/exo-arch.jpg
--- a/content/files/i/t/on_exocortices_graph.jpg
+++ b/content/files/i/t/on_exocortices_graph.jpg
--- a/content/files/on_exocortices_graph.jpg
+++ b/content/files/on_exocortices_graph.jpg
--- a/content/pages/about.md
+++ b/content/pages/about.md
@@ -0,0 +1,4 @@
+Title: About
+Slug: about
+
+This notebook covers a project to build my own knowledge management system.
--- a/content/pages/historical/bullet_notes.md
+++ b/content/pages/historical/bullet_notes.md
@@ -0,0 +1,75 @@
+Title: Bullet Journal Notes
+
+[![](/files/i/t/bullet_scan.jpg)](/files/i/bullet_scan.jpg)
+
+The original notes from my bullet journal that I took on my exocortex
+are found on page 109-110 of that journal; page 108 is journal entries
+for 2021-02-08 and 2021-02-16; page 111 starts with 2021-02-17. The
+first entry in the mercurial logs for the first pass I started last
+year was 2021-02-14.
+
+* Goal
+  - Collect artifacts + notes → current knowledge
+  - Daily writeups
+* Ex.
+  - Gemlog
+  - notes.org
+  - books/*.org
+  - PDF docs
+* SCM is a red herring
+* Evernote
+  - notes/folders
+  - clipper
+  - tags
+  - everything searchable
+  - synced (may be red herring)
+* Quiver
+  - cell types
+* Jupyter notebooks
+  - code
+  - markdown
+* swolfram
+  - [archive](https://writings.stephenwolfram.com/2019/02/seeking-the-productive-life-some-details-of-my-personal-infrastructure/)
+* Solution
+  - Minimal viable metadata
+    - Data
+    - Tags
+  - Artifact link
+    - Central artifact repo
+    - Hash/link
+    - Node type (article, notes, ...)
+  - Type: Node = Folder|Page|Artifact
+  - How to create a searchable index?
+  - Retain old copies
+  - Index header
+    - Date retrieved
+    - Doc ID
+    - Source
+* Artifact header
+  - Needs to support history
+  - Doc ID
+  - Date retrieved / stored
+  - Artifact date
+  - Source
+  - Artifact type
+  - Tags
+  - Category
+  - Blobs
+    - Format
+    - Blob ID
+* Central artifact repository
+  - Metadata index
+  - Blob store
+  - Upload interface
+* Elements of an exocortex
+  - Artifacts
+  - Artifact repository
+  - Notes
+  - Structure
+  - UI
+    - Query
+    - Exploratory
+    - Presentation
+    - Update
+  - Locality
+  - Totality
--- a/content/pages/historical/index.md
+++ b/content/pages/historical/index.md
@@ -0,0 +1,7 @@
+Title: Historical Notes
+Slug: index
+
+These pages track the origins of the exocortex.
+
+* [Original bullet journal notes](/historical/bullet_notes)
+* [On exocortices](/historical/on_exocortices)
--- a/content/pages/historical/on_exocortices.md
+++ b/content/pages/historical/on_exocortices.md
@@ -0,0 +1,299 @@
+Title: On Exocortices
+
+*This is from a gemlog post written on 2021-02-10.*
+
+This is a rough draft on some thoughts about exocortices that has been
+simmering in the back of my mind lately. The catalyst for writing it
+was reading Stephen Wolfram's (with all caveats that come with reading
+his posts) entry "Seeking the Productive Life: Some Details of My
+Personal Infrastructure".
+
+This is a rough draft on some thoughts about exocortices that has been
+simmering in the back of my mind lately. The catalyst for writing it
+was reading Stephen Wolfram's (with all caveats that come with reading
+his posts) entry "Seeking the Productive Life: Some Details of My
+Personal Infrastructure".
+
+## Background
+
+An exocortex is "a hypothetical artificial information-processing
+system that would augment a brain's biological cognitive processes." I
+have made many attempts at building my own, including
+
+* A web-based wiki (including my own custom solution, gitit,
+  MediaWiki, and others.
+* Org-mode based notes, including my current notes/notes.org system
+  (with subdirectories for other things such as book notes)
+* Evernote / Notion
+* The Quiver MacOS app
+* Experimenting in building custom exocortex software (e.g. kortex)
+* A daily weblog (e.g. the old ai6ua.net site) and gemlog to summarize
+  important knowledge gained that day.
+
+Each of these has their own shortcomings that don't quite match up
+with my expectations or desires. An exocortex must be a personalized
+system adapted to its user to maximise knowledge capture.
+
+Succinctly put, the goal of an
+[exocortex](https://en.wiktionary.org/wiki/exocortex) is to collect
+artifacts and notes (including daily notes), organize them, and allow
+for written summaries of current snapshots of my knowledge. Put
+another way, "artifacts + notes + graph structure = exocortex". Note
+that a folder hierarchy is a tree, which is a form of directed
+graph. Symlinks inside a folder act as edges to notes outside of that
+folder, refining the graph structure.
+
+This writeup is an attempt at characterising and exploring the
+exocortex problem space to capture my goals, serve as a foundation for
+the construction of such a system, and, through discussion of the
+problem space, tease out the structure of the problem to discover a
+closer approximation to the idealized reality of an exocortex system.
+
+## The elements of exocortices
+
+The elements of an exocortex, briefly touched on above and expanded
+below, include
+
+* artifacts,
+* the artifact repository,
+* notes,
+* structure,
+* a query interface,
+* an exploratory interface,
+* a presentation interface,
+* an update interface,
+* locality, and
+* totality.
+
+### Artifacts
+
+An artifact is any object that is not a textual writeup by me that
+should be referenceable as part of the exocortex. A copy of a paper
+from ArXiV might serve as an artifact. Importantly, artifacts must be
+locally-available. They serve as a snapshot of some source of
+knowledge, and should not be subject to link decay, future pay-walling
+(or loss of access to a pay-walled system), or loss of
+connectivity. An artifact should be timestamped: when was it captured?
+When was the artifact created upstream? An artifact must also have
+some associated upstream information --- how did it come to be in the
+repository?
+
+### The artifact repository
+
+An artifact may be relevant to more than one field of interest;
+accordingly, all artifacts should exist in a central repository. This
+repository should support artifact histories (e.g. collecting updates
+to artifacts, where the history is important in capturing a historical
+view of knowledge), multiple formats (a book may exist in PDF, EPUB,
+or other formats), and a mechanism for exploring, finding, and
+updating docs. The repository must capture relevant metadata about
+each artifact.
+
+### Notes
+
+A note is a written summary of a certain field. It should be in some
+rich-text format that supports linking as well as basic
+formatting. The ideal text format appears to be the org-mode format
+given its rich formatting and ability to transition fluidly between
+outline and full document; however, this may not be the final, most
+effective format. A note is the distillation of artifacts into an
+understandable form, providing avenues to discover specifics that may
+need to be held in working memory only briefly.
+
+### Structure
+
+A structured format allows for fast and efficient knowledge
+lookups. It grants the researcher a starting place with a set of rules
+governing where and how things may be found. It imposes order over
+chaos such that relevant kernels of knowledge may be retrieved and
+examined in an expedient manner. The metaphor that humans seem to
+adapt to the most readily is a graph structure, particularly those
+that are generally hierarchical in nature.
+
+### A query interface
+
+The exocortex and the artifact repository both require a query
+interface; they may be part of the same UI. A query UI allows a
+researcher to pose questions of the exocortex, directly looking for
+specific knowledge.
+
+The four interfaces (query, exploration, presentation, and update) may
+all be facets of the same interface, and they may benefit from a
+cohesive and unified interface; however, it is important that all of
+these use cases are considered and supported.
+
+### An exploratory interface
+
+The exploratory interface allows a researcher to meander through the
+knowledge store, exploring topics and potentially identifying new
+areas to push the knowledge sphere out further.
+
+### A presentation interface
+
+The presentation interface allows a set of notes to be shared with
+others; it should be possible to include some or all artifacts
+associated with these notes. For example, it may not be appropriate to
+share a copy of a book with the presentation, but it may be
+appropriate to share a copy of some of the supporting papers.
+
+### An update interface
+
+The update interface is where knowledge is added to the exocortex,
+whether through capturing an artifact or writing notes.
+
+### Locality
+
+An exocortex must be localized to the user, with the full repository
+available offline. Quick input or scratch pad notes might be
+available, but realistically, the cost of cloud storage and the
+transfer sizes mean that having the full exocortex available is
+unlikely. Instead, a hybrid model allowing quick captures of knowledge
+available remotely combined with a full exocortex on a local system
+presents the probably best solution.
+
+### Totality
+
+An exocortex represents the sum of the user's knowledge. There aren't
+separate exocortices for different areas. Everything I know should go
+into my exocortex.
+
+## Exploring the problem space
+
+In order to map out the structure of an exocortex, it's useful to
+review what has worked and what hasn't. Each alternative presented
+will consider what worked and what didn't to clarify what an effective
+exocortex looks like.
+
+### Git-backed wikis and plaintext folders
+
+At a high-level, wikis like Gitit and folders of plain-text (including
+org-mode) data are roughly equivalent; the differences lie primarily
+in how they are presented. Neither approach works well for indexing or
+organizing artifacts, and while some approaches like a scanner that
+adds notes to a SQLite database (for improved search performance).
+
+Using a folder of org-mode notes is probably one of the better
+note-taking interfaces that I have found; however, there is no notion
+of an artifact repository without considerable manual work.
+
+The main downsides to this approach are the lack of good query and
+exploration UIs, along with the lack of a useful artifact
+repository. The upsides are good updates and presentation interfaces.
+
+### Evernote and Notion
+
+Evernote (and also notion) provide a unified, searchable interface
+across multiple machines. Evernote in particular has a usable artifact
+repository, although information about upstream sources isn't
+available, nor are metadata about the object or the idea of multiple
+formats and history.
+
+Evernote is a paid service, and neither is particularly extensible to
+a user's needs. Exploring the exocortex is difficult, as there's no
+notion of an entry point. Presenting nodes is met with some success,
+albeit limited.
+
+### Quiver
+
+Quiver is an excellent note-taking application; however, it is
+MacOS-only. It does have some ability to import web pages, but in
+general it lacks any idea of an artifact repository. The ability to
+intersperse different cell types is good.
+
+### Jupyter notebooks
+
+Jupyter notebooks provide an excellent interface for interspersing
+computational ideas with prose; there is no notion of an artifact
+repository, however. Linking notebooks isn't supported, and there is
+no overall structure besides manual hyperlinking and a directory
+structure.
+
+## The artifact repository
+
+The artifact repository is one of the two pillars of the exocortex; it
+stores the "first hand" sources of knowledge.
+
+### The central index
+
+The first part of an artifact repository is a central index that
+provides
+
+* references and linking to artifacts,
+* a "blob" store that contains the artifacts, and
+* some management interface that allows adding and editing metadata as
+  well as adding artifacts.
+
+An artifact entry in the index contains, at a minimm,
+
+* An artifact identifier
+* Authorship information
+
+The artifact identifier is used to associate all related artifacts
+(e.g. previous revisions, different formats, etc.)
+
+### Artifacts
+
+An artifact consists of multiple components:
+
+* A primary metadata entry that organizes artifacts
+* Pointers to artifact "blobs"
+* A historical record of changed blobs
+
+The metadata header for an artifact should contain, at a minimum,
+fields for
+
+* Artifact identifier
+* A list of revisions
+
+Each artifact can have zero or more blobs associated. For example, a
+physical book reference might not have a blob associated; an ebook
+might have multiple blobs corresponding to different formats; and a
+webpage snapshot may have mulitple blobs representing revisions to the
+page.
+
+A blob header stores
+
+* The artifact identifier
+* The date retrieved or stored
+* The date of the artifact itself
+* The source
+* Blob type information (e.g. a MIME type)
+* A list of categories
+* A list of tags
+
+The headers should probably be stored in a database of some kind;
+SQLite is a good example for the first iteration. Blobs themselves
+will need to be stored on disk, probably in a format related to a hash
+of the blob contents, such as in a content-addressable store (CAS).
+
+## The exocortex
+
+The exocortex consists of a graph database that links notes. At a
+broad level, it should probably start with a root node that points to
+broad fields. The update interface should allow manipulation of nodes
+as graph nodes in addition to allowing for adding and editing notes. A
+node might be thought of as "type node = Note | ArtifactLink". That
+is, a note can link to other notes or to artifacts. A proper node
+title is the sum of the paths. For example, consider the following
+structure linked below:
+
+[![](/files/i/t/on_exocortices_graph.jpg)](/files/i/on_exocortices_graph.jpg)
+
+Different possibilities for naming note3 include:
+
+* root->note2->note3
+* root=>note2=>note3
+* root/note2/note3
+
+Personally, I prefer the arrow notation with equal sign. Each note can
+be shortened to a partial path; e.g. "note2=>note3". The title for
+each note can be stored in a metadata entry.
+
+
+## Next steps
+
+A first step is to start constructing an artifact repository. Once
+this is in place, a suitable graph database (for example,
+[cayley](https://github.com/cayleygraph/cayley)) should be identified,
+and an exocortex core developed. User interfaces will necessarily be
+developed alongside these systems.
--- a/content/pages/ls.md
+++ b/content/pages/ls.md
@@ -0,0 +1,4 @@
+Title: Pages
+
+* [Historical context](/historical/)
+* [Design docs](/specs/)
--- a/content/pages/specs/functional.md
+++ b/content/pages/specs/functional.md
@@ -0,0 +1,238 @@
+Title: Functional Spec for the Exocortex
+Tags: specs
+
+kExocortex is a tool for capturing and retaining knowledge, making it
+searchable.
+
+This is the initial top-level draft to sort out the high-level vision.
+
+## Summary
+
+The more you learn, the harder it is to recall specific things. Fortunately,
+computers are generally pretty good at remembering things. kExocortex is
+my attempt at building a knowledge graph for long-term memory.
+
+In addition to having functionality like notetaking systems like
+[Dendron](https://dendron.so), I'd like to keep track of what I call artifacts.
+An artifact is a source of some knowledge; it might be a PDF copy of a book, an
+image, or a URL.
+
+In a perfect world, I would have a local copy of everything with a remote backup.
+The remote backup lets me restore the exocortex in the event of data loss.
+
+## Usage sketches
+
+### Research mode
+
+If I am researching a topic, I have a top-level node that contains all the
+research I'm working on. I can link artifacts to a note, including URLs. One of
+the reasons it makes sense to attach a URL to a document is that I can reuse
+them, as well as go back and search URLs based on tags or categories. It would
+make sense to tag any artifacts with relevant tags from the note once it is saved.
+
+For example, let's say that I am research graphing databases. In Dendron, this
+note lives under `comp.database.graph`. I might find this O'Reilly book on
+[Neo4J](https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_Graph_Algorithms.pdf)
+that discusses graph algorithms. I might link it here, and I might link it
+under a Neo4J-specific node. I would store the PDF in an artifact repository,
+adding relevant tags (such as "graph-database", "neo4j", "oreilly") and
+categorize it under books, PDFs, comp/database/graph/neo4j.
+
+Going forward, if I want to revisit the book, I don't have to find it online
+again. It's easily accessible from the artifact repository.
+
+The user interface for the knowledge graph should show a list of associated
+artifacts.
+
+Nodes are also timestamped; I am leaning towards keep track of every time a
+page was edited (but probably not the edits). If I know I was researching
+graph databases last week, and I log the URLs I was reading as artifacts,
+I have a better history of what I was reading.
+
+### Reading from a mobile device
+
+Sometimes I'm on my iPad or phone, and I want to save the link I'm reading. I
+should be able to stash documents, URLs, etc, in the artifact repository. This
+implies a remote endpoint that I can enter a URL and a tag, and have that
+entered into the artifact repository later.
+
+### Cataloging artifacts
+
+If I've entered a bunch of artifacts, I should be able to see a list of ones
+that need categorizing or that aren't attached to a node.
+
+### Autotagging
+
+The interface should search the text of a note to identify any tags. This
+brings up an important feature: notes consist of cells, and each cell has a
+type. The primary use case is to support markdown formatting and code blocks,
+while not touching the code blocks during autotagging. For example,
+
+```
+---
+node: today.2022.02.21
+---
+
+I figured out how to get Cayley running in production.
+
+\```
+cayleyd --some flag --other flag
+\```
+```
+
+The exocortex would see Cayley, identify that as a node, and add the tags for
+that node to this one. It might see production and add that as a tag, e.g. for
+ops-related stuff.
+
+### Fast capture
+
+I should be able to enter a quick note, which would go under a daily node tree.
+Something like `quick.2022-02-27.1534`.
+
+This would get autotagged. Quick notes might also get a metadata tag indicating
+whether I went back and integrated them into the rest of the knowledge graph.
+
+One way I could use this might be to text or email a note, or to have a quick
+capture program on my computer.
+
+
+
+## Requirements & Assumptions
+
+What should it do? What assumptions are being made about it? What's
+considered "in scope" and what won't the project try to do?
+
+Does it need to be compatible with any existing solutions or systems?
+
+If it's a daemon, how are you going to manage it?
+
+What are the dependencies that are assumed to be available?
+
+## System Design
+
+### Major components
+
+The system has two logical components: the artifact repository and the
+knowledge graph.
+
+#### Artifact repository
+
+There should be, at a minimum, a local artifact repository. It will have its
+own tooling and UI for interaction, as well as being linked to the knowledge
+graph.
+
+Previous prototypes stored artifact metadata in SQLite, and the contents of the
+artifacts in a blob store. The blob store is a content-addressable system for
+retrieving arbitrary data. A remote option might use an S3-equivalent like
+Minio.
+
+#### Knowledge graph
+
+The knowledge graph stores nodes. The current model stores the graph in SQLite,
+using an external file sync (e.g. syncthing) to sync the databases across
+machines.
+
+### Data model
+
+Previous prototypes used separate SQLite databases for the artifact repository
+and the knowledge graph. 
+
+#### Single SQLite database
+
+The concern with a single SQLite database is that it would be accessed by two
+different systems, causing potential locking issues.
+
+This could be solved by a single unified backend server; this is the preferred
+approach.
+
+#### Split SQLite databases
+
+The original prototype split the databases for performance reasons. However,
+this was based on any empirical evidence.
+
+The major downside to this is that tags and categories are not shared between
+the artifact repository and the knowledge graph. Categories might make sense
+for splitting; e.g. an artifact category might be 'PDF' while a node might have
+the category 'Research'. However, tags should be shared between both systems.
+
+#### PostgreSQL database
+
+Another option is to to use postgres. This brings a heavy ops cost, while
+enabling a variety of replication and backup strategies.
+
+### Architectural overview
+
+[![The exocortex architecture](/files/i/t/exo-arch.jpg)](/files/i/exo-arch.jpg)
+
+There is a backend server, `exod`, that will have a gRPC endpoint for
+communicating with frontends. The approach allows for a reverse-proxy front end
+on a public server over Tailscale for remote devices. It also maintains a local
+blob store, the database, and a connection to a remote minio server for backing
+up blobs and retrieving missing blobs.
+
+If a standard HTTP API is needed, it can be added in later. One potential use
+for this is for retrieving blobs (e.g. GET /artifacts/blob/id/...).
+
+## Supportability
+
+### Failure scenarios
+
+#### Data corruption
+
+If the data is corrupted locally, a local import from the remote end would
+restore it. Alternatively, it may be restored from local backups.
+
+If the data is corrupted remotely, a local export to the remote end would
+restore it.
+
+### Platform support
+
+The main program would ideally run on Linux primarily, but I'd like to be able
+to use it on my Windows desktop too.
+
+### Packaging and deployment
+
+## Security
+
+The gRPC endpoint should be authenticated. The system is intended to operate
+over localhost or a local network, so the use of TLS is probably untenable.
+[minica](https://github.com/jsha/minica) is an option, but then key rotation
+needs to be built in.
+
+A possible workaround is to only enable authentication (HTTP basic auth will
+suffice) on the reverse proxy, which will also have TLS.
+
+## Project Dependencies
+
+The software should rely on no external sources, except for the software
+packages that it uses. This can be mitigated with vendoring.
+
+## Open Issues
+  
+* If I track each time a page was edited, does it make sense to roll this up?
+  e.g. I don't track edits to the second, but maybe to the hour or day.
+ 
+
+## Milestones
+
+1. Specifications
+   a. Write up spec for the artifact repository data structures.
+   b. Write up a spec for the knowledge graph data structures.
+2. Core systems
+   a. Build the artifact repository server.
+   b. Build the backend for the knowledge graph.
+   c. Build rough CLI interfaces to both.
+3. Build the user interfaces.
+   a. Simple note taking.
+   b. Artifact upload and searching by tag, content type, title.
+
+## Review History
+
+This may not be applicable, but it's usually nice to have someone else
+sanity check this.
+
+Keep a table of who reviewed the doc and when, for in-person reviews. Consider
+having at least 1 in-person review.
+
+
+
--- a/content/pages/specs/index.md
+++ b/content/pages/specs/index.md
@@ -0,0 +1,4 @@
+Title: Design docs
+Tags: specs
+
+* [Top-level functional spec](/specs/functional.html)
--- a/content/posts/20220223.md
+++ b/content/posts/20220223.md
@@ -0,0 +1,32 @@
+Title: 20220223
+Slug: 20220223
+Date: 2022-02-23 22:22 PST
+Modified: 2022-02-23 22:25 PST
+Category: 
+Tags: journal
+Authors: kyle
+Summary: Design work on blobs.
+
+I finished writing out the basic blob database and structure types. The
+blob was a structure like
+
+```
+type Blob struct {
+	ID          string
+	Header      *Header
+	ContentType string
+	Contents    []byte
+}
+```
+
+I decided to make it an `io.Reader` to make it better handle large files;
+rather than load them entirely into memory, we can do a straight buffer.
+
+```
+type Blob struct {
+	ID          string
+	Header      *Header
+	ContentType string
+	r           io.Reader
+}
+```