Initial import.

This commit is contained in:
2022-02-23 22:55:41 -08:00
commit 41a42c7a59
48 changed files with 1876 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 326 KiB

BIN
content/files/exo-arch.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 326 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

4
content/pages/about.md Normal file
View File

@@ -0,0 +1,4 @@
Title: About
Slug: about
This notebook covers a project to build my own knowledge management system.

View File

@@ -0,0 +1,75 @@
Title: Bullet Journal Notes
[![](/files/i/t/bullet_scan.jpg)](/files/i/bullet_scan.jpg)
The original notes from my bullet journal that I took on my exocortex
are found on page 109-110 of that journal; page 108 is journal entries
for 2021-02-08 and 2021-02-16; page 111 starts with 2021-02-17. The
first entry in the mercurial logs for the first pass I started last
year was 2021-02-14.
* Goal
- Collect artifacts + notes → current knowledge
- Daily writeups
* Ex.
- Gemlog
- notes.org
- books/*.org
- PDF docs
* SCM is a red herring
* Evernote
- notes/folders
- clipper
- tags
- everything searchable
- synced (may be red herring)
* Quiver
- cell types
* Jupyter notebooks
- code
- markdown
* swolfram
- [archive](https://writings.stephenwolfram.com/2019/02/seeking-the-productive-life-some-details-of-my-personal-infrastructure/)
* Solution
- Minimal viable metadata
- Data
- Tags
- Artifact link
- Central artifact repo
- Hash/link
- Node type (article, notes, ...)
- Type: Node = Folder|Page|Artifact
- How to create a searchable index?
- Retain old copies
- Index header
- Date retrieved
- Doc ID
- Source
* Artifact header
- Needs to support history
- Doc ID
- Date retrieved / stored
- Artifact date
- Source
- Artifact type
- Tags
- Category
- Blobs
- Format
- Blob ID
* Central artifact repository
- Metadata index
- Blob store
- Upload interface
* Elements of an exocortex
- Artifacts
- Artifact repository
- Notes
- Structure
- UI
- Query
- Exploratory
- Presentation
- Update
- Locality
- Totality

View File

@@ -0,0 +1,7 @@
Title: Historical Notes
Slug: index
These pages track the origins of the exocortex.
* [Original bullet journal notes](/historical/bullet_notes)
* [On exocortices](/historical/on_exocortices)

View File

@@ -0,0 +1,299 @@
Title: On Exocortices
*This is from a gemlog post written on 2021-02-10.*
This is a rough draft on some thoughts about exocortices that has been
simmering in the back of my mind lately. The catalyst for writing it
was reading Stephen Wolfram's (with all caveats that come with reading
his posts) entry "Seeking the Productive Life: Some Details of My
Personal Infrastructure".
This is a rough draft on some thoughts about exocortices that has been
simmering in the back of my mind lately. The catalyst for writing it
was reading Stephen Wolfram's (with all caveats that come with reading
his posts) entry "Seeking the Productive Life: Some Details of My
Personal Infrastructure".
## Background
An exocortex is "a hypothetical artificial information-processing
system that would augment a brain's biological cognitive processes." I
have made many attempts at building my own, including
* A web-based wiki (including my own custom solution, gitit,
MediaWiki, and others.
* Org-mode based notes, including my current notes/notes.org system
(with subdirectories for other things such as book notes)
* Evernote / Notion
* The Quiver MacOS app
* Experimenting in building custom exocortex software (e.g. kortex)
* A daily weblog (e.g. the old ai6ua.net site) and gemlog to summarize
important knowledge gained that day.
Each of these has their own shortcomings that don't quite match up
with my expectations or desires. An exocortex must be a personalized
system adapted to its user to maximise knowledge capture.
Succinctly put, the goal of an
[exocortex](https://en.wiktionary.org/wiki/exocortex) is to collect
artifacts and notes (including daily notes), organize them, and allow
for written summaries of current snapshots of my knowledge. Put
another way, "artifacts + notes + graph structure = exocortex". Note
that a folder hierarchy is a tree, which is a form of directed
graph. Symlinks inside a folder act as edges to notes outside of that
folder, refining the graph structure.
This writeup is an attempt at characterising and exploring the
exocortex problem space to capture my goals, serve as a foundation for
the construction of such a system, and, through discussion of the
problem space, tease out the structure of the problem to discover a
closer approximation to the idealized reality of an exocortex system.
## The elements of exocortices
The elements of an exocortex, briefly touched on above and expanded
below, include
* artifacts,
* the artifact repository,
* notes,
* structure,
* a query interface,
* an exploratory interface,
* a presentation interface,
* an update interface,
* locality, and
* totality.
### Artifacts
An artifact is any object that is not a textual writeup by me that
should be referenceable as part of the exocortex. A copy of a paper
from ArXiV might serve as an artifact. Importantly, artifacts must be
locally-available. They serve as a snapshot of some source of
knowledge, and should not be subject to link decay, future pay-walling
(or loss of access to a pay-walled system), or loss of
connectivity. An artifact should be timestamped: when was it captured?
When was the artifact created upstream? An artifact must also have
some associated upstream information --- how did it come to be in the
repository?
### The artifact repository
An artifact may be relevant to more than one field of interest;
accordingly, all artifacts should exist in a central repository. This
repository should support artifact histories (e.g. collecting updates
to artifacts, where the history is important in capturing a historical
view of knowledge), multiple formats (a book may exist in PDF, EPUB,
or other formats), and a mechanism for exploring, finding, and
updating docs. The repository must capture relevant metadata about
each artifact.
### Notes
A note is a written summary of a certain field. It should be in some
rich-text format that supports linking as well as basic
formatting. The ideal text format appears to be the org-mode format
given its rich formatting and ability to transition fluidly between
outline and full document; however, this may not be the final, most
effective format. A note is the distillation of artifacts into an
understandable form, providing avenues to discover specifics that may
need to be held in working memory only briefly.
### Structure
A structured format allows for fast and efficient knowledge
lookups. It grants the researcher a starting place with a set of rules
governing where and how things may be found. It imposes order over
chaos such that relevant kernels of knowledge may be retrieved and
examined in an expedient manner. The metaphor that humans seem to
adapt to the most readily is a graph structure, particularly those
that are generally hierarchical in nature.
### A query interface
The exocortex and the artifact repository both require a query
interface; they may be part of the same UI. A query UI allows a
researcher to pose questions of the exocortex, directly looking for
specific knowledge.
The four interfaces (query, exploration, presentation, and update) may
all be facets of the same interface, and they may benefit from a
cohesive and unified interface; however, it is important that all of
these use cases are considered and supported.
### An exploratory interface
The exploratory interface allows a researcher to meander through the
knowledge store, exploring topics and potentially identifying new
areas to push the knowledge sphere out further.
### A presentation interface
The presentation interface allows a set of notes to be shared with
others; it should be possible to include some or all artifacts
associated with these notes. For example, it may not be appropriate to
share a copy of a book with the presentation, but it may be
appropriate to share a copy of some of the supporting papers.
### An update interface
The update interface is where knowledge is added to the exocortex,
whether through capturing an artifact or writing notes.
### Locality
An exocortex must be localized to the user, with the full repository
available offline. Quick input or scratch pad notes might be
available, but realistically, the cost of cloud storage and the
transfer sizes mean that having the full exocortex available is
unlikely. Instead, a hybrid model allowing quick captures of knowledge
available remotely combined with a full exocortex on a local system
presents the probably best solution.
### Totality
An exocortex represents the sum of the user's knowledge. There aren't
separate exocortices for different areas. Everything I know should go
into my exocortex.
## Exploring the problem space
In order to map out the structure of an exocortex, it's useful to
review what has worked and what hasn't. Each alternative presented
will consider what worked and what didn't to clarify what an effective
exocortex looks like.
### Git-backed wikis and plaintext folders
At a high-level, wikis like Gitit and folders of plain-text (including
org-mode) data are roughly equivalent; the differences lie primarily
in how they are presented. Neither approach works well for indexing or
organizing artifacts, and while some approaches like a scanner that
adds notes to a SQLite database (for improved search performance).
Using a folder of org-mode notes is probably one of the better
note-taking interfaces that I have found; however, there is no notion
of an artifact repository without considerable manual work.
The main downsides to this approach are the lack of good query and
exploration UIs, along with the lack of a useful artifact
repository. The upsides are good updates and presentation interfaces.
### Evernote and Notion
Evernote (and also notion) provide a unified, searchable interface
across multiple machines. Evernote in particular has a usable artifact
repository, although information about upstream sources isn't
available, nor are metadata about the object or the idea of multiple
formats and history.
Evernote is a paid service, and neither is particularly extensible to
a user's needs. Exploring the exocortex is difficult, as there's no
notion of an entry point. Presenting nodes is met with some success,
albeit limited.
### Quiver
Quiver is an excellent note-taking application; however, it is
MacOS-only. It does have some ability to import web pages, but in
general it lacks any idea of an artifact repository. The ability to
intersperse different cell types is good.
### Jupyter notebooks
Jupyter notebooks provide an excellent interface for interspersing
computational ideas with prose; there is no notion of an artifact
repository, however. Linking notebooks isn't supported, and there is
no overall structure besides manual hyperlinking and a directory
structure.
## The artifact repository
The artifact repository is one of the two pillars of the exocortex; it
stores the "first hand" sources of knowledge.
### The central index
The first part of an artifact repository is a central index that
provides
* references and linking to artifacts,
* a "blob" store that contains the artifacts, and
* some management interface that allows adding and editing metadata as
well as adding artifacts.
An artifact entry in the index contains, at a minimm,
* An artifact identifier
* Authorship information
The artifact identifier is used to associate all related artifacts
(e.g. previous revisions, different formats, etc.)
### Artifacts
An artifact consists of multiple components:
* A primary metadata entry that organizes artifacts
* Pointers to artifact "blobs"
* A historical record of changed blobs
The metadata header for an artifact should contain, at a minimum,
fields for
* Artifact identifier
* A list of revisions
Each artifact can have zero or more blobs associated. For example, a
physical book reference might not have a blob associated; an ebook
might have multiple blobs corresponding to different formats; and a
webpage snapshot may have mulitple blobs representing revisions to the
page.
A blob header stores
* The artifact identifier
* The date retrieved or stored
* The date of the artifact itself
* The source
* Blob type information (e.g. a MIME type)
* A list of categories
* A list of tags
The headers should probably be stored in a database of some kind;
SQLite is a good example for the first iteration. Blobs themselves
will need to be stored on disk, probably in a format related to a hash
of the blob contents, such as in a content-addressable store (CAS).
## The exocortex
The exocortex consists of a graph database that links notes. At a
broad level, it should probably start with a root node that points to
broad fields. The update interface should allow manipulation of nodes
as graph nodes in addition to allowing for adding and editing notes. A
node might be thought of as "type node = Note | ArtifactLink". That
is, a note can link to other notes or to artifacts. A proper node
title is the sum of the paths. For example, consider the following
structure linked below:
[![](/files/i/t/on_exocortices_graph.jpg)](/files/i/on_exocortices_graph.jpg)
Different possibilities for naming note3 include:
* root->note2->note3
* root=>note2=>note3
* root/note2/note3
Personally, I prefer the arrow notation with equal sign. Each note can
be shortened to a partial path; e.g. "note2=>note3". The title for
each note can be stored in a metadata entry.
## Next steps
A first step is to start constructing an artifact repository. Once
this is in place, a suitable graph database (for example,
[cayley](https://github.com/cayleygraph/cayley)) should be identified,
and an exocortex core developed. User interfaces will necessarily be
developed alongside these systems.

4
content/pages/ls.md Normal file
View File

@@ -0,0 +1,4 @@
Title: Pages
* [Historical context](/historical/)
* [Design docs](/specs/)

View File

@@ -0,0 +1,238 @@
Title: Functional Spec for the Exocortex
Tags: specs
kExocortex is a tool for capturing and retaining knowledge, making it
searchable.
This is the initial top-level draft to sort out the high-level vision.
## Summary
The more you learn, the harder it is to recall specific things. Fortunately,
computers are generally pretty good at remembering things. kExocortex is
my attempt at building a knowledge graph for long-term memory.
In addition to having functionality like notetaking systems like
[Dendron](https://dendron.so), I'd like to keep track of what I call artifacts.
An artifact is a source of some knowledge; it might be a PDF copy of a book, an
image, or a URL.
In a perfect world, I would have a local copy of everything with a remote backup.
The remote backup lets me restore the exocortex in the event of data loss.
## Usage sketches
### Research mode
If I am researching a topic, I have a top-level node that contains all the
research I'm working on. I can link artifacts to a note, including URLs. One of
the reasons it makes sense to attach a URL to a document is that I can reuse
them, as well as go back and search URLs based on tags or categories. It would
make sense to tag any artifacts with relevant tags from the note once it is saved.
For example, let's say that I am research graphing databases. In Dendron, this
note lives under `comp.database.graph`. I might find this O'Reilly book on
[Neo4J](https://go.neo4j.com/rs/710-RRC-335/images/Neo4j_Graph_Algorithms.pdf)
that discusses graph algorithms. I might link it here, and I might link it
under a Neo4J-specific node. I would store the PDF in an artifact repository,
adding relevant tags (such as "graph-database", "neo4j", "oreilly") and
categorize it under books, PDFs, comp/database/graph/neo4j.
Going forward, if I want to revisit the book, I don't have to find it online
again. It's easily accessible from the artifact repository.
The user interface for the knowledge graph should show a list of associated
artifacts.
Nodes are also timestamped; I am leaning towards keep track of every time a
page was edited (but probably not the edits). If I know I was researching
graph databases last week, and I log the URLs I was reading as artifacts,
I have a better history of what I was reading.
### Reading from a mobile device
Sometimes I'm on my iPad or phone, and I want to save the link I'm reading. I
should be able to stash documents, URLs, etc, in the artifact repository. This
implies a remote endpoint that I can enter a URL and a tag, and have that
entered into the artifact repository later.
### Cataloging artifacts
If I've entered a bunch of artifacts, I should be able to see a list of ones
that need categorizing or that aren't attached to a node.
### Autotagging
The interface should search the text of a note to identify any tags. This
brings up an important feature: notes consist of cells, and each cell has a
type. The primary use case is to support markdown formatting and code blocks,
while not touching the code blocks during autotagging. For example,
```
---
node: today.2022.02.21
---
I figured out how to get Cayley running in production.
\```
cayleyd --some flag --other flag
\```
```
The exocortex would see Cayley, identify that as a node, and add the tags for
that node to this one. It might see production and add that as a tag, e.g. for
ops-related stuff.
### Fast capture
I should be able to enter a quick note, which would go under a daily node tree.
Something like `quick.2022-02-27.1534`.
This would get autotagged. Quick notes might also get a metadata tag indicating
whether I went back and integrated them into the rest of the knowledge graph.
One way I could use this might be to text or email a note, or to have a quick
capture program on my computer.
## Requirements & Assumptions
What should it do? What assumptions are being made about it? What's
considered "in scope" and what won't the project try to do?
Does it need to be compatible with any existing solutions or systems?
If it's a daemon, how are you going to manage it?
What are the dependencies that are assumed to be available?
## System Design
### Major components
The system has two logical components: the artifact repository and the
knowledge graph.
#### Artifact repository
There should be, at a minimum, a local artifact repository. It will have its
own tooling and UI for interaction, as well as being linked to the knowledge
graph.
Previous prototypes stored artifact metadata in SQLite, and the contents of the
artifacts in a blob store. The blob store is a content-addressable system for
retrieving arbitrary data. A remote option might use an S3-equivalent like
Minio.
#### Knowledge graph
The knowledge graph stores nodes. The current model stores the graph in SQLite,
using an external file sync (e.g. syncthing) to sync the databases across
machines.
### Data model
Previous prototypes used separate SQLite databases for the artifact repository
and the knowledge graph.
#### Single SQLite database
The concern with a single SQLite database is that it would be accessed by two
different systems, causing potential locking issues.
This could be solved by a single unified backend server; this is the preferred
approach.
#### Split SQLite databases
The original prototype split the databases for performance reasons. However,
this was based on any empirical evidence.
The major downside to this is that tags and categories are not shared between
the artifact repository and the knowledge graph. Categories might make sense
for splitting; e.g. an artifact category might be 'PDF' while a node might have
the category 'Research'. However, tags should be shared between both systems.
#### PostgreSQL database
Another option is to to use postgres. This brings a heavy ops cost, while
enabling a variety of replication and backup strategies.
### Architectural overview
[![The exocortex architecture](/files/i/t/exo-arch.jpg)](/files/i/exo-arch.jpg)
There is a backend server, `exod`, that will have a gRPC endpoint for
communicating with frontends. The approach allows for a reverse-proxy front end
on a public server over Tailscale for remote devices. It also maintains a local
blob store, the database, and a connection to a remote minio server for backing
up blobs and retrieving missing blobs.
If a standard HTTP API is needed, it can be added in later. One potential use
for this is for retrieving blobs (e.g. GET /artifacts/blob/id/...).
## Supportability
### Failure scenarios
#### Data corruption
If the data is corrupted locally, a local import from the remote end would
restore it. Alternatively, it may be restored from local backups.
If the data is corrupted remotely, a local export to the remote end would
restore it.
### Platform support
The main program would ideally run on Linux primarily, but I'd like to be able
to use it on my Windows desktop too.
### Packaging and deployment
## Security
The gRPC endpoint should be authenticated. The system is intended to operate
over localhost or a local network, so the use of TLS is probably untenable.
[minica](https://github.com/jsha/minica) is an option, but then key rotation
needs to be built in.
A possible workaround is to only enable authentication (HTTP basic auth will
suffice) on the reverse proxy, which will also have TLS.
## Project Dependencies
The software should rely on no external sources, except for the software
packages that it uses. This can be mitigated with vendoring.
## Open Issues
* If I track each time a page was edited, does it make sense to roll this up?
e.g. I don't track edits to the second, but maybe to the hour or day.
## Milestones
1. Specifications
a. Write up spec for the artifact repository data structures.
b. Write up a spec for the knowledge graph data structures.
2. Core systems
a. Build the artifact repository server.
b. Build the backend for the knowledge graph.
c. Build rough CLI interfaces to both.
3. Build the user interfaces.
a. Simple note taking.
b. Artifact upload and searching by tag, content type, title.
## Review History
This may not be applicable, but it's usually nice to have someone else
sanity check this.
Keep a table of who reviewed the doc and when, for in-person reviews. Consider
having at least 1 in-person review.

View File

@@ -0,0 +1,4 @@
Title: Design docs
Tags: specs
* [Top-level functional spec](/specs/functional.html)

32
content/posts/20220223.md Normal file
View File

@@ -0,0 +1,32 @@
Title: 20220223
Slug: 20220223
Date: 2022-02-23 22:22 PST
Modified: 2022-02-23 22:25 PST
Category:
Tags: journal
Authors: kyle
Summary: Design work on blobs.
I finished writing out the basic blob database and structure types. The
blob was a structure like
```
type Blob struct {
ID string
Header *Header
ContentType string
Contents []byte
}
```
I decided to make it an `io.Reader` to make it better handle large files;
rather than load them entirely into memory, we can do a straight buffer.
```
type Blob struct {
ID string
Header *Header
ContentType string
r io.Reader
}
```