exo/docs/exocortex.org

#+TITLE: On Exocortices
#+AUTHOR: Kyle Isom

* Document history
  + [2021-02-10 Wed] first draft

* Background

  An exocortex is [[https://en.wiktionary.org/wiki/exocortex]["a
  hypothetical artificial information-processing system that would
  augment a brain's biological cognitive processes."]] I have made
  many attempts at building my own, including

  + A web-based wiki (including my own custom solution, [[https://github.com/jgm/gitit][gitit]],
    [[https://www.mediawiki.org/wiki/MediaWiki][MediaWiki]], and others)
  + Org-mode based notes, including my current =notes/notes.org= system
    (with subdirectories for other things such as book notes)
  + [[https://evernote.com/][Evernote]] / [[https://www.notion.so/][Notion.so]]
  + The [[https://happenapps.com/][Quiver MacOS app]]
  + Experimenting in building custom exocortex software (e.g. kortex)
  + A daily weblog (e.g. the old ai6ua.net site) and gemlog to
    summarize important knowledge gained that day.

  Each of these has their own shortcomings that don't quite match up
  with my expectations or desires. An exocortex must be a personalized
  system adapted to its user to maximise knowledge capture.

  Succinctly put, the goal of an exocortex is to collect artifacts and
  notes (including daily notes), organize them, and allow for written
  summaries of current snapshots of my knowledge. Put another way,
  /artifacts + notes + graph structure = exocortex/. Note that a folder
  hierarchy is a tree, which is a form of directed graph. Symlinks
  inside a folder act as edges to notes outside of that folder,
  refining the graph structure.

  This writeup is an attempt at characterising and exploring the
  exocortex problem space to capture my goals, serve as a foundation
  for the construction of such a system, and, through discussion of
  the problem space, tease out the structure of the problem to
  discover a closer approximation to the idealized reality of an
  exocortex system.

* The elements of exocortices

  The elements of an exocortex, briefly touched on above and expanded
  below, include

  + artifacts,
  + the artifact repository,
  + notes,
  + structure,
  + a query interface,
  + an exploratory interface,
  + a presentation interface,
  + an update interface, and
  + locality.

** Artifacts

   An artifact is any object that is not a textual writeup by me that
   should be referenceable as part of the exocortex. A copy of a paper
   from ArXiV might serve as an artifact. Importantly, artifacts must
   be locally-available. They serve as a snapshot of some source of
   knowledge, and should not be subject to link decay, future
   pay-walling (or loss of access to a pay-walled system), or loss of
   connectivity. An artifact should be timestamped: when was it
   captured? When was the artifact created upstream? An artifact must
   also have some associated upstream information --- how did it come
   to be in the repository?

** The artifact repository

   An artifact may be relevant to more than one field of interest;
   accordingly, all artifacts should exist in a central
   repository. This repository should support artifact histories
   (e.g. collecting updates to artifacts, where the history is
   important in capturing a historical view of knowledge), multiple
   formats (a book may exist in PDF, EPUB, or other formats), and a
   mechanism for exploring, finding, and updating docs. The repository
   must capture relevant metadata about each artifact.

** Notes

   A note is a written summary of a certain field. It should be in
   some rich-text format that supports linking as well as basic
   formatting. The ideal text format appears to be the org-mode format
   given its rich formatting and ability to transition fluidly between
   outline and full document; however, this may not be the final, most
   effective format. A note is the distillation of artifacts into an
   understandable form, providing avenues to discover specifics that
   may need to be held in working memory only briefly.

** Structure

   A structured format allows for fast and efficient knowledge
   lookups. It grants the researcher a starting place with a set of
   rules governing where and how things may be found. It imposes order
   over chaos such that relevant kernels of knowledge may be retrieved
   and examined in an expedient manner. The metaphor that humans seem
   to adapt to the most readily is a graph structure, particularly
   those that are generally hierarchical in nature.

** A query interface

   The exocortex and the artifact repository both require a query
   interface; they may be part of the same UI. A query UI allows a
   researcher to pose questions of the exocortex, directly looking for
   specific knowledge.

   The four interfaces (query, exploration, presentation, and update)
   may all be facets of the same interface, and they may benefit from
   a cohesive and unified interface; however, it is important that all
   of these use cases are considered and supported.

** An exploratory interface

   The exploratory interface allows a researcher to meander through
   the knowledge store, exploring topics and potentially identifying
   new areas to push the knowledge sphere out further.

** A presentation interface

   The presentation interface allows a set of notes to be shared with
   others; it should be possible to include some or all artifacts
   associated with these notes. For example, it may not be appropriate
   to share a copy of a book with the presentation, but it may be
   appropriate to share a copy of some of the supporting papers.

** An update interface

   The update interface is where knowledge is added to the exocortex,
   whether through capturing an artifact or writing notes.

** Locality

   An exocortex must be localized to the user, with the full
   repository available offline. Quick input or scratch pad notes
   might be available, but realistically, the cost of cloud storage
   and the transfer sizes mean that having the full exocortex
   available is unlikely. Instead, a hybrid model allowing quick
   captures of knowledge available remotely combined with a full
   exocortex on a local system presents the probably best solution.

* Exploring the problem space

  In order to map out the structure of an exocortex, it's useful to
  review what has worked and what hasn't. Each alternative presented
  will consider what worked and what didn't to clarify what an
  effective exocortex looks like.

** Git-backed wikis and plaintext folders

   At a high-level, wikis like Gitit and folders of plain-text
   (including org-mode) data are roughly equivalent; the differences
   lie primarily in how they are presented. Neither approach works
   well for indexing or organizing artifacts, and while some
   approaches like a scanner that adds notes to a SQLite database (for
   improved search performance).

   Using a folder of org-mode notes is probably one of the better
   note-taking interfaces that I have found; however, there is no
   notion of an artifact repository without considerable manual work.

   The main downsides to this approach are the lack of good query and
   exploration UIs, along with the lack of a useful artifact
   repository. The upsides are good updates and presentation
   interfaces.

** Evernote and Notion

   Evernote (and also notion) provide a unified, searchable interface
   across multiple machines. Evernote in particular has a usable
   artifact repository, although information about upstream sources
   isn't available, nor are metadata about the object or the idea of
   multiple formats and history.

   Evernote is a paid service, and neither is particularly extensible
   to a user's needs. Exploring the exocortex is difficult, as there's
   no notion of an entry point. Presenting nodes is met with some
   success, albeit limited.

** Quiver

   Quiver is an excellent note-taking application; however, it is
   MacOS-only. It does have some ability to import web pages, but in
   general it lacks any idea of an artifact repository. The ability to
   intersperse different cell types is good.

** Jupyter notebooks

   Jupyter notebooks provide an excellent interface for interspersing
   computational ideas with prose; there is no notion of an artifact
   repository, however. Linking notebooks isn't supported, and there
   is no overall structure besides manual hyperlinking and a directory
   structure.

* The artifact repository

  The artifact repository is one of the two pillars of the exocortex;
  it stores the "first hand" sources of knowledge.

** The central index

  The first part of an artifact repository is a central index that
  provides

  + references and linking to artifacts,
  + a "blob" store that contains the artifacts, and
  + some management interface that allows adding and editing metadata
    as well as adding artifacts.

  An artifact entry in the index contains, at a minimm,

  + An artifact identifier
  + Authorship information

  The artifact identifier is used to associate all related artifacts
  (e.g. previous revisions, different formats, etc.)

** Artifacts

   An artifact consists of multiple components:

   + A primary metadata entry that organizes artifacts
   + Pointers to artifact "blobs"
   + A historical record of changed blobs

   The metadata header for an artifact should contain, at a minimum,
   fields for

   + Artifact identifier
   + A list of revisions

   Each artifact can have zero or more blobs associated. For example,
   a physical book reference might not have a blob associated; an
   ebook might have multiple blobs corresponding to different formats;
   and a webpage snapshot may have mulitple blobs representing
   revisions to the page.

   A blob header stores

   + The artifact identifier
   + The date retrieved or stored
   + The date of the artifact itself
   + The source
   + Blob type information (e.g. a MIME type)
   + A list of categories
   + A list of tags

  The headers should probably be stored in a database of some kind;
  SQLite is a good example for the first iteration. Blobs themselves
  will need to be stored on disk, probably in a format related to a
  hash of the blob contents, such as in a [[https://en.wikipedia.org/wiki/Content-addressable_storage][content-addressable store]]
  (CAM).

* The exocortex

  The exocortex consists of a graph database that links notes. At a
  broad level, it should probably start with a root node that points
  to broad fields. The update interface should allow manipulation of
  nodes as graph nodes in addition to allowing for adding and editing
  notes. A node might be thought of as =type node = Note |
  ArtifactLink=. That is, a note can link to other notes or to
  artifacts. A proper node title is the sum of the paths. For example,
  consider the following structure:


* Next steps

  A first step is to start constructing an artifact repository. Once
  this is in place, a suitable graph database (for example, [[https://github.com/cayleygraph/cayley][cayley]])
  should be identified, and an exocortex core developed. User
  interfaces will necessarily be developed alongside these systems.