Inline all data type definitions (Go structs, protobuf messages), the full SQLite schema (11 tables), CAS directory layout, and the dbObject interface directly into ARCHITECTURE.md so it is self-contained and does not depend on cross-references to docs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
476 lines
20 KiB
Markdown
476 lines
20 KiB
Markdown
# Architecture
|
|
|
|
Technical reference for kExocortex — a personal knowledge management system that combines an **artifact repository** (source documents) with a **knowledge graph** (notes and ideas) into a unified, searchable exocortex.
|
|
|
|
Core formula: **artifacts + notes + graph structure = exocortex**
|
|
|
|
## Tech Stack
|
|
|
|
| Role | Technology |
|
|
|------|-----------|
|
|
| Backend server, CLI tools | Go |
|
|
| Desktop application | Kotlin |
|
|
| Metadata storage | SQLite |
|
|
| Blob storage | Content-addressable store (SHA256) |
|
|
| Client-server communication | gRPC / Protobuf |
|
|
| Remote blob backup | Minio |
|
|
| Secure remote access | Tailscale |
|
|
|
|
## System Components
|
|
|
|
```
|
|
┌──────────────────┐ gRPC ┌──────────────────────────┐
|
|
│ Kotlin Desktop │◄════════════════►│ │
|
|
│ (all UI facets) │ │ │
|
|
└──────────────────┘ │ exod │
|
|
│ (Go daemon) │
|
|
┌──────────────────┐ gRPC │ │
|
|
│ CLI tools │◄════════════════►│ sole owner of all data │
|
|
│ (Go binaries) │ │ │
|
|
└──────────────────┘ └─────┬──────┬──────┬──────┘
|
|
│ │ │
|
|
┌─────────────┘ │ └─────────────┐
|
|
│ │ │
|
|
┌─────▼──────┐ ┌──────▼───────┐ ┌──────▼──────┐
|
|
│ SQLite │ │ Local Blob │ │ Minio │
|
|
│ Database │ │ Store (CAS) │ │ (remote) │
|
|
└────────────┘ └──────────────┘ └─────────────┘
|
|
|
|
Remote access:
|
|
┌────────┐ HTTPS ┌─────────────────────┐ Tailscale ┌──────┐
|
|
│ Mobile │──────────►│ Reverse Proxy │────────────►│ exod │
|
|
│ Device │ │ (TLS + basic auth) │ │ │
|
|
└────────┘ └─────────────────────┘ └──────┘
|
|
```
|
|
|
|
Three runtime components exist:
|
|
|
|
- **exod** — The Go backend daemon. Sole owner of the SQLite database and blob store. All reads and writes go through exod. No client accesses storage directly.
|
|
- **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC.
|
|
- **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC.
|
|
|
|
## Data Model
|
|
|
|
### Shared Types
|
|
|
|
Common to all objects across both pillars.
|
|
|
|
```go
|
|
// Header is attached to every persistent object.
|
|
type Header struct {
|
|
ID string // UUID
|
|
Type ObjectType
|
|
Created int64
|
|
Modified int64
|
|
Categories []string
|
|
Tags []string
|
|
Meta Metadata
|
|
}
|
|
|
|
// Metadata is a flexible key-value store for arbitrary attributes.
|
|
type Value struct {
|
|
Contents string
|
|
Type string // e.g. "string", "int", "unspecified"
|
|
}
|
|
|
|
type Metadata map[string]Value
|
|
```
|
|
|
|
All timestamps are UTC-encoded and must support dates prior to Unix epoch 0 (e.g., publication date of a historical text). Clients convert local time to UTC before sending to the server.
|
|
|
|
### Artifact Types
|
|
|
|
An artifact is a source of knowledge — a PDF, a book, a webpage, a paper. Artifacts have versioned snapshots, each containing one or more blobs in different formats.
|
|
|
|
```
|
|
Artifact ──► Snapshot(s) ──► Blob(s)
|
|
│ │
|
|
▼ ▼
|
|
Citation Citation (can override parent)
|
|
```
|
|
|
|
```go
|
|
// Artifact is the top-level container for a knowledge source.
|
|
type Artifact struct {
|
|
ID string
|
|
Type ArtifactType // see enumeration below
|
|
Latest time.Time // most recent Snapshot.DateTime
|
|
History map[time.Time]*Snapshot
|
|
}
|
|
|
|
// ArtifactType enumeration:
|
|
// Unknown, Custom, Article, Book, URL, Paper, Video, Image
|
|
// If Type is "Custom", Header.Meta must contain an "ArtifactType" entry.
|
|
|
|
// Snapshot represents content at a specific point in time or in a specific format.
|
|
// A website might have snapshots for different scrape dates; a book might have
|
|
// snapshots for different editions or formats (PDF and EPUB).
|
|
type Snapshot struct {
|
|
Header Header
|
|
ArtifactID string
|
|
Stored time.Time // when this snapshot was stored
|
|
DateTime time.Time // the time this snapshot represents
|
|
Citation *Citation // can override the artifact-level citation
|
|
Blobs map[MIME]*Blob // content keyed by MIME type
|
|
}
|
|
// MIME parameters can distinguish variants: "application/pdf; format=screen"
|
|
|
|
// Blob is a piece of content in the content-addressable store.
|
|
type Blob struct {
|
|
ID string // SHA256 hash of contents
|
|
Format string // MIME type
|
|
Body io.ReadCloser
|
|
}
|
|
|
|
// Citation holds bibliographic information. Nothing is strictly required.
|
|
// A citation occurs at the artifact level, but snapshots can override specific
|
|
// fields (e.g., a different edition's ISBN).
|
|
type Citation struct {
|
|
Header Header
|
|
DOI string
|
|
Title string
|
|
Year int
|
|
Published time.Time
|
|
Authors []string
|
|
Publisher *Publisher
|
|
Source string // URL or origin
|
|
Abstract string
|
|
}
|
|
|
|
type Publisher struct {
|
|
Header Header
|
|
Name string
|
|
Address string
|
|
}
|
|
```
|
|
|
|
### Knowledge Graph Types
|
|
|
|
The knowledge graph stores notes as nodes in a directed graph. Each node contains cells (content blocks) and is connected to other nodes and artifacts via facts.
|
|
|
|
```go
|
|
// Node is an entity in the knowledge graph.
|
|
// Conceptually: Node = Note | ArtifactLink
|
|
type Node struct {
|
|
Header Header
|
|
Parent string // parent node ID (C2 wiki style hierarchy)
|
|
Children []string // child node IDs
|
|
}
|
|
|
|
// Cell is a content unit within a note. Inspired by Quiver's cell-based
|
|
// structure — a note is composed of multiple cells of different types.
|
|
type Cell struct {
|
|
Header Header
|
|
NodeID string
|
|
Contents []byte
|
|
Type string // "markdown", "code", etc.
|
|
}
|
|
```
|
|
|
|
Facts record relationships using an entity-attribute-value model with transactional history:
|
|
|
|
```protobuf
|
|
message Name {
|
|
string id = 1; // UUID
|
|
string common = 2; // human-readable name
|
|
}
|
|
|
|
message Attribute {
|
|
Name name = 1;
|
|
Value value = 2;
|
|
}
|
|
|
|
message Transaction {
|
|
int64 timestamp = 1;
|
|
}
|
|
|
|
message Fact {
|
|
Name entity = 1;
|
|
Attribute attribute = 2;
|
|
Value value = 3;
|
|
Transaction transaction = 4;
|
|
bool retraction = 5; // true = this fact is being retracted
|
|
}
|
|
```
|
|
|
|
A Fact with `retraction = true` marks a previous fact as no longer valid without deleting history. The transaction timestamp records when the fact was asserted or retracted.
|
|
|
|
## Database Schema
|
|
|
|
Single unified SQLite database. Tags and categories are shared across both pillars — this is the primary reason for a unified database rather than one per pillar.
|
|
|
|
### Shared Infrastructure
|
|
|
|
```sql
|
|
-- Polymorphic key-value metadata. The id column references any object's UUID.
|
|
CREATE TABLE metadata
|
|
(
|
|
id TEXT NOT NULL, -- owner UUID
|
|
mkey TEXT NOT NULL,
|
|
contents TEXT NOT NULL,
|
|
type TEXT NOT NULL,
|
|
PRIMARY KEY (mkey, contents, type),
|
|
UNIQUE (id, mkey)
|
|
);
|
|
CREATE INDEX idx_metadata_id ON metadata (id);
|
|
|
|
-- Shared tag pool (used by both artifacts and knowledge graph nodes).
|
|
CREATE TABLE tags
|
|
(
|
|
id TEXT NOT NULL PRIMARY KEY, -- UUID
|
|
tag TEXT NOT NULL UNIQUE
|
|
);
|
|
|
|
-- Shared category pool.
|
|
CREATE TABLE categories
|
|
(
|
|
id TEXT NOT NULL PRIMARY KEY, -- UUID
|
|
category TEXT NOT NULL UNIQUE
|
|
);
|
|
```
|
|
|
|
### Bibliographic
|
|
|
|
```sql
|
|
CREATE TABLE publishers
|
|
(
|
|
id TEXT UNIQUE NOT NULL PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
address TEXT,
|
|
UNIQUE (name, address)
|
|
);
|
|
|
|
CREATE TABLE citations
|
|
(
|
|
id TEXT PRIMARY KEY,
|
|
doi TEXT,
|
|
title TEXT NOT NULL,
|
|
year INTEGER NOT NULL,
|
|
published TEXT NOT NULL, -- ISO 8601 UTC
|
|
publisher TEXT NOT NULL,
|
|
source TEXT NOT NULL,
|
|
abstract TEXT,
|
|
FOREIGN KEY (publisher) REFERENCES publishers (id)
|
|
);
|
|
CREATE INDEX idx_citations_doi ON citations (id, doi);
|
|
|
|
-- Many-to-one: multiple authors per citation.
|
|
CREATE TABLE authors
|
|
(
|
|
citation_id TEXT NOT NULL,
|
|
author_name TEXT NOT NULL,
|
|
FOREIGN KEY (citation_id) REFERENCES citations (id)
|
|
);
|
|
```
|
|
|
|
### Artifact Repository
|
|
|
|
```sql
|
|
CREATE TABLE artifacts
|
|
(
|
|
id TEXT PRIMARY KEY,
|
|
type TEXT NOT NULL, -- ArtifactType enumeration
|
|
citation_id TEXT NOT NULL,
|
|
latest TEXT NOT NULL, -- ISO 8601 UTC (most recent snapshot)
|
|
FOREIGN KEY (citation_id) REFERENCES citations (id)
|
|
);
|
|
|
|
-- Many-to-many junction tables for classification.
|
|
CREATE TABLE artifact_tags
|
|
(
|
|
artifact_id TEXT NOT NULL,
|
|
tag_id TEXT NOT NULL,
|
|
FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
|
|
FOREIGN KEY (tag_id) REFERENCES tags (id)
|
|
);
|
|
|
|
CREATE TABLE artifact_categories
|
|
(
|
|
artifact_id TEXT NOT NULL,
|
|
category_id TEXT NOT NULL,
|
|
FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
|
|
FOREIGN KEY (category_id) REFERENCES categories (id)
|
|
);
|
|
|
|
-- Temporal index linking artifacts to snapshots by datetime.
|
|
CREATE TABLE artifacts_history
|
|
(
|
|
artifact_id TEXT NOT NULL,
|
|
snapshot_id TEXT NOT NULL UNIQUE,
|
|
datetime TEXT NOT NULL,
|
|
PRIMARY KEY (artifact_id, datetime),
|
|
FOREIGN KEY (artifact_id) REFERENCES artifacts (id)
|
|
);
|
|
|
|
-- Snapshot records with storage and content timestamps.
|
|
CREATE TABLE artifact_snapshots
|
|
(
|
|
artifact_id TEXT NOT NULL,
|
|
id TEXT UNIQUE PRIMARY KEY,
|
|
stored_at INTEGER NOT NULL, -- Unix epoch (when stored)
|
|
datetime TEXT NOT NULL, -- ISO 8601 UTC (what time this represents)
|
|
citation_id TEXT NOT NULL,
|
|
source TEXT NOT NULL,
|
|
FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
|
|
FOREIGN KEY (id) REFERENCES artifacts_history (snapshot_id)
|
|
);
|
|
|
|
-- Blob registry. Actual content lives in the CAS on disk.
|
|
CREATE TABLE blobs
|
|
(
|
|
snapshot_id TEXT NOT NULL,
|
|
id TEXT NOT NULL UNIQUE PRIMARY KEY, -- SHA256 hash
|
|
format TEXT NOT NULL, -- MIME type
|
|
FOREIGN KEY (snapshot_id) REFERENCES artifact_snapshots (id)
|
|
);
|
|
```
|
|
|
|
### Knowledge Graph (to be implemented)
|
|
|
|
Tables for nodes, cells, facts, and graph edges will be added to the same database. They will reuse the `tags`, `categories`, and `metadata` tables via the shared UUID-based foreign key pattern.
|
|
|
|
## Content-Addressable Store
|
|
|
|
Blob content is stored on the local filesystem, addressed by SHA256 hash.
|
|
|
|
- **Base path**: `$HOME/exo/blobs/`
|
|
- **Directory layout**: The hex hash is split into 4-character segments as nested directories. For example, hash `a1b2c3d4e5f67890...` is stored at `a1b2/c3d4/e5f6/7890/.../a1b2c3d4e5f67890...`
|
|
- **Deduplication**: Identical content from different snapshots shares the same file (same hash = same path)
|
|
- **Registration**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash. The hash doubles as both the blob's database ID and its filesystem path key.
|
|
- **Backup**: A sync queue in exod replicates blobs to a remote Minio (S3-compatible) server asynchronously
|
|
- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access
|
|
|
|
## Layered Architecture
|
|
|
|
### Layer 1: Storage
|
|
|
|
Two storage mechanisms, separated by purpose:
|
|
|
|
**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined (see schema above). A single unified database is used so that tags and categories are shared across both pillars.
|
|
|
|
**Content-addressable blob store** stores actual artifact content on the local filesystem (see CAS section above). This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage.
|
|
|
|
Together, the database and blob store form a single logical unit that must stay consistent.
|
|
|
|
### Layer 2: Domain Model
|
|
|
|
Three Go packages implement the data model:
|
|
|
|
- **`core`** — Shared types: `Header`, `Metadata`, `Value`, UUID generation
|
|
- **`artifacts`** — Artifact repository: `Artifact`, `Snapshot`, `Blob`, `Citation`, `Publisher`, tag/category management
|
|
- **`kg`** — Knowledge graph: `Node`, `Cell`, `Fact`
|
|
|
|
All persistent types implement the `dbObject` interface:
|
|
|
|
```go
|
|
type dbObject interface {
|
|
Get(ctx context.Context, tx *sql.Tx) error
|
|
Store(ctx context.Context, tx *sql.Tx) error
|
|
}
|
|
```
|
|
|
|
### Layer 3: Service
|
|
|
|
The `exod` gRPC server is the exclusive gateway to all data:
|
|
|
|
- Manages transaction boundaries (begin, commit/rollback)
|
|
- Handles blob lifecycle (hash content, write to CAS, register in SQLite, queue for Minio sync)
|
|
- Runs the Minio sync queue for asynchronous backup replication
|
|
- Exposes gRPC endpoints defined in `.proto` files for all CRUD operations on both pillars
|
|
|
|
### Layer 4: Presentation
|
|
|
|
A single Kotlin desktop application handles both artifact management and knowledge graph interaction, following the Obsidian model. CLI tools provide a scriptable alternative.
|
|
|
|
#### Desktop Application Layout
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ [Command Palette: Ctrl+Shift+A] [Search: Ctrl+F] │
|
|
├──────────────┬──────────────────────────────────────────────┤
|
|
│ │ │
|
|
│ Sidebar │ Main Panel │
|
|
│ │ │
|
|
│ Tree/ │ Contextual view based on selection: │
|
|
│ Outline │ • Note editor (cell-based) │
|
|
│ View │ • Artifact detail (citation, snapshots) │
|
|
│ │ • Search results │
|
|
│ │ • Catalog (items needing attention) │
|
|
│ │ │
|
|
├──────────────┴──────────────────────────────────────────────┤
|
|
│ [Graph View toggle] │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Sidebar** — Tree/outline view of the knowledge graph hierarchy as primary navigation. Artifacts appear under their linked nodes; unlinked artifacts appear in a dedicated section. Collapsible, like Obsidian's file explorer.
|
|
|
|
**Main panel** — Changes contextually:
|
|
- **Note view**: Cell-based editor (markdown, code blocks). Associated artifacts listed inline. Dendron-style Ctrl+L for note creation.
|
|
- **Artifact view**: Citation details, snapshot history, blob preview (PDF, HTML). Tag/category editing. Link to nodes.
|
|
- **Search view**: Unified results from both pillars. Selector prefixes for precision: `artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`.
|
|
- **Catalog view**: Surfaces untagged, uncategorized, or unlinked artifacts needing attention.
|
|
|
|
**Graph view** — Secondary visualization available as a toggle or separate pane, showing nodes and their connections (like Obsidian's graph view). Useful for exploration and discovering clusters.
|
|
|
|
**Command palette** — Ctrl+Shift+A (IntelliJ-style) for quick actions: create note, import artifact, search, switch views, manage tags.
|
|
|
|
#### CLI Tools
|
|
|
|
Go binaries connecting to exod via gRPC for automation, bulk operations, and scripting. Commands: `import`, `tag`, `cat`, `search`.
|
|
|
|
## Data Flow
|
|
|
|
### Importing an Artifact
|
|
|
|
1. Client sends artifact metadata (citation, tags, categories) and blob data to exod via gRPC
|
|
2. exod begins a database transaction
|
|
3. Tags and categories are created if they don't exist (idempotent upsert)
|
|
4. Publisher is resolved (lookup by name+address, create if missing)
|
|
5. Citation is stored with publisher FK and author records
|
|
6. Artifact header is stored with citation FK
|
|
7. For each snapshot: store snapshot record, then for each blob: compute SHA256, write file to CAS directory, insert blob record
|
|
8. History entries are recorded linking artifact to snapshots by datetime
|
|
9. Transaction commits
|
|
10. Blobs are queued for Minio sync
|
|
|
|
### Querying by Tag
|
|
|
|
1. Client sends a tag string to exod
|
|
2. Tag name is resolved to its UUID via the `tags` table
|
|
3. The `artifact_tags` junction table is queried for matching artifact IDs
|
|
4. Full artifact headers are hydrated (citation, publisher, tags, categories, metadata)
|
|
5. Results are returned; blob data is not fetched until explicitly requested
|
|
|
|
### Creating a Knowledge Graph Note
|
|
|
|
1. Client sends node metadata and cell contents
|
|
2. exod creates a Node with a UUID
|
|
3. Cells are stored with their content type (markdown, code, etc.)
|
|
4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts
|
|
5. Tags from the note content are cross-referenced with the shared tag pool
|
|
|
|
## Cross-Pillar Integration
|
|
|
|
The architectural core that makes kExocortex more than the sum of its parts:
|
|
|
|
- **Shared taxonomy**: Tags and categories exist in a single pool used by both artifacts and knowledge graph nodes. This enables cross-pillar queries: "show me everything tagged X."
|
|
- **Node-to-artifact links**: Knowledge graph nodes can reference artifacts by ID, so the graph contains both original analysis and source material references.
|
|
- **Shared metadata**: The polymorphic `metadata` table uses the owner's UUID as a foreign key, attaching key-value metadata to any object in either pillar.
|
|
- **Cell-artifact bridging**: A Cell within a note can embed references to artifacts, linking prose analysis directly to source material.
|
|
|
|
## Network & Access
|
|
|
|
- **Local-first**: exod, the database, and the blob store all live on the local filesystem. Full functionality requires no network.
|
|
- **Tailscale reverse proxy**: For remote/mobile access. TLS and HTTP basic auth terminate at the proxy, not at exod.
|
|
- **Minio backup**: Blob replication to remote S3-compatible storage, managed by an async sync queue in exod. This is a backup/restore mechanism, not a primary access path.
|
|
|
|
## Key Design Decisions
|
|
|
|
| Decision | Alternative | Rationale |
|
|
|----------|-------------|-----------|
|
|
| Single unified SQLite database | Split databases per pillar | Shared tag/category pool, single transaction scope, simpler backup. exod resolves SQLite locking concerns. |
|
|
| Content-addressable blob store | Store blobs in SQLite | Blobs can be arbitrarily large (PDFs, videos). CAS provides deduplication. SQLite isn't designed for large binary storage. |
|
|
| gRPC / Protobuf | REST / JSON | Typed contracts, efficient binary serialization, bidirectional streaming for future use (e.g., upload progress). |
|
|
| Kotlin desktop app | Web frontend | Desktop-native performance for large document collections. Offline-capable. No browser dependency. |
|
|
| SQLite | PostgreSQL | Zero ops cost, single-file backup, embedded in server process. Single-user system doesn't need concurrent write scaling. |
|