exo/ARCHITECTURE.md

# Architecture

Technical reference for kExocortex — a personal knowledge management system that combines an **artifact repository** (source documents) with a **knowledge graph** (notes and ideas) into a unified, searchable exocortex.

Core formula: **artifacts + notes + graph structure = exocortex**

## Tech Stack

| Role | Technology |
|------|-----------|
| Backend server, CLI tools | Go |
| Desktop application | Kotlin |
| Metadata storage | SQLite |
| Blob storage | Content-addressable store (SHA256) |
| Client-server communication | gRPC / Protobuf |
| Remote blob backup | Minio |
| Secure remote access | Tailscale |

## System Components

```
┌──────────────────┐       gRPC       ┌──────────────────────────┐
│  Kotlin Desktop  │◄════════════════►│                          │
│  (all UI facets) │                  │                          │
└──────────────────┘                  │         exod             │
                                      │      (Go daemon)         │
┌──────────────────┐       gRPC       │                          │
│    CLI tools     │◄════════════════►│  sole owner of all data  │
│   (Go binaries)  │                  │                          │
└──────────────────┘                  └─────┬──────┬──────┬──────┘
                                            │      │      │
                              ┌─────────────┘      │      └─────────────┐
                              │                    │                    │
                        ┌─────▼──────┐     ┌──────▼───────┐    ┌──────▼──────┐
                        │   SQLite   │     │  Local Blob  │    │    Minio    │
                        │  Database  │     │ Store (CAS)  │    │  (remote)   │
                        └────────────┘     └──────────────┘    └─────────────┘

Remote access:
┌────────┐   HTTPS   ┌─────────────────────┐  Tailscale  ┌──────┐
│ Mobile │──────────►│  Reverse Proxy       │────────────►│ exod │
│ Device │           │  (TLS + basic auth)  │             │      │
└────────┘           └─────────────────────┘              └──────┘
```

Three runtime components exist:

- **exod** — The Go backend daemon. Sole owner of the SQLite database and blob store. All reads and writes go through exod. No client accesses storage directly.
- **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC.
- **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC.

## Data Model

### Shared Types

Common to all objects across both pillars.

```go
// Header is attached to every persistent object.
type Header struct {
    ID         string     // UUID
    Type       ObjectType
    Created    int64
    Modified   int64
    Categories []string
    Tags       []string
    Meta       Metadata
}

// Metadata is a flexible key-value store for arbitrary attributes.
type Value struct {
    Contents string
    Type     string // e.g. "string", "int", "unspecified"
}

type Metadata map[string]Value
```

All timestamps are UTC-encoded and must support dates prior to Unix epoch 0 (e.g., publication date of a historical text). Clients convert local time to UTC before sending to the server.

### Artifact Types

An artifact is a source of knowledge — a PDF, a book, a webpage, a paper. Artifacts have versioned snapshots, each containing one or more blobs in different formats.

```
Artifact ──► Snapshot(s) ──► Blob(s)
    │              │
    ▼              ▼
 Citation      Citation (can override parent)
```

```go
// Artifact is the top-level container for a knowledge source.
type Artifact struct {
    ID      string
    Type    ArtifactType  // see enumeration below
    Latest  time.Time     // most recent Snapshot.DateTime
    History map[time.Time]*Snapshot
}

// ArtifactType enumeration:
//   Unknown, Custom, Article, Book, URL, Paper, Video, Image
// If Type is "Custom", Header.Meta must contain an "ArtifactType" entry.

// Snapshot represents content at a specific point in time or in a specific format.
// A website might have snapshots for different scrape dates; a book might have
// snapshots for different editions or formats (PDF and EPUB).
type Snapshot struct {
    Header     Header
    ArtifactID string
    Stored     time.Time         // when this snapshot was stored
    DateTime   time.Time         // the time this snapshot represents
    Citation   *Citation         // can override the artifact-level citation
    Blobs      map[MIME]*Blob    // content keyed by MIME type
}
// MIME parameters can distinguish variants: "application/pdf; format=screen"

// Blob is a piece of content in the content-addressable store.
type Blob struct {
    ID     string         // SHA256 hash of contents
    Format string         // MIME type
    Body   io.ReadCloser
}

// Citation holds bibliographic information. Nothing is strictly required.
// A citation occurs at the artifact level, but snapshots can override specific
// fields (e.g., a different edition's ISBN).
type Citation struct {
    Header    Header
    DOI       string
    Title     string
    Year      int
    Published time.Time
    Authors   []string
    Publisher *Publisher
    Source    string      // URL or origin
    Abstract  string
}

type Publisher struct {
    Header  Header
    Name    string
    Address string
}
```

### Knowledge Graph Types

The knowledge graph stores notes as nodes in a directed graph. Each node contains cells (content blocks) and is connected to other nodes and artifacts via facts.

```go
// Node is an entity in the knowledge graph.
// Conceptually: Node = Note | ArtifactLink
type Node struct {
    Header   Header
    Parent   string   // parent node ID (C2 wiki style hierarchy)
    Children []string // child node IDs
}

// Cell is a content unit within a note. Inspired by Quiver's cell-based
// structure — a note is composed of multiple cells of different types.
type Cell struct {
    Header   Header
    NodeID   string
    Contents []byte
    Type     string // "markdown", "code", etc.
}
```

Facts record relationships using an entity-attribute-value model with transactional history:

```protobuf
message Name {
    string id = 1;      // UUID
    string common = 2;  // human-readable name
}

message Attribute {
    Name name = 1;
    Value value = 2;
}

message Transaction {
    int64 timestamp = 1;
}

message Fact {
    Name entity = 1;
    Attribute attribute = 2;
    Value value = 3;
    Transaction transaction = 4;
    bool retraction = 5;  // true = this fact is being retracted
}
```

A Fact with `retraction = true` marks a previous fact as no longer valid without deleting history. The transaction timestamp records when the fact was asserted or retracted.

## Database Schema

Single unified SQLite database. Tags and categories are shared across both pillars — this is the primary reason for a unified database rather than one per pillar.

### Shared Infrastructure

```sql
-- Polymorphic key-value metadata. The id column references any object's UUID.
CREATE TABLE metadata
(
    id       TEXT NOT NULL,    -- owner UUID
    mkey     TEXT NOT NULL,
    contents TEXT NOT NULL,
    type     TEXT NOT NULL,
    PRIMARY KEY (mkey, contents, type),
    UNIQUE (id, mkey)
);
CREATE INDEX idx_metadata_id ON metadata (id);

-- Shared tag pool (used by both artifacts and knowledge graph nodes).
CREATE TABLE tags
(
    id  TEXT NOT NULL PRIMARY KEY,  -- UUID
    tag TEXT NOT NULL UNIQUE
);

-- Shared category pool.
CREATE TABLE categories
(
    id       TEXT NOT NULL PRIMARY KEY,  -- UUID
    category TEXT NOT NULL UNIQUE
);
```

### Bibliographic

```sql
CREATE TABLE publishers
(
    id      TEXT UNIQUE NOT NULL PRIMARY KEY,
    name    TEXT        NOT NULL,
    address TEXT,
    UNIQUE (name, address)
);

CREATE TABLE citations
(
    id        TEXT PRIMARY KEY,
    doi       TEXT,
    title     TEXT    NOT NULL,
    year      INTEGER NOT NULL,
    published TEXT    NOT NULL,  -- ISO 8601 UTC
    publisher TEXT    NOT NULL,
    source    TEXT    NOT NULL,
    abstract  TEXT,
    FOREIGN KEY (publisher) REFERENCES publishers (id)
);
CREATE INDEX idx_citations_doi ON citations (id, doi);

-- Many-to-one: multiple authors per citation.
CREATE TABLE authors
(
    citation_id TEXT NOT NULL,
    author_name TEXT NOT NULL,
    FOREIGN KEY (citation_id) REFERENCES citations (id)
);
```

### Artifact Repository

```sql
CREATE TABLE artifacts
(
    id          TEXT PRIMARY KEY,
    type        TEXT NOT NULL,     -- ArtifactType enumeration
    citation_id TEXT NOT NULL,
    latest      TEXT NOT NULL,     -- ISO 8601 UTC (most recent snapshot)
    FOREIGN KEY (citation_id) REFERENCES citations (id)
);

-- Many-to-many junction tables for classification.
CREATE TABLE artifact_tags
(
    artifact_id TEXT NOT NULL,
    tag_id      TEXT NOT NULL,
    FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
    FOREIGN KEY (tag_id) REFERENCES tags (id)
);

CREATE TABLE artifact_categories
(
    artifact_id TEXT NOT NULL,
    category_id TEXT NOT NULL,
    FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
    FOREIGN KEY (category_id) REFERENCES categories (id)
);

-- Temporal index linking artifacts to snapshots by datetime.
CREATE TABLE artifacts_history
(
    artifact_id TEXT NOT NULL,
    snapshot_id TEXT NOT NULL UNIQUE,
    datetime    TEXT NOT NULL,
    PRIMARY KEY (artifact_id, datetime),
    FOREIGN KEY (artifact_id) REFERENCES artifacts (id)
);

-- Snapshot records with storage and content timestamps.
CREATE TABLE artifact_snapshots
(
    artifact_id TEXT    NOT NULL,
    id          TEXT UNIQUE PRIMARY KEY,
    stored_at   INTEGER NOT NULL,     -- Unix epoch (when stored)
    datetime    TEXT    NOT NULL,      -- ISO 8601 UTC (what time this represents)
    citation_id TEXT    NOT NULL,
    source      TEXT    NOT NULL,
    FOREIGN KEY (artifact_id) REFERENCES artifacts (id),
    FOREIGN KEY (id) REFERENCES artifacts_history (snapshot_id)
);

-- Blob registry. Actual content lives in the CAS on disk.
CREATE TABLE blobs
(
    snapshot_id TEXT NOT NULL,
    id          TEXT NOT NULL UNIQUE PRIMARY KEY,  -- SHA256 hash
    format      TEXT NOT NULL,                     -- MIME type
    FOREIGN KEY (snapshot_id) REFERENCES artifact_snapshots (id)
);
```

### Knowledge Graph (to be implemented)

Tables for nodes, cells, facts, and graph edges will be added to the same database. They will reuse the `tags`, `categories`, and `metadata` tables via the shared UUID-based foreign key pattern.

## Content-Addressable Store

Blob content is stored on the local filesystem, addressed by SHA256 hash.

- **Base path**: `$HOME/exo/blobs/`
- **Directory layout**: The hex hash is split into 4-character segments as nested directories. For example, hash `a1b2c3d4e5f67890...` is stored at `a1b2/c3d4/e5f6/7890/.../a1b2c3d4e5f67890...`
- **Deduplication**: Identical content from different snapshots shares the same file (same hash = same path)
- **Registration**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash. The hash doubles as both the blob's database ID and its filesystem path key.
- **Backup**: A sync queue in exod replicates blobs to a remote Minio (S3-compatible) server asynchronously
- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access

## Layered Architecture

### Layer 1: Storage

Two storage mechanisms, separated by purpose:

**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined (see schema above). A single unified database is used so that tags and categories are shared across both pillars.

**Content-addressable blob store** stores actual artifact content on the local filesystem (see CAS section above). This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage.

Together, the database and blob store form a single logical unit that must stay consistent.

### Layer 2: Domain Model

Three Go packages implement the data model:

- **`core`** — Shared types: `Header`, `Metadata`, `Value`, UUID generation
- **`artifacts`** — Artifact repository: `Artifact`, `Snapshot`, `Blob`, `Citation`, `Publisher`, tag/category management
- **`kg`** — Knowledge graph: `Node`, `Cell`, `Fact`

All persistent types implement the `dbObject` interface:

```go
type dbObject interface {
    Get(ctx context.Context, tx *sql.Tx) error
    Store(ctx context.Context, tx *sql.Tx) error
}
```

### Layer 3: Service

The `exod` gRPC server is the exclusive gateway to all data:

- Manages transaction boundaries (begin, commit/rollback)
- Handles blob lifecycle (hash content, write to CAS, register in SQLite, queue for Minio sync)
- Runs the Minio sync queue for asynchronous backup replication
- Exposes gRPC endpoints defined in `.proto` files for all CRUD operations on both pillars

### Layer 4: Presentation

A single Kotlin desktop application handles both artifact management and knowledge graph interaction, following the Obsidian model. CLI tools provide a scriptable alternative.

#### Desktop Application Layout

```
┌─────────────────────────────────────────────────────────────┐
│  [Command Palette: Ctrl+Shift+A]          [Search: Ctrl+F] │
├──────────────┬──────────────────────────────────────────────┤
│              │                                              │
│  Sidebar     │  Main Panel                                  │
│              │                                              │
│  Tree/       │  Contextual view based on selection:         │
│  Outline     │  • Note editor (cell-based)                  │
│  View        │  • Artifact detail (citation, snapshots)     │
│              │  • Search results                            │
│              │  • Catalog (items needing attention)          │
│              │                                              │
├──────────────┴──────────────────────────────────────────────┤
│  [Graph View toggle]                                        │
└─────────────────────────────────────────────────────────────┘
```

**Sidebar** — Tree/outline view of the knowledge graph hierarchy as primary navigation. Artifacts appear under their linked nodes; unlinked artifacts appear in a dedicated section. Collapsible, like Obsidian's file explorer.

**Main panel** — Changes contextually:
- **Note view**: Cell-based editor (markdown, code blocks). Associated artifacts listed inline. Dendron-style Ctrl+L for note creation.
- **Artifact view**: Citation details, snapshot history, blob preview (PDF, HTML). Tag/category editing. Link to nodes.
- **Search view**: Unified results from both pillars. Selector prefixes for precision: `artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`.
- **Catalog view**: Surfaces untagged, uncategorized, or unlinked artifacts needing attention.

**Graph view** — Secondary visualization available as a toggle or separate pane, showing nodes and their connections (like Obsidian's graph view). Useful for exploration and discovering clusters.

**Command palette** — Ctrl+Shift+A (IntelliJ-style) for quick actions: create note, import artifact, search, switch views, manage tags.

#### CLI Tools

Go binaries connecting to exod via gRPC for automation, bulk operations, and scripting. Commands: `import`, `tag`, `cat`, `search`.

## Data Flow

### Importing an Artifact

1. Client sends artifact metadata (citation, tags, categories) and blob data to exod via gRPC
2. exod begins a database transaction
3. Tags and categories are created if they don't exist (idempotent upsert)
4. Publisher is resolved (lookup by name+address, create if missing)
5. Citation is stored with publisher FK and author records
6. Artifact header is stored with citation FK
7. For each snapshot: store snapshot record, then for each blob: compute SHA256, write file to CAS directory, insert blob record
8. History entries are recorded linking artifact to snapshots by datetime
9. Transaction commits
10. Blobs are queued for Minio sync

### Querying by Tag

1. Client sends a tag string to exod
2. Tag name is resolved to its UUID via the `tags` table
3. The `artifact_tags` junction table is queried for matching artifact IDs
4. Full artifact headers are hydrated (citation, publisher, tags, categories, metadata)
5. Results are returned; blob data is not fetched until explicitly requested

### Creating a Knowledge Graph Note

1. Client sends node metadata and cell contents
2. exod creates a Node with a UUID
3. Cells are stored with their content type (markdown, code, etc.)
4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts
5. Tags from the note content are cross-referenced with the shared tag pool

## Cross-Pillar Integration

The architectural core that makes kExocortex more than the sum of its parts:

- **Shared taxonomy**: Tags and categories exist in a single pool used by both artifacts and knowledge graph nodes. This enables cross-pillar queries: "show me everything tagged X."
- **Node-to-artifact links**: Knowledge graph nodes can reference artifacts by ID, so the graph contains both original analysis and source material references.
- **Shared metadata**: The polymorphic `metadata` table uses the owner's UUID as a foreign key, attaching key-value metadata to any object in either pillar.
- **Cell-artifact bridging**: A Cell within a note can embed references to artifacts, linking prose analysis directly to source material.

## Network & Access

- **Local-first**: exod, the database, and the blob store all live on the local filesystem. Full functionality requires no network.
- **Tailscale reverse proxy**: For remote/mobile access. TLS and HTTP basic auth terminate at the proxy, not at exod.
- **Minio backup**: Blob replication to remote S3-compatible storage, managed by an async sync queue in exod. This is a backup/restore mechanism, not a primary access path.

## Key Design Decisions

| Decision | Alternative | Rationale |
|----------|-------------|-----------|
| Single unified SQLite database | Split databases per pillar | Shared tag/category pool, single transaction scope, simpler backup. exod resolves SQLite locking concerns. |
| Content-addressable blob store | Store blobs in SQLite | Blobs can be arbitrarily large (PDFs, videos). CAS provides deduplication. SQLite isn't designed for large binary storage. |
| gRPC / Protobuf | REST / JSON | Typed contracts, efficient binary serialization, bidirectional streaming for future use (e.g., upload progress). |
| Kotlin desktop app | Web frontend | Desktop-native performance for large document collections. Offline-capable. No browser dependency. |
| SQLite | PostgreSQL | Zero ops cost, single-file backup, embedded in server process. Single-user system doesn't need concurrent write scaling. |