# Architecture Technical reference for kExocortex — a personal knowledge management system that combines an **artifact repository** (source documents) with a **knowledge graph** (notes and ideas) into a unified, searchable exocortex. Core formula: **artifacts + notes + graph structure = exocortex** ## Tech Stack | Role | Technology | |------|-----------| | Backend server, CLI tools | Go | | Desktop application | Kotlin | | Metadata storage | SQLite | | Blob storage | Content-addressable store (SHA256) | | Client-server communication | gRPC / Protobuf | | Remote blob backup | Minio | | Secure remote access | Tailscale | ## System Components ``` ┌──────────────────┐ gRPC ┌──────────────────────────┐ │ Kotlin Desktop │◄════════════════►│ │ │ (all UI facets) │ │ │ └──────────────────┘ │ exod │ │ (Go daemon) │ ┌──────────────────┐ gRPC │ │ │ CLI tools │◄════════════════►│ sole owner of all data │ │ (Go binaries) │ │ │ └──────────────────┘ └─────┬──────┬──────┬──────┘ │ │ │ ┌─────────────┘ │ └─────────────┐ │ │ │ ┌─────▼──────┐ ┌──────▼───────┐ ┌──────▼──────┐ │ SQLite │ │ Local Blob │ │ Minio │ │ Database │ │ Store (CAS) │ │ (remote) │ └────────────┘ └──────────────┘ └─────────────┘ Remote access: ┌────────┐ HTTPS ┌─────────────────────┐ Tailscale ┌──────┐ │ Mobile │──────────►│ Reverse Proxy │────────────►│ exod │ │ Device │ │ (TLS + basic auth) │ │ │ └────────┘ └─────────────────────┘ └──────┘ ``` Three runtime components exist: - **exod** — The Go backend daemon. Sole owner of the SQLite database and blob store. All reads and writes go through exod. No client accesses storage directly. - **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC. - **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC. ## Layered Architecture ### Layer 1: Storage Two storage mechanisms, separated by purpose: **SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined. This includes artifact headers, citations, tags, categories, publisher info, snapshot records, blob registry entries, and knowledge graph facts. A single unified database is used (rather than split databases) so that tags and categories are shared across both pillars. **Content-addressable blob store** stores the actual artifact content (PDFs, images, web snapshots, etc.) on the local filesystem. Files are addressed by the SHA256 hash of their contents, stored in a hierarchical directory layout. This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage. Together, the database and blob store form a single logical unit that must stay consistent. ### Layer 2: Domain Model Three Go packages implement the data model: **`core`** — Shared types used by both pillars: - `Header` (ID, Type, Created, Modified, Categories, Tags, Meta) - `Metadata` (map of string keys to typed `Value` structs) - UUID generation **`artifacts`** — The artifact repository pillar. Key relationship chain: ``` Artifact ──► Snapshot(s) ──► Blob(s) │ │ ▼ ▼ Citation Citation (can override parent) ``` An Artifact has a type (Article, Book, URL, Paper, Video, Image, etc.), a history of Snapshots keyed by datetime, and a top-level Citation. Each Snapshot can have its own Citation that overrides or extends the artifact-level one (e.g., a specific edition of a book). Each Snapshot contains Blobs keyed by MIME type. See `docs/KExocortex/Spec/Artifacts.md` for canonical type definitions. **`kg`** — The knowledge graph pillar: - **Node** — An entity in the graph, containing Cells - **Cell** — A content unit within a note (markdown, code, etc.), inspired by Quiver's cell-based structure - **Fact** — An entity-attribute-value tuple with a transaction timestamp and retraction flag, based on the protobuf model in `docs/KExocortex/KnowledgeGraph/Tuple.md` Nodes are conceptually `Node = Note | ArtifactLink` — they can be original analysis or references to artifacts. ### Layer 3: Service The `exod` gRPC server is the exclusive gateway to all data: - Manages transaction boundaries (begin, commit/rollback) - Handles blob lifecycle (hash content, write to CAS, register in SQLite, queue for Minio sync) - Runs the Minio sync queue for asynchronous backup replication - Exposes gRPC endpoints defined in `.proto` files for all CRUD operations on both pillars ### Layer 4: Presentation A single Kotlin desktop application handles both artifact management and knowledge graph interaction, following the Obsidian model. CLI tools provide a scriptable alternative. #### Desktop Application Layout ``` ┌─────────────────────────────────────────────────────────────┐ │ [Command Palette: Ctrl+Shift+A] [Search: Ctrl+F] │ ├──────────────┬──────────────────────────────────────────────┤ │ │ │ │ Sidebar │ Main Panel │ │ │ │ │ Tree/ │ Contextual view based on selection: │ │ Outline │ • Note editor (cell-based) │ │ View │ • Artifact detail (citation, snapshots) │ │ │ • Search results │ │ │ • Catalog (items needing attention) │ │ │ │ ├──────────────┴──────────────────────────────────────────────┤ │ [Graph View toggle] │ └─────────────────────────────────────────────────────────────┘ ``` **Sidebar** — Tree/outline view of the knowledge graph hierarchy as primary navigation. Artifacts appear under their linked nodes; unlinked artifacts appear in a dedicated section. Collapsible, like Obsidian's file explorer. **Main panel** — Changes contextually: - **Note view**: Cell-based editor (markdown, code blocks). Associated artifacts listed inline. Dendron-style Ctrl+L for note creation. - **Artifact view**: Citation details, snapshot history, blob preview (PDF, HTML). Tag/category editing. Link to nodes. - **Search view**: Unified results from both pillars. Selector prefixes for precision: `artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`. - **Catalog view**: Surfaces untagged, uncategorized, or unlinked artifacts needing attention. **Graph view** — Secondary visualization available as a toggle or separate pane, showing nodes and their connections (like Obsidian's graph view). Useful for exploration and discovering clusters. **Command palette** — Ctrl+Shift+A (IntelliJ-style) for quick actions: create note, import artifact, search, switch views, manage tags. #### CLI Tools Go binaries connecting to exod via gRPC for automation, bulk operations, and scripting. Commands: `import`, `tag`, `cat`, `search`. ## Data Flow ### Importing an Artifact 1. Client sends artifact metadata (citation, tags, categories) and blob data to exod via gRPC 2. exod begins a database transaction 3. Tags and categories are created if they don't exist (idempotent upsert) 4. Publisher is resolved (lookup by name+address, create if missing) 5. Citation is stored with publisher FK and author records 6. Artifact header is stored with citation FK 7. For each snapshot: store snapshot record, then for each blob: compute SHA256, write file to CAS directory, insert blob record 8. History entries are recorded linking artifact to snapshots by datetime 9. Transaction commits 10. Blobs are queued for Minio sync ### Querying by Tag 1. Client sends a tag string to exod 2. Tag name is resolved to its UUID via the `tags` table 3. The `artifact_tags` junction table is queried for matching artifact IDs 4. Full artifact headers are hydrated (citation, publisher, tags, categories, metadata) 5. Results are returned; blob data is not fetched until explicitly requested ### Creating a Knowledge Graph Note 1. Client sends node metadata and cell contents 2. exod creates a Node with a UUID 3. Cells are stored with their content type (markdown, code, etc.) 4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts 5. Tags from the note content are cross-referenced with the shared tag pool ## Content-Addressable Store - **Addressing**: SHA256 hash of blob contents, rendered as a hex string - **Directory layout**: Hash split into 4-character segments as nested directories (e.g., `a1b2c3d4...` → `a1b2/c3d4/.../a1b2c3d4...`) - **Deduplication**: Identical content from different snapshots shares the same blob — same hash, same file - **Registry**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash - **Backup**: Minio sync queue replicates blobs to remote S3-compatible storage asynchronously - **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access ## Cross-Pillar Integration The architectural core that makes kExocortex more than the sum of its parts: - **Shared taxonomy**: Tags and categories exist in a single pool used by both artifacts and knowledge graph nodes. This enables cross-pillar queries: "show me everything tagged X." - **Node-to-artifact links**: Knowledge graph nodes can reference artifacts by ID, so the graph contains both original analysis and source material references. - **Shared metadata**: The polymorphic `metadata` table uses the owner's UUID as a foreign key, attaching key-value metadata to any object in either pillar. - **Cell-artifact bridging**: A Cell within a note can embed references to artifacts, linking prose analysis directly to source material. ## Network & Access - **Local-first**: exod, the database, and the blob store all live on the local filesystem. Full functionality requires no network. - **Tailscale reverse proxy**: For remote/mobile access. TLS and HTTP basic auth terminate at the proxy, not at exod. - **Minio backup**: Blob replication to remote S3-compatible storage, managed by an async sync queue in exod. This is a backup/restore mechanism, not a primary access path. ## Key Design Decisions | Decision | Alternative | Rationale | |----------|-------------|-----------| | Single unified SQLite database | Split databases per pillar | Shared tag/category pool, single transaction scope, simpler backup. exod resolves SQLite locking concerns. | | Content-addressable blob store | Store blobs in SQLite | Blobs can be arbitrarily large (PDFs, videos). CAS provides deduplication. SQLite isn't designed for large binary storage. | | gRPC / Protobuf | REST / JSON | Typed contracts, efficient binary serialization, bidirectional streaming for future use (e.g., upload progress). | | Kotlin desktop app | Web frontend | Desktop-native performance for large document collections. Offline-capable. No browser dependency. | | SQLite | PostgreSQL | Zero ops cost, single-file backup, embedded in server process. Single-user system doesn't need concurrent write scaling. |