From 83f4b327f3890409cca8d284f93df4ad0c41d15e Mon Sep 17 00:00:00 2001 From: Kyle Isom Date: Fri, 20 Mar 2026 12:18:54 -0700 Subject: [PATCH] Add architecture and project plan documentation ARCHITECTURE.md covers the system design: exod backend, single Kotlin desktop app (Obsidian-style), layered architecture, data flow, CAS blob store, cross-pillar integration, and key design decisions. PROJECT_PLAN.md defines six implementation phases from foundation through remote access, with concrete deliverables per phase. CLAUDE.md updated to reference both documents and reflect the single-app UI decision with unified search. Co-Authored-By: Claude Opus 4.6 (1M context) --- ARCHITECTURE.md | 205 ++++++++++++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 4 +- PROJECT_PLAN.md | 140 +++++++++++++++++++++++++++++++++ 3 files changed, 347 insertions(+), 2 deletions(-) create mode 100644 ARCHITECTURE.md create mode 100644 PROJECT_PLAN.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..d675e06 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,205 @@ +# Architecture + +Technical reference for kExocortex — a personal knowledge management system that combines an **artifact repository** (source documents) with a **knowledge graph** (notes and ideas) into a unified, searchable exocortex. + +Core formula: **artifacts + notes + graph structure = exocortex** + +## Tech Stack + +| Role | Technology | +|------|-----------| +| Backend server, CLI tools | Go | +| Desktop application | Kotlin | +| Metadata storage | SQLite | +| Blob storage | Content-addressable store (SHA256) | +| Client-server communication | gRPC / Protobuf | +| Remote blob backup | Minio | +| Secure remote access | Tailscale | + +## System Components + +``` +┌──────────────────┐ gRPC ┌──────────────────────────┐ +│ Kotlin Desktop │◄════════════════►│ │ +│ (all UI facets) │ │ │ +└──────────────────┘ │ exod │ + │ (Go daemon) │ +┌──────────────────┐ gRPC │ │ +│ CLI tools │◄════════════════►│ sole owner of all data │ +│ (Go binaries) │ │ │ +└──────────────────┘ └─────┬──────┬──────┬──────┘ + │ │ │ + ┌─────────────┘ │ └─────────────┐ + │ │ │ + ┌─────▼──────┐ ┌──────▼───────┐ ┌──────▼──────┐ + │ SQLite │ │ Local Blob │ │ Minio │ + │ Database │ │ Store (CAS) │ │ (remote) │ + └────────────┘ └──────────────┘ └─────────────┘ + +Remote access: +┌────────┐ HTTPS ┌─────────────────────┐ Tailscale ┌──────┐ +│ Mobile │──────────►│ Reverse Proxy │────────────►│ exod │ +│ Device │ │ (TLS + basic auth) │ │ │ +└────────┘ └─────────────────────┘ └──────┘ +``` + +Three runtime components exist: + +- **exod** — The Go backend daemon. Sole owner of the SQLite database and blob store. All reads and writes go through exod. No client accesses storage directly. +- **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC. +- **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC. + +## Layered Architecture + +### Layer 1: Storage + +Two storage mechanisms, separated by purpose: + +**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined. This includes artifact headers, citations, tags, categories, publisher info, snapshot records, blob registry entries, and knowledge graph facts. A single unified database is used (rather than split databases) so that tags and categories are shared across both pillars. + +**Content-addressable blob store** stores the actual artifact content (PDFs, images, web snapshots, etc.) on the local filesystem. Files are addressed by the SHA256 hash of their contents, stored in a hierarchical directory layout. This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage. + +Together, the database and blob store form a single logical unit that must stay consistent. + +### Layer 2: Domain Model + +Three Go packages implement the data model: + +**`core`** — Shared types used by both pillars: +- `Header` (ID, Type, Created, Modified, Categories, Tags, Meta) +- `Metadata` (map of string keys to typed `Value` structs) +- UUID generation + +**`artifacts`** — The artifact repository pillar. Key relationship chain: + +``` +Artifact ──► Snapshot(s) ──► Blob(s) + │ │ + ▼ ▼ + Citation Citation (can override parent) +``` + +An Artifact has a type (Article, Book, URL, Paper, Video, Image, etc.), a history of Snapshots keyed by datetime, and a top-level Citation. Each Snapshot can have its own Citation that overrides or extends the artifact-level one (e.g., a specific edition of a book). Each Snapshot contains Blobs keyed by MIME type. + +See `docs/KExocortex/Spec/Artifacts.md` for canonical type definitions. + +**`kg`** — The knowledge graph pillar: +- **Node** — An entity in the graph, containing Cells +- **Cell** — A content unit within a note (markdown, code, etc.), inspired by Quiver's cell-based structure +- **Fact** — An entity-attribute-value tuple with a transaction timestamp and retraction flag, based on the protobuf model in `docs/KExocortex/KnowledgeGraph/Tuple.md` + +Nodes are conceptually `Node = Note | ArtifactLink` — they can be original analysis or references to artifacts. + +### Layer 3: Service + +The `exod` gRPC server is the exclusive gateway to all data: + +- Manages transaction boundaries (begin, commit/rollback) +- Handles blob lifecycle (hash content, write to CAS, register in SQLite, queue for Minio sync) +- Runs the Minio sync queue for asynchronous backup replication +- Exposes gRPC endpoints defined in `.proto` files for all CRUD operations on both pillars + +### Layer 4: Presentation + +A single Kotlin desktop application handles both artifact management and knowledge graph interaction, following the Obsidian model. CLI tools provide a scriptable alternative. + +#### Desktop Application Layout + +``` +┌─────────────────────────────────────────────────────────────┐ +│ [Command Palette: Ctrl+Shift+A] [Search: Ctrl+F] │ +├──────────────┬──────────────────────────────────────────────┤ +│ │ │ +│ Sidebar │ Main Panel │ +│ │ │ +│ Tree/ │ Contextual view based on selection: │ +│ Outline │ • Note editor (cell-based) │ +│ View │ • Artifact detail (citation, snapshots) │ +│ │ • Search results │ +│ │ • Catalog (items needing attention) │ +│ │ │ +├──────────────┴──────────────────────────────────────────────┤ +│ [Graph View toggle] │ +└─────────────────────────────────────────────────────────────┘ +``` + +**Sidebar** — Tree/outline view of the knowledge graph hierarchy as primary navigation. Artifacts appear under their linked nodes; unlinked artifacts appear in a dedicated section. Collapsible, like Obsidian's file explorer. + +**Main panel** — Changes contextually: +- **Note view**: Cell-based editor (markdown, code blocks). Associated artifacts listed inline. Dendron-style Ctrl+L for note creation. +- **Artifact view**: Citation details, snapshot history, blob preview (PDF, HTML). Tag/category editing. Link to nodes. +- **Search view**: Unified results from both pillars. Selector prefixes for precision: `artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`. +- **Catalog view**: Surfaces untagged, uncategorized, or unlinked artifacts needing attention. + +**Graph view** — Secondary visualization available as a toggle or separate pane, showing nodes and their connections (like Obsidian's graph view). Useful for exploration and discovering clusters. + +**Command palette** — Ctrl+Shift+A (IntelliJ-style) for quick actions: create note, import artifact, search, switch views, manage tags. + +#### CLI Tools + +Go binaries connecting to exod via gRPC for automation, bulk operations, and scripting. Commands: `import`, `tag`, `cat`, `search`. + +## Data Flow + +### Importing an Artifact + +1. Client sends artifact metadata (citation, tags, categories) and blob data to exod via gRPC +2. exod begins a database transaction +3. Tags and categories are created if they don't exist (idempotent upsert) +4. Publisher is resolved (lookup by name+address, create if missing) +5. Citation is stored with publisher FK and author records +6. Artifact header is stored with citation FK +7. For each snapshot: store snapshot record, then for each blob: compute SHA256, write file to CAS directory, insert blob record +8. History entries are recorded linking artifact to snapshots by datetime +9. Transaction commits +10. Blobs are queued for Minio sync + +### Querying by Tag + +1. Client sends a tag string to exod +2. Tag name is resolved to its UUID via the `tags` table +3. The `artifact_tags` junction table is queried for matching artifact IDs +4. Full artifact headers are hydrated (citation, publisher, tags, categories, metadata) +5. Results are returned; blob data is not fetched until explicitly requested + +### Creating a Knowledge Graph Note + +1. Client sends node metadata and cell contents +2. exod creates a Node with a UUID +3. Cells are stored with their content type (markdown, code, etc.) +4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts +5. Tags from the note content are cross-referenced with the shared tag pool + +## Content-Addressable Store + +- **Addressing**: SHA256 hash of blob contents, rendered as a hex string +- **Directory layout**: Hash split into 4-character segments as nested directories (e.g., `a1b2c3d4...` → `a1b2/c3d4/.../a1b2c3d4...`) +- **Deduplication**: Identical content from different snapshots shares the same blob — same hash, same file +- **Registry**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash +- **Backup**: Minio sync queue replicates blobs to remote S3-compatible storage asynchronously +- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access + +## Cross-Pillar Integration + +The architectural core that makes kExocortex more than the sum of its parts: + +- **Shared taxonomy**: Tags and categories exist in a single pool used by both artifacts and knowledge graph nodes. This enables cross-pillar queries: "show me everything tagged X." +- **Node-to-artifact links**: Knowledge graph nodes can reference artifacts by ID, so the graph contains both original analysis and source material references. +- **Shared metadata**: The polymorphic `metadata` table uses the owner's UUID as a foreign key, attaching key-value metadata to any object in either pillar. +- **Cell-artifact bridging**: A Cell within a note can embed references to artifacts, linking prose analysis directly to source material. + +## Network & Access + +- **Local-first**: exod, the database, and the blob store all live on the local filesystem. Full functionality requires no network. +- **Tailscale reverse proxy**: For remote/mobile access. TLS and HTTP basic auth terminate at the proxy, not at exod. +- **Minio backup**: Blob replication to remote S3-compatible storage, managed by an async sync queue in exod. This is a backup/restore mechanism, not a primary access path. + +## Key Design Decisions + +| Decision | Alternative | Rationale | +|----------|-------------|-----------| +| Single unified SQLite database | Split databases per pillar | Shared tag/category pool, single transaction scope, simpler backup. exod resolves SQLite locking concerns. | +| Content-addressable blob store | Store blobs in SQLite | Blobs can be arbitrarily large (PDFs, videos). CAS provides deduplication. SQLite isn't designed for large binary storage. | +| gRPC / Protobuf | REST / JSON | Typed contracts, efficient binary serialization, bidirectional streaming for future use (e.g., upload progress). | +| Kotlin desktop app | Web frontend | Desktop-native performance for large document collections. Offline-capable. No browser dependency. | +| SQLite | PostgreSQL | Zero ops cost, single-file backup, embedded in server process. Single-user system doesn't need concurrent write scaling. | diff --git a/CLAUDE.md b/CLAUDE.md index 06321a0..91052c0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co **kExocortex** is a personal knowledge management system — an "exocortex" for capturing, organizing, and retrieving knowledge. It combines two pillars: an **artifact repository** (for storing source documents like PDFs, papers, webpages) and a **knowledge graph** (for linking notes and ideas). -The project is in active design and early implementation. The design docs in `docs/` are the primary working material. +The project is in active design and early implementation. See `ARCHITECTURE.md` for the technical system design and `PROJECT_PLAN.md` for the phased implementation plan. ## Repository Structure @@ -38,7 +38,7 @@ The system design calls for: 3. Local blob store (content-addressable) 4. Remote Minio backup for blobs 5. Reverse-proxy frontend over Tailscale for remote/mobile access -6. Kotlin desktop apps covering four UI facets: query, exploration, presentation, and update +6. Single Kotlin desktop app (Obsidian-style layout) with tree sidebar, contextual main panel, graph view, and unified search with selector prefixes ## Git Remote diff --git a/PROJECT_PLAN.md b/PROJECT_PLAN.md new file mode 100644 index 0000000..0940f0e --- /dev/null +++ b/PROJECT_PLAN.md @@ -0,0 +1,140 @@ +# Project Plan + +Implementation plan for kExocortex, organized into phases with concrete deliverables. + +## Current State + +**What exists:** +- Comprehensive design documentation in `docs/KExocortex/` covering the artifact data model, knowledge graph design, system architecture, UI considerations, and taxonomy +- Three archived implementations in `ark/` (Go v1, Go v2, Java) that validated the artifact repository data model, SQLite schema, and content-addressable blob store +- A proven database schema (11 tables) and Go domain types for the artifact pillar +- Protobuf sketches for the knowledge graph EAV tuple model + +**What doesn't exist yet:** +- Active codebase (all code is archived) +- The `exod` gRPC server +- Knowledge graph implementation beyond stubs +- Kotlin desktop application +- Minio sync queue +- Protobuf/gRPC service definitions + +## Phase 1: Foundation + +Establish the Go project structure, shared types, and database infrastructure. + +**Deliverables:** +- Go module (`go.mod`) with project structure +- `core` package: `Header`, `Metadata`, `Value` types, UUID generation +- SQLite migration framework and initial schema (ported from `ark/go-v2/schema/artifacts.sql`) +- Database access layer: connection management, transaction helpers (`StartTX`/`EndTX` pattern) +- Configuration: paths for database, blob store, Minio endpoint + +**Key references:** +- `docs/KExocortex/Spec/Artifacts.md` — Header and Metadata type definitions +- `ark/go-v2/types/common/common.go` — Proven shared type implementations +- `ark/go-v2/types/artifacts/db.go` — Proven database access patterns +- `ark/go-v2/schema/artifacts.sql` — Proven schema + +## Phase 2: Artifact Repository + +Build the artifact pillar — the most mature and validated part of the design. + +**Deliverables:** +- `artifacts` package: `Artifact`, `Snapshot`, `Blob`, `Citation`, `Publisher` types with `Get`/`Store` methods +- Tag and category management (shared pool, idempotent upserts) +- Content-addressable blob store (SHA256 hashing, hierarchical directory layout, read/write) +- YAML import for bootstrapping from existing artifact files +- Protobuf message definitions for all artifact types +- gRPC service: create/get/update/delete artifacts, store/retrieve blobs, manage tags and categories + +**Key references:** +- `docs/KExocortex/Spec/Artifacts.md` — Canonical type definitions +- `ark/go-v2/types/artifacts/*.go` — Proven implementations of all artifact types +- `ark/go-v2/cmd/exo-repo/cmd/import.go` — Proven import flow + +## Phase 3: CLI Tools + +Build command-line tools that connect to exod via gRPC for scripting and administrative use. + +**Deliverables:** +- `exo` CLI binary using Cobra (or similar) +- Commands: `import` (YAML artifacts), `tag` (add/list/delete), `cat` (add/list/delete), `search` (by tag, category, title, DOI) +- `exod` server binary with startup, shutdown, and configuration + +**Key references:** +- `ark/go-v2/cmd/exo-repo/cmd/*.go` — Proven command structure (import, tags, cat) + +## Phase 4: Knowledge Graph + +Build the knowledge graph pillar — the less mature component requiring more design work. + +**Deliverables:** +- `kg` package: `Node`, `Cell`, `Fact` types +- Database schema additions for knowledge graph tables (nodes, cells, facts, graph edges) in the unified SQLite database +- EAV tuple storage with transaction timestamps and retraction support +- Node-to-artifact linking (cross-pillar references) +- Cell content types (markdown, code, etc.) +- gRPC service: create/get/update nodes, add cells, record facts, traverse graph +- CLI commands for node creation and graph queries + +**Key references:** +- `docs/KExocortex/KnowledgeGraph/Tuple.md` — EAV/Fact protobuf model +- `docs/KExocortex/KnowledgeGraph.md` — Graph structure design +- `docs/KExocortex/Taxonomy.md` — Note naming conventions (C2 wiki style) +- `docs/KExocortex/Elements.md` — Note and structure definitions +- `ark/go-v2/types/kg/` — Type stubs (Node, Cell) + +## Phase 5: Desktop Application + +Single Kotlin desktop app for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar, contextual main panel, graph visualization, unified search. + +**Deliverables (incremental):** + +1. **App shell and sidebar** — gRPC client connecting to exod. Tree/outline sidebar showing knowledge graph hierarchy and an unlinked-artifacts section. Basic navigation. +2. **Artifact views** — Artifact detail panel (citation, snapshot history, blob preview). Import flow (file or URL → citation form → tags/categories). Catalog view for untagged/unlinked artifacts needing attention. +3. **Note editor** — Cell-based editor (markdown, code blocks). Ctrl+L note creation. Inline display of associated artifacts. +4. **Unified search** — Single search bar across both pillars. Selector prefixes for precision (`artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`). Fuzzy matching for partial recall. +5. **Graph view** — Visual node graph (toggle or separate pane, Obsidian-style). Exploration by traversing connections and discovering clusters. +6. **Command palette** — Ctrl+Shift+A for quick actions: create note, import artifact, search, switch views, manage tags. +7. **Presentation/export** — Export curated notes with associated artifacts to HTML or PDF. + +**Key references:** +- `docs/KExocortex/UI.md` — Interaction patterns to adopt (IntelliJ action menu, Dendron Ctrl+L) +- `docs/KExocortex/Elements.md` — Interface definitions (query, exploration, presentation, update) +- `docs/KExocortex/About.md` — Litmus test: Camerata article retrieval with readable snapshot +- `docs/KExocortex/Taxonomy.md` — C2 wiki style node naming for sidebar hierarchy + +## Phase 6: Remote Access & Backup + +Enable remote capture and blob backup. + +**Deliverables:** +- Minio sync queue in exod: async blob replication, retry on failure, restore from remote +- Tailscale reverse proxy configuration with TLS and HTTP basic auth +- Quick-capture endpoint: accept URL or document from mobile, stash in artifact repository for later categorization +- Cataloging view: list artifacts needing tags or node attachment + +**Key references:** +- `docs/KExocortex/Spec.md` — Remote access architecture, mobile reading use case +- `docs/KExocortex/RDD/2022/02/23.md` — Original web server goal for URL/PDF stashing +- `docs/KExocortex/Agents.md` — Future agent integration via Tailscale + +## Phase Dependencies + +``` +Phase 1: Foundation + │ + ▼ +Phase 2: Artifact Repository ──► Phase 3: CLI Tools + │ + ▼ +Phase 4: Knowledge Graph + │ + ▼ +Phase 5: Desktop Application + │ + ▼ +Phase 6: Remote Access & Backup +``` + +Phases 2 and 3 can overlap — CLI commands can be built as gRPC endpoints come online. Phase 5 can begin its Update facet once Phase 2 is complete, with remaining facets built as Phase 4 delivers.