Add architecture and project plan documentation
ARCHITECTURE.md covers the system design: exod backend, single Kotlin desktop app (Obsidian-style), layered architecture, data flow, CAS blob store, cross-pillar integration, and key design decisions. PROJECT_PLAN.md defines six implementation phases from foundation through remote access, with concrete deliverables per phase. CLAUDE.md updated to reference both documents and reflect the single-app UI decision with unified search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
205
ARCHITECTURE.md
Normal file
205
ARCHITECTURE.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# Architecture
|
||||
|
||||
Technical reference for kExocortex — a personal knowledge management system that combines an **artifact repository** (source documents) with a **knowledge graph** (notes and ideas) into a unified, searchable exocortex.
|
||||
|
||||
Core formula: **artifacts + notes + graph structure = exocortex**
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Role | Technology |
|
||||
|------|-----------|
|
||||
| Backend server, CLI tools | Go |
|
||||
| Desktop application | Kotlin |
|
||||
| Metadata storage | SQLite |
|
||||
| Blob storage | Content-addressable store (SHA256) |
|
||||
| Client-server communication | gRPC / Protobuf |
|
||||
| Remote blob backup | Minio |
|
||||
| Secure remote access | Tailscale |
|
||||
|
||||
## System Components
|
||||
|
||||
```
|
||||
┌──────────────────┐ gRPC ┌──────────────────────────┐
|
||||
│ Kotlin Desktop │◄════════════════►│ │
|
||||
│ (all UI facets) │ │ │
|
||||
└──────────────────┘ │ exod │
|
||||
│ (Go daemon) │
|
||||
┌──────────────────┐ gRPC │ │
|
||||
│ CLI tools │◄════════════════►│ sole owner of all data │
|
||||
│ (Go binaries) │ │ │
|
||||
└──────────────────┘ └─────┬──────┬──────┬──────┘
|
||||
│ │ │
|
||||
┌─────────────┘ │ └─────────────┐
|
||||
│ │ │
|
||||
┌─────▼──────┐ ┌──────▼───────┐ ┌──────▼──────┐
|
||||
│ SQLite │ │ Local Blob │ │ Minio │
|
||||
│ Database │ │ Store (CAS) │ │ (remote) │
|
||||
└────────────┘ └──────────────┘ └─────────────┘
|
||||
|
||||
Remote access:
|
||||
┌────────┐ HTTPS ┌─────────────────────┐ Tailscale ┌──────┐
|
||||
│ Mobile │──────────►│ Reverse Proxy │────────────►│ exod │
|
||||
│ Device │ │ (TLS + basic auth) │ │ │
|
||||
└────────┘ └─────────────────────┘ └──────┘
|
||||
```
|
||||
|
||||
Three runtime components exist:
|
||||
|
||||
- **exod** — The Go backend daemon. Sole owner of the SQLite database and blob store. All reads and writes go through exod. No client accesses storage directly.
|
||||
- **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC.
|
||||
- **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC.
|
||||
|
||||
## Layered Architecture
|
||||
|
||||
### Layer 1: Storage
|
||||
|
||||
Two storage mechanisms, separated by purpose:
|
||||
|
||||
**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined. This includes artifact headers, citations, tags, categories, publisher info, snapshot records, blob registry entries, and knowledge graph facts. A single unified database is used (rather than split databases) so that tags and categories are shared across both pillars.
|
||||
|
||||
**Content-addressable blob store** stores the actual artifact content (PDFs, images, web snapshots, etc.) on the local filesystem. Files are addressed by the SHA256 hash of their contents, stored in a hierarchical directory layout. This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage.
|
||||
|
||||
Together, the database and blob store form a single logical unit that must stay consistent.
|
||||
|
||||
### Layer 2: Domain Model
|
||||
|
||||
Three Go packages implement the data model:
|
||||
|
||||
**`core`** — Shared types used by both pillars:
|
||||
- `Header` (ID, Type, Created, Modified, Categories, Tags, Meta)
|
||||
- `Metadata` (map of string keys to typed `Value` structs)
|
||||
- UUID generation
|
||||
|
||||
**`artifacts`** — The artifact repository pillar. Key relationship chain:
|
||||
|
||||
```
|
||||
Artifact ──► Snapshot(s) ──► Blob(s)
|
||||
│ │
|
||||
▼ ▼
|
||||
Citation Citation (can override parent)
|
||||
```
|
||||
|
||||
An Artifact has a type (Article, Book, URL, Paper, Video, Image, etc.), a history of Snapshots keyed by datetime, and a top-level Citation. Each Snapshot can have its own Citation that overrides or extends the artifact-level one (e.g., a specific edition of a book). Each Snapshot contains Blobs keyed by MIME type.
|
||||
|
||||
See `docs/KExocortex/Spec/Artifacts.md` for canonical type definitions.
|
||||
|
||||
**`kg`** — The knowledge graph pillar:
|
||||
- **Node** — An entity in the graph, containing Cells
|
||||
- **Cell** — A content unit within a note (markdown, code, etc.), inspired by Quiver's cell-based structure
|
||||
- **Fact** — An entity-attribute-value tuple with a transaction timestamp and retraction flag, based on the protobuf model in `docs/KExocortex/KnowledgeGraph/Tuple.md`
|
||||
|
||||
Nodes are conceptually `Node = Note | ArtifactLink` — they can be original analysis or references to artifacts.
|
||||
|
||||
### Layer 3: Service
|
||||
|
||||
The `exod` gRPC server is the exclusive gateway to all data:
|
||||
|
||||
- Manages transaction boundaries (begin, commit/rollback)
|
||||
- Handles blob lifecycle (hash content, write to CAS, register in SQLite, queue for Minio sync)
|
||||
- Runs the Minio sync queue for asynchronous backup replication
|
||||
- Exposes gRPC endpoints defined in `.proto` files for all CRUD operations on both pillars
|
||||
|
||||
### Layer 4: Presentation
|
||||
|
||||
A single Kotlin desktop application handles both artifact management and knowledge graph interaction, following the Obsidian model. CLI tools provide a scriptable alternative.
|
||||
|
||||
#### Desktop Application Layout
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ [Command Palette: Ctrl+Shift+A] [Search: Ctrl+F] │
|
||||
├──────────────┬──────────────────────────────────────────────┤
|
||||
│ │ │
|
||||
│ Sidebar │ Main Panel │
|
||||
│ │ │
|
||||
│ Tree/ │ Contextual view based on selection: │
|
||||
│ Outline │ • Note editor (cell-based) │
|
||||
│ View │ • Artifact detail (citation, snapshots) │
|
||||
│ │ • Search results │
|
||||
│ │ • Catalog (items needing attention) │
|
||||
│ │ │
|
||||
├──────────────┴──────────────────────────────────────────────┤
|
||||
│ [Graph View toggle] │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Sidebar** — Tree/outline view of the knowledge graph hierarchy as primary navigation. Artifacts appear under their linked nodes; unlinked artifacts appear in a dedicated section. Collapsible, like Obsidian's file explorer.
|
||||
|
||||
**Main panel** — Changes contextually:
|
||||
- **Note view**: Cell-based editor (markdown, code blocks). Associated artifacts listed inline. Dendron-style Ctrl+L for note creation.
|
||||
- **Artifact view**: Citation details, snapshot history, blob preview (PDF, HTML). Tag/category editing. Link to nodes.
|
||||
- **Search view**: Unified results from both pillars. Selector prefixes for precision: `artifact:`, `note:`, `cell:`, `tag:`, `author:`, `doi:`.
|
||||
- **Catalog view**: Surfaces untagged, uncategorized, or unlinked artifacts needing attention.
|
||||
|
||||
**Graph view** — Secondary visualization available as a toggle or separate pane, showing nodes and their connections (like Obsidian's graph view). Useful for exploration and discovering clusters.
|
||||
|
||||
**Command palette** — Ctrl+Shift+A (IntelliJ-style) for quick actions: create note, import artifact, search, switch views, manage tags.
|
||||
|
||||
#### CLI Tools
|
||||
|
||||
Go binaries connecting to exod via gRPC for automation, bulk operations, and scripting. Commands: `import`, `tag`, `cat`, `search`.
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Importing an Artifact
|
||||
|
||||
1. Client sends artifact metadata (citation, tags, categories) and blob data to exod via gRPC
|
||||
2. exod begins a database transaction
|
||||
3. Tags and categories are created if they don't exist (idempotent upsert)
|
||||
4. Publisher is resolved (lookup by name+address, create if missing)
|
||||
5. Citation is stored with publisher FK and author records
|
||||
6. Artifact header is stored with citation FK
|
||||
7. For each snapshot: store snapshot record, then for each blob: compute SHA256, write file to CAS directory, insert blob record
|
||||
8. History entries are recorded linking artifact to snapshots by datetime
|
||||
9. Transaction commits
|
||||
10. Blobs are queued for Minio sync
|
||||
|
||||
### Querying by Tag
|
||||
|
||||
1. Client sends a tag string to exod
|
||||
2. Tag name is resolved to its UUID via the `tags` table
|
||||
3. The `artifact_tags` junction table is queried for matching artifact IDs
|
||||
4. Full artifact headers are hydrated (citation, publisher, tags, categories, metadata)
|
||||
5. Results are returned; blob data is not fetched until explicitly requested
|
||||
|
||||
### Creating a Knowledge Graph Note
|
||||
|
||||
1. Client sends node metadata and cell contents
|
||||
2. exod creates a Node with a UUID
|
||||
3. Cells are stored with their content type (markdown, code, etc.)
|
||||
4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts
|
||||
5. Tags from the note content are cross-referenced with the shared tag pool
|
||||
|
||||
## Content-Addressable Store
|
||||
|
||||
- **Addressing**: SHA256 hash of blob contents, rendered as a hex string
|
||||
- **Directory layout**: Hash split into 4-character segments as nested directories (e.g., `a1b2c3d4...` → `a1b2/c3d4/.../a1b2c3d4...`)
|
||||
- **Deduplication**: Identical content from different snapshots shares the same blob — same hash, same file
|
||||
- **Registry**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash
|
||||
- **Backup**: Minio sync queue replicates blobs to remote S3-compatible storage asynchronously
|
||||
- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access
|
||||
|
||||
## Cross-Pillar Integration
|
||||
|
||||
The architectural core that makes kExocortex more than the sum of its parts:
|
||||
|
||||
- **Shared taxonomy**: Tags and categories exist in a single pool used by both artifacts and knowledge graph nodes. This enables cross-pillar queries: "show me everything tagged X."
|
||||
- **Node-to-artifact links**: Knowledge graph nodes can reference artifacts by ID, so the graph contains both original analysis and source material references.
|
||||
- **Shared metadata**: The polymorphic `metadata` table uses the owner's UUID as a foreign key, attaching key-value metadata to any object in either pillar.
|
||||
- **Cell-artifact bridging**: A Cell within a note can embed references to artifacts, linking prose analysis directly to source material.
|
||||
|
||||
## Network & Access
|
||||
|
||||
- **Local-first**: exod, the database, and the blob store all live on the local filesystem. Full functionality requires no network.
|
||||
- **Tailscale reverse proxy**: For remote/mobile access. TLS and HTTP basic auth terminate at the proxy, not at exod.
|
||||
- **Minio backup**: Blob replication to remote S3-compatible storage, managed by an async sync queue in exod. This is a backup/restore mechanism, not a primary access path.
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
| Decision | Alternative | Rationale |
|
||||
|----------|-------------|-----------|
|
||||
| Single unified SQLite database | Split databases per pillar | Shared tag/category pool, single transaction scope, simpler backup. exod resolves SQLite locking concerns. |
|
||||
| Content-addressable blob store | Store blobs in SQLite | Blobs can be arbitrarily large (PDFs, videos). CAS provides deduplication. SQLite isn't designed for large binary storage. |
|
||||
| gRPC / Protobuf | REST / JSON | Typed contracts, efficient binary serialization, bidirectional streaming for future use (e.g., upload progress). |
|
||||
| Kotlin desktop app | Web frontend | Desktop-native performance for large document collections. Offline-capable. No browser dependency. |
|
||||
| SQLite | PostgreSQL | Zero ops cost, single-file backup, embedded in server process. Single-user system doesn't need concurrent write scaling. |
|
||||
Reference in New Issue
Block a user