From 98990c6d7649ffc5eb5cc2f948bbca21028b915c Mon Sep 17 00:00:00 2001 From: Kyle Isom Date: Sat, 21 Mar 2026 09:31:27 -0700 Subject: [PATCH] Make ARCHITECTURE.md standalone with inlined schemas and types Inline all data type definitions (Go structs, protobuf messages), the full SQLite schema (11 tables), CAS directory layout, and the dbObject interface directly into ARCHITECTURE.md so it is self-contained and does not depend on cross-references to docs/. Co-Authored-By: Claude Opus 4.6 (1M context) --- ARCHITECTURE.md | 330 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 300 insertions(+), 30 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index d675e06..57e4a75 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -49,28 +49,38 @@ Three runtime components exist: - **Kotlin desktop app** — A single application for both artifact management and knowledge graph interaction. Obsidian-style layout: tree/outline sidebar for navigation, contextual main panel, graph visualization, and unified search with selector prefixes. Connects to exod via gRPC. - **CLI tools** — Go binaries for scripting, bulk operations, and administrative tasks. Also connect via gRPC. -## Layered Architecture +## Data Model -### Layer 1: Storage +### Shared Types -Two storage mechanisms, separated by purpose: +Common to all objects across both pillars. -**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined. This includes artifact headers, citations, tags, categories, publisher info, snapshot records, blob registry entries, and knowledge graph facts. A single unified database is used (rather than split databases) so that tags and categories are shared across both pillars. +```go +// Header is attached to every persistent object. +type Header struct { + ID string // UUID + Type ObjectType + Created int64 + Modified int64 + Categories []string + Tags []string + Meta Metadata +} -**Content-addressable blob store** stores the actual artifact content (PDFs, images, web snapshots, etc.) on the local filesystem. Files are addressed by the SHA256 hash of their contents, stored in a hierarchical directory layout. This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage. +// Metadata is a flexible key-value store for arbitrary attributes. +type Value struct { + Contents string + Type string // e.g. "string", "int", "unspecified" +} -Together, the database and blob store form a single logical unit that must stay consistent. +type Metadata map[string]Value +``` -### Layer 2: Domain Model +All timestamps are UTC-encoded and must support dates prior to Unix epoch 0 (e.g., publication date of a historical text). Clients convert local time to UTC before sending to the server. -Three Go packages implement the data model: +### Artifact Types -**`core`** — Shared types used by both pillars: -- `Header` (ID, Type, Created, Modified, Categories, Tags, Meta) -- `Metadata` (map of string keys to typed `Value` structs) -- UUID generation - -**`artifacts`** — The artifact repository pillar. Key relationship chain: +An artifact is a source of knowledge — a PDF, a book, a webpage, a paper. Artifacts have versioned snapshots, each containing one or more blobs in different formats. ``` Artifact ──► Snapshot(s) ──► Blob(s) @@ -79,16 +89,285 @@ Artifact ──► Snapshot(s) ──► Blob(s) Citation Citation (can override parent) ``` -An Artifact has a type (Article, Book, URL, Paper, Video, Image, etc.), a history of Snapshots keyed by datetime, and a top-level Citation. Each Snapshot can have its own Citation that overrides or extends the artifact-level one (e.g., a specific edition of a book). Each Snapshot contains Blobs keyed by MIME type. +```go +// Artifact is the top-level container for a knowledge source. +type Artifact struct { + ID string + Type ArtifactType // see enumeration below + Latest time.Time // most recent Snapshot.DateTime + History map[time.Time]*Snapshot +} -See `docs/KExocortex/Spec/Artifacts.md` for canonical type definitions. +// ArtifactType enumeration: +// Unknown, Custom, Article, Book, URL, Paper, Video, Image +// If Type is "Custom", Header.Meta must contain an "ArtifactType" entry. -**`kg`** — The knowledge graph pillar: -- **Node** — An entity in the graph, containing Cells -- **Cell** — A content unit within a note (markdown, code, etc.), inspired by Quiver's cell-based structure -- **Fact** — An entity-attribute-value tuple with a transaction timestamp and retraction flag, based on the protobuf model in `docs/KExocortex/KnowledgeGraph/Tuple.md` +// Snapshot represents content at a specific point in time or in a specific format. +// A website might have snapshots for different scrape dates; a book might have +// snapshots for different editions or formats (PDF and EPUB). +type Snapshot struct { + Header Header + ArtifactID string + Stored time.Time // when this snapshot was stored + DateTime time.Time // the time this snapshot represents + Citation *Citation // can override the artifact-level citation + Blobs map[MIME]*Blob // content keyed by MIME type +} +// MIME parameters can distinguish variants: "application/pdf; format=screen" -Nodes are conceptually `Node = Note | ArtifactLink` — they can be original analysis or references to artifacts. +// Blob is a piece of content in the content-addressable store. +type Blob struct { + ID string // SHA256 hash of contents + Format string // MIME type + Body io.ReadCloser +} + +// Citation holds bibliographic information. Nothing is strictly required. +// A citation occurs at the artifact level, but snapshots can override specific +// fields (e.g., a different edition's ISBN). +type Citation struct { + Header Header + DOI string + Title string + Year int + Published time.Time + Authors []string + Publisher *Publisher + Source string // URL or origin + Abstract string +} + +type Publisher struct { + Header Header + Name string + Address string +} +``` + +### Knowledge Graph Types + +The knowledge graph stores notes as nodes in a directed graph. Each node contains cells (content blocks) and is connected to other nodes and artifacts via facts. + +```go +// Node is an entity in the knowledge graph. +// Conceptually: Node = Note | ArtifactLink +type Node struct { + Header Header + Parent string // parent node ID (C2 wiki style hierarchy) + Children []string // child node IDs +} + +// Cell is a content unit within a note. Inspired by Quiver's cell-based +// structure — a note is composed of multiple cells of different types. +type Cell struct { + Header Header + NodeID string + Contents []byte + Type string // "markdown", "code", etc. +} +``` + +Facts record relationships using an entity-attribute-value model with transactional history: + +```protobuf +message Name { + string id = 1; // UUID + string common = 2; // human-readable name +} + +message Attribute { + Name name = 1; + Value value = 2; +} + +message Transaction { + int64 timestamp = 1; +} + +message Fact { + Name entity = 1; + Attribute attribute = 2; + Value value = 3; + Transaction transaction = 4; + bool retraction = 5; // true = this fact is being retracted +} +``` + +A Fact with `retraction = true` marks a previous fact as no longer valid without deleting history. The transaction timestamp records when the fact was asserted or retracted. + +## Database Schema + +Single unified SQLite database. Tags and categories are shared across both pillars — this is the primary reason for a unified database rather than one per pillar. + +### Shared Infrastructure + +```sql +-- Polymorphic key-value metadata. The id column references any object's UUID. +CREATE TABLE metadata +( + id TEXT NOT NULL, -- owner UUID + mkey TEXT NOT NULL, + contents TEXT NOT NULL, + type TEXT NOT NULL, + PRIMARY KEY (mkey, contents, type), + UNIQUE (id, mkey) +); +CREATE INDEX idx_metadata_id ON metadata (id); + +-- Shared tag pool (used by both artifacts and knowledge graph nodes). +CREATE TABLE tags +( + id TEXT NOT NULL PRIMARY KEY, -- UUID + tag TEXT NOT NULL UNIQUE +); + +-- Shared category pool. +CREATE TABLE categories +( + id TEXT NOT NULL PRIMARY KEY, -- UUID + category TEXT NOT NULL UNIQUE +); +``` + +### Bibliographic + +```sql +CREATE TABLE publishers +( + id TEXT UNIQUE NOT NULL PRIMARY KEY, + name TEXT NOT NULL, + address TEXT, + UNIQUE (name, address) +); + +CREATE TABLE citations +( + id TEXT PRIMARY KEY, + doi TEXT, + title TEXT NOT NULL, + year INTEGER NOT NULL, + published TEXT NOT NULL, -- ISO 8601 UTC + publisher TEXT NOT NULL, + source TEXT NOT NULL, + abstract TEXT, + FOREIGN KEY (publisher) REFERENCES publishers (id) +); +CREATE INDEX idx_citations_doi ON citations (id, doi); + +-- Many-to-one: multiple authors per citation. +CREATE TABLE authors +( + citation_id TEXT NOT NULL, + author_name TEXT NOT NULL, + FOREIGN KEY (citation_id) REFERENCES citations (id) +); +``` + +### Artifact Repository + +```sql +CREATE TABLE artifacts +( + id TEXT PRIMARY KEY, + type TEXT NOT NULL, -- ArtifactType enumeration + citation_id TEXT NOT NULL, + latest TEXT NOT NULL, -- ISO 8601 UTC (most recent snapshot) + FOREIGN KEY (citation_id) REFERENCES citations (id) +); + +-- Many-to-many junction tables for classification. +CREATE TABLE artifact_tags +( + artifact_id TEXT NOT NULL, + tag_id TEXT NOT NULL, + FOREIGN KEY (artifact_id) REFERENCES artifacts (id), + FOREIGN KEY (tag_id) REFERENCES tags (id) +); + +CREATE TABLE artifact_categories +( + artifact_id TEXT NOT NULL, + category_id TEXT NOT NULL, + FOREIGN KEY (artifact_id) REFERENCES artifacts (id), + FOREIGN KEY (category_id) REFERENCES categories (id) +); + +-- Temporal index linking artifacts to snapshots by datetime. +CREATE TABLE artifacts_history +( + artifact_id TEXT NOT NULL, + snapshot_id TEXT NOT NULL UNIQUE, + datetime TEXT NOT NULL, + PRIMARY KEY (artifact_id, datetime), + FOREIGN KEY (artifact_id) REFERENCES artifacts (id) +); + +-- Snapshot records with storage and content timestamps. +CREATE TABLE artifact_snapshots +( + artifact_id TEXT NOT NULL, + id TEXT UNIQUE PRIMARY KEY, + stored_at INTEGER NOT NULL, -- Unix epoch (when stored) + datetime TEXT NOT NULL, -- ISO 8601 UTC (what time this represents) + citation_id TEXT NOT NULL, + source TEXT NOT NULL, + FOREIGN KEY (artifact_id) REFERENCES artifacts (id), + FOREIGN KEY (id) REFERENCES artifacts_history (snapshot_id) +); + +-- Blob registry. Actual content lives in the CAS on disk. +CREATE TABLE blobs +( + snapshot_id TEXT NOT NULL, + id TEXT NOT NULL UNIQUE PRIMARY KEY, -- SHA256 hash + format TEXT NOT NULL, -- MIME type + FOREIGN KEY (snapshot_id) REFERENCES artifact_snapshots (id) +); +``` + +### Knowledge Graph (to be implemented) + +Tables for nodes, cells, facts, and graph edges will be added to the same database. They will reuse the `tags`, `categories`, and `metadata` tables via the shared UUID-based foreign key pattern. + +## Content-Addressable Store + +Blob content is stored on the local filesystem, addressed by SHA256 hash. + +- **Base path**: `$HOME/exo/blobs/` +- **Directory layout**: The hex hash is split into 4-character segments as nested directories. For example, hash `a1b2c3d4e5f67890...` is stored at `a1b2/c3d4/e5f6/7890/.../a1b2c3d4e5f67890...` +- **Deduplication**: Identical content from different snapshots shares the same file (same hash = same path) +- **Registration**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash. The hash doubles as both the blob's database ID and its filesystem path key. +- **Backup**: A sync queue in exod replicates blobs to a remote Minio (S3-compatible) server asynchronously +- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access + +## Layered Architecture + +### Layer 1: Storage + +Two storage mechanisms, separated by purpose: + +**SQLite database** stores all metadata — everything that needs to be queried, filtered, or joined (see schema above). A single unified database is used so that tags and categories are shared across both pillars. + +**Content-addressable blob store** stores actual artifact content on the local filesystem (see CAS section above). This separation exists because blobs are large, opaque, and benefit from deduplication, while SQLite is not suited for large binary storage. + +Together, the database and blob store form a single logical unit that must stay consistent. + +### Layer 2: Domain Model + +Three Go packages implement the data model: + +- **`core`** — Shared types: `Header`, `Metadata`, `Value`, UUID generation +- **`artifacts`** — Artifact repository: `Artifact`, `Snapshot`, `Blob`, `Citation`, `Publisher`, tag/category management +- **`kg`** — Knowledge graph: `Node`, `Cell`, `Fact` + +All persistent types implement the `dbObject` interface: + +```go +type dbObject interface { + Get(ctx context.Context, tx *sql.Tx) error + Store(ctx context.Context, tx *sql.Tx) error +} +``` ### Layer 3: Service @@ -170,15 +449,6 @@ Go binaries connecting to exod via gRPC for automation, bulk operations, and scr 4. Facts are recorded as EAV tuples linking the node to attributes, other nodes, and artifacts 5. Tags from the note content are cross-referenced with the shared tag pool -## Content-Addressable Store - -- **Addressing**: SHA256 hash of blob contents, rendered as a hex string -- **Directory layout**: Hash split into 4-character segments as nested directories (e.g., `a1b2c3d4...` → `a1b2/c3d4/.../a1b2c3d4...`) -- **Deduplication**: Identical content from different snapshots shares the same blob — same hash, same file -- **Registry**: The `blobs` table in SQLite stores `(snapshot_id, blob_id, format)` where `blob_id` is the SHA256 hash -- **Backup**: Minio sync queue replicates blobs to remote S3-compatible storage asynchronously -- **Retrieval**: An optional HTTP endpoint (`GET /artifacts/blob/{id}`) may be added for direct blob access - ## Cross-Pillar Integration The architectural core that makes kExocortex more than the sum of its parts: