Initial import.

2026-03-25 22:25:44 -07:00
commit 168ceb2c07
6 changed files with 57369 additions and 0 deletions
--- a/docs/metacircular.md
+++ b/docs/metacircular.md
@@ -0,0 +1,927 @@
+# Metacircular Infrastructure
+
+## Background
+
+Metacircular Dynamics is a personal infrastructure platform. The name comes
+from the tradition of metacircular evaluators in Lisp — a system defined in
+terms of itself — by way of SICP and Common Lisp projects that preceded this
+work. The infrastructure is metacircular in the same sense: the platform
+manages, secures, and hosts its own services.
+
+The goal is sovereign infrastructure. Every component is self-hosted, every
+dependency is controlled, and the entire stack is operable by one person. There
+are no cloud provider dependencies, no third-party auth providers, no external
+databases. When a Metacircular node boots, it connects to Metacircular services
+for identity, certificates, container images, and workload scheduling.
+
+All services are written in Go and follow a shared set of engineering standards
+(see `engineering-standards.md`). The platform is designed for a small number of
+machines — a personal homelab or a handful of VPSes — not for hyperscale.
+
+## Philosophy
+
+**Sovereignty.** You own the whole stack. Identity, certificates, secrets,
+container images, DNS, networking — all self-hosted. No SaaS dependency means
+no vendor lock-in, no surprise deprecations, and no trust delegation to third
+parties.
+
+**Simplicity over sophistication.** SQLite over Postgres. Stdlib `testing` over
+test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments
+over microservice orchestrators. The right tool is the simplest one that solves
+the problem without creating a new one.
+
+**Consistency as leverage.** Every service follows identical patterns: the same
+directory layout, the same Makefile targets, the same config format, the same
+auth integration, the same deployment model. Knowledge of one service transfers
+instantly to all others. A new service can be stood up by copying the skeleton.
+
+**Security as structure.** Security is not a feature bolted on after the fact.
+Default deny is the starting posture. TLS 1.3 is the minimum, not a goal.
+Interceptor maps make "forgot to add auth" a visible, reviewable omission
+rather than a silent runtime failure. Secrets are encrypted at rest behind a
+seal/unseal barrier. Every service delegates identity to a single root of
+trust.
+
+**Design before code.** The architecture document is written before
+implementation begins. It is the spec, not the afterthought. When the code and
+the spec disagree, one of them has a bug.
+
+## High-Level Overview
+
+Metacircular infrastructure is built from six core components, plus a shared
+standard library (**MCDSL**) that provides the common patterns all services
+depend on (auth integration, database setup, config loading, TLS server
+bootstrapping, CSRF, snapshots):
+
+- **MCIAS** — Identity and access. The root of trust for all other services.
+  Handles authentication, token issuance, role management, and login policy
+  enforcement. Every other component delegates auth here.
+
+- **Metacrypt** — Cryptographic services. PKI/CA, SSH CA, transit encryption,
+  and encrypted secret storage behind a Vault-inspired seal/unseal barrier.
+  Issues the TLS certificates that every other service depends on.
+
+- **MCR** — Container registry. OCI-compliant image storage. MCP directs nodes
+  to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.
+
+- **MCNS** — Networking. DNS and address management for the platform.
+
+- **MCP** — Control plane. The orchestrator. A master/agent architecture that
+  manages workload scheduling, container lifecycle, service registry, data
+  transfer, and node state across the platform.
+
+- **MC-Proxy** — Node ingress. A TLS proxy and router that sits on every node,
+  accepts outside connections, and routes them to the correct service — either
+  as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.
+
+These components form a dependency graph rooted at MCIAS:
+
+```
+MCIAS (standalone — the root of trust)
+  ├── Metacrypt (uses MCIAS for auth; provides certs to all services)
+  ├── MCR (uses MCIAS for auth; stores images pulled by MCP)
+  ├── MCNS (uses MCIAS for auth; provides DNS for the platform)
+  ├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
+  └── MC-Proxy (pre-auth; routes traffic to services behind it)
+```
+
+### The Node Model
+
+The unit of deployment is the **MC Node** — a machine (physical or virtual)
+that participates in the Metacircular platform.
+
+```
+                     ┌──────────────────┐    ┌──────────────┐
+                     │  System / Core   │    │     MCP      │
+                     │  Infrastructure  │    │    Master     │
+                     │  (e.g. MCIAS)    │    │              │
+                     └────────┬─────────┘    └──────┬───────┘
+                              │                     │ C2
+                              │                     │
+    Outside     ┌─────────────▼─────────────────────▼──────────┐
+    Client ────▶│                   MC Node                     │
+                │                                               │
+                │   ┌───────────┐                               │
+                │   │ MC-Proxy  │──┬──────┬──────┐              │
+                │   └───────────┘  │      │      │              │
+                │              ┌───▼┐  ┌──▼─┐  ┌─▼──┐  ┌─────┐ │
+                │              │ α  │  │ β  │  │ γ  │  │ MCP │ │
+                │              └────┘  └────┘  └────┘  │Slave│ │
+                │                                      └──┬──┘ │
+                │                                    ┌────▼───┐│
+                │                                    │Docker/ ││
+                │                                    │etc.    ││
+                │                                    └────────┘│
+                └──────────────────────────────────────────────┘
+```
+
+Outside clients connect to **MC-Proxy**, which inspects the TLS SNI hostname
+and routes to the correct service (α, β, γ) — either as a raw TCP relay or
+via TLS-terminating HTTP/2 reverse proxy, per-route. The **MCP Agent** on each
+node receives C2 commands from the **MCP Master** (running on the operator's
+workstation) and manages local container lifecycle via the container runtime.
+Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any
+other workload.
+
+### The Network Model
+
+Metacircular nodes are connected via an **encrypted overlay network** — a
+self-managed WireGuard mesh, Tailscale, or similar. No component has a hard
+dependency on a specific overlay implementation; the platform requires only
+that nodes can reach each other over encrypted links.
+
+```
+                  Public Internet
+                        │
+              ┌─────────▼──────────┐
+              │  Edge MC-Proxy     │    VPS (public IP)
+              │  :443              │
+              └─────────┬──────────┘
+                        │ PROXY protocol v2
+              ┌─────────▼──────────────────────────────────┐
+              │         Encrypted Overlay (e.g. WireGuard) │
+              │                                            │
+  ┌───────────┴──┐   ┌──────────┐   ┌──────────┐   ┌──────┴─────┐
+  │ Origin       │   │  Node B  │   │  Node C  │   │  Operator  │
+  │ MC-Proxy     │   │  (MCP    │   │          │   │  Workstation│
+  │ + services   │   │  agent)  │   │  (MCP    │   │  (MCP      │
+  │ (MCP agent)  │   │          │   │  agent)  │   │  Master)   │
+  └──────────────┘   └──────────┘   └──────────┘   └────────────┘
+```
+
+**External traffic** flows from the internet through an edge MC-Proxy (on a
+public VPS), which forwards via PROXY protocol over the overlay to an origin
+MC-Proxy on the private network. The overlay preserves the real client IP
+across the hop.
+
+**Internal traffic** (MCP C2, inter-service communication, MCNS DNS) flows
+directly over the overlay. MCP's C2 channel is gRPC over whatever link exists
+between master and agent — the overlay provides the transport.
+
+The overlay network itself is a candidate for future Metacircular management
+(a self-hosted WireGuard mesh manager), consistent with the sovereignty
+principle of minimizing third-party dependencies.
+
+---
+
+## System Catalog
+
+### MCIAS — Metacircular Identity and Access Service
+
+MCIAS is the root of trust for the entire platform. Every other service
+delegates authentication to it; no service maintains its own user database.
+
+**What it provides:**
+
+- **Authentication.** Username/password with optional TOTP and FIDO2/WebAuthn.
+  Credentials are verified by MCIAS and a signed JWT bearer token is returned.
+  Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of
+  the token).
+
+- **Role-based access.** Three roles — `admin` (full access, policy bypass),
+  `user` (policy-governed), `guest` (service-dependent restrictions). Admin
+  detection comes solely from the MCIAS `admin` role; services never promote
+  users locally.
+
+- **Account types.** Human accounts (interactive users) and system accounts
+  (service-to-service). Both authenticate the same way; system accounts enable
+  automated workflows.
+
+- **Login policy.** Priority-based ACL rules that control who can log into
+  which services. Rules can target roles, account types, service names, and
+  tags. This allows operators to restrict access per-service (e.g., deny
+  `guest` from services tagged `env:restricted`) without changing the
+  services themselves.
+
+- **Token lifecycle.** Issuance, validation, renewal, and revocation.
+  Ed25519-signed JWTs. Short expiry with renewal support.
+
+**How other services integrate:** Every service includes an `[mcias]` config
+section with the MCIAS server URL, a `service_name`, and optional `tags`. At
+login time, the service forwards credentials to MCIAS along with this context.
+MCIAS evaluates login policy against the service context, verifies credentials,
+and returns a bearer token. The MCIAS Go client library
+(`git.wntrmute.dev/kyle/mcias/clients/go`) handles this flow.
+
+**Status:** Implemented. v1.0.0 complete.
+
+---
+
+### Metacrypt — Cryptographic Service Engine
+
+Metacrypt provides cryptographic resources to the platform through a modular
+engine architecture, backed by an encrypted storage barrier inspired by
+HashiCorp Vault.
+
+**What it provides:**
+
+- **PKI / Certificate Authority.** X.509 certificate issuance. Root and
+  intermediate CAs, certificate signing, CRL management, ACME protocol
+  support. This is how every service in the platform gets its TLS
+  certificates.
+
+- **SSH CA.** (Planned.) SSH certificate signing for host and user
+  certificates, replacing static SSH key management.
+
+- **Transit encryption.** (Planned.) Encrypt and decrypt data without exposing
+  keys to the caller. Envelope encryption for services that need to protect
+  data at rest without managing their own key material.
+
+- **User-to-user encryption.** (Planned.) End-to-end encryption between users,
+  with key management handled by Metacrypt.
+
+**Seal/unseal model:** Metacrypt starts sealed. An operator provides a password
+which derives (via Argon2id) a key-wrapping key, which decrypts the master
+encryption key (MEK), which in turn unwraps per-engine data encryption keys
+(DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise
+of one engine's key does not expose another's data.
+
+```
+Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs
+```
+
+**Engine architecture:** Engines are pluggable providers that register with a
+central registry. Each engine mount has a type, a name, its own DEK, and its
+own configuration. The engine interface handles initialization, seal/unseal
+lifecycle, and request routing. New engine types plug in without modifying the
+core.
+
+**Policy:** Fine-grained ACL rules control which users can perform which
+operations on which engine mounts. Priority-based evaluation, default deny,
+admin bypass. See Metacrypt's `POLICY.md` for the full model.
+
+**Status:** Implemented. CA engine complete with ACME support. SSH CA, transit,
+and user-to-user engines planned.
+
+---
+
+### MCR — Metacircular Container Registry
+
+MCR is an OCI Distribution Spec-compliant container registry. It stores and
+serves the container images that MCP deploys across the platform.
+
+**What it provides:**
+
+- **OCI-compliant image storage.** Pull, push, tag, and delete container
+  images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite,
+  blobs on the filesystem.
+
+- **Authenticated access.** No anonymous access. MCR uses the OCI token
+  authentication flow: clients hit `/v2/`, receive a 401 with a token
+  endpoint, authenticate via MCIAS, and use the returned JWT for subsequent
+  requests.
+
+- **Policy-controlled push/pull.** Fine-grained ACL rules govern who can push
+  to or pull from which repositories. Integrated with MCIAS roles.
+
+- **Garbage collection.** Unreferenced blobs are cleaned up via the admin CLI
+  (`mcrctl`).
+
+**How it fits in:** MCP directs nodes to pull images from MCR. When a workload
+is scheduled, MCP tells the node's agent which image to pull and where to get
+it. MCR sits behind an MC-Proxy instance for TLS routing.
+
+**Status:** Implemented. Phase 12 (web UI) complete.
+
+---
+
+### MC-Proxy — TLS Proxy and Router
+
+MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections,
+extracts the SNI hostname, and routes to the correct backend. Each route is
+independently configured as either **L4 passthrough** (raw TCP relay, no TLS
+termination) or **L7 terminating** (terminates TLS, reverse proxies HTTP/2 and
+HTTP/1.1 including gRPC). Both modes coexist on the same listener.
+
+**What it provides:**
+
+- **SNI-based routing.** A route table maps hostnames to backend addresses.
+  Exact match, case-insensitive. Multiple listeners can bind different ports,
+  each with its own route table, all sharing the same global firewall.
+
+- **Dual-mode proxying.** L4 routes relay raw TCP — backends see the original
+  TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy
+  and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS),
+  with header injection (`X-Forwarded-For`, `X-Real-IP`), gRPC streaming
+  support, and trailer forwarding.
+
+- **Global firewall.** Every connection is evaluated before routing: per-IP
+  rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2).
+  Blocked connections get a TCP RST — no error messages, no TLS alerts.
+
+- **PROXY protocol.** Listeners can accept v1/v2 headers from upstream proxies
+  to learn the real client IP. Routes can send v2 headers to downstream
+  backends. This enables multi-hop deployments — a public edge MC-Proxy on a
+  VPS forwarding over the encrypted overlay to a private origin MC-Proxy —
+  while preserving the real client IP for firewall evaluation and logging.
+
+- **Runtime management.** Routes and firewall rules can be updated at runtime
+  via a gRPC admin API on a Unix domain socket (filesystem permissions for
+  access control, no network exposure). State is persisted to SQLite with
+  write-through semantics.
+
+**How it fits in:** MC-Proxy is pre-auth infrastructure. It sits in front of
+everything on a node. Outside clients connect to MC-Proxy on well-known ports
+(443, 8443, etc.) and MC-Proxy routes to the correct backend based on the
+hostname the client is trying to reach. A typical production deployment uses
+two instances — an edge proxy on a public VPS and an origin proxy on the
+private network, connected over the overlay with PROXY protocol preserving
+client IPs across the hop.
+
+**Status:** Implemented.
+
+---
+
+### MCNS — Metacircular Networking Service
+
+MCNS provides DNS for the platform. It manages two internal zones and serves
+as the name resolution layer for the Metacircular network. Service discovery
+(which services run where) is owned by MCP; MCNS translates those assignments
+into DNS records.
+
+**What it will provide:**
+
+- **Internal DNS.** MCNS is authoritative for the internal zones of the
+  Metacircular network. Three zones serve different purposes:
+
+  | Zone | Example | Purpose |
+  |------|---------|---------|
+  | `*.metacircular.net` | `metacrypt.metacircular.net` | External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy. |
+  | `*.mcp.metacircular.net` | `vade.mcp.metacircular.net` | Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs). |
+  | `*.svc.mcp.metacircular.net` | `metacrypt.svc.mcp.metacircular.net` | Internal service addresses. Maps service names to the node and port where they currently run. |
+
+  The `*.mcp.metacircular.net` and `*.svc.mcp.metacircular.net` zones are
+  managed by MCNS. The external `*.metacircular.net` zone is managed separately
+  (existing DNS infrastructure) and is mostly static.
+
+- **MCP integration.** MCP pushes DNS record updates to MCNS after deploy and
+  migrate operations. When MCP starts service α on node X, it calls the MCNS
+  API to set `α.svc.mcp.metacircular.net` to X's address. Services and clients
+  using internal DNS names automatically resolve to the right place without
+  config changes.
+
+- **Record management API.** Authenticated via MCIAS. MCP is the primary
+  consumer for dynamic updates. Operators can also manage records directly
+  for static entries (node addresses, aliases).
+
+**How it fits in:** MCNS answers "what is the address of X?" MCP answers "where
+is service α running?" and pushes the answer to MCNS. This separation means
+services can use stable DNS names in their configs (e.g.,
+`mcias.svc.mcp.metacircular.net` in `[mcias] server_url`) that survive
+migration without config changes.
+
+**Status:** Not yet implemented.
+
+---
+
+### MCP — Metacircular Control Plane
+
+MCP is the orchestrator. It manages what runs where across the platform. The
+deployment model is operator-driven: the user says "deploy service α" and MCP
+handles the rest. MCP Master runs on the operator's workstation; agents run on
+each managed node.
+
+**What it will provide:**
+
+- **Service registry.** MCP is the source of truth for what is running where.
+  It tracks every service, which node it's on, and its current state. Other
+  components that need to find a service (including MC-Proxy for route table
+  updates) query MCP's registry.
+
+- **Deploy.** The operator says "deploy α". MCP checks if α is already running
+  somewhere. If it is, MCP pulls the new container image on that node and
+  restarts the service in place. If it isn't running, MCP selects a node
+  (the operator can pin to a specific node but shouldn't have to), transfers
+  the initial config, pulls the image from MCR, starts the container, and
+  pushes a DNS update to MCNS (`α.svc.mcp.metacircular.net` → node address).
+
+- **Migrate.** Move a service from one node to another. MCP snapshots the
+  service's `/srv/<service>/` directory on the source node (as a tar.zst
+  image), transfers it to the destination, extracts it, starts the service,
+  stops it on the source, and updates MCNS so DNS points to the new location.
+  The `/srv/<service>/` convention makes this uniform across all services.
+
+- **Data transfer.** The C2 channel supports file-level operations between
+  master and agents: copy or fetch individual files (push a config, pull a
+  log), and transfer tar.zst archives for bulk snapshot/restore of service
+  data directories. This is the foundation for both migration and backup.
+
+- **Service snapshots.** To snapshot `/srv/<service>/`, the agent runs
+  `VACUUM INTO` to create a consistent database copy, then builds a tar.zst
+  that includes the full directory but **excludes** live database files
+  (`*.db`, `*.db-wal`, `*.db-shm`) and the `backups/` directory. The
+  temporary VACUUM INTO copy is injected into the archive as `<service>.db`.
+  The result is a clean, minimal archive that extracts directly into a
+  working service directory on the destination.
+
+- **Container lifecycle.** Start, stop, restart, and update containers on
+  nodes. MCP Master issues commands; agents on each node execute them against
+  the local container runtime (Docker, etc.).
+
+- **Master/agent architecture.** MCP Master runs on the operator's machine.
+  Agents run on every managed node, receiving C2 (command and control) from
+  Master, reporting node status, and managing local workloads. The C2 channel
+  is authenticated via MCIAS. The master does not need to be always-on —
+  agents keep running their workloads independently; the master is needed only
+  to issue new commands.
+
+- **Node management.** Track which nodes are in the platform, their health,
+  available resources, and running workloads.
+
+- **Scheduling.** When placing a new service, MCP selects a node based on
+  available resources and any operator-specified constraints. The operator can
+  override with an explicit node, but the default is MCP's choice.
+
+**How it fits in:** MCP is the piece that ties everything together. MCIAS
+provides identity, Metacrypt provides certificates, MCR provides images, MCNS
+provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the
+map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the
+infrastructure metacircular: the control plane deploys and manages the very
+services it depends on.
+
+**Container-first design:** All Metacircular services are built as containers
+(multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP
+can deploy them. The systemd unit files exist as a fallback and for bootstrap —
+the long-term deployment model is MCP-managed containers.
+
+**Status:** Not yet implemented.
+
+---
+
+### MCAT — MCIAS Login Policy Tester
+
+MCAT is a lightweight diagnostic tool, not a core infrastructure component. It
+presents a web login form, forwards credentials to MCIAS with a configurable
+`service_name` and `tags`, and shows whether the login was accepted or denied
+by policy. This lets operators verify that login policy rules behave as
+expected without touching the target service.
+
+**Status:** Implemented.
+
+---
+
+## Bootstrap Sequence
+
+Bringing up a Metacircular platform from scratch requires careful ordering
+because of the circular dependencies — the infrastructure manages itself, but
+must exist before it can do so. The key challenge is that nearly every service
+needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but
+those services themselves need to be running first.
+
+During bootstrap, all services run as **systemd units** on a single bootstrap
+node. MCP takes over lifecycle management as the final step.
+
+### Prerequisites
+
+Before any service starts, the operator needs:
+
+- **The bootstrap node** — a machine (VPS, homelab server, etc.) with the
+  overlay network configured and reachable.
+- **Seed PKI** — MCIAS and Metacrypt need TLS certs to start, but Metacrypt
+  isn't running yet to issue them. The root CA is generated manually using
+  `github.com/kisom/cert` and stored in the `ca/` directory in the workspace.
+  Initial service certificates are issued from this root. The root CA is then
+  imported into Metacrypt once it's running, so Metacrypt becomes the
+  authoritative CA for the platform going forward.
+- **TOML config files** — each service needs its config in `/srv/<service>/`.
+  During bootstrap these are written manually. Later, MCP handles config
+  distribution.
+
+### Startup Order
+
+```
+Phase 0: Seed PKI
+  Operator creates or obtains initial TLS certificates for MCIAS
+  and Metacrypt. Places them in /srv/mcias/certs/ and
+  /srv/metacrypt/certs/.
+
+Phase 1: Identity
+  ┌──────────────────────────────────────────────────────┐
+  │ MCIAS starts (systemd)                               │
+  │  - No dependencies on other Metacircular services    │
+  │  - Uses seed TLS certificates                        │
+  │  - Operator creates initial admin account             │
+  │  - Operator creates system accounts for other services│
+  └──────────────────────────────────────────────────────┘
+
+Phase 2: Cryptographic Services
+  ┌──────────────────────────────────────────────────────┐
+  │ Metacrypt starts (systemd)                           │
+  │  - Authenticates against MCIAS                       │
+  │  - Uses seed TLS certificates initially              │
+  │  - Operator initializes and unseals                  │
+  │  - Operator creates CA engine, imports root CA from  │
+  │    ca/, creates issuers                              │
+  │  - Can now issue certificates for all other services │
+  │  - Reissue MCIAS and Metacrypt certs from own CA     │
+  │    (replace seed certs with Metacrypt-issued certs)  │
+  └──────────────────────────────────────────────────────┘
+
+Phase 3: Ingress
+  ┌──────────────────────────────────────────────────────┐
+  │ MC-Proxy starts (systemd)                            │
+  │  - Static route table from TOML config               │
+  │  - Routes external traffic to MCIAS, Metacrypt       │
+  │  - No MCIAS auth (pre-auth infrastructure)           │
+  │  - TLS certs for L7 routes from Metacrypt            │
+  └──────────────────────────────────────────────────────┘
+
+Phase 4: Container Registry
+  ┌──────────────────────────────────────────────────────┐
+  │ MCR starts (systemd)                                 │
+  │  - Authenticates against MCIAS                       │
+  │  - TLS certificates from Metacrypt                   │
+  │  - Operator pushes container images for all services │
+  │    (including MCIAS, Metacrypt, MC-Proxy themselves) │
+  └──────────────────────────────────────────────────────┘
+
+Phase 5: DNS
+  ┌──────────────────────────────────────────────────────┐
+  │ MCNS starts (systemd)                                │
+  │  - Authenticates against MCIAS                       │
+  │  - Operator configures initial DNS records           │
+  │    (node addresses, service names)                   │
+  └──────────────────────────────────────────────────────┘
+
+Phase 6: Control Plane
+  ┌──────────────────────────────────────────────────────┐
+  │ MCP Agent starts on bootstrap node (systemd)         │
+  │ MCP Master starts on operator workstation            │
+  │  - Authenticates against MCIAS                       │
+  │  - Master registers the bootstrap node               │
+  │  - Master imports running services into its registry │
+  │  - From here, MCP owns the service map               │
+  │  - Services can be redeployed as MCP-managed         │
+  │    containers (replacing the systemd units)          │
+  └──────────────────────────────────────────────────────┘
+```
+
+### The Seed Certificate Problem
+
+The circular dependency between MCIAS, Metacrypt, and TLS is resolved by
+bootstrapping with a **manually generated root CA**:
+
+1. The operator generates a root CA using `github.com/kisom/cert`. This root
+   and initial service certificates live in the `ca/` directory.
+2. MCIAS and Metacrypt start with certificates issued from this external root.
+3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA
+   engine, making Metacrypt the authoritative issuer under the same root.
+4. Metacrypt can now issue and renew certificates for all services. The `ca/`
+   directory remains as the offline backup of the root material.
+
+This is a one-time process. The root CA is generated once, imported once, and
+from that point forward Metacrypt is the sole CA. MCP handles certificate
+provisioning for all services.
+
+### Adding a New Node
+
+Once the platform is bootstrapped, adding a node is straightforward:
+
+1. Provision the machine and connect it to the overlay network.
+2. Install the MCP agent binary.
+3. Configure the agent with the MCP Master address and MCIAS credentials
+   (system account for the node).
+4. Start the agent. It authenticates with MCIAS, connects to Master, and
+   reports as available.
+5. The operator deploys workloads to it via MCP. MCP handles image pulls,
+   config transfer, certificate provisioning, and DNS updates.
+
+### Disaster Recovery
+
+If the bootstrap node is lost, recovery follows the same sequence as initial
+bootstrap — but with data restored from backups:
+
+1. Start MCIAS on a new node, restore its database from the most recent
+   `VACUUM INTO` snapshot.
+2. Start Metacrypt, restore its database. Unseal with the original password.
+   The entire key hierarchy and all issued certificates are recovered.
+3. Bring up the remaining services in order, restoring their databases.
+4. Start MCP, which rebuilds its registry from the running services.
+5. Update DNS (MCNS or external) to point to the new node.
+
+Every service's `snapshot` CLI command and daily backup timer exist specifically
+to make this recovery possible. The `/srv/<service>/` convention means each
+service's entire state is a single directory to back up and restore.
+
+---
+
+## Certificate Lifecycle
+
+Every service in the platform requires TLS certificates, and Metacrypt is the
+CA that issues them. This section describes how certificates flow from
+Metacrypt to services, how they are renewed, and how the pieces fit together.
+
+### PKI Structure
+
+Metacrypt implements a **two-tier PKI**:
+
+```
+Root CA (self-signed, generated at engine initialization)
+  ├── Issuer "infra"    (intermediate CA for infrastructure services)
+  ├── Issuer "services" (intermediate CA for application services)
+  └── Issuer "clients"  (intermediate CA for client certificates)
+```
+
+The root CA signs intermediate CAs ("issuers"), which in turn sign leaf
+certificates. Each issuer is scoped to a purpose. The root CA certificate is
+the trust anchor — services and clients need it (or the relevant issuer chain)
+to verify certificates presented by other services.
+
+### ACME Protocol
+
+Metacrypt implements an **ACME server** (RFC 8555) with External Account
+Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any
+standard ACME client can obtain certificates from Metacrypt.
+
+The ACME flow:
+
+1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
+2. Client registers an ACME account using the EAB credentials.
+3. Client places a certificate order (one or more domain names).
+4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
+5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT
+   record for DNS-01).
+6. Metacrypt validates the challenge and issues the certificate.
+7. Client downloads the certificate chain and private key.
+
+A **Go client library** (`metacrypt/clients/go`) wraps this entire flow:
+MCIAS login, EAB fetch, account registration, challenge fulfillment, and
+certificate download. Services that integrate this library can obtain and
+renew certificates programmatically.
+
+### How Services Get Certificates Today
+
+Currently, certificates are provisioned through Metacrypt's **REST API or web
+UI** and placed into each service's `/srv/<service>/certs/` directory. This is
+a manual process — the operator issues a certificate, downloads it, and
+deploys the files. The ACME client library exists but is not yet integrated
+into any service.
+
+### How It Will Work With MCP
+
+MCP is the natural place to automate certificate provisioning:
+
+- **Initial deploy.** When MCP deploys a new service, it can provision a
+  certificate from Metacrypt (via the ACME client library or the REST API),
+  transfer the cert and key to the node as part of the config push to
+  `/srv/<service>/certs/`, and start the service with valid TLS material.
+
+- **Renewal.** MCP knows what services are running and when their certificates
+  expire. It can renew certificates before expiry by re-running the ACME flow
+  (or calling Metacrypt's `renew` operation) and pushing updated files to the
+  node. The service restarts with the new certificate.
+
+- **Migration.** When MCP migrates a service, the certificate in
+  `/srv/<service>/certs/` moves with the tar.zst snapshot. If the service's
+  hostname changes (new node, new DNS name), MCP provisions a new certificate
+  for the new name.
+
+- **MC-Proxy L7 routes.** MC-Proxy's L7 mode requires certificate/key pairs
+  for TLS termination. MCP (or the operator) can provision these from
+  Metacrypt and push them to MC-Proxy's cert directory. MC-Proxy's
+  architecture doc lists ACME integration and Metacrypt key storage as future
+  work.
+
+### Trust Distribution
+
+Every service and client that validates TLS certificates needs the root CA
+certificate (or the relevant issuer chain). Metacrypt serves these publicly
+without authentication:
+
+- `GET /v1/pki/{mount}/ca` — root CA certificate (PEM)
+- `GET /v1/pki/{mount}/ca/chain` — full chain: issuer + root (PEM)
+- `GET /v1/pki/{mount}/issuer/{name}` — specific issuer certificate (PEM)
+
+During bootstrap, the root CA cert is distributed manually (or via the `ca/`
+directory in the workspace). Once MCP is running, it can distribute the CA
+cert as part of service deployment. Services reference the CA cert path in
+their `[mcias]` config section (`ca_cert`) to verify connections to MCIAS and
+other services.
+
+---
+
+## End-to-End Deploy Workflow
+
+This traces a deployment from code change to running service, showing how every
+component participates. The example deploys a new version of service α that is
+already running on Node B.
+
+### 1. Build and Push
+
+The operator builds a new container image and pushes it to MCR:
+
+```
+Operator workstation (vade)
+  $ docker build -t mcr.metacircular.net/α:v1.2.0 .
+  $ docker push mcr.metacircular.net/α:v1.2.0
+         │
+         ▼
+    MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
+                                                        │
+                                                   Authenticates
+                                                   via MCIAS
+                                                        │
+                                                   Policy check:
+                                                   can this user
+                                                   push to α?
+                                                        │
+                                                   Image stored
+                                                   (blobs + manifest)
+```
+
+The `docker push` goes through MC-Proxy (SNI routing to MCR), authenticates
+via the OCI token flow (which delegates to MCIAS), and is checked against
+MCR's push policy. The image is stored content-addressed in MCR.
+
+### 2. Deploy
+
+The operator tells MCP to deploy:
+
+```
+Operator workstation (vade)
+  $ mcp deploy α                  # or: mcp deploy α --image v1.2.0
+         │
+    MCP Master
+         │
+         ├── Registry lookup: α is running on Node B
+         │
+         ├── C2 (gRPC over overlay) to Node B agent:
+         │     "pull mcr.metacircular.net/α:v1.2.0 and restart"
+         │
+         ▼
+    MCP Agent (Node B)
+         │
+         ├── Pull image from MCR
+         │     (authenticates via MCIAS, same OCI flow)
+         │
+         ├── Stop running container
+         │
+         ├── Start new container from updated image
+         │     - Mounts /srv/α/ (config, database, certs all persist)
+         │     - Service starts, authenticates to MCIAS, resumes operation
+         │
+         └── Report status back to Master
+```
+
+Since α is already running on Node B, this is an in-place update. The
+`/srv/α/` directory is untouched — config, database, and certificates persist
+across the container restart.
+
+### 3. First-Time Deploy
+
+If α has never been deployed, MCP does more work:
+
+```
+Operator workstation (vade)
+  $ mcp deploy α --config α.toml
+         │
+    MCP Master
+         │
+         ├── Registry lookup: α is not running anywhere
+         │
+         ├── Scheduling: select Node C (best fit)
+         │
+         ├── Provision TLS certificate from Metacrypt
+         │     (ACME flow or REST API)
+         │
+         ├── C2 to Node C agent:
+         │     1. Create /srv/α/ directory structure
+         │     2. Transfer config file (α.toml → /srv/α/α.toml)
+         │     3. Transfer TLS cert+key → /srv/α/certs/
+         │     4. Transfer root CA cert → /srv/α/certs/ca.pem
+         │     5. Pull image from MCR
+         │     6. Start container
+         │
+         ├── Update service registry: α → Node C
+         │
+         ├── Push DNS update to MCNS:
+         │     α.svc.mcp.metacircular.net → Node C address
+         │
+         └── (Optionally) update MC-Proxy route table
+              if α needs external ingress
+```
+
+### 4. Migration
+
+Moving α from Node B to Node C:
+
+```
+Operator workstation (vade)
+  $ mcp migrate α --to node-c     # or let MCP choose the destination
+         │
+    MCP Master
+         │
+         ├── C2 to Node B agent:
+         │     1. Stop α container
+         │     2. Snapshot /srv/α/ → tar.zst archive
+         │     3. Transfer tar.zst to Master (or directly to Node C)
+         │
+         ├── C2 to Node C agent:
+         │     1. Receive tar.zst archive
+         │     2. Extract to /srv/α/
+         │     3. Pull container image from MCR (if not cached)
+         │     4. Start container
+         │     5. Report status
+         │
+         ├── Update service registry: α → Node C
+         │
+         ├── Push DNS update to MCNS:
+         │     α.svc.mcp.metacircular.net → Node C address
+         │
+         └── (If α had external ingress) update MC-Proxy route
+              or rely on DNS change
+```
+
+### What Each Component Does
+
+| Step | MCIAS | Metacrypt | MCR | MC-Proxy | MCP | MCNS |
+|------|-------|-----------|-----|----------|-----|------|
+| Build/push image | Authenticates push | — | Stores image, enforces push policy | Routes traffic to MCR | — | — |
+| Deploy (update) | Authenticates pull, authenticates service on start | — | Serves image to agent | Routes traffic to service | Coordinates: registry lookup, C2 to agent | — |
+| Deploy (new) | Authenticates pull, authenticates service on start | Issues TLS certificate | Serves image to agent | Routes traffic to service (if external) | Coordinates: scheduling, cert provisioning, config transfer, DNS update | Updates DNS records |
+| Migrate | Authenticates service on new node | Issues new cert (if hostname changes) | Serves image (if not cached) | Routes traffic to new location | Coordinates: snapshot, transfer, DNS update | Updates DNS records |
+| Steady state | Validates tokens for every authenticated request | Serves CA certs publicly, renews certs | Serves image pulls | Routes all external traffic | Tracks service health, holds registry | Serves DNS queries |
+
+---
+
+## Future Ideas
+
+Components and capabilities that may be worth building but have no immediate
+timeline. Listed here to capture the thinking; none are committed.
+
+### Observability — Log Collection and Health Monitoring
+
+Every service already produces structured logs (`log/slog`) and exposes health
+checks (gRPC `Health.Check` or REST status endpoints). What's missing is
+aggregation — today, debugging a cross-service issue means SSH'ing into each
+node and reading local logs.
+
+A collector could:
+
+- Gather structured logs from services on each node and forward them to a
+  central store.
+- Periodically health-check local services and report status.
+- Feed health data into MCP so it can make informed decisions (restart
+  unhealthy services, avoid scheduling on degraded nodes, alert the operator).
+
+This might be a standalone service or an MCP agent capability, depending on
+weight. If it's just "tail logs and hit health endpoints," it fits in the
+agent. If it grows to include indexing, querying, retention policies, and
+alerting rules, it's its own service.
+
+### Object Store
+
+The platform has structured storage (SQLite), blob storage scoped to container
+images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does
+not have general-purpose object/blob storage.
+
+Potential uses:
+
+- **Centralized backups.** Service snapshots currently live on each node in
+  `/srv/<service>/backups/`. A central object store gives MCP somewhere to push
+  tar.zst snapshots for offsite retention.
+- **Artifact storage.** Build outputs, large files, anything that doesn't fit
+  in a database row.
+- **Data sharing between services.** Files that need to move between services
+  outside the MCP C2 channel.
+
+Prior art: [Nebula](https://metacircular.net/pages/nebula.html), a
+content-addressable data store with capability-based security (SHA-256
+addressed blobs, UUID entries for versioning, proxy references for revocable
+access). Prototyped in multiple languages. The capability model is interesting
+but may be more sophistication than the platform needs — a simpler
+authenticated blob store with MCIAS integration might suffice.
+
+### Overlay Network Management
+
+The platform currently relies on an external overlay network (WireGuard,
+Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard
+mesh manager would bring the overlay under Metacircular's control:
+
+- Automate key exchange and peer configuration when MCP adds a node.
+- Manage IP allocation within the mesh (potentially absorbing part of MCNS's
+  scope).
+- Remove the dependency on Tailscale's coordination servers.
+
+This is a natural extension of the sovereignty principle but is low priority
+while the mesh is small enough to manage by hand.
+
+### Hypervisor / Isolation
+
+A deeper exploration of environment isolation, message-passing between
+services, and access mediation at a level below containers. Prior art:
+[hypervisor concept](https://metacircular.net/pages/hypervisor.html). The
+current platform achieves these goals through containers + MCIAS + policy
+engines. A hypervisor layer would push isolation down to the OS level —
+interesting for security but significant in scope. More relevant if the
+platform ever moves beyond containers to VM-based workloads.
+
+### Prior Art: SYSGOV
+
+[SYSGOV](https://metacircular.net/pages/lisp-dcos.html) was an earlier
+exploration of system management in Lisp, with SYSPLAN (desired state
+enforcement) and SYSMON (service management). Many of its research questions —
+C2 communication, service discovery, secure config distribution, failure
+handling — are directly addressed by MCP's design. MCP is the spiritual
+successor, reimplemented in Go with the benefit of the Metacircular platform
+underneath it.
--- a/docs/notebook.pdf
+++ b/docs/notebook.pdf