Initial import.

2026-03-25 22:25:44 -07:00
commit 168ceb2c07
6 changed files with 57369 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,12 @@
+# infrastructure / secrets
+/ca
+
+# project directories: these are separate git repos
+/mcat
+/mcias
+/mc-proxy
+/mcr
+/metacrypt
+/mcdsl
+/mcns
+
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,76 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+Metacircular is a multi-service personal infrastructure platform. This root repository is a workspace container — each subdirectory is a separate Git repo (gitignored here). The authoritative platform-wide standards live in `engineering-standards.md`.
+
+## Project Map
+
+| Directory | Purpose | Language |
+|-----------|---------|----------|
+| `mcias/` | Identity and Access Service — central SSO/IAM, all other services delegate auth here | Go |
+| `metacrypt/` | Cryptographic service engine — encrypted secrets, PKI/CA, SSH CA, transit encryption | Go |
+| `mc-proxy/` | TLS proxy and router — L4 passthrough or L7 terminating, PROXY protocol, firewall | Go |
+| `mcr/` | OCI container registry — integrated with MCIAS for auth and policy-based push/pull | Go |
+| `mcat/` | MCIAS login policy tester — lightweight web app to test and audit login policies | Go |
+| `mcdsl/` | Standard library — shared packages for auth, db, config, TLS servers, CSRF, snapshots | Go |
+| `ca/` | PKI infrastructure and secrets for dev/test (not source code, gitignored) | — |
+
+Each subproject has its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`. When working in a subproject, read its own CLAUDE.md first.
+
+## Service Dependencies
+
+MCIAS is the root dependency — every other service authenticates through it. No service maintains its own user database. The dependency graph:
+
+```
+mcias (standalone — no MCIAS dependency)
+  ├── metacrypt (uses MCIAS for auth)
+  ├── mc-proxy (uses MCIAS for admin auth)
+  ├── mcr (uses MCIAS for auth + policy)
+  └── mcat (tests MCIAS login policies)
+```
+
+## Standard Build Commands (all subprojects)
+
+```bash
+make all         # vet → lint → test → build (the CI pipeline)
+make build       # go build ./...
+make test        # go test ./...
+make vet         # go vet ./...
+make lint        # golangci-lint run ./...
+make proto       # regenerate gRPC code from .proto files
+make proto-lint  # buf lint + buf breaking
+make devserver   # build and run locally against srv/ config
+make docker      # build container image
+make clean       # remove binaries
+```
+
+Run a single test: `go test ./internal/auth/ -run TestTokenValidation`
+
+## Critical Rules
+
+1. **REST/gRPC sync**: Every REST endpoint must have a corresponding gRPC RPC, updated in the same change.
+2. **gRPC interceptor maps**: New RPCs must be added to `authRequiredMethods`, `adminRequiredMethods`, and/or `sealRequiredMethods`. Forgetting this is a security defect.
+3. **No CGo in production**: All builds use `CGO_ENABLED=0`. Use `modernc.org/sqlite`, not `mattn/go-sqlite3`.
+4. **No test frameworks**: Use stdlib `testing` only. Real SQLite in `t.TempDir()`, no mocks for databases.
+5. **Default deny**: Unauthenticated and unauthorized requests are always rejected. Admin detection comes solely from the MCIAS `admin` role.
+6. **Proto versioning**: Start at v1. Only create v2 for breaking changes. Non-breaking additions go in-place.
+
+## Architecture Patterns
+
+- **Seal/Unseal**: Metacrypt starts sealed and requires a password to unlock (Vault-like pattern). Key hierarchy: Password → Argon2id → KWK → MEK → per-engine DEKs.
+- **Web UI separation**: Web UIs run as separate binaries communicating with the API server via gRPC. No direct DB access from the web tier.
+- **Config**: TOML with env var overrides (`SERVICENAME_*`). All runtime data in `/srv/<service>/`.
+- **Policy engines**: Priority-based ACL rules, default deny, admin bypass. See metacrypt's implementation as reference.
+- **Auth flow**: Client → service `/v1/auth/login` → MCIAS client library → MCIAS validates → bearer token returned. Token validation cached 30s keyed by SHA-256 of token.
+
+## Tech Stack
+
+- Go 1.25+, chi router, cobra CLI, go-toml/v2
+- SQLite via modernc.org/sqlite (pure Go), WAL mode, foreign keys on
+- gRPC + protobuf, buf for linting
+- htmx + Go html/template for web UIs
+- golangci-lint v2 with errcheck, gosec, staticcheck, revive
+- TLS 1.3 minimum, AES-256-GCM, Argon2id, Ed25519
--- a/README.md
+++ b/README.md
@@ -0,0 +1,104 @@
+# Metacircular Dynamics
+
+Metacircular Dynamics is a self-hosted personal infrastructure platform. The
+name comes from the tradition of metacircular evaluators in Lisp — a system
+defined in terms of itself — by way of SICP and Common Lisp projects that
+preceded this work. The infrastructure is metacircular in the same sense: the
+platform manages, secures, and hosts its own services.
+
+Every component is self-hosted, every dependency is controlled, and the entire
+stack is operable by one person. No cloud providers, no third-party auth, no
+external databases. The platform is designed for a small number of machines — a
+personal homelab or a handful of VPSes — not for hyperscale.
+
+All services are written in Go and follow shared
+[engineering standards](engineering-standards.md). Full platform documentation
+lives in [docs/metacircular.md](docs/metacircular.md).
+
+## Components
+
+| Component | Purpose | Status |
+|-----------|---------|--------|
+| **MCIAS** | Identity and access — the root of trust. SSO, token issuance, role management, login policy. Every other service delegates auth here. | Implemented |
+| **Metacrypt** | Cryptographic services — PKI/CA, transit encryption, encrypted secret storage behind a seal/unseal barrier. Issues TLS certificates for the platform. | Implemented |
+| **MCR** | Container registry — OCI-compliant image storage with MCIAS auth and policy-controlled push/pull. | Implemented |
+| **MC-Proxy** | Node ingress — TLS proxy and router. L4 passthrough or L7 terminating (per-route), PROXY protocol, firewall with rate limiting and GeoIP. | Implemented |
+| **MCNS** | Networking — DNS and address management for the platform. | Planned |
+| **MCP** | Control plane — operator-driven deployment, service registry, data transfer, master/agent container lifecycle. | Planned |
+
+Shared library: **MCDSL** — standard library for all services (auth, db,
+config, TLS server, CSRF, snapshots).
+
+Supporting tool: **MCAT** — lightweight web app for testing MCIAS login
+policies.
+
+## Architecture
+
+```
+MCIAS (standalone — the root of trust)
+  ├── Metacrypt (auth via MCIAS; provides certs to all services)
+  ├── MCR (auth via MCIAS; stores images pulled by MCP)
+  ├── MCNS (auth via MCIAS; provides DNS for the platform)
+  ├── MCP (auth via MCIAS; orchestrates everything; owns service registry)
+  └── MC-Proxy (pre-auth; routes traffic to services behind it)
+```
+
+Each machine is an **MC Node**. On every node, **MC-Proxy** accepts outside
+connections and routes by TLS SNI — either relaying raw TCP (L4) or
+terminating TLS and reverse proxying HTTP/2 (L7), per-route. **MCP Agent** on
+each node receives commands from **MCP Master** (which runs on the operator's
+workstation) and manages containers via the local runtime. Core infrastructure
+(MCIAS, Metacrypt, MCR) runs on nodes like any other workload.
+
+```
+                     ┌──────────────────┐    ┌──────────────┐
+                     │  Core Infra      │    │  MCP Master  │
+                     │  (e.g. MCIAS)    │    │              │
+                     └────────┬─────────┘    └──────┬───────┘
+                              │                     │ C2
+    Outside     ┌─────────────▼─────────────────────▼──────────┐
+    Client ────▶│                   MC Node                     │
+                │  ┌───────────┐                               │
+                │  │ MC-Proxy  │──┬──────┬──────┐              │
+                │  └───────────┘  │      │      │              │
+                │             ┌───▼┐  ┌──▼─┐  ┌─▼──┐  ┌─────┐ │
+                │             │ α  │  │ β  │  │ γ  │  │ MCP │ │
+                │             └────┘  └────┘  └────┘  │Slave│ │
+                │                                     └──┬──┘ │
+                │                                   ┌────▼───┐│
+                │                                   │Container│
+                │                                   │Runtime  │
+                │                                   └────────┘│
+                └──────────────────────────────────────────────┘
+```
+
+## Design Principles
+
+- **Sovereignty** — self-hosted end to end; no SaaS dependencies
+- **Simplicity** — SQLite over Postgres, stdlib testing, pure Go, htmx, single binaries
+- **Consistency** — every service follows identical patterns (layout, config, auth, deployment)
+- **Security as structure** — default deny, TLS 1.3 minimum, interceptor-map auth, encrypted-at-rest secrets
+- **Design before code** — ARCHITECTURE.md is the spec, written before implementation
+
+## Tech Stack
+
+Go 1.25+, SQLite (modernc.org/sqlite), chi router, gRPC + protobuf, htmx +
+Go html/template, golangci-lint v2, Ed25519/Argon2id/AES-256-GCM, TLS 1.3,
+container-first deployment (Docker + systemd).
+
+## Repository Structure
+
+This root repository is a workspace container. Each subdirectory is a separate
+Git repo with its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`:
+
+```
+metacircular/
+├── mcias/          Identity and Access Service
+├── metacrypt/      Cryptographic service engine
+├── mcr/            Container registry
+├── mc-proxy/       TLS proxy and router
+├── mcat/           Login policy tester
+├── mcdsl/          Standard library (shared packages)
+├── ca/             PKI infrastructure (dev/test, not source code)
+└── docs/           Platform-wide documentation
+```
--- a/docs/metacircular.md
+++ b/docs/metacircular.md
@@ -0,0 +1,927 @@
+# Metacircular Infrastructure
+
+## Background
+
+Metacircular Dynamics is a personal infrastructure platform. The name comes
+from the tradition of metacircular evaluators in Lisp — a system defined in
+terms of itself — by way of SICP and Common Lisp projects that preceded this
+work. The infrastructure is metacircular in the same sense: the platform
+manages, secures, and hosts its own services.
+
+The goal is sovereign infrastructure. Every component is self-hosted, every
+dependency is controlled, and the entire stack is operable by one person. There
+are no cloud provider dependencies, no third-party auth providers, no external
+databases. When a Metacircular node boots, it connects to Metacircular services
+for identity, certificates, container images, and workload scheduling.
+
+All services are written in Go and follow a shared set of engineering standards
+(see `engineering-standards.md`). The platform is designed for a small number of
+machines — a personal homelab or a handful of VPSes — not for hyperscale.
+
+## Philosophy
+
+**Sovereignty.** You own the whole stack. Identity, certificates, secrets,
+container images, DNS, networking — all self-hosted. No SaaS dependency means
+no vendor lock-in, no surprise deprecations, and no trust delegation to third
+parties.
+
+**Simplicity over sophistication.** SQLite over Postgres. Stdlib `testing` over
+test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments
+over microservice orchestrators. The right tool is the simplest one that solves
+the problem without creating a new one.
+
+**Consistency as leverage.** Every service follows identical patterns: the same
+directory layout, the same Makefile targets, the same config format, the same
+auth integration, the same deployment model. Knowledge of one service transfers
+instantly to all others. A new service can be stood up by copying the skeleton.
+
+**Security as structure.** Security is not a feature bolted on after the fact.
+Default deny is the starting posture. TLS 1.3 is the minimum, not a goal.
+Interceptor maps make "forgot to add auth" a visible, reviewable omission
+rather than a silent runtime failure. Secrets are encrypted at rest behind a
+seal/unseal barrier. Every service delegates identity to a single root of
+trust.
+
+**Design before code.** The architecture document is written before
+implementation begins. It is the spec, not the afterthought. When the code and
+the spec disagree, one of them has a bug.
+
+## High-Level Overview
+
+Metacircular infrastructure is built from six core components, plus a shared
+standard library (**MCDSL**) that provides the common patterns all services
+depend on (auth integration, database setup, config loading, TLS server
+bootstrapping, CSRF, snapshots):
+
+- **MCIAS** — Identity and access. The root of trust for all other services.
+  Handles authentication, token issuance, role management, and login policy
+  enforcement. Every other component delegates auth here.
+
+- **Metacrypt** — Cryptographic services. PKI/CA, SSH CA, transit encryption,
+  and encrypted secret storage behind a Vault-inspired seal/unseal barrier.
+  Issues the TLS certificates that every other service depends on.
+
+- **MCR** — Container registry. OCI-compliant image storage. MCP directs nodes
+  to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.
+
+- **MCNS** — Networking. DNS and address management for the platform.
+
+- **MCP** — Control plane. The orchestrator. A master/agent architecture that
+  manages workload scheduling, container lifecycle, service registry, data
+  transfer, and node state across the platform.
+
+- **MC-Proxy** — Node ingress. A TLS proxy and router that sits on every node,
+  accepts outside connections, and routes them to the correct service — either
+  as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.
+
+These components form a dependency graph rooted at MCIAS:
+
+```
+MCIAS (standalone — the root of trust)
+  ├── Metacrypt (uses MCIAS for auth; provides certs to all services)
+  ├── MCR (uses MCIAS for auth; stores images pulled by MCP)
+  ├── MCNS (uses MCIAS for auth; provides DNS for the platform)
+  ├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
+  └── MC-Proxy (pre-auth; routes traffic to services behind it)
+```
+
+### The Node Model
+
+The unit of deployment is the **MC Node** — a machine (physical or virtual)
+that participates in the Metacircular platform.
+
+```
+                     ┌──────────────────┐    ┌──────────────┐
+                     │  System / Core   │    │     MCP      │
+                     │  Infrastructure  │    │    Master     │
+                     │  (e.g. MCIAS)    │    │              │
+                     └────────┬─────────┘    └──────┬───────┘
+                              │                     │ C2
+                              │                     │
+    Outside     ┌─────────────▼─────────────────────▼──────────┐
+    Client ────▶│                   MC Node                     │
+                │                                               │
+                │   ┌───────────┐                               │
+                │   │ MC-Proxy  │──┬──────┬──────┐              │
+                │   └───────────┘  │      │      │              │
+                │              ┌───▼┐  ┌──▼─┐  ┌─▼──┐  ┌─────┐ │
+                │              │ α  │  │ β  │  │ γ  │  │ MCP │ │
+                │              └────┘  └────┘  └────┘  │Slave│ │
+                │                                      └──┬──┘ │
+                │                                    ┌────▼───┐│
+                │                                    │Docker/ ││
+                │                                    │etc.    ││
+                │                                    └────────┘│
+                └──────────────────────────────────────────────┘
+```
+
+Outside clients connect to **MC-Proxy**, which inspects the TLS SNI hostname
+and routes to the correct service (α, β, γ) — either as a raw TCP relay or
+via TLS-terminating HTTP/2 reverse proxy, per-route. The **MCP Agent** on each
+node receives C2 commands from the **MCP Master** (running on the operator's
+workstation) and manages local container lifecycle via the container runtime.
+Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any
+other workload.
+
+### The Network Model
+
+Metacircular nodes are connected via an **encrypted overlay network** — a
+self-managed WireGuard mesh, Tailscale, or similar. No component has a hard
+dependency on a specific overlay implementation; the platform requires only
+that nodes can reach each other over encrypted links.
+
+```
+                  Public Internet
+                        │
+              ┌─────────▼──────────┐
+              │  Edge MC-Proxy     │    VPS (public IP)
+              │  :443              │
+              └─────────┬──────────┘
+                        │ PROXY protocol v2
+              ┌─────────▼──────────────────────────────────┐
+              │         Encrypted Overlay (e.g. WireGuard) │
+              │                                            │
+  ┌───────────┴──┐   ┌──────────┐   ┌──────────┐   ┌──────┴─────┐
+  │ Origin       │   │  Node B  │   │  Node C  │   │  Operator  │
+  │ MC-Proxy     │   │  (MCP    │   │          │   │  Workstation│
+  │ + services   │   │  agent)  │   │  (MCP    │   │  (MCP      │
+  │ (MCP agent)  │   │          │   │  agent)  │   │  Master)   │
+  └──────────────┘   └──────────┘   └──────────┘   └────────────┘
+```
+
+**External traffic** flows from the internet through an edge MC-Proxy (on a
+public VPS), which forwards via PROXY protocol over the overlay to an origin
+MC-Proxy on the private network. The overlay preserves the real client IP
+across the hop.
+
+**Internal traffic** (MCP C2, inter-service communication, MCNS DNS) flows
+directly over the overlay. MCP's C2 channel is gRPC over whatever link exists
+between master and agent — the overlay provides the transport.
+
+The overlay network itself is a candidate for future Metacircular management
+(a self-hosted WireGuard mesh manager), consistent with the sovereignty
+principle of minimizing third-party dependencies.
+
+---
+
+## System Catalog
+
+### MCIAS — Metacircular Identity and Access Service
+
+MCIAS is the root of trust for the entire platform. Every other service
+delegates authentication to it; no service maintains its own user database.
+
+**What it provides:**
+
+- **Authentication.** Username/password with optional TOTP and FIDO2/WebAuthn.
+  Credentials are verified by MCIAS and a signed JWT bearer token is returned.
+  Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of
+  the token).
+
+- **Role-based access.** Three roles — `admin` (full access, policy bypass),
+  `user` (policy-governed), `guest` (service-dependent restrictions). Admin
+  detection comes solely from the MCIAS `admin` role; services never promote
+  users locally.
+
+- **Account types.** Human accounts (interactive users) and system accounts
+  (service-to-service). Both authenticate the same way; system accounts enable
+  automated workflows.
+
+- **Login policy.** Priority-based ACL rules that control who can log into
+  which services. Rules can target roles, account types, service names, and
+  tags. This allows operators to restrict access per-service (e.g., deny
+  `guest` from services tagged `env:restricted`) without changing the
+  services themselves.
+
+- **Token lifecycle.** Issuance, validation, renewal, and revocation.
+  Ed25519-signed JWTs. Short expiry with renewal support.
+
+**How other services integrate:** Every service includes an `[mcias]` config
+section with the MCIAS server URL, a `service_name`, and optional `tags`. At
+login time, the service forwards credentials to MCIAS along with this context.
+MCIAS evaluates login policy against the service context, verifies credentials,
+and returns a bearer token. The MCIAS Go client library
+(`git.wntrmute.dev/kyle/mcias/clients/go`) handles this flow.
+
+**Status:** Implemented. v1.0.0 complete.
+
+---
+
+### Metacrypt — Cryptographic Service Engine
+
+Metacrypt provides cryptographic resources to the platform through a modular
+engine architecture, backed by an encrypted storage barrier inspired by
+HashiCorp Vault.
+
+**What it provides:**
+
+- **PKI / Certificate Authority.** X.509 certificate issuance. Root and
+  intermediate CAs, certificate signing, CRL management, ACME protocol
+  support. This is how every service in the platform gets its TLS
+  certificates.
+
+- **SSH CA.** (Planned.) SSH certificate signing for host and user
+  certificates, replacing static SSH key management.
+
+- **Transit encryption.** (Planned.) Encrypt and decrypt data without exposing
+  keys to the caller. Envelope encryption for services that need to protect
+  data at rest without managing their own key material.
+
+- **User-to-user encryption.** (Planned.) End-to-end encryption between users,
+  with key management handled by Metacrypt.
+
+**Seal/unseal model:** Metacrypt starts sealed. An operator provides a password
+which derives (via Argon2id) a key-wrapping key, which decrypts the master
+encryption key (MEK), which in turn unwraps per-engine data encryption keys
+(DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise
+of one engine's key does not expose another's data.
+
+```
+Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs
+```
+
+**Engine architecture:** Engines are pluggable providers that register with a
+central registry. Each engine mount has a type, a name, its own DEK, and its
+own configuration. The engine interface handles initialization, seal/unseal
+lifecycle, and request routing. New engine types plug in without modifying the
+core.
+
+**Policy:** Fine-grained ACL rules control which users can perform which
+operations on which engine mounts. Priority-based evaluation, default deny,
+admin bypass. See Metacrypt's `POLICY.md` for the full model.
+
+**Status:** Implemented. CA engine complete with ACME support. SSH CA, transit,
+and user-to-user engines planned.
+
+---
+
+### MCR — Metacircular Container Registry
+
+MCR is an OCI Distribution Spec-compliant container registry. It stores and
+serves the container images that MCP deploys across the platform.
+
+**What it provides:**
+
+- **OCI-compliant image storage.** Pull, push, tag, and delete container
+  images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite,
+  blobs on the filesystem.
+
+- **Authenticated access.** No anonymous access. MCR uses the OCI token
+  authentication flow: clients hit `/v2/`, receive a 401 with a token
+  endpoint, authenticate via MCIAS, and use the returned JWT for subsequent
+  requests.
+
+- **Policy-controlled push/pull.** Fine-grained ACL rules govern who can push
+  to or pull from which repositories. Integrated with MCIAS roles.
+
+- **Garbage collection.** Unreferenced blobs are cleaned up via the admin CLI
+  (`mcrctl`).
+
+**How it fits in:** MCP directs nodes to pull images from MCR. When a workload
+is scheduled, MCP tells the node's agent which image to pull and where to get
+it. MCR sits behind an MC-Proxy instance for TLS routing.
+
+**Status:** Implemented. Phase 12 (web UI) complete.
+
+---
+
+### MC-Proxy — TLS Proxy and Router
+
+MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections,
+extracts the SNI hostname, and routes to the correct backend. Each route is
+independently configured as either **L4 passthrough** (raw TCP relay, no TLS
+termination) or **L7 terminating** (terminates TLS, reverse proxies HTTP/2 and
+HTTP/1.1 including gRPC). Both modes coexist on the same listener.
+
+**What it provides:**
+
+- **SNI-based routing.** A route table maps hostnames to backend addresses.
+  Exact match, case-insensitive. Multiple listeners can bind different ports,
+  each with its own route table, all sharing the same global firewall.
+
+- **Dual-mode proxying.** L4 routes relay raw TCP — backends see the original
+  TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy
+  and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS),
+  with header injection (`X-Forwarded-For`, `X-Real-IP`), gRPC streaming
+  support, and trailer forwarding.
+
+- **Global firewall.** Every connection is evaluated before routing: per-IP
+  rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2).
+  Blocked connections get a TCP RST — no error messages, no TLS alerts.
+
+- **PROXY protocol.** Listeners can accept v1/v2 headers from upstream proxies
+  to learn the real client IP. Routes can send v2 headers to downstream
+  backends. This enables multi-hop deployments — a public edge MC-Proxy on a
+  VPS forwarding over the encrypted overlay to a private origin MC-Proxy —
+  while preserving the real client IP for firewall evaluation and logging.
+
+- **Runtime management.** Routes and firewall rules can be updated at runtime
+  via a gRPC admin API on a Unix domain socket (filesystem permissions for
+  access control, no network exposure). State is persisted to SQLite with
+  write-through semantics.
+
+**How it fits in:** MC-Proxy is pre-auth infrastructure. It sits in front of
+everything on a node. Outside clients connect to MC-Proxy on well-known ports
+(443, 8443, etc.) and MC-Proxy routes to the correct backend based on the
+hostname the client is trying to reach. A typical production deployment uses
+two instances — an edge proxy on a public VPS and an origin proxy on the
+private network, connected over the overlay with PROXY protocol preserving
+client IPs across the hop.
+
+**Status:** Implemented.
+
+---
+
+### MCNS — Metacircular Networking Service
+
+MCNS provides DNS for the platform. It manages two internal zones and serves
+as the name resolution layer for the Metacircular network. Service discovery
+(which services run where) is owned by MCP; MCNS translates those assignments
+into DNS records.
+
+**What it will provide:**
+
+- **Internal DNS.** MCNS is authoritative for the internal zones of the
+  Metacircular network. Three zones serve different purposes:
+
+  | Zone | Example | Purpose |
+  |------|---------|---------|
+  | `*.metacircular.net` | `metacrypt.metacircular.net` | External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy. |
+  | `*.mcp.metacircular.net` | `vade.mcp.metacircular.net` | Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs). |
+  | `*.svc.mcp.metacircular.net` | `metacrypt.svc.mcp.metacircular.net` | Internal service addresses. Maps service names to the node and port where they currently run. |
+
+  The `*.mcp.metacircular.net` and `*.svc.mcp.metacircular.net` zones are
+  managed by MCNS. The external `*.metacircular.net` zone is managed separately
+  (existing DNS infrastructure) and is mostly static.
+
+- **MCP integration.** MCP pushes DNS record updates to MCNS after deploy and
+  migrate operations. When MCP starts service α on node X, it calls the MCNS
+  API to set `α.svc.mcp.metacircular.net` to X's address. Services and clients
+  using internal DNS names automatically resolve to the right place without
+  config changes.
+
+- **Record management API.** Authenticated via MCIAS. MCP is the primary
+  consumer for dynamic updates. Operators can also manage records directly
+  for static entries (node addresses, aliases).
+
+**How it fits in:** MCNS answers "what is the address of X?" MCP answers "where
+is service α running?" and pushes the answer to MCNS. This separation means
+services can use stable DNS names in their configs (e.g.,
+`mcias.svc.mcp.metacircular.net` in `[mcias] server_url`) that survive
+migration without config changes.
+
+**Status:** Not yet implemented.
+
+---
+
+### MCP — Metacircular Control Plane
+
+MCP is the orchestrator. It manages what runs where across the platform. The
+deployment model is operator-driven: the user says "deploy service α" and MCP
+handles the rest. MCP Master runs on the operator's workstation; agents run on
+each managed node.
+
+**What it will provide:**
+
+- **Service registry.** MCP is the source of truth for what is running where.
+  It tracks every service, which node it's on, and its current state. Other
+  components that need to find a service (including MC-Proxy for route table
+  updates) query MCP's registry.
+
+- **Deploy.** The operator says "deploy α". MCP checks if α is already running
+  somewhere. If it is, MCP pulls the new container image on that node and
+  restarts the service in place. If it isn't running, MCP selects a node
+  (the operator can pin to a specific node but shouldn't have to), transfers
+  the initial config, pulls the image from MCR, starts the container, and
+  pushes a DNS update to MCNS (`α.svc.mcp.metacircular.net` → node address).
+
+- **Migrate.** Move a service from one node to another. MCP snapshots the
+  service's `/srv/<service>/` directory on the source node (as a tar.zst
+  image), transfers it to the destination, extracts it, starts the service,
+  stops it on the source, and updates MCNS so DNS points to the new location.
+  The `/srv/<service>/` convention makes this uniform across all services.
+
+- **Data transfer.** The C2 channel supports file-level operations between
+  master and agents: copy or fetch individual files (push a config, pull a
+  log), and transfer tar.zst archives for bulk snapshot/restore of service
+  data directories. This is the foundation for both migration and backup.
+
+- **Service snapshots.** To snapshot `/srv/<service>/`, the agent runs
+  `VACUUM INTO` to create a consistent database copy, then builds a tar.zst
+  that includes the full directory but **excludes** live database files
+  (`*.db`, `*.db-wal`, `*.db-shm`) and the `backups/` directory. The
+  temporary VACUUM INTO copy is injected into the archive as `<service>.db`.
+  The result is a clean, minimal archive that extracts directly into a
+  working service directory on the destination.
+
+- **Container lifecycle.** Start, stop, restart, and update containers on
+  nodes. MCP Master issues commands; agents on each node execute them against
+  the local container runtime (Docker, etc.).
+
+- **Master/agent architecture.** MCP Master runs on the operator's machine.
+  Agents run on every managed node, receiving C2 (command and control) from
+  Master, reporting node status, and managing local workloads. The C2 channel
+  is authenticated via MCIAS. The master does not need to be always-on —
+  agents keep running their workloads independently; the master is needed only
+  to issue new commands.
+
+- **Node management.** Track which nodes are in the platform, their health,
+  available resources, and running workloads.
+
+- **Scheduling.** When placing a new service, MCP selects a node based on
+  available resources and any operator-specified constraints. The operator can
+  override with an explicit node, but the default is MCP's choice.
+
+**How it fits in:** MCP is the piece that ties everything together. MCIAS
+provides identity, Metacrypt provides certificates, MCR provides images, MCNS
+provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the
+map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the
+infrastructure metacircular: the control plane deploys and manages the very
+services it depends on.
+
+**Container-first design:** All Metacircular services are built as containers
+(multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP
+can deploy them. The systemd unit files exist as a fallback and for bootstrap —
+the long-term deployment model is MCP-managed containers.
+
+**Status:** Not yet implemented.
+
+---
+
+### MCAT — MCIAS Login Policy Tester
+
+MCAT is a lightweight diagnostic tool, not a core infrastructure component. It
+presents a web login form, forwards credentials to MCIAS with a configurable
+`service_name` and `tags`, and shows whether the login was accepted or denied
+by policy. This lets operators verify that login policy rules behave as
+expected without touching the target service.
+
+**Status:** Implemented.
+
+---
+
+## Bootstrap Sequence
+
+Bringing up a Metacircular platform from scratch requires careful ordering
+because of the circular dependencies — the infrastructure manages itself, but
+must exist before it can do so. The key challenge is that nearly every service
+needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but
+those services themselves need to be running first.
+
+During bootstrap, all services run as **systemd units** on a single bootstrap
+node. MCP takes over lifecycle management as the final step.
+
+### Prerequisites
+
+Before any service starts, the operator needs:
+
+- **The bootstrap node** — a machine (VPS, homelab server, etc.) with the
+  overlay network configured and reachable.
+- **Seed PKI** — MCIAS and Metacrypt need TLS certs to start, but Metacrypt
+  isn't running yet to issue them. The root CA is generated manually using
+  `github.com/kisom/cert` and stored in the `ca/` directory in the workspace.
+  Initial service certificates are issued from this root. The root CA is then
+  imported into Metacrypt once it's running, so Metacrypt becomes the
+  authoritative CA for the platform going forward.
+- **TOML config files** — each service needs its config in `/srv/<service>/`.
+  During bootstrap these are written manually. Later, MCP handles config
+  distribution.
+
+### Startup Order
+
+```
+Phase 0: Seed PKI
+  Operator creates or obtains initial TLS certificates for MCIAS
+  and Metacrypt. Places them in /srv/mcias/certs/ and
+  /srv/metacrypt/certs/.
+
+Phase 1: Identity
+  ┌──────────────────────────────────────────────────────┐
+  │ MCIAS starts (systemd)                               │
+  │  - No dependencies on other Metacircular services    │
+  │  - Uses seed TLS certificates                        │
+  │  - Operator creates initial admin account             │
+  │  - Operator creates system accounts for other services│
+  └──────────────────────────────────────────────────────┘
+
+Phase 2: Cryptographic Services
+  ┌──────────────────────────────────────────────────────┐
+  │ Metacrypt starts (systemd)                           │
+  │  - Authenticates against MCIAS                       │
+  │  - Uses seed TLS certificates initially              │
+  │  - Operator initializes and unseals                  │
+  │  - Operator creates CA engine, imports root CA from  │
+  │    ca/, creates issuers                              │
+  │  - Can now issue certificates for all other services │
+  │  - Reissue MCIAS and Metacrypt certs from own CA     │
+  │    (replace seed certs with Metacrypt-issued certs)  │
+  └──────────────────────────────────────────────────────┘
+
+Phase 3: Ingress
+  ┌──────────────────────────────────────────────────────┐
+  │ MC-Proxy starts (systemd)                            │
+  │  - Static route table from TOML config               │
+  │  - Routes external traffic to MCIAS, Metacrypt       │
+  │  - No MCIAS auth (pre-auth infrastructure)           │
+  │  - TLS certs for L7 routes from Metacrypt            │
+  └──────────────────────────────────────────────────────┘
+
+Phase 4: Container Registry
+  ┌──────────────────────────────────────────────────────┐
+  │ MCR starts (systemd)                                 │
+  │  - Authenticates against MCIAS                       │
+  │  - TLS certificates from Metacrypt                   │
+  │  - Operator pushes container images for all services │
+  │    (including MCIAS, Metacrypt, MC-Proxy themselves) │
+  └──────────────────────────────────────────────────────┘
+
+Phase 5: DNS
+  ┌──────────────────────────────────────────────────────┐
+  │ MCNS starts (systemd)                                │
+  │  - Authenticates against MCIAS                       │
+  │  - Operator configures initial DNS records           │
+  │    (node addresses, service names)                   │
+  └──────────────────────────────────────────────────────┘
+
+Phase 6: Control Plane
+  ┌──────────────────────────────────────────────────────┐
+  │ MCP Agent starts on bootstrap node (systemd)         │
+  │ MCP Master starts on operator workstation            │
+  │  - Authenticates against MCIAS                       │
+  │  - Master registers the bootstrap node               │
+  │  - Master imports running services into its registry │
+  │  - From here, MCP owns the service map               │
+  │  - Services can be redeployed as MCP-managed         │
+  │    containers (replacing the systemd units)          │
+  └──────────────────────────────────────────────────────┘
+```
+
+### The Seed Certificate Problem
+
+The circular dependency between MCIAS, Metacrypt, and TLS is resolved by
+bootstrapping with a **manually generated root CA**:
+
+1. The operator generates a root CA using `github.com/kisom/cert`. This root
+   and initial service certificates live in the `ca/` directory.
+2. MCIAS and Metacrypt start with certificates issued from this external root.
+3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA
+   engine, making Metacrypt the authoritative issuer under the same root.
+4. Metacrypt can now issue and renew certificates for all services. The `ca/`
+   directory remains as the offline backup of the root material.
+
+This is a one-time process. The root CA is generated once, imported once, and
+from that point forward Metacrypt is the sole CA. MCP handles certificate
+provisioning for all services.
+
+### Adding a New Node
+
+Once the platform is bootstrapped, adding a node is straightforward:
+
+1. Provision the machine and connect it to the overlay network.
+2. Install the MCP agent binary.
+3. Configure the agent with the MCP Master address and MCIAS credentials
+   (system account for the node).
+4. Start the agent. It authenticates with MCIAS, connects to Master, and
+   reports as available.
+5. The operator deploys workloads to it via MCP. MCP handles image pulls,
+   config transfer, certificate provisioning, and DNS updates.
+
+### Disaster Recovery
+
+If the bootstrap node is lost, recovery follows the same sequence as initial
+bootstrap — but with data restored from backups:
+
+1. Start MCIAS on a new node, restore its database from the most recent
+   `VACUUM INTO` snapshot.
+2. Start Metacrypt, restore its database. Unseal with the original password.
+   The entire key hierarchy and all issued certificates are recovered.
+3. Bring up the remaining services in order, restoring their databases.
+4. Start MCP, which rebuilds its registry from the running services.
+5. Update DNS (MCNS or external) to point to the new node.
+
+Every service's `snapshot` CLI command and daily backup timer exist specifically
+to make this recovery possible. The `/srv/<service>/` convention means each
+service's entire state is a single directory to back up and restore.
+
+---
+
+## Certificate Lifecycle
+
+Every service in the platform requires TLS certificates, and Metacrypt is the
+CA that issues them. This section describes how certificates flow from
+Metacrypt to services, how they are renewed, and how the pieces fit together.
+
+### PKI Structure
+
+Metacrypt implements a **two-tier PKI**:
+
+```
+Root CA (self-signed, generated at engine initialization)
+  ├── Issuer "infra"    (intermediate CA for infrastructure services)
+  ├── Issuer "services" (intermediate CA for application services)
+  └── Issuer "clients"  (intermediate CA for client certificates)
+```
+
+The root CA signs intermediate CAs ("issuers"), which in turn sign leaf
+certificates. Each issuer is scoped to a purpose. The root CA certificate is
+the trust anchor — services and clients need it (or the relevant issuer chain)
+to verify certificates presented by other services.
+
+### ACME Protocol
+
+Metacrypt implements an **ACME server** (RFC 8555) with External Account
+Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any
+standard ACME client can obtain certificates from Metacrypt.
+
+The ACME flow:
+
+1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
+2. Client registers an ACME account using the EAB credentials.
+3. Client places a certificate order (one or more domain names).
+4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
+5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT
+   record for DNS-01).
+6. Metacrypt validates the challenge and issues the certificate.
+7. Client downloads the certificate chain and private key.
+
+A **Go client library** (`metacrypt/clients/go`) wraps this entire flow:
+MCIAS login, EAB fetch, account registration, challenge fulfillment, and
+certificate download. Services that integrate this library can obtain and
+renew certificates programmatically.
+
+### How Services Get Certificates Today
+
+Currently, certificates are provisioned through Metacrypt's **REST API or web
+UI** and placed into each service's `/srv/<service>/certs/` directory. This is
+a manual process — the operator issues a certificate, downloads it, and
+deploys the files. The ACME client library exists but is not yet integrated
+into any service.
+
+### How It Will Work With MCP
+
+MCP is the natural place to automate certificate provisioning:
+
+- **Initial deploy.** When MCP deploys a new service, it can provision a
+  certificate from Metacrypt (via the ACME client library or the REST API),
+  transfer the cert and key to the node as part of the config push to
+  `/srv/<service>/certs/`, and start the service with valid TLS material.
+
+- **Renewal.** MCP knows what services are running and when their certificates
+  expire. It can renew certificates before expiry by re-running the ACME flow
+  (or calling Metacrypt's `renew` operation) and pushing updated files to the
+  node. The service restarts with the new certificate.
+
+- **Migration.** When MCP migrates a service, the certificate in
+  `/srv/<service>/certs/` moves with the tar.zst snapshot. If the service's
+  hostname changes (new node, new DNS name), MCP provisions a new certificate
+  for the new name.
+
+- **MC-Proxy L7 routes.** MC-Proxy's L7 mode requires certificate/key pairs
+  for TLS termination. MCP (or the operator) can provision these from
+  Metacrypt and push them to MC-Proxy's cert directory. MC-Proxy's
+  architecture doc lists ACME integration and Metacrypt key storage as future
+  work.
+
+### Trust Distribution
+
+Every service and client that validates TLS certificates needs the root CA
+certificate (or the relevant issuer chain). Metacrypt serves these publicly
+without authentication:
+
+- `GET /v1/pki/{mount}/ca` — root CA certificate (PEM)
+- `GET /v1/pki/{mount}/ca/chain` — full chain: issuer + root (PEM)
+- `GET /v1/pki/{mount}/issuer/{name}` — specific issuer certificate (PEM)
+
+During bootstrap, the root CA cert is distributed manually (or via the `ca/`
+directory in the workspace). Once MCP is running, it can distribute the CA
+cert as part of service deployment. Services reference the CA cert path in
+their `[mcias]` config section (`ca_cert`) to verify connections to MCIAS and
+other services.
+
+---
+
+## End-to-End Deploy Workflow
+
+This traces a deployment from code change to running service, showing how every
+component participates. The example deploys a new version of service α that is
+already running on Node B.
+
+### 1. Build and Push
+
+The operator builds a new container image and pushes it to MCR:
+
+```
+Operator workstation (vade)
+  $ docker build -t mcr.metacircular.net/α:v1.2.0 .
+  $ docker push mcr.metacircular.net/α:v1.2.0
+         │
+         ▼
+    MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
+                                                        │
+                                                   Authenticates
+                                                   via MCIAS
+                                                        │
+                                                   Policy check:
+                                                   can this user
+                                                   push to α?
+                                                        │
+                                                   Image stored
+                                                   (blobs + manifest)
+```
+
+The `docker push` goes through MC-Proxy (SNI routing to MCR), authenticates
+via the OCI token flow (which delegates to MCIAS), and is checked against
+MCR's push policy. The image is stored content-addressed in MCR.
+
+### 2. Deploy
+
+The operator tells MCP to deploy:
+
+```
+Operator workstation (vade)
+  $ mcp deploy α                  # or: mcp deploy α --image v1.2.0
+         │
+    MCP Master
+         │
+         ├── Registry lookup: α is running on Node B
+         │
+         ├── C2 (gRPC over overlay) to Node B agent:
+         │     "pull mcr.metacircular.net/α:v1.2.0 and restart"
+         │
+         ▼
+    MCP Agent (Node B)
+         │
+         ├── Pull image from MCR
+         │     (authenticates via MCIAS, same OCI flow)
+         │
+         ├── Stop running container
+         │
+         ├── Start new container from updated image
+         │     - Mounts /srv/α/ (config, database, certs all persist)
+         │     - Service starts, authenticates to MCIAS, resumes operation
+         │
+         └── Report status back to Master
+```
+
+Since α is already running on Node B, this is an in-place update. The
+`/srv/α/` directory is untouched — config, database, and certificates persist
+across the container restart.
+
+### 3. First-Time Deploy
+
+If α has never been deployed, MCP does more work:
+
+```
+Operator workstation (vade)
+  $ mcp deploy α --config α.toml
+         │
+    MCP Master
+         │
+         ├── Registry lookup: α is not running anywhere
+         │
+         ├── Scheduling: select Node C (best fit)
+         │
+         ├── Provision TLS certificate from Metacrypt
+         │     (ACME flow or REST API)
+         │
+         ├── C2 to Node C agent:
+         │     1. Create /srv/α/ directory structure
+         │     2. Transfer config file (α.toml → /srv/α/α.toml)
+         │     3. Transfer TLS cert+key → /srv/α/certs/
+         │     4. Transfer root CA cert → /srv/α/certs/ca.pem
+         │     5. Pull image from MCR
+         │     6. Start container
+         │
+         ├── Update service registry: α → Node C
+         │
+         ├── Push DNS update to MCNS:
+         │     α.svc.mcp.metacircular.net → Node C address
+         │
+         └── (Optionally) update MC-Proxy route table
+              if α needs external ingress
+```
+
+### 4. Migration
+
+Moving α from Node B to Node C:
+
+```
+Operator workstation (vade)
+  $ mcp migrate α --to node-c     # or let MCP choose the destination
+         │
+    MCP Master
+         │
+         ├── C2 to Node B agent:
+         │     1. Stop α container
+         │     2. Snapshot /srv/α/ → tar.zst archive
+         │     3. Transfer tar.zst to Master (or directly to Node C)
+         │
+         ├── C2 to Node C agent:
+         │     1. Receive tar.zst archive
+         │     2. Extract to /srv/α/
+         │     3. Pull container image from MCR (if not cached)
+         │     4. Start container
+         │     5. Report status
+         │
+         ├── Update service registry: α → Node C
+         │
+         ├── Push DNS update to MCNS:
+         │     α.svc.mcp.metacircular.net → Node C address
+         │
+         └── (If α had external ingress) update MC-Proxy route
+              or rely on DNS change
+```
+
+### What Each Component Does
+
+| Step | MCIAS | Metacrypt | MCR | MC-Proxy | MCP | MCNS |
+|------|-------|-----------|-----|----------|-----|------|
+| Build/push image | Authenticates push | — | Stores image, enforces push policy | Routes traffic to MCR | — | — |
+| Deploy (update) | Authenticates pull, authenticates service on start | — | Serves image to agent | Routes traffic to service | Coordinates: registry lookup, C2 to agent | — |
+| Deploy (new) | Authenticates pull, authenticates service on start | Issues TLS certificate | Serves image to agent | Routes traffic to service (if external) | Coordinates: scheduling, cert provisioning, config transfer, DNS update | Updates DNS records |
+| Migrate | Authenticates service on new node | Issues new cert (if hostname changes) | Serves image (if not cached) | Routes traffic to new location | Coordinates: snapshot, transfer, DNS update | Updates DNS records |
+| Steady state | Validates tokens for every authenticated request | Serves CA certs publicly, renews certs | Serves image pulls | Routes all external traffic | Tracks service health, holds registry | Serves DNS queries |
+
+---
+
+## Future Ideas
+
+Components and capabilities that may be worth building but have no immediate
+timeline. Listed here to capture the thinking; none are committed.
+
+### Observability — Log Collection and Health Monitoring
+
+Every service already produces structured logs (`log/slog`) and exposes health
+checks (gRPC `Health.Check` or REST status endpoints). What's missing is
+aggregation — today, debugging a cross-service issue means SSH'ing into each
+node and reading local logs.
+
+A collector could:
+
+- Gather structured logs from services on each node and forward them to a
+  central store.
+- Periodically health-check local services and report status.
+- Feed health data into MCP so it can make informed decisions (restart
+  unhealthy services, avoid scheduling on degraded nodes, alert the operator).
+
+This might be a standalone service or an MCP agent capability, depending on
+weight. If it's just "tail logs and hit health endpoints," it fits in the
+agent. If it grows to include indexing, querying, retention policies, and
+alerting rules, it's its own service.
+
+### Object Store
+
+The platform has structured storage (SQLite), blob storage scoped to container
+images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does
+not have general-purpose object/blob storage.
+
+Potential uses:
+
+- **Centralized backups.** Service snapshots currently live on each node in
+  `/srv/<service>/backups/`. A central object store gives MCP somewhere to push
+  tar.zst snapshots for offsite retention.
+- **Artifact storage.** Build outputs, large files, anything that doesn't fit
+  in a database row.
+- **Data sharing between services.** Files that need to move between services
+  outside the MCP C2 channel.
+
+Prior art: [Nebula](https://metacircular.net/pages/nebula.html), a
+content-addressable data store with capability-based security (SHA-256
+addressed blobs, UUID entries for versioning, proxy references for revocable
+access). Prototyped in multiple languages. The capability model is interesting
+but may be more sophistication than the platform needs — a simpler
+authenticated blob store with MCIAS integration might suffice.
+
+### Overlay Network Management
+
+The platform currently relies on an external overlay network (WireGuard,
+Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard
+mesh manager would bring the overlay under Metacircular's control:
+
+- Automate key exchange and peer configuration when MCP adds a node.
+- Manage IP allocation within the mesh (potentially absorbing part of MCNS's
+  scope).
+- Remove the dependency on Tailscale's coordination servers.
+
+This is a natural extension of the sovereignty principle but is low priority
+while the mesh is small enough to manage by hand.
+
+### Hypervisor / Isolation
+
+A deeper exploration of environment isolation, message-passing between
+services, and access mediation at a level below containers. Prior art:
+[hypervisor concept](https://metacircular.net/pages/hypervisor.html). The
+current platform achieves these goals through containers + MCIAS + policy
+engines. A hypervisor layer would push isolation down to the OS level —
+interesting for security but significant in scope. More relevant if the
+platform ever moves beyond containers to VM-based workloads.
+
+### Prior Art: SYSGOV
+
+[SYSGOV](https://metacircular.net/pages/lisp-dcos.html) was an earlier
+exploration of system management in Lisp, with SYSPLAN (desired state
+enforcement) and SYSMON (service management). Many of its research questions —
+C2 communication, service discovery, secure config distribution, failure
+handling — are directly addressed by MCP's design. MCP is the spiritual
+successor, reimplemented in Go with the benefit of the Metacircular platform
+underneath it.
--- a/docs/notebook.pdf
+++ b/docs/notebook.pdf
--- a/engineering-standards.md
+++ b/engineering-standards.md
@@ -0,0 +1,851 @@
+# Metacircular Dynamics — Engineering Standards
+
+Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
+
+This document describes the standard repository layout, tooling, and software
+development lifecycle (SDLC) for services built at Metacircular Dynamics. It
+incorporates the platform-wide project guidelines and codifies the conventions
+established in Metacrypt as the baseline for all services.
+
+## Platform Rules
+
+These four rules apply to every Metacircular service:
+
+1. **Data Storage**: All service data goes in `/srv/<service>/` to enable
+   straightforward migration across systems.
+2. **Deployment Architecture**: Services require systemd unit files but
+   prioritize container-first design to support deployment via the
+   Metacircular Control Plane (MCP).
+3. **Identity Management**: Services must integrate with MCIAS (Metacircular
+   Identity and Access Service) for user management and access control. Three
+   role levels: `admin` (full administrative access), `user` (full
+   non-administrative access), `guest` (service-dependent restrictions).
+4. **API Design**: Services expose both gRPC and REST interfaces, kept in
+   sync. Web UIs are built with htmx.
+
+## Table of Contents
+
+0. [Platform Rules](#platform-rules)
+1. [Repository Layout](#repository-layout)
+2. [Language & Toolchain](#language--toolchain)
+3. [Build System](#build-system)
+4. [API Design](#api-design)
+5. [Authentication & Authorization](#authentication--authorization)
+6. [Database Conventions](#database-conventions)
+7. [Configuration](#configuration)
+8. [Web UI](#web-ui)
+9. [Testing](#testing)
+10. [Linting & Static Analysis](#linting--static-analysis)
+11. [Deployment](#deployment)
+12. [Documentation](#documentation)
+13. [Security](#security)
+14. [Development Workflow](#development-workflow)
+
+---
+
+## Repository Layout
+
+Every service follows a consistent directory structure. Adjust the
+service-specific directories (e.g. `engines/` in Metacrypt) as appropriate,
+but the top-level skeleton is fixed.
+
+```
+.
+├── cmd/
+│   ├── <service>/              CLI entry point (server, subcommands)
+│   └── <service>-web/          Web UI entry point (if separate binary)
+├── internal/
+│   ├── auth/                   MCIAS integration (token validation, caching)
+│   ├── config/                 TOML configuration loading & validation
+│   ├── db/                     Database setup, schema migrations
+│   ├── server/                 REST API server, routes, middleware
+│   ├── grpcserver/             gRPC server, interceptors, service handlers
+│   ├── webserver/              Web UI server, template routes, HTMX handlers
+│   └── <domain>/               Service-specific packages
+├── proto/<service>/
+│   └── v<N>/                   Current proto definitions (start at v1;
+│                               increment only on breaking changes)
+├── gen/<service>/
+│   └── v<N>/                   Generated Go gRPC/protobuf code
+├── web/
+│   ├── embed.go                //go:embed directive for templates and static
+│   ├── templates/              Go HTML templates
+│   └── static/                 CSS, JS (htmx)
+├── deploy/
+│   ├── docker/                 Docker Compose configuration
+│   ├── examples/               Example config files
+│   ├── scripts/                Install, backup, migration scripts
+│   └── systemd/                systemd unit files and timers
+├── docs/                       Internal engineering documentation
+├── Dockerfile.api              API server container (if split binary)
+├── Dockerfile.web              Web UI container (if split binary)
+├── Makefile
+├── buf.yaml                    Protobuf linting & breaking-change config
+├── .golangci.yaml              Linter configuration
+├── .gitignore
+├── CLAUDE.md                   AI-assisted development instructions
+├── ARCHITECTURE.md             Full system specification
+└── <service>.toml.example      Example configuration
+```
+
+### Key Principles
+
+- **`cmd/`** contains only CLI wiring (cobra commands, flag parsing). No
+  business logic.
+- **`internal/`** contains all service logic. Nothing in `internal/` is
+  importable by other modules — this is enforced by Go's module system.
+- **`proto/`** is the source of truth for gRPC definitions. Generated code
+  lives in `gen/`, never edited by hand. Versions start at `v1`; a new
+  version directory is only created when a breaking change is required — not
+  as a naming convention or initial setup step.
+- **`deploy/`** contains everything needed to run the service in production.
+  A new engineer should be able to deploy from this directory alone.
+- **`web/`** is embedded into the binary via `//go:embed`. No external file
+  dependencies at runtime.
+
+### What Does Not Belong in the Repository
+
+- Runtime data (databases, certificates, logs) — these live in `/srv/<service>`
+- Real configuration files with secrets — only examples are committed
+- IDE configuration (`.idea/`, `.vscode/`) — per-developer, not shared
+- Vendored dependencies — Go module proxy handles this
+
+---
+
+## Language & Toolchain
+
+| Tool | Version | Purpose |
+|------|---------|---------|
+| Go | 1.25+ | Primary language |
+| protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation |
+| buf | Latest | Proto linting and breaking-change detection |
+| golangci-lint | v2 | Static analysis and linting |
+| Docker | Latest | Container builds |
+
+### Go Conventions
+
+- **Pure-Go dependencies** where possible. Avoid CGo — it complicates
+  cross-compilation and container builds. Use `modernc.org/sqlite` instead
+  of `mattn/go-sqlite3`.
+- **`CGO_ENABLED=0`** for all production builds. Statically linked binaries
+  deploy cleanly to Alpine containers.
+- **Stripped binaries**: Build with `-trimpath -ldflags="-s -w"` to remove
+  debug symbols and reduce image size.
+- **Version injection**: Pass `git describe --tags --always --dirty` via
+  `-X main.version=...` at build time. Every binary must report its version.
+
+### Module Path
+
+Services hosted on `git.wntrmute.dev` use:
+
+```
+git.wntrmute.dev/kyle/<service>
+```
+
+---
+
+## Build System
+
+Every repository has a Makefile with these standard targets:
+
+```makefile
+.PHONY: build test vet lint proto-lint clean docker all
+
+LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"
+
+<service>:
+	go build $(LDFLAGS) -o <service> ./cmd/<service>
+
+build:
+	go build ./...
+
+test:
+	go test ./...
+
+vet:
+	go vet ./...
+
+lint:
+	golangci-lint run ./...
+
+proto:
+	protoc --go_out=. --go_opt=module=<module> \
+		--go-grpc_out=. --go-grpc_opt=module=<module> \
+		proto/<service>/v2/*.proto
+
+proto-lint:
+	buf lint
+	buf breaking --against '.git#branch=master,subdir=proto'
+
+clean:
+	rm -f <service>
+
+docker:
+	docker build -t <service> -f Dockerfile.api .
+
+all: vet lint test <service>
+```
+
+### Target Semantics
+
+| Target | When to Run | CI Gate? |
+|--------|-------------|----------|
+| `vet` | Every change | Yes |
+| `lint` | Every change | Yes |
+| `test` | Every change | Yes |
+| `proto-lint` | Any proto change | Yes |
+| `proto` | After editing `.proto` files | No (manual) |
+| `all` | Pre-push verification | Yes |
+
+The `all` target is the CI pipeline: `vet → lint → test → build`. If any
+step fails, the pipeline stops.
+
+---
+
+## API Design
+
+Services expose two synchronized API surfaces:
+
+### gRPC (Primary)
+
+- Proto definitions live in `proto/<service>/v<N>/`, where N starts at 1.
+- **Versioning policy**: proto packages are versioned to protect existing
+  clients from breaking changes. A new version directory (`v2/`, `v3/`, …)
+  is only introduced when a breaking change is unavoidable. Non-breaking
+  additions (new fields, new RPCs) are made in-place to the current version.
+- Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
+- Use `google.protobuf.Timestamp` for all time fields (not RFC 3339 strings).
+- Run `buf lint` and `buf breaking` against master before merging proto
+  changes.
+
+### REST (Secondary)
+
+- JSON over HTTPS. Routes live in `internal/server/routes.go`.
+- Use `chi` for routing (lightweight, stdlib-compatible).
+- Standard error format: `{"error": "description"}`.
+- Standard HTTP status codes: `401` (unauthenticated), `403` (unauthorized),
+  `412` (precondition failed), `503` (service unavailable).
+
+### API Sync Rule
+
+**Every REST endpoint must have a corresponding gRPC RPC, and vice versa.**
+When adding, removing, or changing an endpoint in either surface, the other
+must be updated in the same change. This is enforced in code review.
+
+### gRPC Interceptors
+
+Access control is enforced via interceptor maps, not per-handler checks:
+
+| Map | Effect |
+|-----|--------|
+| `sealRequiredMethods` | Returns `UNAVAILABLE` if the service is sealed/locked |
+| `authRequiredMethods` | Validates MCIAS bearer token, populates caller info |
+| `adminRequiredMethods` | Requires admin role on the caller |
+
+Adding a new RPC means adding it to the correct interceptor maps. Forgetting
+this is a security defect.
+
+---
+
+## Authentication & Authorization
+
+### Authentication
+
+All services delegate authentication to **MCIAS** (Metacircular Identity and
+Access Service). No service maintains its own user database.
+
+- Client sends credentials to the service's `/v1/auth/login` endpoint.
+- The service forwards them to MCIAS via the client library
+  (`git.wntrmute.dev/kyle/mcias/clients/go`).
+- On success, MCIAS returns a bearer token. The service returns it to the
+  client and optionally sets it as a cookie for the web UI.
+- Subsequent requests include the token via `Authorization: Bearer <token>`
+  header or cookie.
+- Token validation calls MCIAS `ValidateToken()`. Results should be cached
+  (keyed by SHA-256 of the token) with a short TTL (30 seconds or less).
+
+### Authorization
+
+Three role levels:
+
+| Role | Meaning |
+|------|---------|
+| `admin` | Full access to everything. Policy bypass. |
+| `user` | Access governed by policy rules. Default deny. |
+| `guest` | Service-dependent restrictions. Default deny. |
+
+Admin detection is based solely on the MCIAS `admin` role. The service never
+promotes users locally.
+
+Services that need fine-grained access control should implement a policy
+engine (priority-based ACL rules stored in encrypted storage, default deny,
+admin bypass). See Metacrypt's implementation as the reference.
+
+---
+
+## Database Conventions
+
+### SQLite
+
+SQLite is the default database for Metacircular services. It is simple to
+operate, requires no external processes, and backs up cleanly with
+`VACUUM INTO`.
+
+Connection settings (applied at open time):
+
+```go
+PRAGMA journal_mode = WAL;
+PRAGMA foreign_keys = ON;
+PRAGMA busy_timeout = 5000;
+```
+
+File permissions: `0600`. Created by the service on first run.
+
+### Migrations
+
+- Migrations are Go functions registered in `internal/db/` and run
+  sequentially at startup.
+- Each migration is idempotent — `CREATE TABLE IF NOT EXISTS`,
+  `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`.
+- Applied migrations are tracked in a `schema_migrations` table.
+- Never modify a migration that has been deployed. Add a new one.
+
+### Backup
+
+Every service must provide a `snapshot` CLI command that creates a consistent
+backup using `VACUUM INTO`. Automated backups run via a systemd timer
+(daily, with retention pruning).
+
+---
+
+## Configuration
+
+### Format
+
+TOML. Parsed with `go-toml/v2`. Environment variable overrides via
+`SERVICENAME_*` (e.g. `METACRYPT_SERVER_LISTEN_ADDR`).
+
+### Standard Sections
+
+```toml
+[server]
+listen_addr = ":8443"           # HTTPS API
+grpc_addr   = ":9443"           # gRPC (optional; disabled if unset)
+tls_cert    = "/srv/<service>/certs/cert.pem"
+tls_key     = "/srv/<service>/certs/key.pem"
+
+[web]
+listen_addr  = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
+vault_grpc   = "127.0.0.1:9443" # gRPC address of the API server
+vault_ca_cert = ""               # CA cert for verifying API server TLS
+
+[database]
+path = "/srv/<service>/<service>.db"
+
+[mcias]
+server_url   = "https://mcias.metacircular.net:8443"
+ca_cert      = ""                # Custom CA for MCIAS TLS
+service_name = "<service>"       # This service's identity, as registered in MCIAS
+tags         = []                # Tags sent with every login request (e.g. ["env:restricted"])
+                                 # MCIAS evaluates auth:login policy against these tags,
+                                 # enabling per-service login restrictions via policy rules.
+
+[log]
+level = "info"                   # debug, info, warn, error
+```
+
+#### Service context and login policy
+
+`service_name` and `tags` in `[mcias]` are sent with every `POST /v1/auth/login`
+request. MCIAS evaluates the `auth:login` action with the resource set to
+`{service_name, tags}`. This allows operators to write deny rules that restrict
+which roles or account types can log into specific services.
+
+Example: deny `guest` and `viewer` human accounts from any service tagged
+`env:restricted`:
+
+```json
+{
+  "effect": "deny",
+  "roles": ["guest", "viewer"],
+  "account_types": ["human"],
+  "actions": ["auth:login"],
+  "required_tags": ["env:restricted"]
+}
+```
+
+A service can also be targeted by name instead of (or in addition to) tags:
+
+```json
+{
+  "effect": "deny",
+  "roles": ["guest"],
+  "actions": ["auth:login"],
+  "service_names": ["meta-money-printer"]
+}
+```
+
+MCIAS enforces the policy after credentials are verified; a policy-denied
+login returns HTTP 403 (not 401) so the client can distinguish a bad password
+from a service access restriction.
+
+### Validation
+
+Required fields are validated at startup. The service refuses to start if
+any are missing. Do not silently default required values.
+
+### Data Directory
+
+All runtime data lives in `/srv/<service>/`:
+
+```
+/srv/<service>/
+├── <service>.toml        Configuration
+├── <service>.db          SQLite database
+├── certs/                TLS certificates
+└── backups/              Database snapshots
+```
+
+This convention enables straightforward service migration between hosts:
+copy `/srv/<service>/` and the binary.
+
+---
+
+## Web UI
+
+### Technology
+
+- **Go `html/template`** for server-side rendering. No JavaScript frameworks.
+- **htmx** for dynamic interactions (form submission, partial page updates)
+  without full page reloads.
+- Templates and static files are embedded in the binary via `//go:embed`.
+
+### Structure
+
+- `web/templates/layout.html` — shared HTML skeleton, navigation, CSS/JS
+  includes. All page templates extend this.
+- Page templates: one `.html` file per page/feature.
+- `web/static/` — CSS, htmx. Keep this minimal.
+
+### Architecture
+
+The web UI runs as a separate binary (`<service>-web`) that communicates
+with the API server via its gRPC interface. This separation means:
+
+- The web UI has no direct database access.
+- The API server enforces all authorization.
+- The web UI can be deployed independently or omitted entirely.
+
+### Security
+
+- CSRF protection via signed double-submit cookies on all mutating requests
+  (POST/PUT/PATCH/DELETE).
+- Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`.
+- All user input is escaped by `html/template` (the default).
+
+---
+
+## Testing
+
+### Philosophy
+
+Tests are written using the Go standard library `testing` package. No test
+frameworks (testify, gomega, etc.) — the standard library is sufficient and
+keeps dependencies minimal.
+
+### Patterns
+
+```go
+func TestFeatureName(t *testing.T) {
+    // Setup: use t.TempDir() for isolated file system state.
+    dir := t.TempDir()
+    database, err := db.Open(filepath.Join(dir, "test.db"))
+    if err != nil {
+        t.Fatalf("open db: %v", err)
+    }
+    defer func() { _ = database.Close() }()
+    db.Migrate(database)
+
+    // Exercise the code under test.
+    // ...
+
+    // Assert with t.Fatal (not t.Error) for precondition failures.
+    if !bytes.Equal(got, want) {
+        t.Fatalf("got %q, want %q", got, want)
+    }
+}
+```
+
+### Guidelines
+
+- **Use `t.TempDir()`** for all file-system state. Never write to fixed
+  paths. Cleanup is automatic.
+- **Use `errors.Is`** for error assertions, not string comparison.
+- **No mocks for databases.** Tests use real SQLite databases created in
+  temp directories. This catches migration bugs that mocks would hide.
+- **Test files** live alongside the code they test: `barrier.go` and
+  `barrier_test.go` in the same package.
+- **Test helpers** call `t.Helper()` so failures report the caller's line.
+
+### What to Test
+
+| Layer | Test Strategy |
+|-------|---------------|
+| Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases |
+| Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access |
+| API handlers | Request/response correctness, auth enforcement, error codes |
+| Policy engine | Rule matching, priority ordering, default deny, admin bypass |
+| CLI commands | Flag parsing, output format (lightweight) |
+
+---
+
+## Linting & Static Analysis
+
+### Configuration
+
+Every repository includes a `.golangci.yaml` with this philosophy:
+**fail loudly for security and correctness; everything else is a warning.**
+
+### Required Linters
+
+| Linter | Category | Purpose |
+|--------|----------|---------|
+| `errcheck` | Correctness | Unhandled errors are silent failures |
+| `govet` | Correctness | Printf mismatches, unreachable code, suspicious constructs |
+| `ineffassign` | Correctness | Dead writes hide logic bugs |
+| `unused` | Correctness | Unused variables and functions |
+| `errorlint` | Error handling | Proper `errors.Is`/`errors.As` usage |
+| `gosec` | Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection |
+| `staticcheck` | Security | Deprecated APIs, mutex misuse, deep analysis |
+| `revive` | Style | Go naming conventions, error return ordering |
+| `gofmt` | Formatting | Standard Go formatting |
+| `goimports` | Formatting | Import grouping and ordering |
+
+### Settings
+
+- `errcheck`: `check-type-assertions: true` (catch `x.(*T)` without ok check).
+- `govet`: all analyzers enabled except `shadow` (too noisy for idiomatic Go).
+- `gosec`: severity and confidence set to `medium`. Exclude `G104` (overlaps
+  with errcheck).
+- `max-issues-per-linter: 0` — report everything. No caps.
+- Test files: allow `G101` (hardcoded credentials) for test fixtures.
+
+---
+
+## Deployment
+
+### Container-First
+
+Services are designed for container deployment but must also run as native
+systemd services. Both paths are first-class.
+
+### Docker
+
+Multi-stage builds:
+
+1. **Builder**: `golang:1.23-alpine`. Compile with `CGO_ENABLED=0`, strip
+   symbols.
+2. **Runtime**: `alpine:3.21`. Non-root user (`<service>`), minimal attack
+   surface.
+
+If the service has separate API and web binaries, use separate Dockerfiles
+(`Dockerfile.api`, `Dockerfile.web`) and a `docker-compose.yml` that wires
+them together with a shared data volume.
+
+### systemd
+
+Every service ships with:
+
+| File | Purpose |
+|------|---------|
+| `<service>.service` | Main service unit (API server) |
+| `<service>-web.service` | Web UI unit (if applicable) |
+| `<service>-backup.service` | Oneshot backup unit |
+| `<service>-backup.timer` | Daily backup timer (02:00 UTC, 5-minute jitter) |
+
+#### Security Hardening
+
+All service units must include these security directives:
+
+```ini
+NoNewPrivileges=true
+ProtectSystem=strict
+ProtectHome=true
+PrivateTmp=true
+PrivateDevices=true
+ProtectKernelTunables=true
+ProtectKernelModules=true
+ProtectControlGroups=true
+RestrictSUIDSGID=true
+RestrictNamespaces=true
+LockPersonality=true
+MemoryDenyWriteExecute=true
+RestrictRealtime=true
+ReadWritePaths=/srv/<service>
+```
+
+The web UI unit should use `ReadOnlyPaths=/srv/<service>` instead of
+`ReadWritePaths` — it has no reason to write to the data directory.
+
+### Install Script
+
+`deploy/scripts/install.sh` handles:
+
+1. Create system user/group (idempotent).
+2. Install binary to `/usr/local/bin/`.
+3. Create `/srv/<service>/` directory structure.
+4. Install example config if none exists.
+5. Install systemd units and reload the daemon.
+
+### TLS
+
+- **Minimum TLS version: 1.3.** No exceptions, no fallback cipher suites.
+  Go's TLS 1.3 implementation manages cipher selection automatically.
+- **Timeouts**: read 30s, write 30s, idle 120s.
+- Certificate and key paths are required configuration — the service refuses
+  to start without them.
+
+### Graceful Shutdown
+
+Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:
+
+1. Stop accepting new connections.
+2. Drain in-flight requests (with a timeout).
+3. Clean up resources (close databases, zeroize secrets if applicable).
+4. Exit.
+
+---
+
+## Documentation
+
+### Required Files
+
+| File | Purpose | Audience |
+|------|---------|----------|
+| `README.md` | Project overview, quick-start, and contributor guide | Everyone |
+| `CLAUDE.md` | AI-assisted development context | Claude Code |
+| `ARCHITECTURE.md` | Full system specification | Engineers |
+| `RUNBOOK.md` | Operational procedures and incident response | Operators |
+| `deploy/examples/<service>.toml` | Example configuration | Operators |
+
+### Suggested Files
+
+These are not required for every project but should be created where applicable:
+
+| File | When to Include | Purpose |
+|------|-----------------|---------|
+| `AUDIT.md` | Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
+| `POLICY.md` | Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
+
+### README.md
+
+The README is the front door. A new engineer or user should be able to
+understand what the service does and get it running from this file alone.
+It should contain:
+
+- Project name and one-paragraph description.
+- Quick-start instructions (build, configure, run).
+- Link to `ARCHITECTURE.md` for full technical details.
+- Link to `RUNBOOK.md` for operational procedures.
+- License and contribution notes (if applicable).
+
+Keep it concise. The README is not the spec — that's `ARCHITECTURE.md`.
+
+### CLAUDE.md
+
+This file provides context for AI-assisted development. It should contain:
+
+- Project overview (one paragraph).
+- Build, test, and lint commands.
+- High-level architecture summary.
+- Project structure with directory descriptions.
+- Ignored directories (runtime data, generated code).
+- Critical rules (e.g. API sync requirements).
+
+Keep it concise. AI tools read this on every interaction.
+
+### ARCHITECTURE.md
+
+This is the canonical specification for the service. It should cover:
+
+1. System overview with a layered architecture diagram.
+2. Cryptographic design (if applicable): algorithms, key hierarchy.
+3. State machines and lifecycle (if applicable).
+4. Storage design.
+5. Authentication and authorization model.
+6. API surface (REST and gRPC, with tables of every endpoint).
+7. Web interface routes.
+8. Database schema (every table, every column).
+9. Configuration reference.
+10. Deployment guide.
+11. Security model: threat mitigations table and security invariants.
+12. Future work.
+
+This document is the source of truth. When the code and the spec disagree,
+one of them has a bug.
+
+### RUNBOOK.md
+
+The runbook is written for operators, not developers. It covers what to do
+when things go wrong and how to perform routine maintenance. It should
+contain:
+
+1. **Service overview** — what the service does, in one paragraph.
+2. **Health checks** — how to verify the service is healthy (endpoints,
+   CLI commands, expected responses).
+3. **Common operations** — start, stop, restart, seal/unseal, backup,
+   restore, log inspection.
+4. **Alerting** — what alerts exist, what they mean, and how to respond.
+5. **Incident procedures** — step-by-step playbooks for known failure
+   modes (database corruption, certificate expiry, MCIAS outage, disk
+   full, etc.).
+6. **Escalation** — when and how to escalate beyond the runbook.
+
+Write runbook entries as numbered steps, not prose. An operator at 3 AM
+should be able to follow them without thinking.
+
+### AUDIT.md (Suggested)
+
+For services that handle cryptography, secrets, PII, or authentication,
+maintain a security audit log. Each finding gets a numbered entry with:
+
+- Description of the issue.
+- Severity (critical, high, medium, low).
+- Resolution status: open, resolved (with summary), or accepted (with
+  rationale for accepting the risk).
+
+The priority summary table at the bottom provides a scannable overview.
+Resolved and accepted items are struck through but retained for history.
+See Metacrypt's `AUDIT.md` for the reference format.
+
+### POLICY.md (Suggested)
+
+For services with a policy engine or fine-grained access control, document
+the policy model separately from the architecture spec. It should cover:
+
+- Rule structure (fields, types, semantics).
+- Evaluation algorithm (match logic, priority, default effect).
+- Resource path conventions and glob patterns.
+- Action classification.
+- API endpoints for policy CRUD.
+- Common policy patterns with examples.
+- Role summary (what each MCIAS role gets by default).
+
+This document is aimed at administrators who need to write policy rules,
+not engineers who need to understand the implementation.
+
+### Engine/Feature Design Documents
+
+For services with a modular architecture, each module gets its own design
+document (e.g. `engines/sshca.md`). These are detailed implementation plans
+that include:
+
+- Overview and core concepts.
+- Data model and storage layout.
+- Lifecycle (initialization, teardown).
+- Operations table with auth requirements.
+- API definitions (gRPC and REST).
+- Implementation steps (file-by-file).
+- Security considerations.
+- References to existing code patterns to follow.
+
+Write these before writing code. They are the blueprint, not the afterthought.
+
+---
+
+## Security
+
+### General Principles
+
+- **Default deny.** Unauthenticated requests are rejected. Unauthorized
+  requests are rejected. If in doubt, deny.
+- **Fail closed.** If the service cannot verify authorization, it denies the
+  request. If the database is unavailable, the service is unavailable.
+- **Least privilege.** Service processes run as non-root. systemd units
+  restrict filesystem access, syscalls, and capabilities.
+- **No local user databases.** Authentication is always delegated to MCIAS.
+
+### Cryptographic Standards
+
+| Purpose | Algorithm | Notes |
+|---------|-----------|-------|
+| Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation |
+| Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance |
+| Key derivation | Argon2id | Memory-hard; tune params to hardware |
+| Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 |
+| CSPRNG | `crypto/rand` | All keys, nonces, salts, tokens |
+| Constant-time comparison | `crypto/subtle` | All secret comparisons |
+
+- **Never use RSA for new designs.** Ed25519 and ECDSA are faster, produce
+  smaller keys, and have simpler security models.
+- **Zeroize secrets** from memory when they are no longer needed. Overwrite
+  byte slices with zeros, nil out pointers.
+- **Never log secrets.** Keys, passwords, tokens, and plaintext must never
+  appear in log output.
+
+### Web Security
+
+- CSRF tokens on all mutating requests.
+- `SameSite=Strict` on all cookies.
+- `html/template` for automatic escaping.
+- Validate all input at system boundaries.
+
+---
+
+## Development Workflow
+
+### Local Development
+
+```bash
+# Build and run both servers locally:
+make devserver
+
+# Or build everything and run the full pipeline:
+make all
+```
+
+The `devserver` target builds both binaries and runs them against a local
+config in `srv/`. The `srv/` directory is gitignored — it holds your local
+database, certificates, and configuration.
+
+### Pre-Push Checklist
+
+Before pushing a branch:
+
+```bash
+make all          # vet → lint → test → build
+make proto-lint   # if proto files changed
+```
+
+### Proto Changes
+
+1. Edit `.proto` files in `proto/<service>/v2/`.
+2. Run `make proto` to regenerate Go code.
+3. Run `make proto-lint` to check for linting violations and breaking changes.
+4. Update REST routes to match the new/changed RPCs.
+5. Update gRPC interceptor maps for any new RPCs.
+6. Update `ARCHITECTURE.md` API tables.
+
+### Adding a New Feature
+
+1. **Design first.** Write or update the relevant design document. For a new
+   engine or major subsystem, create a new doc in `docs/` or `engines/`.
+2. **Implement.** Follow existing patterns — the design doc should reference
+   specific files and line numbers.
+3. **Test.** Write tests alongside the implementation.
+4. **Update docs.** Update `ARCHITECTURE.md`, `CLAUDE.md`, and route tables.
+5. **Verify.** Run `make all`.
+
+### CLI Commands
+
+Every service uses cobra for CLI commands. Standard subcommands:
+
+| Command | Purpose |
+|---------|---------|
+| `server` | Start the service |
+| `init` | First-time setup (if applicable) |
+| `status` | Query a running instance's health |
+| `snapshot` | Create a database backup |
+
+Add service-specific subcommands as needed (e.g. `migrate-aad`, `unseal`).
+Each command lives in its own file in `cmd/<service>/`.