Files
metacircular/docs/metacircular.md
Kyle Isom 4386fb0896 Sync docs/metacircular.md versions and add undeploy capability
Update version references to match current git tags: MCIAS v1.9.0,
Metacrypt v1.3.1, MCP v0.7.6. Add Phase D (DNS registration) to MCP
status, update RPC/CLI counts, and document undeploy as a first-class
capability. Also sync STATUS.md and packaging-and-deployment.md with
the same version updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:20:30 -07:00

48 KiB
Raw Permalink Blame History

Metacircular Infrastructure

Background

Metacircular Dynamics is a personal infrastructure platform. The name comes from the tradition of metacircular evaluators in Lisp — a system defined in terms of itself — by way of SICP and Common Lisp projects that preceded this work. The infrastructure is metacircular in the same sense: the platform manages, secures, and hosts its own services.

The goal is sovereign infrastructure. Every component is self-hosted, every dependency is controlled, and the entire stack is operable by one person. There are no cloud provider dependencies, no third-party auth providers, no external databases. When a Metacircular node boots, it connects to Metacircular services for identity, certificates, container images, and workload scheduling.

All services are written in Go and follow a shared set of engineering standards (see engineering-standards.md). The platform is designed for a small number of machines — a personal homelab or a handful of VPSes — not for hyperscale.

Philosophy

Sovereignty. You own the whole stack. Identity, certificates, secrets, container images, DNS, networking — all self-hosted. No SaaS dependency means no vendor lock-in, no surprise deprecations, and no trust delegation to third parties.

Simplicity over sophistication. SQLite over Postgres. Stdlib testing over test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments over microservice orchestrators. The right tool is the simplest one that solves the problem without creating a new one.

Consistency as leverage. Every service follows identical patterns: the same directory layout, the same Makefile targets, the same config format, the same auth integration, the same deployment model. Knowledge of one service transfers instantly to all others. A new service can be stood up by copying the skeleton.

Security as structure. Security is not a feature bolted on after the fact. Default deny is the starting posture. TLS 1.3 is the minimum, not a goal. Interceptor maps make "forgot to add auth" a visible, reviewable omission rather than a silent runtime failure. Secrets are encrypted at rest behind a seal/unseal barrier. Every service delegates identity to a single root of trust.

Design before code. The architecture document is written before implementation begins. It is the spec, not the afterthought. When the code and the spec disagree, one of them has a bug.

High-Level Overview

Metacircular infrastructure is built from six core components and a documentation server, plus a shared standard library (MCDSL) that provides the common patterns all services depend on (auth integration, database setup, config loading, HTTP/gRPC server bootstrapping, CSRF, web session management, health checks, snapshots, and service directory archiving):

  • MCIAS — Identity and access. The root of trust for all other services. Handles authentication, token issuance, role management, and login policy enforcement. Every other component delegates auth here.

  • Metacrypt — Cryptographic services. PKI/CA, SSH CA, transit encryption, and encrypted secret storage behind a Vault-inspired seal/unseal barrier. Issues the TLS certificates that every other service depends on.

  • MCR — Container registry. OCI-compliant image storage. MCP directs nodes to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.

  • MCNS — Networking. DNS and address management for the platform.

  • MCP — Control plane. The orchestrator. A master/agent architecture that manages workload scheduling, container lifecycle, service registry, data transfer, and node state across the platform.

  • MC-Proxy — Node ingress. A TLS proxy and router that sits on every node, accepts outside connections, and routes them to the correct service — either as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.

  • MCDoc — Documentation server. Fetches markdown from Gitea repositories, renders HTML with syntax highlighting, serves a navigable documentation site. Public-facing, no MCIAS authentication required.

These components form a dependency graph rooted at MCIAS:

MCIAS (standalone — the root of trust)
  ├── Metacrypt (uses MCIAS for auth; provides certs to all services)
  ├── MCR (uses MCIAS for auth; stores images pulled by MCP)
  ├── MCNS (uses MCIAS for auth; provides DNS for the platform)
  ├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
  └── MC-Proxy (pre-auth; routes traffic to services behind it)

The Node Model

The unit of deployment is the MC Node — a machine (physical or virtual) that participates in the Metacircular platform.

                     ┌──────────────────┐    ┌──────────────┐
                     │  System / Core   │    │     MCP      │
                     │  Infrastructure  │    │    Master     │
                     │  (e.g. MCIAS)    │    │              │
                     └────────┬─────────┘    └──────┬───────┘
                              │                     │ C2
                              │                     │
    Outside     ┌─────────────▼─────────────────────▼──────────┐
    Client ────▶│                   MC Node                     │
                │                                               │
                │   ┌───────────┐                               │
                │   │ MC-Proxy  │──┬──────┬──────┐              │
                │   └───────────┘  │      │      │              │
                │              ┌───▼┐  ┌──▼─┐  ┌─▼──┐  ┌─────┐ │
                │              │ α  │  │ β  │  │ γ  │  │ MCP │ │
                │              └────┘  └────┘  └────┘  │Agent│ │
                │                                      └──┬──┘ │
                │                                    ┌────▼───┐│
                │                                    │Podman/ ││
                │                                    │etc.    ││
                │                                    └────────┘│
                └──────────────────────────────────────────────┘

Outside clients connect to MC-Proxy, which inspects the TLS SNI hostname and routes to the correct service (α, β, γ) — either as a raw TCP relay or via TLS-terminating HTTP/2 reverse proxy, per-route. The MCP Agent on each node receives C2 commands from the MCP Master (running on the operator's workstation) and manages local container lifecycle via the container runtime. Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any other workload.

The Network Model

Metacircular nodes are connected via an encrypted overlay network — a self-managed WireGuard mesh, Tailscale, or similar. No component has a hard dependency on a specific overlay implementation; the platform requires only that nodes can reach each other over encrypted links.

                  Public Internet
                        │
              ┌─────────▼──────────┐
              │  Edge MC-Proxy     │    VPS (public IP)
              │  :443              │
              └─────────┬──────────┘
                        │ PROXY protocol v2
              ┌─────────▼──────────────────────────────────┐
              │         Encrypted Overlay (e.g. WireGuard) │
              │                                            │
  ┌───────────┴──┐   ┌──────────┐   ┌──────────┐   ┌──────┴─────┐
  │ Origin       │   │  Node B  │   │  Node C  │   │  Operator  │
  │ MC-Proxy     │   │  (MCP    │   │          │   │  Workstation│
  │ + services   │   │  agent)  │   │  (MCP    │   │  (MCP      │
  │ (MCP agent)  │   │          │   │  agent)  │   │  Master)   │
  └──────────────┘   └──────────┘   └──────────┘   └────────────┘

External traffic flows from the internet through an edge MC-Proxy (on a public VPS), which forwards via PROXY protocol over the overlay to an origin MC-Proxy on the private network. The overlay preserves the real client IP across the hop.

Internal traffic (MCP C2, inter-service communication, MCNS DNS) flows directly over the overlay. MCP's C2 channel is gRPC over whatever link exists between master and agent — the overlay provides the transport.

The overlay network itself is a candidate for future Metacircular management (a self-hosted WireGuard mesh manager), consistent with the sovereignty principle of minimizing third-party dependencies.


System Catalog

MCIAS — Metacircular Identity and Access Service

MCIAS is the root of trust for the entire platform. Every other service delegates authentication to it; no service maintains its own user database.

What it provides:

  • Authentication. Username/password with optional TOTP and FIDO2/WebAuthn. Credentials are verified by MCIAS and a signed JWT bearer token is returned. Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of the token).

  • Role-based access. Three roles — admin (MCIAS account management, policy changes, zone mutations — reserved for human operators), user (policy-governed), guest (service-dependent restrictions, rejected by MCP agent). Admin detection comes solely from the MCIAS admin role; services never promote users locally. Routine operations (deploy, push, DNS updates) do not require admin.

  • Account types. Human accounts (interactive users) and system accounts (service-to-service). Both produce standard JWTs validated the same way. System accounts carry no roles — their authorization is handled by each service's policy engine (Metacrypt policies, MCNS name-scoped access, MCR default policies). System account tokens are long-lived (365-day default) and do not require passwords for issuance.

  • Login policy. Priority-based ACL rules that control who can log into which services. Rules can target roles, account types, service names, and tags. This allows operators to restrict access per-service (e.g., deny guest from services tagged env:restricted) without changing the services themselves.

  • Token lifecycle. Issuance, validation, renewal, and revocation. Ed25519-signed JWTs. Short expiry with renewal support.

How other services integrate: Every service includes an [mcias] config section with the MCIAS server URL, a service_name, and optional tags. At login time, the service forwards credentials to MCIAS along with this context. MCIAS evaluates login policy against the service context, verifies credentials, and returns a bearer token. The MCIAS Go client library (git.wntrmute.dev/mc/mcias/clients/go) handles this flow.

Status: Implemented. v1.9.0. Feature-complete with active refinement (WebAuthn/FIDO2 passkeys, TOTP 2FA, service-context login policies).


Metacrypt — Cryptographic Service Engine

Metacrypt provides cryptographic resources to the platform through a modular engine architecture, backed by an encrypted storage barrier inspired by HashiCorp Vault.

What it provides:

  • PKI / Certificate Authority. X.509 certificate issuance. Root and intermediate CAs, certificate signing, CRL management, ACME protocol support. This is how every service in the platform gets its TLS certificates.

  • SSH CA. SSH certificate signing for host and user certificates, replacing static SSH key management. Signing profiles, Key Revocation List (KRL) support, gRPC/REST APIs, and web UI.

  • Transit encryption. Encrypt and decrypt data without exposing keys to the caller. Symmetric encryption with versioned key management, signing, and HMAC operations. Envelope encryption for services that need to protect data at rest without managing their own key material.

  • User-to-user encryption. End-to-end encryption between users, with key management handled by Metacrypt. ECDH key exchange with AES-256-GCM encryption.

Seal/unseal model: Metacrypt starts sealed. An operator provides a password which derives (via Argon2id) a key-wrapping key, which decrypts the master encryption key (MEK), which in turn unwraps per-engine data encryption keys (DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise of one engine's key does not expose another's data.

Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs

Engine architecture: Engines are pluggable providers that register with a central registry. Each engine mount has a type, a name, its own DEK, and its own configuration. The engine interface handles initialization, seal/unseal lifecycle, and request routing. New engine types plug in without modifying the core.

Policy: Fine-grained ACL rules control which users can perform which operations on which engine mounts. Priority-based evaluation, default deny, admin bypass. See Metacrypt's POLICY.md for the full model.

Status: Implemented. v1.3.1. All four engine types complete — CA (with ACME support), SSH CA, transit encryption, and user-to-user encryption.


MCR — Metacircular Container Registry

MCR is an OCI Distribution Spec-compliant container registry. It stores and serves the container images that MCP deploys across the platform.

What it provides:

  • OCI-compliant image storage. Pull, push, tag, and delete container images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite, blobs on the filesystem.

  • Authenticated access. No anonymous access. MCR uses the OCI token authentication flow: clients hit /v2/, receive a 401 with a token endpoint, authenticate via MCIAS, and use the returned JWT for subsequent requests. The token endpoint accepts both username/password (standard login) and pre-existing MCIAS JWTs as passwords (personal-access-token pattern), enabling non-interactive push/pull for system accounts and CI.

  • Policy-controlled push/pull. Fine-grained ACL rules govern who can push to or pull from which repositories. Integrated with MCIAS roles.

  • Garbage collection. Unreferenced blobs are cleaned up via the admin CLI (mcrctl).

How it fits in: MCP directs nodes to pull images from MCR. When a workload is scheduled, MCP tells the node's agent which image to pull and where to get it. MCR sits behind an MC-Proxy instance for TLS routing.

Status: Implemented. v1.2.1. All implementation phases complete.


MC-Proxy — TLS Proxy and Router

MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections, extracts the SNI hostname, and routes to the correct backend. Each route is independently configured as either L4 passthrough (raw TCP relay, no TLS termination) or L7 terminating (terminates TLS, reverse proxies HTTP/2 and HTTP/1.1 including gRPC). Both modes coexist on the same listener.

What it provides:

  • SNI-based routing. A route table maps hostnames to backend addresses. Exact match, case-insensitive. Multiple listeners can bind different ports, each with its own route table, all sharing the same global firewall.

  • Dual-mode proxying. L4 routes relay raw TCP — backends see the original TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS), with header injection (X-Forwarded-For, X-Real-IP), gRPC streaming support, and trailer forwarding.

  • Global firewall. Every connection is evaluated before routing: per-IP rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2). Blocked connections get a TCP RST — no error messages, no TLS alerts.

  • PROXY protocol. Listeners can accept v1/v2 headers from upstream proxies to learn the real client IP. Routes can send v2 headers to downstream backends. This enables multi-hop deployments — a public edge MC-Proxy on a VPS forwarding over the encrypted overlay to a private origin MC-Proxy — while preserving the real client IP for firewall evaluation and logging.

  • Runtime management. Routes and firewall rules can be updated at runtime via a gRPC admin API on a Unix domain socket (filesystem permissions for access control, no network exposure). State is persisted to SQLite with write-through semantics.

How it fits in: MC-Proxy is pre-auth infrastructure. It sits in front of everything on a node. Outside clients connect to MC-Proxy on well-known ports (443, 8443, etc.) and MC-Proxy routes to the correct backend based on the hostname the client is trying to reach. A typical production deployment uses two instances — an edge proxy on a public VPS and an origin proxy on the private network, connected over the overlay with PROXY protocol preserving client IPs across the hop.

Status: Implemented. v1.2.1. Route state persisted in SQLite with write-through semantics. gRPC admin API with idempotent AddRoute for runtime route management.


MCNS — Metacircular Networking Service

MCNS provides DNS for the platform. It manages two internal zones and serves as the name resolution layer for the Metacircular network. Service discovery (which services run where) is owned by MCP; MCNS translates those assignments into DNS records.

What it provides:

  • Internal DNS. MCNS is authoritative for the internal zones of the Metacircular network. Three zones serve different purposes:

    Zone Example Purpose
    *.metacircular.net metacrypt.metacircular.net External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy.
    *.mcp.metacircular.net vade.mcp.metacircular.net Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs).
    *.svc.mcp.metacircular.net metacrypt.svc.mcp.metacircular.net Internal service addresses. Maps service names to the node and port where they currently run.

    The *.mcp.metacircular.net and *.svc.mcp.metacircular.net zones are managed by MCNS. The external *.metacircular.net zone is managed separately (existing DNS infrastructure) and is mostly static.

  • MCP integration. MCP pushes DNS record updates to MCNS after deploy and migrate operations. When MCP starts service α on node X, it calls the MCNS API to set α.svc.mcp.metacircular.net to X's address. Services and clients using internal DNS names automatically resolve to the right place without config changes.

  • Record management API. Authenticated via MCIAS with name-scoped authorization. Admin can manage all records and zones. The mcp-agent system account can create and delete any record. Other system accounts can only manage records matching their own name (e.g., system account mcq can manage mcq.svc.mcp.metacircular.net but not other records). Human users have read-only access to records. Zone mutations (create, update, delete zones) remain admin-only.

How it fits in: MCNS answers "what is the address of X?" MCP answers "where is service α running?" and pushes the answer to MCNS. This separation means services can use stable DNS names in their configs (e.g., mcias.svc.mcp.metacircular.net in [mcias] server_url) that survive migration without config changes.

Status: Implemented. v1.1.1. Custom Go DNS server deployed on rift, serving two authoritative zones (svc.mcp.metacircular.net and mcp.metacircular.net) plus upstream forwarding. REST + gRPC APIs with MCIAS auth and name-scoped system account authorization. Records stored in SQLite.


MCP — Metacircular Control Plane

MCP is the orchestrator. It manages what runs where across the platform. The deployment model is operator-driven: the user says "deploy service α" and MCP handles the rest. MCP Master runs on the operator's workstation; agents run on each managed node.

What it provides:

  • Service registry. MCP is the source of truth for what is running where. It tracks every service, which node it's on, and its current state. Other components that need to find a service (including MC-Proxy for route table updates) query MCP's registry.

  • Deploy. The operator says "deploy α". MCP checks if α is already running somewhere. If it is, MCP pulls the new container image on that node and restarts the service in place. If it isn't running, MCP selects a node (the operator can pin to a specific node but shouldn't have to), transfers the initial config, pulls the image from MCR, starts the container, and pushes a DNS update to MCNS (α.svc.mcp.metacircular.net → node address).

  • Undeploy. Full teardown of a service. Stops the container, removes MC-Proxy routes, deletes DNS records from MCNS, and cleans up the service registry entry. The inverse of deploy.

  • Migrate. Move a service from one node to another. MCP snapshots the service's /srv/<service>/ directory on the source node (as a tar.zst image), transfers it to the destination, extracts it, starts the service, stops it on the source, and updates MCNS so DNS points to the new location. The /srv/<service>/ convention makes this uniform across all services.

  • Data transfer. The C2 channel supports file-level operations between master and agents: copy or fetch individual files (push a config, pull a log), and transfer tar.zst archives for bulk snapshot/restore of service data directories. This is the foundation for both migration and backup.

  • Service snapshots. To snapshot /srv/<service>/, the agent runs VACUUM INTO to create a consistent database copy, then builds a tar.zst that includes the full directory but excludes live database files (*.db, *.db-wal, *.db-shm) and the backups/ directory. The temporary VACUUM INTO copy is injected into the archive as <service>.db. The result is a clean, minimal archive that extracts directly into a working service directory on the destination.

  • Container lifecycle. Start, stop, restart, and update containers on nodes. MCP Master issues commands; agents on each node execute them against the local container runtime (rootless Podman).

  • Master/agent architecture. MCP Master runs on the operator's machine. Agents run on every managed node, receiving C2 (command and control) from Master, reporting node status, and managing local workloads. The C2 channel is authenticated via MCIAS — any authenticated non-guest user or system account is accepted (admin role is not required for deploy operations). The master does not need to be always-on — agents keep running their workloads independently; the master is needed only to issue new commands.

  • System account automation. The agent uses an mcp-agent system account for all service-to-service communication: TLS cert provisioning (Metacrypt), DNS record management (MCNS), and container image pulls (MCR). Each service authorizes the agent through its own policy engine. Per-service system accounts (e.g., mcq) can be created for scoped self-management — a service account can only manage its own DNS records, not other services'.

  • Node management. Track which nodes are in the platform, their health, available resources, and running workloads.

  • Scheduling. When placing a new service, MCP selects a node based on available resources and any operator-specified constraints. The operator can override with an explicit node, but the default is MCP's choice.

How it fits in: MCP is the piece that ties everything together. MCIAS provides identity, Metacrypt provides certificates, MCR provides images, MCNS provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the infrastructure metacircular: the control plane deploys and manages the very services it depends on.

Container-first design: All Metacircular services are built as containers (multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP can deploy them. The systemd unit files exist as a fallback and for bootstrap — the long-term deployment model is MCP-managed containers.

Status: Implemented. v0.7.6. Deployed on rift managing all platform containers. Route declarations with automatic port allocation ($PORT / $PORT_<NAME> env vars passed to containers). MC-Proxy route registration during deploy and stop. Automated TLS cert provisioning for L7 routes via Metacrypt CA (Phase C). Automated DNS registration in MCNS during deploy and stop (Phase D). Two components — mcp CLI (operator workstation) and mcp-agent (per-node daemon with SQLite registry, rootless Podman, monitoring with drift/flap detection). gRPC-only (no REST). 15 RPCs, 17+ CLI commands.


MCAT — MCIAS Login Policy Tester

MCAT is a lightweight diagnostic tool, not a core infrastructure component. It presents a web login form, forwards credentials to MCIAS with a configurable service_name and tags, and shows whether the login was accepted or denied by policy. This lets operators verify that login policy rules behave as expected without touching the target service.

Status: Implemented.


Bootstrap Sequence

Bringing up a Metacircular platform from scratch requires careful ordering because of the circular dependencies — the infrastructure manages itself, but must exist before it can do so. The key challenge is that nearly every service needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but those services themselves need to be running first.

During bootstrap, all services run as systemd units on a single bootstrap node. MCP takes over lifecycle management as the final step.

Prerequisites

Before any service starts, the operator needs:

  • The bootstrap node — a machine (VPS, homelab server, etc.) with the overlay network configured and reachable.
  • Seed PKI — MCIAS and Metacrypt need TLS certs to start, but Metacrypt isn't running yet to issue them. The root CA is generated manually using github.com/kisom/cert and stored in the ca/ directory in the workspace. Initial service certificates are issued from this root. The root CA is then imported into Metacrypt once it's running, so Metacrypt becomes the authoritative CA for the platform going forward.
  • TOML config files — each service needs its config in /srv/<service>/. During bootstrap these are written manually. Later, MCP handles config distribution.

Startup Order

Phase 0: Seed PKI
  Operator creates or obtains initial TLS certificates for MCIAS
  and Metacrypt. Places them in /srv/mcias/certs/ and
  /srv/metacrypt/certs/.

Phase 1: Identity
  ┌──────────────────────────────────────────────────────┐
  │ MCIAS starts (systemd)                               │
  │  - No dependencies on other Metacircular services    │
  │  - Uses seed TLS certificates                        │
  │  - Operator creates initial admin account             │
  │  - Operator creates system accounts for other services│
  └──────────────────────────────────────────────────────┘

Phase 2: Cryptographic Services
  ┌──────────────────────────────────────────────────────┐
  │ Metacrypt starts (systemd)                           │
  │  - Authenticates against MCIAS                       │
  │  - Uses seed TLS certificates initially              │
  │  - Operator initializes and unseals                  │
  │  - Operator creates CA engine, imports root CA from  │
  │    ca/, creates issuers                              │
  │  - Can now issue certificates for all other services │
  │  - Reissue MCIAS and Metacrypt certs from own CA     │
  │    (replace seed certs with Metacrypt-issued certs)  │
  └──────────────────────────────────────────────────────┘

Phase 3: Ingress
  ┌──────────────────────────────────────────────────────┐
  │ MC-Proxy starts (systemd)                            │
  │  - Static route table from TOML config               │
  │  - Routes external traffic to MCIAS, Metacrypt       │
  │  - No MCIAS auth (pre-auth infrastructure)           │
  │  - TLS certs for L7 routes from Metacrypt            │
  └──────────────────────────────────────────────────────┘

Phase 4: Container Registry
  ┌──────────────────────────────────────────────────────┐
  │ MCR starts (systemd)                                 │
  │  - Authenticates against MCIAS                       │
  │  - TLS certificates from Metacrypt                   │
  │  - Operator pushes container images for all services │
  │    (including MCIAS, Metacrypt, MC-Proxy themselves) │
  └──────────────────────────────────────────────────────┘

Phase 5: DNS
  ┌──────────────────────────────────────────────────────┐
  │ MCNS starts (systemd)                                │
  │  - Authenticates against MCIAS                       │
  │  - Operator configures initial DNS records           │
  │    (node addresses, service names)                   │
  └──────────────────────────────────────────────────────┘

Phase 6: Control Plane
  ┌──────────────────────────────────────────────────────┐
  │ MCP Agent starts on bootstrap node (systemd)         │
  │ MCP Master starts on operator workstation            │
  │  - Authenticates against MCIAS                       │
  │  - Master registers the bootstrap node               │
  │  - Master imports running services into its registry │
  │  - From here, MCP owns the service map               │
  │  - Services can be redeployed as MCP-managed         │
  │    containers (replacing the systemd units)          │
  └──────────────────────────────────────────────────────┘

The Seed Certificate Problem

The circular dependency between MCIAS, Metacrypt, and TLS is resolved by bootstrapping with a manually generated root CA:

  1. The operator generates a root CA using github.com/kisom/cert. This root and initial service certificates live in the ca/ directory.
  2. MCIAS and Metacrypt start with certificates issued from this external root.
  3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA engine, making Metacrypt the authoritative issuer under the same root.
  4. Metacrypt can now issue and renew certificates for all services. The ca/ directory remains as the offline backup of the root material.

This is a one-time process. The root CA is generated once, imported once, and from that point forward Metacrypt is the sole CA. MCP handles certificate provisioning for all services.

Adding a New Node

Once the platform is bootstrapped, adding a node is straightforward:

  1. Provision the machine and connect it to the overlay network.
  2. Install the MCP agent binary.
  3. Configure the agent with the MCP Master address and MCIAS credentials (system account for the node).
  4. Start the agent. It authenticates with MCIAS, connects to Master, and reports as available.
  5. The operator deploys workloads to it via MCP. MCP handles image pulls, config transfer, certificate provisioning, and DNS updates.

Disaster Recovery

If the bootstrap node is lost, recovery follows the same sequence as initial bootstrap — but with data restored from backups:

  1. Start MCIAS on a new node, restore its database from the most recent VACUUM INTO snapshot.
  2. Start Metacrypt, restore its database. Unseal with the original password. The entire key hierarchy and all issued certificates are recovered.
  3. Bring up the remaining services in order, restoring their databases.
  4. Start MCP, which rebuilds its registry from the running services.
  5. Update DNS (MCNS or external) to point to the new node.

Every service's snapshot CLI command and daily backup timer exist specifically to make this recovery possible. The /srv/<service>/ convention means each service's entire state is a single directory to back up and restore.


Certificate Lifecycle

Every service in the platform requires TLS certificates, and Metacrypt is the CA that issues them. This section describes how certificates flow from Metacrypt to services, how they are renewed, and how the pieces fit together.

PKI Structure

Metacrypt implements a two-tier PKI:

Root CA (self-signed, generated at engine initialization)
  ├── Issuer "infra"    (intermediate CA for infrastructure services)
  ├── Issuer "services" (intermediate CA for application services)
  └── Issuer "clients"  (intermediate CA for client certificates)

The root CA signs intermediate CAs ("issuers"), which in turn sign leaf certificates. Each issuer is scoped to a purpose. The root CA certificate is the trust anchor — services and clients need it (or the relevant issuer chain) to verify certificates presented by other services.

ACME Protocol

Metacrypt implements an ACME server (RFC 8555) with External Account Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any standard ACME client can obtain certificates from Metacrypt.

The ACME flow:

  1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
  2. Client registers an ACME account using the EAB credentials.
  3. Client places a certificate order (one or more domain names).
  4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
  5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT record for DNS-01).
  6. Metacrypt validates the challenge and issues the certificate.
  7. Client downloads the certificate chain and private key.

A Go client library (metacrypt/clients/go) wraps this entire flow: MCIAS login, EAB fetch, account registration, challenge fulfillment, and certificate download. Services that integrate this library can obtain and renew certificates programmatically.

How Services Get Certificates Today

For services deployed via MCP with L7 routes, certificates are provisioned automatically during deploy — MCP uses the Metacrypt ACME client library to obtain certs and transfers them to the node. For other services and during bootstrap, certificates are provisioned through Metacrypt's REST API or web UI and placed into each service's /srv/<service>/certs/ directory manually.

How MCP Automates Certificates

MCP automates certificate provisioning for deploy workflows, with renewal and migration automation planned:

  • Initial deploy. When MCP deploys a new service, it provisions a certificate from Metacrypt (via the ACME client library), transfers the cert and key to the node as part of the config push to /srv/<service>/certs/, and starts the service with valid TLS material. For L7 routes, MCP also provisions a TLS certificate for MC-Proxy's termination endpoint.

  • Renewal. MCP knows what services are running and when their certificates expire. It can renew certificates before expiry by re-running the ACME flow (or calling Metacrypt's renew operation) and pushing updated files to the node. The service restarts with the new certificate.

  • Migration. When MCP migrates a service, the certificate in /srv/<service>/certs/ moves with the tar.zst snapshot. If the service's hostname changes (new node, new DNS name), MCP provisions a new certificate for the new name.

  • MC-Proxy L7 routes. MC-Proxy's L7 mode requires certificate/key pairs for TLS termination. MCP provisions these from Metacrypt during deploy and pushes them to the node alongside the route registration.

Trust Distribution

Every service and client that validates TLS certificates needs the root CA certificate (or the relevant issuer chain). Metacrypt serves these publicly without authentication:

  • GET /v1/pki/{mount}/ca — root CA certificate (PEM)
  • GET /v1/pki/{mount}/ca/chain — full chain: issuer + root (PEM)
  • GET /v1/pki/{mount}/issuer/{name} — specific issuer certificate (PEM)

During bootstrap, the root CA cert is distributed manually (or via the ca/ directory in the workspace). Once MCP is running, it can distribute the CA cert as part of service deployment. Services reference the CA cert path in their [mcias] config section (ca_cert) to verify connections to MCIAS and other services.


End-to-End Deploy Workflow

This traces a deployment from code change to running service, showing how every component participates. The example deploys a new version of service α that is already running on Node B.

1. Build and Push

The operator builds a new container image and pushes it to MCR:

Operator workstation (vade)
  $ docker build -t mcr.metacircular.net/α:v1.2.0 .
  $ docker push mcr.metacircular.net/α:v1.2.0
         │
         ▼
    MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
                                                        │
                                                   Authenticates
                                                   via MCIAS
                                                        │
                                                   Policy check:
                                                   can this user
                                                   push to α?
                                                        │
                                                   Image stored
                                                   (blobs + manifest)

The docker push goes through MC-Proxy (SNI routing to MCR), authenticates via the OCI token flow (which delegates to MCIAS), and is checked against MCR's push policy. The image is stored content-addressed in MCR.

2. Deploy

The operator tells MCP to deploy:

Operator workstation (vade)
  $ mcp deploy α                  # or: mcp deploy α --image v1.2.0
         │
    MCP Master
         │
         ├── Registry lookup: α is running on Node B
         │
         ├── C2 (gRPC over overlay) to Node B agent:
         │     "pull mcr.metacircular.net/α:v1.2.0 and restart"
         │
         ▼
    MCP Agent (Node B)
         │
         ├── Pull image from MCR
         │     (authenticates via MCIAS, same OCI flow)
         │
         ├── Stop running container
         │
         ├── Start new container from updated image
         │     - Mounts /srv/α/ (config, database, certs all persist)
         │     - Service starts, authenticates to MCIAS, resumes operation
         │
         └── Report status back to Master

Since α is already running on Node B, this is an in-place update. The /srv/α/ directory is untouched — config, database, and certificates persist across the container restart.

3. First-Time Deploy

If α has never been deployed, MCP does more work:

Operator workstation (vade)
  $ mcp deploy α --config α.toml
         │
    MCP Master
         │
         ├── Registry lookup: α is not running anywhere
         │
         ├── Scheduling: select Node C (best fit)
         │
         ├── Port assignment: allocate a free host port for each
         │     declared route (passed as $PORT / $PORT_<NAME> env vars)
         │
         ├── Provision TLS certificate from Metacrypt CA
         │     (ACME client library) for the service
         │     — for L7 routes, also provision a cert for MC-Proxy
         │       TLS termination
         │
         ├── C2 to Node C agent:
         │     1. Create /srv/α/ directory structure
         │     2. Transfer config file (α.toml → /srv/α/α.toml)
         │     3. Transfer TLS cert+key → /srv/α/certs/
         │     4. Transfer root CA cert → /srv/α/certs/ca.pem
         │     5. Pull image from MCR
         │     6. Start container with $PORT / $PORT_<NAME> env vars
         │
         ├── Register routes with MC-Proxy
         │     (gRPC AddRoute for each declared route)
         │
         ├── Update service registry: α → Node C
         │
         └── Push DNS update to MCNS:
               α.svc.mcp.metacircular.net → Node C address

4. Migration

Moving α from Node B to Node C:

Operator workstation (vade)
  $ mcp migrate α --to node-c     # or let MCP choose the destination
         │
    MCP Master
         │
         ├── C2 to Node B agent:
         │     1. Stop α container
         │     2. Snapshot /srv/α/ → tar.zst archive
         │     3. Transfer tar.zst to Master (or directly to Node C)
         │
         ├── C2 to Node C agent:
         │     1. Receive tar.zst archive
         │     2. Extract to /srv/α/
         │     3. Pull container image from MCR (if not cached)
         │     4. Start container
         │     5. Report status
         │
         ├── Update service registry: α → Node C
         │
         ├── Push DNS update to MCNS:
         │     α.svc.mcp.metacircular.net → Node C address
         │
         └── (If α had external ingress) update MC-Proxy route
              or rely on DNS change

What Each Component Does

Step MCIAS Metacrypt MCR MC-Proxy MCP MCNS
Build/push image Authenticates push Stores image, enforces push policy Routes traffic to MCR
Deploy (update) Authenticates pull, authenticates service on start Serves image to agent Routes traffic to service Coordinates: registry lookup, C2 to agent
Deploy (new) Authenticates pull, authenticates service on start Issues TLS certificate Serves image to agent Routes traffic to service (if external) Coordinates: scheduling, cert provisioning, config transfer, DNS update Updates DNS records
Migrate Authenticates service on new node Issues new cert (if hostname changes) Serves image (if not cached) Routes traffic to new location Coordinates: snapshot, transfer, DNS update Updates DNS records
Steady state Validates tokens for every authenticated request Serves CA certs publicly, renews certs Serves image pulls Routes all external traffic Tracks service health, holds registry Serves DNS queries

Future Ideas

Components and capabilities that may be worth building but have no immediate timeline. Listed here to capture the thinking; none are committed.

Observability — Log Collection and Health Monitoring

Every service already produces structured logs (log/slog) and exposes health checks (gRPC Health.Check or REST status endpoints). What's missing is aggregation — today, debugging a cross-service issue means SSH'ing into each node and reading local logs.

A collector could:

  • Gather structured logs from services on each node and forward them to a central store.
  • Periodically health-check local services and report status.
  • Feed health data into MCP so it can make informed decisions (restart unhealthy services, avoid scheduling on degraded nodes, alert the operator).

This might be a standalone service or an MCP agent capability, depending on weight. If it's just "tail logs and hit health endpoints," it fits in the agent. If it grows to include indexing, querying, retention policies, and alerting rules, it's its own service.

Object Store

The platform has structured storage (SQLite), blob storage scoped to container images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does not have general-purpose object/blob storage.

Potential uses:

  • Centralized backups. Service snapshots currently live on each node in /srv/<service>/backups/. A central object store gives MCP somewhere to push tar.zst snapshots for offsite retention.
  • Artifact storage. Build outputs, large files, anything that doesn't fit in a database row.
  • Data sharing between services. Files that need to move between services outside the MCP C2 channel.

Prior art: Nebula, a content-addressable data store with capability-based security (SHA-256 addressed blobs, UUID entries for versioning, proxy references for revocable access). Prototyped in multiple languages. The capability model is interesting but may be more sophistication than the platform needs — a simpler authenticated blob store with MCIAS integration might suffice.

Overlay Network Management

The platform currently relies on an external overlay network (WireGuard, Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard mesh manager would bring the overlay under Metacircular's control:

  • Automate key exchange and peer configuration when MCP adds a node.
  • Manage IP allocation within the mesh (potentially absorbing part of MCNS's scope).
  • Remove the dependency on Tailscale's coordination servers.

This is a natural extension of the sovereignty principle but is low priority while the mesh is small enough to manage by hand.

Hypervisor / Isolation

A deeper exploration of environment isolation, message-passing between services, and access mediation at a level below containers. Prior art: hypervisor concept. The current platform achieves these goals through containers + MCIAS + policy engines. A hypervisor layer would push isolation down to the OS level — interesting for security but significant in scope. More relevant if the platform ever moves beyond containers to VM-based workloads.

Prior Art: SYSGOV

SYSGOV was an earlier exploration of system management in Lisp, with SYSPLAN (desired state enforcement) and SYSMON (service management). Many of its research questions — C2 communication, service discovery, secure config distribution, failure handling — are directly addressed by MCP's design. MCP is the spiritual successor, reimplemented in Go with the benefit of the Metacircular platform underneath it.