Initial import.

This commit is contained in:
2026-03-25 22:25:44 -07:00
commit 168ceb2c07
6 changed files with 57369 additions and 0 deletions

927
docs/metacircular.md Normal file
View File

@@ -0,0 +1,927 @@
# Metacircular Infrastructure
## Background
Metacircular Dynamics is a personal infrastructure platform. The name comes
from the tradition of metacircular evaluators in Lisp — a system defined in
terms of itself — by way of SICP and Common Lisp projects that preceded this
work. The infrastructure is metacircular in the same sense: the platform
manages, secures, and hosts its own services.
The goal is sovereign infrastructure. Every component is self-hosted, every
dependency is controlled, and the entire stack is operable by one person. There
are no cloud provider dependencies, no third-party auth providers, no external
databases. When a Metacircular node boots, it connects to Metacircular services
for identity, certificates, container images, and workload scheduling.
All services are written in Go and follow a shared set of engineering standards
(see `engineering-standards.md`). The platform is designed for a small number of
machines — a personal homelab or a handful of VPSes — not for hyperscale.
## Philosophy
**Sovereignty.** You own the whole stack. Identity, certificates, secrets,
container images, DNS, networking — all self-hosted. No SaaS dependency means
no vendor lock-in, no surprise deprecations, and no trust delegation to third
parties.
**Simplicity over sophistication.** SQLite over Postgres. Stdlib `testing` over
test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments
over microservice orchestrators. The right tool is the simplest one that solves
the problem without creating a new one.
**Consistency as leverage.** Every service follows identical patterns: the same
directory layout, the same Makefile targets, the same config format, the same
auth integration, the same deployment model. Knowledge of one service transfers
instantly to all others. A new service can be stood up by copying the skeleton.
**Security as structure.** Security is not a feature bolted on after the fact.
Default deny is the starting posture. TLS 1.3 is the minimum, not a goal.
Interceptor maps make "forgot to add auth" a visible, reviewable omission
rather than a silent runtime failure. Secrets are encrypted at rest behind a
seal/unseal barrier. Every service delegates identity to a single root of
trust.
**Design before code.** The architecture document is written before
implementation begins. It is the spec, not the afterthought. When the code and
the spec disagree, one of them has a bug.
## High-Level Overview
Metacircular infrastructure is built from six core components, plus a shared
standard library (**MCDSL**) that provides the common patterns all services
depend on (auth integration, database setup, config loading, TLS server
bootstrapping, CSRF, snapshots):
- **MCIAS** — Identity and access. The root of trust for all other services.
Handles authentication, token issuance, role management, and login policy
enforcement. Every other component delegates auth here.
- **Metacrypt** — Cryptographic services. PKI/CA, SSH CA, transit encryption,
and encrypted secret storage behind a Vault-inspired seal/unseal barrier.
Issues the TLS certificates that every other service depends on.
- **MCR** — Container registry. OCI-compliant image storage. MCP directs nodes
to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.
- **MCNS** — Networking. DNS and address management for the platform.
- **MCP** — Control plane. The orchestrator. A master/agent architecture that
manages workload scheduling, container lifecycle, service registry, data
transfer, and node state across the platform.
- **MC-Proxy** — Node ingress. A TLS proxy and router that sits on every node,
accepts outside connections, and routes them to the correct service — either
as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.
These components form a dependency graph rooted at MCIAS:
```
MCIAS (standalone — the root of trust)
├── Metacrypt (uses MCIAS for auth; provides certs to all services)
├── MCR (uses MCIAS for auth; stores images pulled by MCP)
├── MCNS (uses MCIAS for auth; provides DNS for the platform)
├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
└── MC-Proxy (pre-auth; routes traffic to services behind it)
```
### The Node Model
The unit of deployment is the **MC Node** — a machine (physical or virtual)
that participates in the Metacircular platform.
```
┌──────────────────┐ ┌──────────────┐
│ System / Core │ │ MCP │
│ Infrastructure │ │ Master │
│ (e.g. MCIAS) │ │ │
└────────┬─────────┘ └──────┬───────┘
│ │ C2
│ │
Outside ┌─────────────▼─────────────────────▼──────────┐
Client ────▶│ MC Node │
│ │
│ ┌───────────┐ │
│ │ MC-Proxy │──┬──────┬──────┐ │
│ └───────────┘ │ │ │ │
│ ┌───▼┐ ┌──▼─┐ ┌─▼──┐ ┌─────┐ │
│ │ α │ │ β │ │ γ │ │ MCP │ │
│ └────┘ └────┘ └────┘ │Slave│ │
│ └──┬──┘ │
│ ┌────▼───┐│
│ │Docker/ ││
│ │etc. ││
│ └────────┘│
└──────────────────────────────────────────────┘
```
Outside clients connect to **MC-Proxy**, which inspects the TLS SNI hostname
and routes to the correct service (α, β, γ) — either as a raw TCP relay or
via TLS-terminating HTTP/2 reverse proxy, per-route. The **MCP Agent** on each
node receives C2 commands from the **MCP Master** (running on the operator's
workstation) and manages local container lifecycle via the container runtime.
Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any
other workload.
### The Network Model
Metacircular nodes are connected via an **encrypted overlay network** — a
self-managed WireGuard mesh, Tailscale, or similar. No component has a hard
dependency on a specific overlay implementation; the platform requires only
that nodes can reach each other over encrypted links.
```
Public Internet
┌─────────▼──────────┐
│ Edge MC-Proxy │ VPS (public IP)
│ :443 │
└─────────┬──────────┘
│ PROXY protocol v2
┌─────────▼──────────────────────────────────┐
│ Encrypted Overlay (e.g. WireGuard) │
│ │
┌───────────┴──┐ ┌──────────┐ ┌──────────┐ ┌──────┴─────┐
│ Origin │ │ Node B │ │ Node C │ │ Operator │
│ MC-Proxy │ │ (MCP │ │ │ │ Workstation│
│ + services │ │ agent) │ │ (MCP │ │ (MCP │
│ (MCP agent) │ │ │ │ agent) │ │ Master) │
└──────────────┘ └──────────┘ └──────────┘ └────────────┘
```
**External traffic** flows from the internet through an edge MC-Proxy (on a
public VPS), which forwards via PROXY protocol over the overlay to an origin
MC-Proxy on the private network. The overlay preserves the real client IP
across the hop.
**Internal traffic** (MCP C2, inter-service communication, MCNS DNS) flows
directly over the overlay. MCP's C2 channel is gRPC over whatever link exists
between master and agent — the overlay provides the transport.
The overlay network itself is a candidate for future Metacircular management
(a self-hosted WireGuard mesh manager), consistent with the sovereignty
principle of minimizing third-party dependencies.
---
## System Catalog
### MCIAS — Metacircular Identity and Access Service
MCIAS is the root of trust for the entire platform. Every other service
delegates authentication to it; no service maintains its own user database.
**What it provides:**
- **Authentication.** Username/password with optional TOTP and FIDO2/WebAuthn.
Credentials are verified by MCIAS and a signed JWT bearer token is returned.
Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of
the token).
- **Role-based access.** Three roles — `admin` (full access, policy bypass),
`user` (policy-governed), `guest` (service-dependent restrictions). Admin
detection comes solely from the MCIAS `admin` role; services never promote
users locally.
- **Account types.** Human accounts (interactive users) and system accounts
(service-to-service). Both authenticate the same way; system accounts enable
automated workflows.
- **Login policy.** Priority-based ACL rules that control who can log into
which services. Rules can target roles, account types, service names, and
tags. This allows operators to restrict access per-service (e.g., deny
`guest` from services tagged `env:restricted`) without changing the
services themselves.
- **Token lifecycle.** Issuance, validation, renewal, and revocation.
Ed25519-signed JWTs. Short expiry with renewal support.
**How other services integrate:** Every service includes an `[mcias]` config
section with the MCIAS server URL, a `service_name`, and optional `tags`. At
login time, the service forwards credentials to MCIAS along with this context.
MCIAS evaluates login policy against the service context, verifies credentials,
and returns a bearer token. The MCIAS Go client library
(`git.wntrmute.dev/kyle/mcias/clients/go`) handles this flow.
**Status:** Implemented. v1.0.0 complete.
---
### Metacrypt — Cryptographic Service Engine
Metacrypt provides cryptographic resources to the platform through a modular
engine architecture, backed by an encrypted storage barrier inspired by
HashiCorp Vault.
**What it provides:**
- **PKI / Certificate Authority.** X.509 certificate issuance. Root and
intermediate CAs, certificate signing, CRL management, ACME protocol
support. This is how every service in the platform gets its TLS
certificates.
- **SSH CA.** (Planned.) SSH certificate signing for host and user
certificates, replacing static SSH key management.
- **Transit encryption.** (Planned.) Encrypt and decrypt data without exposing
keys to the caller. Envelope encryption for services that need to protect
data at rest without managing their own key material.
- **User-to-user encryption.** (Planned.) End-to-end encryption between users,
with key management handled by Metacrypt.
**Seal/unseal model:** Metacrypt starts sealed. An operator provides a password
which derives (via Argon2id) a key-wrapping key, which decrypts the master
encryption key (MEK), which in turn unwraps per-engine data encryption keys
(DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise
of one engine's key does not expose another's data.
```
Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs
```
**Engine architecture:** Engines are pluggable providers that register with a
central registry. Each engine mount has a type, a name, its own DEK, and its
own configuration. The engine interface handles initialization, seal/unseal
lifecycle, and request routing. New engine types plug in without modifying the
core.
**Policy:** Fine-grained ACL rules control which users can perform which
operations on which engine mounts. Priority-based evaluation, default deny,
admin bypass. See Metacrypt's `POLICY.md` for the full model.
**Status:** Implemented. CA engine complete with ACME support. SSH CA, transit,
and user-to-user engines planned.
---
### MCR — Metacircular Container Registry
MCR is an OCI Distribution Spec-compliant container registry. It stores and
serves the container images that MCP deploys across the platform.
**What it provides:**
- **OCI-compliant image storage.** Pull, push, tag, and delete container
images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite,
blobs on the filesystem.
- **Authenticated access.** No anonymous access. MCR uses the OCI token
authentication flow: clients hit `/v2/`, receive a 401 with a token
endpoint, authenticate via MCIAS, and use the returned JWT for subsequent
requests.
- **Policy-controlled push/pull.** Fine-grained ACL rules govern who can push
to or pull from which repositories. Integrated with MCIAS roles.
- **Garbage collection.** Unreferenced blobs are cleaned up via the admin CLI
(`mcrctl`).
**How it fits in:** MCP directs nodes to pull images from MCR. When a workload
is scheduled, MCP tells the node's agent which image to pull and where to get
it. MCR sits behind an MC-Proxy instance for TLS routing.
**Status:** Implemented. Phase 12 (web UI) complete.
---
### MC-Proxy — TLS Proxy and Router
MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections,
extracts the SNI hostname, and routes to the correct backend. Each route is
independently configured as either **L4 passthrough** (raw TCP relay, no TLS
termination) or **L7 terminating** (terminates TLS, reverse proxies HTTP/2 and
HTTP/1.1 including gRPC). Both modes coexist on the same listener.
**What it provides:**
- **SNI-based routing.** A route table maps hostnames to backend addresses.
Exact match, case-insensitive. Multiple listeners can bind different ports,
each with its own route table, all sharing the same global firewall.
- **Dual-mode proxying.** L4 routes relay raw TCP — backends see the original
TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy
and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS),
with header injection (`X-Forwarded-For`, `X-Real-IP`), gRPC streaming
support, and trailer forwarding.
- **Global firewall.** Every connection is evaluated before routing: per-IP
rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2).
Blocked connections get a TCP RST — no error messages, no TLS alerts.
- **PROXY protocol.** Listeners can accept v1/v2 headers from upstream proxies
to learn the real client IP. Routes can send v2 headers to downstream
backends. This enables multi-hop deployments — a public edge MC-Proxy on a
VPS forwarding over the encrypted overlay to a private origin MC-Proxy —
while preserving the real client IP for firewall evaluation and logging.
- **Runtime management.** Routes and firewall rules can be updated at runtime
via a gRPC admin API on a Unix domain socket (filesystem permissions for
access control, no network exposure). State is persisted to SQLite with
write-through semantics.
**How it fits in:** MC-Proxy is pre-auth infrastructure. It sits in front of
everything on a node. Outside clients connect to MC-Proxy on well-known ports
(443, 8443, etc.) and MC-Proxy routes to the correct backend based on the
hostname the client is trying to reach. A typical production deployment uses
two instances — an edge proxy on a public VPS and an origin proxy on the
private network, connected over the overlay with PROXY protocol preserving
client IPs across the hop.
**Status:** Implemented.
---
### MCNS — Metacircular Networking Service
MCNS provides DNS for the platform. It manages two internal zones and serves
as the name resolution layer for the Metacircular network. Service discovery
(which services run where) is owned by MCP; MCNS translates those assignments
into DNS records.
**What it will provide:**
- **Internal DNS.** MCNS is authoritative for the internal zones of the
Metacircular network. Three zones serve different purposes:
| Zone | Example | Purpose |
|------|---------|---------|
| `*.metacircular.net` | `metacrypt.metacircular.net` | External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy. |
| `*.mcp.metacircular.net` | `vade.mcp.metacircular.net` | Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs). |
| `*.svc.mcp.metacircular.net` | `metacrypt.svc.mcp.metacircular.net` | Internal service addresses. Maps service names to the node and port where they currently run. |
The `*.mcp.metacircular.net` and `*.svc.mcp.metacircular.net` zones are
managed by MCNS. The external `*.metacircular.net` zone is managed separately
(existing DNS infrastructure) and is mostly static.
- **MCP integration.** MCP pushes DNS record updates to MCNS after deploy and
migrate operations. When MCP starts service α on node X, it calls the MCNS
API to set `α.svc.mcp.metacircular.net` to X's address. Services and clients
using internal DNS names automatically resolve to the right place without
config changes.
- **Record management API.** Authenticated via MCIAS. MCP is the primary
consumer for dynamic updates. Operators can also manage records directly
for static entries (node addresses, aliases).
**How it fits in:** MCNS answers "what is the address of X?" MCP answers "where
is service α running?" and pushes the answer to MCNS. This separation means
services can use stable DNS names in their configs (e.g.,
`mcias.svc.mcp.metacircular.net` in `[mcias] server_url`) that survive
migration without config changes.
**Status:** Not yet implemented.
---
### MCP — Metacircular Control Plane
MCP is the orchestrator. It manages what runs where across the platform. The
deployment model is operator-driven: the user says "deploy service α" and MCP
handles the rest. MCP Master runs on the operator's workstation; agents run on
each managed node.
**What it will provide:**
- **Service registry.** MCP is the source of truth for what is running where.
It tracks every service, which node it's on, and its current state. Other
components that need to find a service (including MC-Proxy for route table
updates) query MCP's registry.
- **Deploy.** The operator says "deploy α". MCP checks if α is already running
somewhere. If it is, MCP pulls the new container image on that node and
restarts the service in place. If it isn't running, MCP selects a node
(the operator can pin to a specific node but shouldn't have to), transfers
the initial config, pulls the image from MCR, starts the container, and
pushes a DNS update to MCNS (`α.svc.mcp.metacircular.net` → node address).
- **Migrate.** Move a service from one node to another. MCP snapshots the
service's `/srv/<service>/` directory on the source node (as a tar.zst
image), transfers it to the destination, extracts it, starts the service,
stops it on the source, and updates MCNS so DNS points to the new location.
The `/srv/<service>/` convention makes this uniform across all services.
- **Data transfer.** The C2 channel supports file-level operations between
master and agents: copy or fetch individual files (push a config, pull a
log), and transfer tar.zst archives for bulk snapshot/restore of service
data directories. This is the foundation for both migration and backup.
- **Service snapshots.** To snapshot `/srv/<service>/`, the agent runs
`VACUUM INTO` to create a consistent database copy, then builds a tar.zst
that includes the full directory but **excludes** live database files
(`*.db`, `*.db-wal`, `*.db-shm`) and the `backups/` directory. The
temporary VACUUM INTO copy is injected into the archive as `<service>.db`.
The result is a clean, minimal archive that extracts directly into a
working service directory on the destination.
- **Container lifecycle.** Start, stop, restart, and update containers on
nodes. MCP Master issues commands; agents on each node execute them against
the local container runtime (Docker, etc.).
- **Master/agent architecture.** MCP Master runs on the operator's machine.
Agents run on every managed node, receiving C2 (command and control) from
Master, reporting node status, and managing local workloads. The C2 channel
is authenticated via MCIAS. The master does not need to be always-on —
agents keep running their workloads independently; the master is needed only
to issue new commands.
- **Node management.** Track which nodes are in the platform, their health,
available resources, and running workloads.
- **Scheduling.** When placing a new service, MCP selects a node based on
available resources and any operator-specified constraints. The operator can
override with an explicit node, but the default is MCP's choice.
**How it fits in:** MCP is the piece that ties everything together. MCIAS
provides identity, Metacrypt provides certificates, MCR provides images, MCNS
provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the
map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the
infrastructure metacircular: the control plane deploys and manages the very
services it depends on.
**Container-first design:** All Metacircular services are built as containers
(multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP
can deploy them. The systemd unit files exist as a fallback and for bootstrap —
the long-term deployment model is MCP-managed containers.
**Status:** Not yet implemented.
---
### MCAT — MCIAS Login Policy Tester
MCAT is a lightweight diagnostic tool, not a core infrastructure component. It
presents a web login form, forwards credentials to MCIAS with a configurable
`service_name` and `tags`, and shows whether the login was accepted or denied
by policy. This lets operators verify that login policy rules behave as
expected without touching the target service.
**Status:** Implemented.
---
## Bootstrap Sequence
Bringing up a Metacircular platform from scratch requires careful ordering
because of the circular dependencies — the infrastructure manages itself, but
must exist before it can do so. The key challenge is that nearly every service
needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but
those services themselves need to be running first.
During bootstrap, all services run as **systemd units** on a single bootstrap
node. MCP takes over lifecycle management as the final step.
### Prerequisites
Before any service starts, the operator needs:
- **The bootstrap node** — a machine (VPS, homelab server, etc.) with the
overlay network configured and reachable.
- **Seed PKI** — MCIAS and Metacrypt need TLS certs to start, but Metacrypt
isn't running yet to issue them. The root CA is generated manually using
`github.com/kisom/cert` and stored in the `ca/` directory in the workspace.
Initial service certificates are issued from this root. The root CA is then
imported into Metacrypt once it's running, so Metacrypt becomes the
authoritative CA for the platform going forward.
- **TOML config files** — each service needs its config in `/srv/<service>/`.
During bootstrap these are written manually. Later, MCP handles config
distribution.
### Startup Order
```
Phase 0: Seed PKI
Operator creates or obtains initial TLS certificates for MCIAS
and Metacrypt. Places them in /srv/mcias/certs/ and
/srv/metacrypt/certs/.
Phase 1: Identity
┌──────────────────────────────────────────────────────┐
│ MCIAS starts (systemd) │
│ - No dependencies on other Metacircular services │
│ - Uses seed TLS certificates │
│ - Operator creates initial admin account │
│ - Operator creates system accounts for other services│
└──────────────────────────────────────────────────────┘
Phase 2: Cryptographic Services
┌──────────────────────────────────────────────────────┐
│ Metacrypt starts (systemd) │
│ - Authenticates against MCIAS │
│ - Uses seed TLS certificates initially │
│ - Operator initializes and unseals │
│ - Operator creates CA engine, imports root CA from │
│ ca/, creates issuers │
│ - Can now issue certificates for all other services │
│ - Reissue MCIAS and Metacrypt certs from own CA │
│ (replace seed certs with Metacrypt-issued certs) │
└──────────────────────────────────────────────────────┘
Phase 3: Ingress
┌──────────────────────────────────────────────────────┐
│ MC-Proxy starts (systemd) │
│ - Static route table from TOML config │
│ - Routes external traffic to MCIAS, Metacrypt │
│ - No MCIAS auth (pre-auth infrastructure) │
│ - TLS certs for L7 routes from Metacrypt │
└──────────────────────────────────────────────────────┘
Phase 4: Container Registry
┌──────────────────────────────────────────────────────┐
│ MCR starts (systemd) │
│ - Authenticates against MCIAS │
│ - TLS certificates from Metacrypt │
│ - Operator pushes container images for all services │
│ (including MCIAS, Metacrypt, MC-Proxy themselves) │
└──────────────────────────────────────────────────────┘
Phase 5: DNS
┌──────────────────────────────────────────────────────┐
│ MCNS starts (systemd) │
│ - Authenticates against MCIAS │
│ - Operator configures initial DNS records │
│ (node addresses, service names) │
└──────────────────────────────────────────────────────┘
Phase 6: Control Plane
┌──────────────────────────────────────────────────────┐
│ MCP Agent starts on bootstrap node (systemd) │
│ MCP Master starts on operator workstation │
│ - Authenticates against MCIAS │
│ - Master registers the bootstrap node │
│ - Master imports running services into its registry │
│ - From here, MCP owns the service map │
│ - Services can be redeployed as MCP-managed │
│ containers (replacing the systemd units) │
└──────────────────────────────────────────────────────┘
```
### The Seed Certificate Problem
The circular dependency between MCIAS, Metacrypt, and TLS is resolved by
bootstrapping with a **manually generated root CA**:
1. The operator generates a root CA using `github.com/kisom/cert`. This root
and initial service certificates live in the `ca/` directory.
2. MCIAS and Metacrypt start with certificates issued from this external root.
3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA
engine, making Metacrypt the authoritative issuer under the same root.
4. Metacrypt can now issue and renew certificates for all services. The `ca/`
directory remains as the offline backup of the root material.
This is a one-time process. The root CA is generated once, imported once, and
from that point forward Metacrypt is the sole CA. MCP handles certificate
provisioning for all services.
### Adding a New Node
Once the platform is bootstrapped, adding a node is straightforward:
1. Provision the machine and connect it to the overlay network.
2. Install the MCP agent binary.
3. Configure the agent with the MCP Master address and MCIAS credentials
(system account for the node).
4. Start the agent. It authenticates with MCIAS, connects to Master, and
reports as available.
5. The operator deploys workloads to it via MCP. MCP handles image pulls,
config transfer, certificate provisioning, and DNS updates.
### Disaster Recovery
If the bootstrap node is lost, recovery follows the same sequence as initial
bootstrap — but with data restored from backups:
1. Start MCIAS on a new node, restore its database from the most recent
`VACUUM INTO` snapshot.
2. Start Metacrypt, restore its database. Unseal with the original password.
The entire key hierarchy and all issued certificates are recovered.
3. Bring up the remaining services in order, restoring their databases.
4. Start MCP, which rebuilds its registry from the running services.
5. Update DNS (MCNS or external) to point to the new node.
Every service's `snapshot` CLI command and daily backup timer exist specifically
to make this recovery possible. The `/srv/<service>/` convention means each
service's entire state is a single directory to back up and restore.
---
## Certificate Lifecycle
Every service in the platform requires TLS certificates, and Metacrypt is the
CA that issues them. This section describes how certificates flow from
Metacrypt to services, how they are renewed, and how the pieces fit together.
### PKI Structure
Metacrypt implements a **two-tier PKI**:
```
Root CA (self-signed, generated at engine initialization)
├── Issuer "infra" (intermediate CA for infrastructure services)
├── Issuer "services" (intermediate CA for application services)
└── Issuer "clients" (intermediate CA for client certificates)
```
The root CA signs intermediate CAs ("issuers"), which in turn sign leaf
certificates. Each issuer is scoped to a purpose. The root CA certificate is
the trust anchor — services and clients need it (or the relevant issuer chain)
to verify certificates presented by other services.
### ACME Protocol
Metacrypt implements an **ACME server** (RFC 8555) with External Account
Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any
standard ACME client can obtain certificates from Metacrypt.
The ACME flow:
1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
2. Client registers an ACME account using the EAB credentials.
3. Client places a certificate order (one or more domain names).
4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT
record for DNS-01).
6. Metacrypt validates the challenge and issues the certificate.
7. Client downloads the certificate chain and private key.
A **Go client library** (`metacrypt/clients/go`) wraps this entire flow:
MCIAS login, EAB fetch, account registration, challenge fulfillment, and
certificate download. Services that integrate this library can obtain and
renew certificates programmatically.
### How Services Get Certificates Today
Currently, certificates are provisioned through Metacrypt's **REST API or web
UI** and placed into each service's `/srv/<service>/certs/` directory. This is
a manual process — the operator issues a certificate, downloads it, and
deploys the files. The ACME client library exists but is not yet integrated
into any service.
### How It Will Work With MCP
MCP is the natural place to automate certificate provisioning:
- **Initial deploy.** When MCP deploys a new service, it can provision a
certificate from Metacrypt (via the ACME client library or the REST API),
transfer the cert and key to the node as part of the config push to
`/srv/<service>/certs/`, and start the service with valid TLS material.
- **Renewal.** MCP knows what services are running and when their certificates
expire. It can renew certificates before expiry by re-running the ACME flow
(or calling Metacrypt's `renew` operation) and pushing updated files to the
node. The service restarts with the new certificate.
- **Migration.** When MCP migrates a service, the certificate in
`/srv/<service>/certs/` moves with the tar.zst snapshot. If the service's
hostname changes (new node, new DNS name), MCP provisions a new certificate
for the new name.
- **MC-Proxy L7 routes.** MC-Proxy's L7 mode requires certificate/key pairs
for TLS termination. MCP (or the operator) can provision these from
Metacrypt and push them to MC-Proxy's cert directory. MC-Proxy's
architecture doc lists ACME integration and Metacrypt key storage as future
work.
### Trust Distribution
Every service and client that validates TLS certificates needs the root CA
certificate (or the relevant issuer chain). Metacrypt serves these publicly
without authentication:
- `GET /v1/pki/{mount}/ca` — root CA certificate (PEM)
- `GET /v1/pki/{mount}/ca/chain` — full chain: issuer + root (PEM)
- `GET /v1/pki/{mount}/issuer/{name}` — specific issuer certificate (PEM)
During bootstrap, the root CA cert is distributed manually (or via the `ca/`
directory in the workspace). Once MCP is running, it can distribute the CA
cert as part of service deployment. Services reference the CA cert path in
their `[mcias]` config section (`ca_cert`) to verify connections to MCIAS and
other services.
---
## End-to-End Deploy Workflow
This traces a deployment from code change to running service, showing how every
component participates. The example deploys a new version of service α that is
already running on Node B.
### 1. Build and Push
The operator builds a new container image and pushes it to MCR:
```
Operator workstation (vade)
$ docker build -t mcr.metacircular.net/α:v1.2.0 .
$ docker push mcr.metacircular.net/α:v1.2.0
MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
Authenticates
via MCIAS
Policy check:
can this user
push to α?
Image stored
(blobs + manifest)
```
The `docker push` goes through MC-Proxy (SNI routing to MCR), authenticates
via the OCI token flow (which delegates to MCIAS), and is checked against
MCR's push policy. The image is stored content-addressed in MCR.
### 2. Deploy
The operator tells MCP to deploy:
```
Operator workstation (vade)
$ mcp deploy α # or: mcp deploy α --image v1.2.0
MCP Master
├── Registry lookup: α is running on Node B
├── C2 (gRPC over overlay) to Node B agent:
│ "pull mcr.metacircular.net/α:v1.2.0 and restart"
MCP Agent (Node B)
├── Pull image from MCR
│ (authenticates via MCIAS, same OCI flow)
├── Stop running container
├── Start new container from updated image
│ - Mounts /srv/α/ (config, database, certs all persist)
│ - Service starts, authenticates to MCIAS, resumes operation
└── Report status back to Master
```
Since α is already running on Node B, this is an in-place update. The
`/srv/α/` directory is untouched — config, database, and certificates persist
across the container restart.
### 3. First-Time Deploy
If α has never been deployed, MCP does more work:
```
Operator workstation (vade)
$ mcp deploy α --config α.toml
MCP Master
├── Registry lookup: α is not running anywhere
├── Scheduling: select Node C (best fit)
├── Provision TLS certificate from Metacrypt
│ (ACME flow or REST API)
├── C2 to Node C agent:
│ 1. Create /srv/α/ directory structure
│ 2. Transfer config file (α.toml → /srv/α/α.toml)
│ 3. Transfer TLS cert+key → /srv/α/certs/
│ 4. Transfer root CA cert → /srv/α/certs/ca.pem
│ 5. Pull image from MCR
│ 6. Start container
├── Update service registry: α → Node C
├── Push DNS update to MCNS:
α.svc.mcp.metacircular.net → Node C address
└── (Optionally) update MC-Proxy route table
if α needs external ingress
```
### 4. Migration
Moving α from Node B to Node C:
```
Operator workstation (vade)
$ mcp migrate α --to node-c # or let MCP choose the destination
MCP Master
├── C2 to Node B agent:
│ 1. Stop α container
│ 2. Snapshot /srv/α/ → tar.zst archive
│ 3. Transfer tar.zst to Master (or directly to Node C)
├── C2 to Node C agent:
│ 1. Receive tar.zst archive
│ 2. Extract to /srv/α/
│ 3. Pull container image from MCR (if not cached)
│ 4. Start container
│ 5. Report status
├── Update service registry: α → Node C
├── Push DNS update to MCNS:
α.svc.mcp.metacircular.net → Node C address
└── (If α had external ingress) update MC-Proxy route
or rely on DNS change
```
### What Each Component Does
| Step | MCIAS | Metacrypt | MCR | MC-Proxy | MCP | MCNS |
|------|-------|-----------|-----|----------|-----|------|
| Build/push image | Authenticates push | — | Stores image, enforces push policy | Routes traffic to MCR | — | — |
| Deploy (update) | Authenticates pull, authenticates service on start | — | Serves image to agent | Routes traffic to service | Coordinates: registry lookup, C2 to agent | — |
| Deploy (new) | Authenticates pull, authenticates service on start | Issues TLS certificate | Serves image to agent | Routes traffic to service (if external) | Coordinates: scheduling, cert provisioning, config transfer, DNS update | Updates DNS records |
| Migrate | Authenticates service on new node | Issues new cert (if hostname changes) | Serves image (if not cached) | Routes traffic to new location | Coordinates: snapshot, transfer, DNS update | Updates DNS records |
| Steady state | Validates tokens for every authenticated request | Serves CA certs publicly, renews certs | Serves image pulls | Routes all external traffic | Tracks service health, holds registry | Serves DNS queries |
---
## Future Ideas
Components and capabilities that may be worth building but have no immediate
timeline. Listed here to capture the thinking; none are committed.
### Observability — Log Collection and Health Monitoring
Every service already produces structured logs (`log/slog`) and exposes health
checks (gRPC `Health.Check` or REST status endpoints). What's missing is
aggregation — today, debugging a cross-service issue means SSH'ing into each
node and reading local logs.
A collector could:
- Gather structured logs from services on each node and forward them to a
central store.
- Periodically health-check local services and report status.
- Feed health data into MCP so it can make informed decisions (restart
unhealthy services, avoid scheduling on degraded nodes, alert the operator).
This might be a standalone service or an MCP agent capability, depending on
weight. If it's just "tail logs and hit health endpoints," it fits in the
agent. If it grows to include indexing, querying, retention policies, and
alerting rules, it's its own service.
### Object Store
The platform has structured storage (SQLite), blob storage scoped to container
images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does
not have general-purpose object/blob storage.
Potential uses:
- **Centralized backups.** Service snapshots currently live on each node in
`/srv/<service>/backups/`. A central object store gives MCP somewhere to push
tar.zst snapshots for offsite retention.
- **Artifact storage.** Build outputs, large files, anything that doesn't fit
in a database row.
- **Data sharing between services.** Files that need to move between services
outside the MCP C2 channel.
Prior art: [Nebula](https://metacircular.net/pages/nebula.html), a
content-addressable data store with capability-based security (SHA-256
addressed blobs, UUID entries for versioning, proxy references for revocable
access). Prototyped in multiple languages. The capability model is interesting
but may be more sophistication than the platform needs — a simpler
authenticated blob store with MCIAS integration might suffice.
### Overlay Network Management
The platform currently relies on an external overlay network (WireGuard,
Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard
mesh manager would bring the overlay under Metacircular's control:
- Automate key exchange and peer configuration when MCP adds a node.
- Manage IP allocation within the mesh (potentially absorbing part of MCNS's
scope).
- Remove the dependency on Tailscale's coordination servers.
This is a natural extension of the sovereignty principle but is low priority
while the mesh is small enough to manage by hand.
### Hypervisor / Isolation
A deeper exploration of environment isolation, message-passing between
services, and access mediation at a level below containers. Prior art:
[hypervisor concept](https://metacircular.net/pages/hypervisor.html). The
current platform achieves these goals through containers + MCIAS + policy
engines. A hypervisor layer would push isolation down to the OS level —
interesting for security but significant in scope. More relevant if the
platform ever moves beyond containers to VM-based workloads.
### Prior Art: SYSGOV
[SYSGOV](https://metacircular.net/pages/lisp-dcos.html) was an earlier
exploration of system management in Lisp, with SYSPLAN (desired state
enforcement) and SYSMON (service management). Many of its research questions —
C2 communication, service discovery, secure config distribution, failure
handling — are directly addressed by MCP's design. MCP is the spiritual
successor, reimplemented in Go with the benefit of the Metacircular platform
underneath it.

55399
docs/notebook.pdf Normal file

File diff suppressed because it is too large Load Diff