From 11466062084f6b74cf134d2c30402001aa5900d4 Mon Sep 17 00:00:00 2001 From: Kyle Isom Date: Fri, 27 Mar 2026 00:07:14 -0700 Subject: [PATCH] Add PLATFORM_EVOLUTION.md Documents the gap between the current manually-wired platform state and the target fully-declarative deployment model. Covers port assignment, mc-proxy route persistence, automated TLS provisioning via Metacrypt, DNS registration via MCNS, and the $PORT convention. Includes gap analysis and suggested phasing. Co-Authored-By: Claude Opus 4.6 (1M context) --- PLATFORM_EVOLUTION.md | 330 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 330 insertions(+) create mode 100644 PLATFORM_EVOLUTION.md diff --git a/PLATFORM_EVOLUTION.md b/PLATFORM_EVOLUTION.md new file mode 100644 index 0000000..fa2a32c --- /dev/null +++ b/PLATFORM_EVOLUTION.md @@ -0,0 +1,330 @@ +# Platform Evolution + +This document captures the planned evolution of the Metacircular platform +from its current manually-wired state to fully declarative deployment. +It is a living design document — not a spec, not a commitment, but a +record of where we are, where we want to be, and what's between. + +Last updated: 2026-03-27 + +--- + +## Current State + +The platform works. Services run on rift, fronted by mc-proxy, with +MCIAS handling auth and Metacrypt managing secrets. MCP can deploy, +stop, start, restart, and monitor containers. This is not nothing — the +core infrastructure is real and operational. + +But the wiring between services is manual: + +- **Port assignment**: operators pick host ports by hand and record them + in service definitions (`ports = ["127.0.0.1:28443:8443"]`). A mental + register of "what port is free" is required. +- **mc-proxy routing**: routes are defined in a static TOML config file. + Adding a service means editing `mc-proxy-rift.toml`, restarting + mc-proxy, and hoping you didn't typo a port number. +- **TLS certificates**: provisioned manually. Certs are generated, + placed in `/srv/mc-proxy/certs/`, and referenced by path in the + mc-proxy config. +- **DNS**: records are manually configured in MCNS zone files. +- **Container networking**: operators specify `network`, `user`, and + `restart` policy per component, even though these are almost always + the same values. + +Each new service requires touching 4-5 files across 3-4 repos. The +process works but doesn't scale and is error-prone. + +## Target State + +The operator writes a service definition that declares **what** they +want, not **how** to wire it: + +```toml +name = "metacrypt" +node = "rift" +active = true +path = "metacrypt" + +[build] +uses_mcdsl = false + +[build.images] +metacrypt = "Dockerfile.api" +metacrypt-web = "Dockerfile.web" + +[[components]] +name = "api" +image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:v1.0.0" +volumes = ["/srv/metacrypt:/srv/metacrypt"] +cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"] + +[[components.routes]] +name = "rest" +port = 8443 +mode = "l4" + +[[components.routes]] +name = "grpc" +port = 9443 +mode = "l4" + +[[components]] +name = "web" +image = "mcr.svc.mcp.metacircular.net:8443/metacrypt-web:v1.0.0" +volumes = ["/srv/metacrypt:/srv/metacrypt"] +cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"] + +[[components.routes]] +name = "web" +port = 443 +mode = "l7" +``` + +`mcp deploy metacrypt` does the rest: + +1. Agent assigns a free host port per route (random, check availability, + retry on collision). +2. Agent requests TLS certs from Metacrypt CA for + `metacrypt.svc.mcp.metacircular.net`. +3. Agent registers routes with mc-proxy via gRPC (mc-proxy persists + them in SQLite). +4. Agent creates/updates DNS records in MCNS for + `metacrypt.svc.mcp.metacircular.net`. +5. Agent starts containers with `$PORT_REST`, `$PORT_GRPC`, `$PORT_WEB` + environment variables set to the assigned host ports. +6. Agent records the full state (port assignments, cert paths, route + IDs) in its registry. + +On teardown (`mcp stop`), the agent reverses the process: stops +containers, removes mc-proxy routes, cleans up DNS records. + +### Port Environment Variables + +Applications receive their assigned ports via environment variables: + +| Components with... | Env var | Example | +|--------------------|---------|---------| +| Single route | `$PORT` | `$PORT=8913` | +| Multiple routes | `$PORT_` | `$PORT_REST=8913`, `$PORT_GRPC=9217` | + +Route names come from the `name` field in `[[components.routes]]`. +Applications read these in their config layer alongside existing env +overrides (e.g., `$MCR_SERVER_LISTEN_ADDR`). + +### Hostname Convention + +Every service gets `.svc.mcp.metacircular.net` automatically. +Public-facing services can additionally declare external hostnames: + +```toml +[[components.routes]] +name = "web" +port = 443 +mode = "l7" +hostname = "docs.metacircular.net" # optional, public DNS +``` + +If `hostname` is omitted, the route uses the default +`.svc.mcp.metacircular.net`. + +### Fields Removed from Service Definitions + +These become agent-level defaults or are derived automatically: + +| Field | Current | Target | +|-------|---------|--------| +| `ports` | Manual port mapping | Agent-assigned via routes | +| `network` | Per-component | Agent default (`mcpnet`) | +| `user` | Per-component | Agent default (`0:0`) | +| `restart` | Per-component | Agent default (`unless-stopped`) | + +--- + +## Gap Analysis + +### What exists today and works + +| Capability | Status | +|------------|--------| +| MCP CLI + agent deploy/stop/start/restart | Working | +| MCP sync (push service definitions to agent) | Working | +| MCP status/monitoring/drift detection | Working | +| mc-proxy L4/L7 routing | Working | +| mc-proxy gRPC admin API | Working | +| MCIAS auth for all services | Working | +| Metacrypt CA (PKI engine) | Working | +| MCNS DNS serving | Working | +| MCR container registry | Working | +| Service definitions in ~/.config/mcp/services/ | Working | +| Image build pipeline (mcdeploy.toml, being folded into MCP) | Working | + +### What needs to change + +#### 1. MCP Agent: Port Assignment + +**Gap**: agent doesn't manage host ports. Service definitions specify +them manually. + +**Work**: +- Agent picks free host ports at deploy time (random + availability + check). +- Agent records assignments in its registry. +- Agent passes `$PORT` / `$PORT_` to containers. +- Agent releases ports on container stop/removal. +- New service definition format drops `ports` field, adds + `[[components.routes]]`. + +**Depends on**: nothing (can be developed standalone). + +#### 2. MCP Agent: mc-proxy Route Registration + +**Gap**: mc-proxy routes are static TOML. The gRPC admin API exists but +MCP doesn't use it. + +**Work**: +- Agent calls mc-proxy gRPC API to register/remove routes on deploy/stop. +- Route registration includes: hostname, host port (agent-assigned), + mode (l4/l7), TLS cert paths. + +**Depends on**: port assignment (#1), mc-proxy route persistence (#4). + +#### 3. MCP Agent: TLS Cert Provisioning + +**Gap**: certs are manually provisioned and placed on disk. There is no +automated issuance flow. + +**Work**: +- Agent requests certs from Metacrypt CA via its API. +- Certs are stored in a standard location + (`/srv/mc-proxy/certs/.pem`). +- Cert renewal is handled automatically before expiry. + +**Depends on**: Metacrypt cert issuance policy (#6). + +#### 4. mc-proxy: Route Persistence + +**Gap**: mc-proxy loads routes from TOML on startup. Routes added via +gRPC are lost on restart. + +**Work**: +- mc-proxy persists gRPC-managed routes in its SQLite database. +- On startup, mc-proxy loads routes from the database. +- TOML route config is deprecated (kept for bootstrapping only, e.g., + mc-proxy's own routes before MCP is fully operational). +- mcproxyctl becomes the primary route management interface. + +**Depends on**: nothing (mc-proxy already has SQLite and gRPC API). + +#### 5. MCP Agent: DNS Registration + +**Gap**: DNS records are manually configured in MCNS zone files. + +**Work**: +- Agent creates/updates A records in MCNS for + `.svc.mcp.metacircular.net`. +- Agent removes records on service teardown. + +**Depends on**: MCNS record management API (#7). + +#### 6. Metacrypt: Automated Cert Issuance Policy + +**Gap**: no policy exists for automated cert issuance. The MCP agent +doesn't have a Metacrypt identity or permissions. + +**Work**: +- MCP agent gets an MCIAS service account. +- Metacrypt policy allows this account to issue certs scoped to + `*.svc.mcp.metacircular.net` (and explicitly listed public hostnames). +- No wildcard certs — one cert per hostname per service. + +**Depends on**: MCIAS service account provisioning (exists today, just +needs the account created). + +#### 7. MCNS: Record Management API + +**Gap**: MCNS is a CoreDNS precursor serving static zone files. There +is no API for dynamic record management. + +**Work**: +- MCNS needs an API (REST + gRPC per platform convention) for + creating/updating/deleting DNS records. +- Records are stored in SQLite (replacing or supplementing zone files). +- MCIAS auth, scoped to allow MCP agent to manage + `*.svc.mcp.metacircular.net` records. + +**Depends on**: this is the largest gap. MCNS is currently a CoreDNS +wrapper, not a full service. This may be the right time to build the +real MCNS. + +#### 8. Application $PORT Convention + +**Gap**: applications read listen addresses from their config files. +They don't check `$PORT` env vars. + +**Work**: +- Each service's config layer checks `$PORT` / `$PORT_` and + overrides the configured listen address if set. +- This fits naturally into the existing `$SERVICENAME_*` env override + convention — it's just one more env var. +- Small change per service, but touches every service. + +**Depends on**: nothing (can be done incrementally). + +--- + +## Suggested Sequencing + +The dependencies form a rough order: + +``` +Phase A — Independent groundwork (parallel): + #1 MCP agent port assignment + #4 mc-proxy route persistence + #8 $PORT convention in applications + +Phase B — MCP route registration: + #2 Agent registers routes with mc-proxy + (depends on #1 + #4) + +Phase C — Automated TLS: + #6 Metacrypt cert issuance policy + #3 Agent provisions certs + (depends on #6) + +Phase D — DNS: + #7 MCNS record management API + #5 Agent registers DNS + (depends on #7) +``` + +After Phase B, the manual steps are: cert provisioning and DNS. After +Phase C, only DNS remains manual. After Phase D, `mcp deploy` is fully +declarative. + +Each phase is independently useful. Phase A + B alone eliminates the +most common source of manual wiring errors (port assignment and mc-proxy +config). + +--- + +## Open Questions + +- **Cert rotation**: when a Metacrypt-issued cert expires, does the + agent renew it automatically? What's the renewal window? Does mc-proxy + need to reload certs without restart? +- **Public hostnames**: services like mcdoc want `docs.metacircular.net` + in addition to the `.svc.mcp.metacircular.net` name. Public DNS is + managed outside MCNS (Cloudflare? registrar?). How does the agent + handle the split between internal and external DNS? +- **mc-proxy bootstrap**: mc-proxy itself needs routes to be reachable. + If routes are in SQLite, how does mc-proxy start before MCP configures + it? A small set of static bootstrap routes (or self-configuration) may + be needed. +- **Multi-node**: this design assumes single-node (rift). When a second + node is added, port assignment is still per-agent, but mc-proxy + routing, cert provisioning, and DNS need to account for multiple + backends. Not a v1 concern, but worth keeping in mind. +- **Rollback**: if cert provisioning fails mid-deploy, does the agent + roll back the port assignment and mc-proxy route? What's the failure + mode — partial deploy, full rollback, or best-effort?