From 4722fdb0da67ae83c0f8f888a0a11d7c8696a573 Mon Sep 17 00:00:00 2001 From: Kyle Isom Date: Sat, 28 Mar 2026 23:05:37 -0700 Subject: [PATCH] Sync platform docs: Phase D complete, Phase E planned, version updates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - PLATFORM_EVOLUTION: Mark Phase D (DNS) complete, add Phase E (multi-node agent management) planning with items #10-12 - PLATFORM_EVOLUTION: Fix stale mcdsl reference (v1.2.0 adds → added, consuming services now on v1.4.0) - STATUS: Update all service versions to current, note Phase A-D completion and Phase E planning - docs/packaging-and-deployment: Add agent management section Co-Authored-By: Claude Opus 4.6 (1M context) --- PLATFORM_EVOLUTION.md | 127 ++++++++++++++++++++++--------- STATUS.md | 82 ++++++++++---------- docs/packaging-and-deployment.md | 68 +++++++++++++++++ 3 files changed, 200 insertions(+), 77 deletions(-) diff --git a/PLATFORM_EVOLUTION.md b/PLATFORM_EVOLUTION.md index a2bc928..b439412 100644 --- a/PLATFORM_EVOLUTION.md +++ b/PLATFORM_EVOLUTION.md @@ -5,7 +5,7 @@ from its current manually-wired state to fully declarative deployment. It is a living design document — not a spec, not a commitment, but a record of where we are, where we want to be, and what's between. -Last updated: 2026-03-27 (Phases A + B + C complete) +Last updated: 2026-03-28 (Phases A + B + C + D complete) --- @@ -239,16 +239,16 @@ mc-proxy routes are fully persisted in SQLite and survive restarts: bootstrap before MCP is operational. The gRPC API and mcproxyctl are the primary route management interfaces going forward. -#### 6. MCP Agent: DNS Registration +#### 6. MCP Agent: DNS Registration — DONE -**Gap**: DNS records are manually configured in MCNS zone files. - -**Work**: -- Agent creates/updates A records in MCNS for - `.svc.mcp.metacircular.net`. -- Agent removes records on service teardown. - -**Depends on**: MCNS record management API (#8). +Agent automatically manages DNS records during deploy and stop: +- Deploy: calls MCNS API to create/update A records for + `.svc.mcp.metacircular.net` pointing to the node's address. +- Stop/undeploy: removes DNS records before stopping containers. +- Config: `[mcns]` section in agent config with server URL, CA cert, + token path, zone, and node address. +- Nil-safe: if MCNS not configured, silently skipped (backward compatible). +- Authorization: mcp-agent system account can manage any record name. #### 7. Metacrypt: Automated Cert Issuance Policy — DONE @@ -259,31 +259,29 @@ issuance: `*.svc.mcp.metacircular.net` - One cert per hostname per service — no wildcard certs -#### 8. MCNS: Record Management API +#### 8. MCNS: Record Management API — DONE -**Gap**: MCNS v1.0.0 has REST + gRPC APIs and SQLite storage, but -records are currently seeded from migrations (static). The API supports -CRUD operations but MCP does not yet call it for dynamic registration. - -**Work**: -- MCP agent calls MCNS API to create/update/delete records on - deploy/stop. -- MCIAS auth scoping to allow MCP agent to manage - `*.svc.mcp.metacircular.net` records. - -**Depends on**: MCNS API exists. Remaining work is MCP integration -and auth scoping. +MCNS provides full CRUD for DNS records via REST and gRPC: +- REST: POST/GET/PUT/DELETE on `/v1/zones/{zone}/records` +- gRPC: RecordService with ListRecords, CreateRecord, GetRecord, + UpdateRecord, DeleteRecord RPCs +- SQLite-backed with transactional writes, CNAME exclusivity enforcement, + and automatic SOA serial bumping on mutations +- Authorization: admin can manage any record, mcp-agent system account + can manage any record name, other system accounts scoped to own name +- MCP agent uses the REST API to register/deregister records on + deploy/stop #### 9. Application $PORT Convention — DONE -mcdsl v1.2.0 adds `$PORT` and `$PORT_GRPC` env var support: +mcdsl v1.2.0 added `$PORT` and `$PORT_GRPC` env var support: - `config.Load` checks `$PORT` → overrides `Server.ListenAddr` - `config.Load` checks `$PORT_GRPC` → overrides `Server.GRPCAddr` - Takes precedence over TOML and generic env overrides (`$MCR_SERVER_LISTEN_ADDR`) — agent-assigned ports are authoritative - Handles both `config.Base` embedding (MCR, MCNS, MCAT) and direct `ServerConfig` embedding (Metacrypt) via struct tree walking -- All consuming services upgraded to mcdsl v1.2.0 +- All consuming services on mcdsl v1.4.0 --- @@ -306,26 +304,85 @@ Phase C — Automated TLS: ✓ COMPLETE #4 Agent provisions certs ✓ DONE (depends on #7) -Phase D — DNS: - #8 MCNS record management API - #6 Agent registers DNS +Phase D — DNS: ✓ COMPLETE + #8 MCNS record management API ✓ DONE + #6 Agent registers DNS ✓ DONE (depends on #8) + +Phase E — Multi-node agent management: + #10 Agent binary at /srv/mcp/mcp-agent on all nodes + #11 mcp agent upgrade (SSH-based cross-compiled push) + #12 Node provisioning tooling (Debian + NixOS) + (depends on #10) ``` -**Phases A, B, and C are complete.** Services can be deployed with +**Phases A, B, C, and D are complete.** Services can be deployed with agent-assigned ports, `$PORT` env vars, automatic mc-proxy route -registration, and automated TLS cert provisioning from Metacrypt CA. -No more manual port picking, mcproxyctl, TOML editing, or cert generation. - -The only remaining manual step is DNS registration (Phase D). +registration, automated TLS cert provisioning from Metacrypt CA, and +automatic DNS registration in MCNS. No more manual port picking, +mcproxyctl, TOML editing, cert generation, or DNS zone editing. ### Immediate Next Steps -1. **Phase D: DNS** — MCNS record management API integration, then - agent registers DNS records during deploy. +1. **Phase E: Multi-node agent management** — see below. 2. **mcdoc implementation** — fully designed, no platform evolution dependency. Deployable now with the new route system. +#### 10. Agent Binary Location Convention + +**Gap**: The agent binary is currently NixOS-managed on rift (lives in +`/nix/store/`, systemd `ExecStart` points there). This doesn't work for +Debian nodes and requires a full `nixos-rebuild` for every MCP release. + +**Work**: +- Standardize agent binary at `/srv/mcp/mcp-agent` on all nodes. +- NixOS config: change `ExecStart` from nix store path to + `/srv/mcp/mcp-agent`. NixOS still owns user, systemd unit, podman, + directories — just not the binary version. +- Debian nodes: same layout, provisioned by setup script. + +#### 11. Agent Upgrade via SSH Push + +**Gap**: Updating the agent requires manual, OS-specific steps. On +NixOS: update flake lock, commit, push, rebuild. On Debian: build, scp, +restart. With multiple nodes and architectures (amd64 + arm64), this +doesn't scale. + +**Work**: +- `mcp agent upgrade [node]` CLI command. +- Cross-compiles agent for each target arch (`GOARCH` from node config). +- Uses `golang.org/x/crypto/ssh` to push the binary and restart the + service. No external tool dependencies. +- Node config gains `ssh` (hostname) and `arch` (GOARCH) fields. +- Upgrades all nodes by default to prevent version skew. New RPCs cause + `Unimplemented` errors if agent and CLI are out of sync. + +**Depends on**: #10 (binary location convention). + +#### 12. Node Provisioning Tooling + +**Gap**: Setting up a new node requires manual steps: create user, +create directories, install podman, write config, create systemd unit. +Different for NixOS vs Debian. + +**Work**: +- Go-based provisioning tool (part of MCP CLI) or standalone script. +- `mcp node provision ` SSHs to the node and runs setup: + create `mcp` user with podman access, create `/srv/mcp/`, write + systemd unit, install initial binary, start service. +- For NixOS, provisioning remains in the NixOS config (declarative). + The provisioning tool targets Debian/generic Linux. + +**Depends on**: #10 (binary location convention), #11 (SSH infra). + +**Current fleet**: + +| Node | OS | Arch | Status | +|------|----|------|--------| +| rift | NixOS | amd64 | Operational, single MCP agent | +| hyperborea | Debian (RPi) | arm64 | Online, needs agent provisioning | +| svc | Debian | amd64 | Runs MCIAS, needs agent for public edge services | + --- ## Open Questions diff --git a/STATUS.md b/STATUS.md index 2479bba..8cfe268 100644 --- a/STATUS.md +++ b/STATUS.md @@ -1,6 +1,6 @@ # Metacircular Platform Status -Last updated: 2026-03-27 +Last updated: 2026-03-28 ## Platform Overview @@ -8,28 +8,30 @@ One node operational (**rift**), running core infrastructure services as containers fronted by MC-Proxy. MCIAS runs separately (not on rift). Bootstrap phases 0–4 complete (MCIAS, Metacrypt, MC-Proxy, MCR all operational). MCP is deployed and managing all platform containers. MCNS is -deployed on rift, serving authoritative DNS. +deployed on rift, serving authoritative DNS. Platform evolution Phases A–D +complete (automated port assignment, route registration, TLS cert +provisioning, and DNS registration). Multi-node deployment is being planned +(Phase E). ## Service Status | Service | Version | SDLC Phase | Deployed | Node | |---------|---------|------------|----------|------| -| MCIAS | v1.8.0 | Maintenance | Yes | (separate) | -| Metacrypt | v1.1.0 | Production | Yes | rift | +| MCIAS | v1.9.0 | Maintenance | Yes | (separate) | +| Metacrypt | v1.3.1 | Production | Yes | rift | | MC-Proxy | v1.2.1 | Maintenance | Yes | rift | -| MCR | v1.2.0 | Production | Yes | rift | -| MCAT | v1.1.0 | Complete | Unknown | — | -| MCDSL | v1.2.0 | Stable | N/A (library) | — | -| MCNS | v1.1.0 | Production | Yes | rift | +| MCR | v1.2.1 | Production | Yes | rift | +| MCAT | v1.1.1 | Complete | Unknown | — | +| MCDSL | v1.4.0 | Stable | N/A (library) | — | +| MCNS | v1.1.1 | Production | Yes | rift | | MCDoc | v0.1.0 | Production | Yes | rift | -| MCP | v0.4.0 | Production | Yes | rift | -| MCDeploy | v0.2.0 | Active dev | N/A (CLI tool) | — | +| MCP | v0.7.6 | Production | Yes | rift | ## Service Details ### MCIAS — Identity and Access Service -- **Version:** v1.8.0 (client library: clients/go/v0.2.0) +- **Version:** v1.9.0 (client library: clients/go/v0.2.0) - **Phase:** Maintenance. Phases 0-14 complete. Feature-complete with active refinement. - **Deployment:** Running in production. All other services authenticate @@ -41,7 +43,7 @@ deployed on rift, serving authoritative DNS. ### Metacrypt — Cryptographic Service Engine -- **Version:** v1.1.0. +- **Version:** v1.3.1. - **Phase:** Production. All four engine types implemented (CA, SSH CA, transit, user-to-user). Active work on integration test coverage. - **Deployment:** Running on rift as a container, fronted by MC-Proxy on @@ -56,7 +58,8 @@ deployed on rift, serving authoritative DNS. - **Version:** v1.2.1. - **Phase:** Maintenance. Stable and actively routing traffic on rift. - **Deployment:** Running on rift. Fronts Metacrypt, MCR, and sgard on ports - 443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091. + 443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091. Routes persisted + in SQLite and managed via gRPC API. - **Recent work:** Route persistence (SQLite), idempotent AddRoute (upsert), golangci-lint v2 compliance, module path migration to mc/ org. - **Artifacts:** systemd units (service + backup timer), Docker Compose @@ -64,7 +67,7 @@ deployed on rift, serving authoritative DNS. ### MCR — Container Registry -- **Version:** v1.2.0. All implementation phases complete. +- **Version:** v1.2.1. All implementation phases complete. - **Phase:** Production. Deployed on rift, serving container images. - **Deployment:** Running on rift as two containers (mcr API + mcr-web), fronted by MC-Proxy on ports 443 (web, L7), 8443 (API, L4), and @@ -77,7 +80,7 @@ deployed on rift, serving authoritative DNS. ### MCAT — Login Policy Tester -- **Version:** v1.1.0. +- **Version:** v1.1.1. - **Phase:** Complete. Diagnostic tool, not core infrastructure. - **Deployment:** Available for ad-hoc use. Lightweight tool for testing MCIAS login policy rules. @@ -86,20 +89,21 @@ deployed on rift, serving authoritative DNS. ### MCDSL — Standard Library -- **Version:** v1.2.0. +- **Version:** v1.4.0. - **Phase:** Stable. All 9 packages implemented and tested. Being adopted across the platform. - **Deployment:** N/A (Go library, imported by other services). - **Packages:** auth, db, config, httpserver, grpcserver, csrf, web, health, archive. -- **Adoption:** All services except mcias on v1.2.0. mcias pending. +- **Adoption:** All services except mcias on v1.4.0. mcias pending. ### MCNS — Networking Service -- **Version:** v1.1.0. +- **Version:** v1.1.1. - **Phase:** Production. Custom Go DNS server replacing CoreDNS precursor. - **Deployment:** Running on rift as a container managed by MCP. Serves two - authoritative zones plus upstream forwarding. + authoritative zones plus upstream forwarding. REST + gRPC APIs with MCIAS + auth and name-scoped system account authorization. - **Recent work:** v1.0.0 implementation (custom Go DNS server), engineering review, deployed to rift replacing CoreDNS. - **Artifacts:** Dockerfile, Docker Compose (rift), MCP service definition, @@ -117,34 +121,24 @@ deployed on rift, serving authoritative DNS. ### MCP — Control Plane -- **Version:** v0.4.0. -- **Phase:** Production. Phases 0-4 complete. Phase C (automated TLS cert - provisioning) implemented. Deployed to rift, managing all platform containers. +- **Version:** v0.7.6. +- **Phase:** Production. Phases A–D complete. Deployed to rift, managing all + platform containers. - **Deployment:** Running on rift. Agent as systemd service under `mcp` user - with rootless podman. Manages metacrypt, mc-proxy, mcr, and mcns containers. + with rootless podman. Manages metacrypt, mc-proxy, mcr, mcns, and mcdoc + containers. - **Architecture:** Two components — `mcp` CLI (thin client on vade) and `mcp-agent` (per-node daemon with SQLite registry, podman management, - monitoring with drift/flap detection, route registration with mc-proxy during - deploy/stop, automated TLS cert provisioning for L7 routes via Metacrypt CA). - gRPC-only (no REST). -- **Recent work:** Full v1 implementation (12 RPCs, 15 CLI commands), - deployment to rift, container migration from kyle→mcp user, service - definition authoring. Phase C automated TLS cert provisioning for L7 routes, - mc-proxy route registration during deploy, mc-proxy dependency updated to - v1.2.0, module path migration. + monitoring with drift/flap detection, route registration with mc-proxy, + automated TLS cert provisioning for L7 routes via Metacrypt CA, automated + DNS registration in MCNS). gRPC-only (no REST). 15 RPCs, 17+ CLI commands. +- **Recent work:** Phase C (automated TLS cert provisioning), Phase D + (automated DNS registration via MCNS), undeploy command, logs command, + edit command, auto-login to MCR, system account auth model, module path + migration. - **Artifacts:** systemd service (NixOS), TLS cert from Metacrypt, service definition files, design docs. -### MCDeploy — Deployment CLI - -- **Version:** v0.2.0. -- **Phase:** Active development. Tactical bridge tool for deploying services - while MCP is being built. -- **Deployment:** N/A (local CLI tool, not a server). -- **Recent work:** Initial implementation, Nix flake. -- **Description:** Single-binary CLI that shells out to podman/ssh/scp/git - for build, push, deploy, cert renewal, and status. TOML-configured. - ## Node Inventory | Node | Address (LAN) | Address (Tailscale) | Role | @@ -153,10 +147,14 @@ deployed on rift, serving authoritative DNS. ## Rift Port Map +Note: Services deployed via MCP receive dynamically assigned host ports +(10000–60000). The ports below are for infrastructure services with static +assignments or well-known ports. + | Port | Protocol | Services | |------|----------|----------| | 53 | DNS (LAN + Tailscale) | mcns | -| 443 | L7 (TLS termination) | metacrypt-web, mcr-web | +| 443 | L7 (TLS termination) | metacrypt-web, mcr-web, mcdoc | | 8080 | HTTP (all interfaces) | exod | | 8443 | L4 (SNI passthrough) | metacrypt API, mcr API | | 9090 | HTTP (all interfaces) | exod | diff --git a/docs/packaging-and-deployment.md b/docs/packaging-and-deployment.md index 9142eef..5cd2eb0 100644 --- a/docs/packaging-and-deployment.md +++ b/docs/packaging-and-deployment.md @@ -608,6 +608,74 @@ Services follow a standard directory structure: --- +## 10. Agent Management + +MCP manages a fleet of nodes with heterogeneous operating systems and +architectures. The agent binary lives at `/srv/mcp/mcp-agent` on every +node — this is a mutable path that MCP controls, regardless of whether +the node runs NixOS or Debian. + +### Node Configuration + +Each node in `~/.config/mcp/mcp.toml` includes SSH and architecture +info for agent management: + +```toml +[[nodes]] +name = "rift" +address = "100.95.252.120:9444" +ssh = "rift" +arch = "amd64" + +[[nodes]] +name = "hyperborea" +address = "100.x.x.x:9444" +ssh = "hyperborea" +arch = "arm64" +``` + +### Upgrading Agents + +After tagging a new MCP release: + +```bash +# Upgrade all nodes (recommended — prevents version skew) +mcp agent upgrade + +# Upgrade a single node +mcp agent upgrade rift + +# Check versions across the fleet +mcp agent status +``` + +`mcp agent upgrade` cross-compiles the agent binary for each target +architecture, SSHs to each node, atomically replaces the binary, and +restarts the systemd service. All nodes should be upgraded together +because new CLI versions often depend on new agent RPCs. + +### Provisioning New Nodes + +One-time setup for a new Debian node: + +```bash +# 1. Provision the node (creates user, dirs, systemd unit, installs binary) +mcp node provision + +# 2. Register the node +mcp node add
+ +# 3. Deploy services +mcp deploy +``` + +For NixOS nodes, provisioning is handled by the NixOS configuration. +The NixOS config creates the `mcp` user, systemd unit, and directories. +The `ExecStart` path points to `/srv/mcp/mcp-agent` so that `mcp agent +upgrade` works the same as on Debian nodes. + +--- + ## Appendix: Currently Deployed Services For reference, these services are operational on the platform: