From ff9bfc50873d81352865c4863c9ccb96445cb439 Mon Sep 17 00:00:00 2001 From: Kyle Isom Date: Thu, 26 Mar 2026 15:27:30 -0700 Subject: [PATCH] Update PROGRESS_V1.md with deployment status and remaining work Documents Phase 6 (deployment), bugs fixed during rollout, remaining work organized by priority (operational, quality, design, infrastructure), and current platform state. Co-Authored-By: Claude Opus 4.6 (1M context) --- PROGRESS_V1.md | 86 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 85 insertions(+), 1 deletion(-) diff --git a/PROGRESS_V1.md b/PROGRESS_V1.md index 3df657f..11c2d6b 100644 --- a/PROGRESS_V1.md +++ b/PROGRESS_V1.md @@ -48,4 +48,88 @@ - [ ] **P5.1** Integration test suite - [ ] **P5.2** Bootstrap procedure test -- [ ] **P5.3** Documentation (CLAUDE.md, README.md, RUNBOOK.md) +- [x] **P5.3** Documentation — CLAUDE.md done; README.md and RUNBOOK.md pending + +## Phase 6: Deployment (completed 2026-03-26) + +- [x] **P6.1** NixOS config for mcp user (rootless podman, subuid/subgid, systemd service) +- [x] **P6.2** TLS cert provisioned from Metacrypt (DNS + IP SANs) +- [x] **P6.3** MCIAS system account (mcp-agent with admin role) +- [x] **P6.4** Container migration (metacrypt, mc-proxy, mcr, mcns → mcp user) +- [x] **P6.5** MCP bootstrap (adopt, sync, export service definitions) +- [x] **P6.6** Service definitions completed with full container specs + +## Deployment Bugs Fixed During Rollout + +- podman ps JSON: `Command` field is `[]string` not `string` +- Container name handling: `splitContainerName` naive split broke `mc-proxy` + → extracted `ContainerNameFor`/`SplitContainerName` with registry-aware lookup +- CLI default config path: `~/.config/mcp/mcp.toml` +- Token file whitespace: trim newlines before sending in gRPC metadata +- NixOS systemd sandbox: `ProtectHome` blocks `/run/user`, `ProtectSystem=strict` + blocks podman runtime dir → relaxed to `ProtectSystem=full`, `ProtectHome=false` +- Agent needs `PATH`, `HOME`, `XDG_RUNTIME_DIR` in systemd environment + +## Remaining Work + +### Operational — Next Priority + +- [ ] **MCR auth for mcp user** — podman pull from MCR requires OCI token + auth. Currently using image save/load workaround. Need either: OCI token + flow support in the agent, or podman login with service account credentials. +- [ ] **Vade DNS routing** — Tailscale MagicDNS intercepts `*.svc.mcp.metacircular.net` + queries on vade, preventing hostname-based TLS connections. CLI currently + uses IP address directly. Fix: Tailscale DNS configuration or split-horizon + setup on vade. +- [ ] **Service export completeness** — `mcp service export` only captures + name + image from the registry. Should include full spec (network, ports, + volumes, user, restart, cmd). Requires the agent's `ListServices` response + to include full `ComponentSpec` data, not just `ComponentInfo`. + +### Quality + +- [ ] **P5.1** Integration test suite — end-to-end CLI → agent → podman tests +- [ ] **P5.2** Bootstrap procedure test — documented and verified +- [ ] **README.md** — quick-start guide +- [ ] **RUNBOOK.md** — operational procedures (unseal metacrypt, restart + services, disaster recovery) + +### Design + +- [ ] **Self-management** — how MCP updates mc-proxy and its own agent without + circular dependency. Likely answer: NixOS manages the agent and mc-proxy + binaries; MCP manages their containers. Or: staged restart with health + checks. +- [ ] **ARCHITECTURE.md proto naming** — update spec to match buf-lint-compliant + message names (StopServiceRequest vs ServiceRequest, AdoptContainers vs + AdoptContainer). +- [ ] **mcdsl DefaultPath helper** — `DefaultPath(name) string` for consistent + config file discovery across all services. Root: /srv, /etc. User: XDG, /srv. +- [ ] **Engineering standards update** — document REST+gRPC parity exception + for infrastructure services (MCP agent). + +### Infrastructure + +- [ ] **Certificate renewal** — MCP-managed cert renewal before expiry. + Agent cert expires 2026-06-24. Need automated renewal via Metacrypt ACME + or REST API. +- [ ] **Monitor alerting** — configure alert_command on rift (ntfy, webhook, + or custom script) for drift/flap notifications. +- [ ] **Backup timer** — install mcp-agent-backup timer via NixOS config. + +## Current State (2026-03-26) + +MCP is deployed and operational on rift. The agent runs as a systemd service +under the `mcp` user with rootless podman. All platform services (metacrypt, +mc-proxy, mcr, mcns) are managed by MCP with complete service definitions. + +``` +$ mcp status +SERVICE COMPONENT DESIRED OBSERVED VERSION +mc-proxy mc-proxy running running latest +mcns coredns running running 1.12.1 +mcr api running running latest +mcr web running running latest +metacrypt api running running latest +metacrypt web running running latest +```