Update PROGRESS_V1.md with deployment status and remaining work
Documents Phase 6 (deployment), bugs fixed during rollout, remaining work organized by priority (operational, quality, design, infrastructure), and current platform state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -48,4 +48,88 @@
|
||||
|
||||
- [ ] **P5.1** Integration test suite
|
||||
- [ ] **P5.2** Bootstrap procedure test
|
||||
- [ ] **P5.3** Documentation (CLAUDE.md, README.md, RUNBOOK.md)
|
||||
- [x] **P5.3** Documentation — CLAUDE.md done; README.md and RUNBOOK.md pending
|
||||
|
||||
## Phase 6: Deployment (completed 2026-03-26)
|
||||
|
||||
- [x] **P6.1** NixOS config for mcp user (rootless podman, subuid/subgid, systemd service)
|
||||
- [x] **P6.2** TLS cert provisioned from Metacrypt (DNS + IP SANs)
|
||||
- [x] **P6.3** MCIAS system account (mcp-agent with admin role)
|
||||
- [x] **P6.4** Container migration (metacrypt, mc-proxy, mcr, mcns → mcp user)
|
||||
- [x] **P6.5** MCP bootstrap (adopt, sync, export service definitions)
|
||||
- [x] **P6.6** Service definitions completed with full container specs
|
||||
|
||||
## Deployment Bugs Fixed During Rollout
|
||||
|
||||
- podman ps JSON: `Command` field is `[]string` not `string`
|
||||
- Container name handling: `splitContainerName` naive split broke `mc-proxy`
|
||||
→ extracted `ContainerNameFor`/`SplitContainerName` with registry-aware lookup
|
||||
- CLI default config path: `~/.config/mcp/mcp.toml`
|
||||
- Token file whitespace: trim newlines before sending in gRPC metadata
|
||||
- NixOS systemd sandbox: `ProtectHome` blocks `/run/user`, `ProtectSystem=strict`
|
||||
blocks podman runtime dir → relaxed to `ProtectSystem=full`, `ProtectHome=false`
|
||||
- Agent needs `PATH`, `HOME`, `XDG_RUNTIME_DIR` in systemd environment
|
||||
|
||||
## Remaining Work
|
||||
|
||||
### Operational — Next Priority
|
||||
|
||||
- [ ] **MCR auth for mcp user** — podman pull from MCR requires OCI token
|
||||
auth. Currently using image save/load workaround. Need either: OCI token
|
||||
flow support in the agent, or podman login with service account credentials.
|
||||
- [ ] **Vade DNS routing** — Tailscale MagicDNS intercepts `*.svc.mcp.metacircular.net`
|
||||
queries on vade, preventing hostname-based TLS connections. CLI currently
|
||||
uses IP address directly. Fix: Tailscale DNS configuration or split-horizon
|
||||
setup on vade.
|
||||
- [ ] **Service export completeness** — `mcp service export` only captures
|
||||
name + image from the registry. Should include full spec (network, ports,
|
||||
volumes, user, restart, cmd). Requires the agent's `ListServices` response
|
||||
to include full `ComponentSpec` data, not just `ComponentInfo`.
|
||||
|
||||
### Quality
|
||||
|
||||
- [ ] **P5.1** Integration test suite — end-to-end CLI → agent → podman tests
|
||||
- [ ] **P5.2** Bootstrap procedure test — documented and verified
|
||||
- [ ] **README.md** — quick-start guide
|
||||
- [ ] **RUNBOOK.md** — operational procedures (unseal metacrypt, restart
|
||||
services, disaster recovery)
|
||||
|
||||
### Design
|
||||
|
||||
- [ ] **Self-management** — how MCP updates mc-proxy and its own agent without
|
||||
circular dependency. Likely answer: NixOS manages the agent and mc-proxy
|
||||
binaries; MCP manages their containers. Or: staged restart with health
|
||||
checks.
|
||||
- [ ] **ARCHITECTURE.md proto naming** — update spec to match buf-lint-compliant
|
||||
message names (StopServiceRequest vs ServiceRequest, AdoptContainers vs
|
||||
AdoptContainer).
|
||||
- [ ] **mcdsl DefaultPath helper** — `DefaultPath(name) string` for consistent
|
||||
config file discovery across all services. Root: /srv, /etc. User: XDG, /srv.
|
||||
- [ ] **Engineering standards update** — document REST+gRPC parity exception
|
||||
for infrastructure services (MCP agent).
|
||||
|
||||
### Infrastructure
|
||||
|
||||
- [ ] **Certificate renewal** — MCP-managed cert renewal before expiry.
|
||||
Agent cert expires 2026-06-24. Need automated renewal via Metacrypt ACME
|
||||
or REST API.
|
||||
- [ ] **Monitor alerting** — configure alert_command on rift (ntfy, webhook,
|
||||
or custom script) for drift/flap notifications.
|
||||
- [ ] **Backup timer** — install mcp-agent-backup timer via NixOS config.
|
||||
|
||||
## Current State (2026-03-26)
|
||||
|
||||
MCP is deployed and operational on rift. The agent runs as a systemd service
|
||||
under the `mcp` user with rootless podman. All platform services (metacrypt,
|
||||
mc-proxy, mcr, mcns) are managed by MCP with complete service definitions.
|
||||
|
||||
```
|
||||
$ mcp status
|
||||
SERVICE COMPONENT DESIRED OBSERVED VERSION
|
||||
mc-proxy mc-proxy running running latest
|
||||
mcns coredns running running 1.12.1
|
||||
mcr api running running latest
|
||||
mcr web running running latest
|
||||
metacrypt api running running latest
|
||||
metacrypt web running running latest
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user