Files
mcp/PROGRESS_V1.md
Kyle Isom ea8a42a696 P5.2 + P5.3: Bootstrap docs, README, and RUNBOOK
- docs/bootstrap.md: step-by-step bootstrap procedure with lessons
  learned from the first deployment (NixOS sandbox issues, podman
  rootless setup, container naming, MCR auth workaround)
- README.md: quick-start guide, command reference, doc links
- RUNBOOK.md: operational procedures for operators (health checks,
  common operations, unsealing metacrypt, cert renewal, incident
  response, disaster recovery, file locations)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:32:22 -07:00

136 lines
5.4 KiB
Markdown

# MCP v1 Progress
## Phase 0: Project Scaffolding
- [x] **P0.1** Repository and module setup
- [x] **P0.2** Proto definitions and code generation
## Phase 1: Core Libraries
- [x] **P1.1** Registry package (`internal/registry/`)
- [x] **P1.2** Runtime package (`internal/runtime/`)
- [x] **P1.3** Service definition package (`internal/servicedef/`)
- [x] **P1.4** Config package (`internal/config/`)
- [x] **P1.5** Auth package (`internal/auth/`)
## Phase 2: Agent
- [x] **P2.1** Agent skeleton and gRPC server
- [x] **P2.2** Deploy handler
- [x] **P2.3** Lifecycle handlers (stop, start, restart)
- [x] **P2.4** Status handlers (list, live check, get status)
- [x] **P2.5** Sync handler
- [x] **P2.6** File transfer handlers
- [x] **P2.7** Adopt handler
- [x] **P2.8** Monitor subsystem
- [x] **P2.9** Snapshot command
## Phase 3: CLI
- [x] **P3.1** CLI skeleton
- [x] **P3.2** Login command
- [x] **P3.3** Deploy command
- [x] **P3.4** Lifecycle commands (stop, start, restart)
- [x] **P3.5** Status commands (list, ps, status)
- [x] **P3.6** Sync command
- [x] **P3.7** Adopt command
- [x] **P3.8** Service commands (show, edit, export)
- [x] **P3.9** Transfer commands (push, pull)
- [x] **P3.10** Node commands
## Phase 4: Deployment Artifacts
- [x] **P4.1** Systemd units
- [x] **P4.2** Example configs
- [x] **P4.3** Install script
## Phase 5: Integration and Polish
- [ ] **P5.1** Integration test suite
- [x] **P5.2** Bootstrap procedure — documented in `docs/bootstrap.md`
- [x] **P5.3** Documentation — CLAUDE.md, README.md, RUNBOOK.md
## Phase 6: Deployment (completed 2026-03-26)
- [x] **P6.1** NixOS config for mcp user (rootless podman, subuid/subgid, systemd service)
- [x] **P6.2** TLS cert provisioned from Metacrypt (DNS + IP SANs)
- [x] **P6.3** MCIAS system account (mcp-agent with admin role)
- [x] **P6.4** Container migration (metacrypt, mc-proxy, mcr, mcns → mcp user)
- [x] **P6.5** MCP bootstrap (adopt, sync, export service definitions)
- [x] **P6.6** Service definitions completed with full container specs
## Deployment Bugs Fixed During Rollout
- podman ps JSON: `Command` field is `[]string` not `string`
- Container name handling: `splitContainerName` naive split broke `mc-proxy`
→ extracted `ContainerNameFor`/`SplitContainerName` with registry-aware lookup
- CLI default config path: `~/.config/mcp/mcp.toml`
- Token file whitespace: trim newlines before sending in gRPC metadata
- NixOS systemd sandbox: `ProtectHome` blocks `/run/user`, `ProtectSystem=strict`
blocks podman runtime dir → relaxed to `ProtectSystem=full`, `ProtectHome=false`
- Agent needs `PATH`, `HOME`, `XDG_RUNTIME_DIR` in systemd environment
## Remaining Work
### Operational — Next Priority
- [ ] **MCR auth for mcp user** — podman pull from MCR requires OCI token
auth. Currently using image save/load workaround. Need either: OCI token
flow support in the agent, or podman login with service account credentials.
- [ ] **Vade DNS routing** — Tailscale MagicDNS intercepts `*.svc.mcp.metacircular.net`
queries on vade, preventing hostname-based TLS connections. CLI currently
uses IP address directly. Fix: Tailscale DNS configuration or split-horizon
setup on vade.
- [ ] **Service export completeness**`mcp service export` only captures
name + image from the registry. Should include full spec (network, ports,
volumes, user, restart, cmd). Requires the agent's `ListServices` response
to include full `ComponentSpec` data, not just `ComponentInfo`.
### Quality
- [ ] **P5.1** Integration test suite — end-to-end CLI → agent → podman tests
- [ ] **P5.2** Bootstrap procedure test — documented and verified
- [ ] **README.md** — quick-start guide
- [ ] **RUNBOOK.md** — operational procedures (unseal metacrypt, restart
services, disaster recovery)
### Design
- [ ] **Self-management** — how MCP updates mc-proxy and its own agent without
circular dependency. Likely answer: NixOS manages the agent and mc-proxy
binaries; MCP manages their containers. Or: staged restart with health
checks.
- [ ] **ARCHITECTURE.md proto naming** — update spec to match buf-lint-compliant
message names (StopServiceRequest vs ServiceRequest, AdoptContainers vs
AdoptContainer).
- [ ] **mcdsl DefaultPath helper**`DefaultPath(name) string` for consistent
config file discovery across all services. Root: /srv, /etc. User: XDG, /srv.
- [ ] **Engineering standards update** — document REST+gRPC parity exception
for infrastructure services (MCP agent).
### Infrastructure
- [ ] **Certificate renewal** — MCP-managed cert renewal before expiry.
Agent cert expires 2026-06-24. Need automated renewal via Metacrypt ACME
or REST API.
- [ ] **Monitor alerting** — configure alert_command on rift (ntfy, webhook,
or custom script) for drift/flap notifications.
- [ ] **Backup timer** — install mcp-agent-backup timer via NixOS config.
## Current State (2026-03-26)
MCP is deployed and operational on rift. The agent runs as a systemd service
under the `mcp` user with rootless podman. All platform services (metacrypt,
mc-proxy, mcr, mcns) are managed by MCP with complete service definitions.
```
$ mcp status
SERVICE COMPONENT DESIRED OBSERVED VERSION
mc-proxy mc-proxy running running latest
mcns coredns running running 1.12.1
mcr api running running latest
mcr web running running latest
metacrypt api running running latest
metacrypt web running running latest
```