# MCP v1 Project Plan ## Overview This plan breaks MCP v1 into discrete implementation tasks organized into phases. Tasks within a phase can often be parallelized. Dependencies between tasks are noted explicitly. The critical path is: proto → registry + runtime → agent deploy handler → integration testing. Parallelizable work (CLI commands, monitoring, file transfer) can proceed alongside the critical path once the proto and core libraries are ready. ## Notation - **[Pn]** = Phase n - **[Pn.m]** = Task m in phase n - **depends: [Px.y]** = must wait for that task - **parallel: [Px.y, Px.z]** = can run alongside these tasks - Each task includes scope, deliverables, and test criteria --- ## Phase 0: Project Scaffolding One engineer. Serial. Establishes the project skeleton that everything else builds on. ### P0.1: Repository and module setup **Scope:** Initialize the Go module, create the standard directory structure, and configure tooling. **Deliverables:** - `go.mod` with module path `git.wntrmute.dev/kyle/mcp` - `Makefile` with standard targets (build, test, vet, lint, proto, proto-lint, clean, all) - `.golangci.yaml` with platform-standard linter config - `.gitignore` - `CLAUDE.md` (project-specific AI context) - Empty `cmd/mcp/main.go` and `cmd/mcp-agent/main.go` (compile to verify the skeleton works) **Test criteria:** `make build` succeeds. `make vet` and `make lint` pass on the empty project. ### P0.2: Proto definitions and code generation **Scope:** Write the `mcp.proto` file from the ARCHITECTURE.md spec and generate Go code. **Depends:** P0.1 **Deliverables:** - `proto/mcp/v1/mcp.proto` — full service definition from ARCHITECTURE.md - `buf.yaml` configuration - `gen/mcp/v1/` — generated Go code - `make proto` and `make proto-lint` both pass **Test criteria:** Generated code compiles. `buf lint` passes. All message types and RPC methods from the architecture doc are present. --- ## Phase 1: Core Libraries Four independent packages. **All can be built in parallel** once P0.2 is complete. Each package has a well-defined interface, no dependencies on other Phase 1 packages, and is fully testable in isolation. ### P1.1: Registry package (`internal/registry/`) **Scope:** SQLite schema, migrations, and CRUD operations for the node-local registry. **Depends:** P0.1 **Deliverables:** - `db.go` — open database, run migrations, close. Schema from ARCHITECTURE.md (services, components, component_ports, component_volumes, component_cmd, events tables). - `services.go` — create, get, list, update, delete services. - `components.go` — create, get, list (by service), update desired/observed state, update spec, delete. Support filtering by desired_state. - `events.go` — insert event, query events by component+service+time range, count events in window (for flap detection), prune old events. **Test criteria:** Full test coverage using `t.TempDir()` + real SQLite. Tests cover: - Schema migration is idempotent - Service and component CRUD - Desired/observed state updates - Event insertion, time-range queries, pruning - Foreign key cascading (delete service → components deleted) - Component composite primary key (service, name) enforced **parallel:** P1.2, P1.3, P1.4 ### P1.2: Runtime package (`internal/runtime/`) **Scope:** Container runtime abstraction with a podman implementation. **Depends:** P0.1 **Deliverables:** - `runtime.go` — `Runtime` interface: ```go type Runtime interface { Pull(ctx context.Context, image string) error Run(ctx context.Context, spec ContainerSpec) error Stop(ctx context.Context, name string) error Remove(ctx context.Context, name string) error Inspect(ctx context.Context, name string) (ContainerInfo, error) List(ctx context.Context) ([]ContainerInfo, error) } ``` Plus `ContainerSpec` and `ContainerInfo` structs. - `podman.go` — podman implementation. Builds command-line arguments from `ContainerSpec`, execs `podman` CLI, parses `podman inspect` JSON output. **Test criteria:** - Unit tests for command-line argument building (given a ContainerSpec, verify the constructed podman args are correct). These don't require podman to be installed. - `ContainerSpec` → podman flag mapping matches the table in ARCHITECTURE.md. - Container naming follows `-` convention. - Version extraction from image tag works (e.g., `registry/img:v1.2.0` → `v1.2.0`, `registry/img:latest` → `latest`, `registry/img` → `""`). **parallel:** P1.1, P1.3, P1.4 ### P1.3: Service definition package (`internal/servicedef/`) **Scope:** Parse, validate, and write TOML service definition files. **Depends:** P0.1 **Deliverables:** - `servicedef.go` — `Load(path) → ServiceDef`, `Write(path, ServiceDef)`, `LoadAll(dir) → []ServiceDef`. Validation: required fields (name, node, at least one component), component names unique within service. Converts between TOML representation and proto `ServiceSpec`. **Test criteria:** - Round-trip: write a ServiceDef, read it back, verify equality - Validation rejects missing name, missing node, empty components, duplicate component names - `LoadAll` loads all `.toml` files from a directory - `active` field defaults to `true` if omitted - Conversion to/from proto `ServiceSpec` is correct **parallel:** P1.1, P1.2, P1.4 ### P1.4: Config package (`internal/config/`) **Scope:** Load and validate CLI and agent configuration from TOML files. **Depends:** P0.1 **Deliverables:** - `cli.go` — CLI config struct: services dir, MCIAS settings, auth (token path, optional username/password_file), nodes list. Load from TOML with env var overrides (`MCP_*`). Validate required fields. - `agent.go` — Agent config struct: server (grpc_addr, tls_cert, tls_key), database path, MCIAS settings, agent (node_name, container_runtime), monitor settings, log level. Load from TOML with env var overrides (`MCP_AGENT_*`). Validate required fields. **Test criteria:** - Load from TOML file, verify all fields populated - Required field validation (reject missing grpc_addr, missing tls_cert, etc.) - Env var overrides work - Nodes list parses correctly from `[[nodes]]` **parallel:** P1.1, P1.2, P1.3 ### P1.5: Auth package (`internal/auth/`) **Scope:** MCIAS token validation for the agent, and token acquisition for the CLI. **Depends:** P0.1, P0.2 (uses proto-generated types for gRPC interceptor) **Deliverables:** - `auth.go`: - `Interceptor` — gRPC unary server interceptor that extracts bearer tokens, validates against MCIAS (with 30s SHA-256-keyed cache), checks admin role, audit-logs every RPC (method, caller, timestamp). Returns UNAUTHENTICATED or PERMISSION_DENIED on failure. - `Login(url, username, password) → token` — authenticate to MCIAS, return bearer token. - `LoadToken(path) → token` — read cached token from file. - `SaveToken(path, token)` — write token to file with 0600 permissions. **Test criteria:** - Interceptor rejects missing token (UNAUTHENTICATED) - Interceptor rejects invalid token (UNAUTHENTICATED) - Interceptor rejects non-admin token (PERMISSION_DENIED) - Token caching works (same token within 30s returns cached result) - Token file read/write with correct permissions - Audit log entry emitted on every RPC (check slog output) **Note:** Full interceptor testing requires an MCIAS mock or test instance. Unit tests can mock the MCIAS validation call. Integration tests against a real MCIAS instance are a Phase 4 concern. **parallel:** P1.1, P1.2, P1.3, P1.4 (partially; needs P0.2 for proto types) --- ## Phase 2: Agent The agent is the core of MCP. Tasks in this phase build on Phase 1 libraries. Some tasks can be parallelized; dependencies are noted. ### P2.1: Agent skeleton and gRPC server **Scope:** Wire up the agent binary: config loading, database setup, gRPC server with TLS and auth interceptor, graceful shutdown. **Depends:** P0.2, P1.1, P1.4, P1.5 **Deliverables:** - `cmd/mcp-agent/main.go` — cobra root command, `server` subcommand - `internal/agent/agent.go` — Agent struct holding registry, runtime, config. Initializes database, starts gRPC server with TLS and auth interceptor, handles SIGINT/SIGTERM for graceful shutdown. - Agent starts, listens on configured address, rejects unauthenticated RPCs, shuts down cleanly. **Test criteria:** Agent starts with a test config, accepts TLS connections, rejects RPCs without a valid token. Graceful shutdown closes the database and stops the listener. ### P2.2: Deploy handler **Scope:** Implement the `Deploy` RPC on the agent. **Depends:** P2.1, P1.2 **Deliverables:** - `internal/agent/deploy.go` — handles DeployRequest: records spec in registry, iterates components, calls runtime (pull, stop, remove, run, inspect), updates observed state and version, returns results. - Supports single-component deploy (when `component` field is set). **Test criteria:** - Deploy with all components records spec in registry - Deploy with single component only touches that component - Failed pull returns error for that component, others continue - Registry is updated with desired_state=running and observed_state - Version is extracted from image tag ### P2.3: Lifecycle handlers (stop, start, restart) **Scope:** Implement `StopService`, `StartService`, `RestartService` RPCs. **Depends:** P2.1, P1.2 **parallel:** P2.2 **Deliverables:** - `internal/agent/lifecycle.go` - Stop: for each component, call runtime stop, update desired_state to `stopped`, update observed_state. - Start: for each component, call runtime start (or run if removed), update desired_state to `running`, update observed_state. - Restart: stop then start each component. **Test criteria:** - Stop sets desired_state=stopped, calls runtime stop - Start sets desired_state=running, calls runtime start - Restart cycles each component - Returns per-component results ### P2.4: Status handlers (list, live check, get status) **Scope:** Implement `ListServices`, `LiveCheck`, `GetServiceStatus` RPCs. **Depends:** P2.1, P1.2 **parallel:** P2.2, P2.3 **Deliverables:** - `internal/agent/status.go` - `ListServices`: read from registry, no runtime query. - `LiveCheck`: query runtime, reconcile registry, return updated state. - `GetServiceStatus`: live check + drift detection + recent events. **Test criteria:** - ListServices returns registry contents without touching runtime - LiveCheck updates observed_state from runtime - GetServiceStatus includes drift info for mismatched desired/observed - GetServiceStatus includes recent events ### P2.5: Sync handler **Scope:** Implement `SyncDesiredState` RPC. **Depends:** P2.1, P1.2 **parallel:** P2.2, P2.3, P2.4 **Deliverables:** - `internal/agent/sync.go` - Receives list of ServiceSpecs from CLI. - For each service: create or update in registry, set desired_state based on `active` flag (running if active, stopped if not). - Runs reconciliation (discover unmanaged containers, set to ignore). - Returns per-service summary of what changed. **Test criteria:** - New services are created in registry - Existing services have specs updated - Active=false sets desired_state=stopped for all components - Unmanaged containers discovered and set to ignore - Returns accurate change summaries ### P2.6: File transfer handlers **Scope:** Implement `PushFile` and `PullFile` RPCs. **Depends:** P2.1 **parallel:** P2.2, P2.3, P2.4, P2.5 **Deliverables:** - `internal/agent/files.go` - Path validation: resolve `/srv//`, reject `..` traversal, reject symlinks escaping the service directory. - Push: atomic write (temp file + rename), create intermediate dirs. - Pull: read file, return content and permissions. **Test criteria:** - Push creates file at correct path with correct permissions - Push creates intermediate directories - Push is atomic (partial write doesn't leave corrupt file) - Pull returns file content and mode - Path traversal rejected (`../etc/passwd`) - Symlink escape rejected - Service directory scoping enforced ### P2.7: Adopt handler **Scope:** Implement `AdoptContainer` RPC. **Depends:** P2.1, P1.2 **parallel:** P2.2, P2.3, P2.4, P2.5, P2.6 **Deliverables:** - `internal/agent/adopt.go` - Matches containers by `-*` prefix in runtime. - Creates service if needed. - Strips prefix to derive component name. - Sets desired_state based on current observed_state. - Returns per-container results. **Test criteria:** - Matches containers by prefix - Creates service when it doesn't exist - Derives component names correctly (metacrypt-api → api, metacrypt-web → web) - Single-component service (mc-proxy → mc-proxy) works - Sets desired_state to running for running containers, stopped for stopped - Returns results for each adopted container ### P2.8: Monitor subsystem **Scope:** Implement the continuous monitoring loop and alerting. **Depends:** P2.1, P1.1, P1.2 **parallel:** P2.2-P2.7 (can be built alongside other agent handlers) **Deliverables:** - `internal/monitor/monitor.go` — Monitor struct, Start/Stop methods. Runs a goroutine with a ticker at the configured interval. Each tick: queries runtime, reconciles registry, records events, evaluates alerts. - `internal/monitor/alerting.go` — Alert evaluation: drift detection (desired != observed for managed components), flap detection (event count in window > threshold), cooldown tracking per component, alert command execution via `exec` (argv array, MCP_* env vars). - Event pruning (delete events older than retention period). **Test criteria:** - Monitor detects state transitions and records events - Drift alert fires on desired/observed mismatch - Drift alert respects cooldown (doesn't fire again within window) - Flap alert fires when transitions exceed threshold in window - Alert command is exec'd with correct env vars - Event pruning removes old events, retains recent ones - Monitor can be stopped cleanly (goroutine exits) ### P2.9: Snapshot command **Scope:** Implement `mcp-agent snapshot` for database backup. **Depends:** P2.1, P1.1 **parallel:** P2.2-P2.8 **Deliverables:** - `cmd/mcp-agent/snapshot.go` — cobra subcommand. Runs `VACUUM INTO` to create a consistent backup in `/srv/mcp/backups/`. **Test criteria:** - Creates a backup file with timestamp in name - Backup is a valid SQLite database - Original database is unchanged --- ## Phase 3: CLI All CLI commands are thin gRPC clients. **Most can be built in parallel** once the proto (P0.2) and servicedef/config packages (P1.3, P1.4) are ready. CLI commands can be tested against a running agent (integration) or with a mock gRPC server (unit). ### P3.1: CLI skeleton **Scope:** Wire up the CLI binary: config loading, gRPC connection setup, cobra command tree. **Depends:** P0.2, P1.3, P1.4 **Deliverables:** - `cmd/mcp/main.go` — cobra root command with `--config` flag. Subcommand stubs for all commands. - gRPC dial helper: reads node address from config, establishes TLS connection with CA verification, attaches bearer token to metadata. **Test criteria:** CLI starts, loads config, `--help` shows all subcommands. ### P3.2: Login command **Scope:** Implement `mcp login`. **Depends:** P3.1, P1.5 **Deliverables:** - `cmd/mcp/login.go` — prompts for username/password (or reads from config for unattended), calls MCIAS, saves token to configured path with 0600 permissions. **Test criteria:** Token is saved to the correct path with correct permissions. ### P3.3: Deploy command **Scope:** Implement `mcp deploy`. **Depends:** P3.1, P1.3 **parallel:** P3.4, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10 **Deliverables:** - `cmd/mcp/deploy.go` - Resolves service spec: file (from `-f` or default path) > agent registry. - Parses `/` syntax for single-component deploy. - Pushes spec to agent via Deploy RPC. - Prints per-component results. **Test criteria:** - Reads service definition from file - Falls back to agent registry when no file exists - Fails with clear error when neither exists - Single-component syntax works - Prints results ### P3.4: Lifecycle commands (stop, start, restart) **Scope:** Implement `mcp stop`, `mcp start`, `mcp restart`. **Depends:** P3.1, P1.3 **parallel:** P3.3, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10 **Deliverables:** - `cmd/mcp/lifecycle.go` - Stop: sets `active = false` in service definition file, calls StopService RPC. - Start: sets `active = true` in service definition file, calls StartService RPC. - Restart: calls RestartService RPC (does not change active flag). **Test criteria:** - Stop updates the service definition file - Start updates the service definition file - Both call the correct RPC - Restart does not modify the file ### P3.5: Status commands (list, ps, status) **Scope:** Implement `mcp list`, `mcp ps`, `mcp status`. **Depends:** P3.1 **parallel:** P3.3, P3.4, P3.6, P3.7, P3.8, P3.9, P3.10 **Deliverables:** - `cmd/mcp/status.go` - List: calls ListServices on all nodes, formats table output. - Ps: calls LiveCheck on all nodes, formats with uptime and version. - Status: calls GetServiceStatus, shows drift and recent events. **Test criteria:** - Queries all registered nodes - Formats output as readable tables - Status highlights drift clearly ### P3.6: Sync command **Scope:** Implement `mcp sync`. **Depends:** P3.1, P1.3 **parallel:** P3.3, P3.4, P3.5, P3.7, P3.8, P3.9, P3.10 **Deliverables:** - `cmd/mcp/sync.go` - Loads all service definitions from the services directory. - Groups by node. - Calls SyncDesiredState on each agent with that node's services. - Prints summary of changes. **Test criteria:** - Loads all service definitions - Filters by node correctly - Pushes to correct agents - Prints change summary ### P3.7: Adopt command **Scope:** Implement `mcp adopt`. **Depends:** P3.1 **parallel:** P3.3, P3.4, P3.5, P3.6, P3.8, P3.9, P3.10 **Deliverables:** - `cmd/mcp/adopt.go` - Calls AdoptContainer RPC on the agent. - Prints adopted containers and their derived component names. **Test criteria:** - Calls RPC with service name - Prints results ### P3.8: Service commands (show, edit, export) **Scope:** Implement `mcp service show`, `mcp service edit`, `mcp service export`. **Depends:** P3.1, P1.3 **parallel:** P3.3, P3.4, P3.5, P3.6, P3.7, P3.9, P3.10 **Deliverables:** - `cmd/mcp/service.go` - Show: calls ListServices, filters to named service, prints spec. - Edit: if file exists, open in $EDITOR. If not, export from agent first, then open. Save to standard path. - Export: calls ListServices, converts to TOML, writes to file (default path or `-f`). **Test criteria:** - Show prints the correct spec - Export writes a valid TOML file that can be loaded back - Edit opens the correct file (or creates from agent spec) ### P3.9: Transfer commands (push, pull) **Scope:** Implement `mcp push` and `mcp pull`. **Depends:** P3.1 **parallel:** P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.10 **Deliverables:** - `cmd/mcp/transfer.go` - Push: reads local file, determines service and path, calls PushFile RPC. Default relative path = basename of local file. - Pull: calls PullFile RPC, writes content to local file. **Test criteria:** - Push reads file and sends correct content - Push derives path from basename when omitted - Pull writes file locally with correct content ### P3.10: Node commands **Scope:** Implement `mcp node list`, `mcp node add`, `mcp node remove`. **Depends:** P3.1, P1.4 **parallel:** P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.9 **Deliverables:** - `cmd/mcp/node.go` - List: reads nodes from config, prints table. - Add: appends a `[[nodes]]` entry to the config file. - Remove: removes the named `[[nodes]]` entry from the config file. **Test criteria:** - List shows all configured nodes - Add creates a new entry - Remove deletes the named entry - Config file remains valid TOML after add/remove --- ## Phase 4: Deployment Artifacts Can be worked on in parallel with Phase 2 and 3. ### P4.1: Systemd units **Scope:** Write systemd service and timer files. **Depends:** None (these are static files) **parallel:** All of Phase 2 and 3 **Deliverables:** - `deploy/systemd/mcp-agent.service` — from ARCHITECTURE.md - `deploy/systemd/mcp-agent-backup.service` — snapshot oneshot - `deploy/systemd/mcp-agent-backup.timer` — daily 02:00 UTC, 5min jitter **Test criteria:** Files match platform conventions (security hardening, correct paths, correct user). ### P4.2: Example configs **Scope:** Write example configuration files. **Depends:** None **parallel:** All of Phase 2 and 3 **Deliverables:** - `deploy/examples/mcp.toml` — CLI config with all fields documented - `deploy/examples/mcp-agent.toml` — agent config with all fields documented **Test criteria:** Examples are valid TOML, loadable by the config package. ### P4.3: Install script **Scope:** Write the agent install script. **Depends:** None **parallel:** All of Phase 2 and 3 **Deliverables:** - `deploy/scripts/install-agent.sh` — idempotent: create user/group, install binary, create /srv/mcp/, install example config, install systemd units, reload daemon. **Test criteria:** Script is idempotent (running twice produces the same result). --- ## Phase 5: Integration Testing and Polish Serial. Requires all previous phases to be complete. ### P5.1: Integration test suite **Scope:** End-to-end tests: CLI → agent → podman → container lifecycle. **Depends:** All of Phase 2 and 3 **Deliverables:** - Test harness that starts an agent with a test config and temp database. - Tests cover: deploy, stop, start, restart, sync, adopt, push/pull, list/ps/status. - Tests verify registry state, runtime state, and CLI output. **Test criteria:** All integration tests pass. Coverage of every CLI command and agent RPC. ### P5.2: Bootstrap procedure test **Scope:** Test the full MCP bootstrap on a clean node with existing containers. **Depends:** P5.1 **Deliverables:** - Documented test procedure: start agent, sync (discover containers), adopt, export, verify service definitions match running state. - Verify the container rename flow (bare names → -). ### P5.3: Documentation **Scope:** Final docs pass. **Depends:** P5.1 **Deliverables:** - `CLAUDE.md` updated with final project structure and commands - `README.md` with quick-start - `RUNBOOK.md` with operational procedures - Verify ARCHITECTURE.md matches implementation --- ## Parallelism Summary ``` Phase 0 (serial): P0.1 → P0.2 │ ▼ Phase 1 (parallel): ┌─── P1.1 (registry) ├─── P1.2 (runtime) ├─── P1.3 (servicedef) ├─── P1.4 (config) └─── P1.5 (auth) │ ┌─────┴──────┐ ▼ ▼ Phase 2 (agent): P2.1 ──┐ Phase 3 (CLI): P3.1 ──┐ │ │ │ │ ▼ │ ▼ │ P2.2 P2.3 Phase 4: P3.2 P3.3 P2.4 P2.5 P4.1-P4.3 P3.4 P3.5 P2.6 P2.7 (parallel P3.6 P3.7 P2.8 P2.9 with 2&3) P3.8 P3.9 │ │ P3.10 └──────────┬────────────────┘ ▼ Phase 5 (serial): P5.1 → P5.2 → P5.3 ``` Maximum parallelism: 5 engineers/agents during Phase 1, up to 8+ during Phase 2+3+4 combined. Minimum serial path: P0.1 → P0.2 → P1.1 → P2.1 → P2.2 → P5.1 → P5.3