Files
mcp/PROJECT_PLAN_V1.md
Kyle Isom 6a90b21a62 Add PROJECT_PLAN_V1.md and PROGRESS_V1.md
30 discrete tasks across 5 phases, with dependency graph and
parallelism analysis. Phase 1 (5 core libraries) is fully parallel.
Phases 2+3+4 (agent handlers, CLI commands, deployment artifacts)
support up to 8+ concurrent engineers/agents. Critical path is
proto → registry + runtime → agent deploy → integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:08:06 -07:00

23 KiB

MCP v1 Project Plan

Overview

This plan breaks MCP v1 into discrete implementation tasks organized into phases. Tasks within a phase can often be parallelized. Dependencies between tasks are noted explicitly.

The critical path is: proto → registry + runtime → agent deploy handler → integration testing. Parallelizable work (CLI commands, monitoring, file transfer) can proceed alongside the critical path once the proto and core libraries are ready.

Notation

  • [Pn] = Phase n
  • [Pn.m] = Task m in phase n
  • depends: [Px.y] = must wait for that task
  • parallel: [Px.y, Px.z] = can run alongside these tasks
  • Each task includes scope, deliverables, and test criteria

Phase 0: Project Scaffolding

One engineer. Serial. Establishes the project skeleton that everything else builds on.

P0.1: Repository and module setup

Scope: Initialize the Go module, create the standard directory structure, and configure tooling.

Deliverables:

  • go.mod with module path git.wntrmute.dev/kyle/mcp
  • Makefile with standard targets (build, test, vet, lint, proto, proto-lint, clean, all)
  • .golangci.yaml with platform-standard linter config
  • .gitignore
  • CLAUDE.md (project-specific AI context)
  • Empty cmd/mcp/main.go and cmd/mcp-agent/main.go (compile to verify the skeleton works)

Test criteria: make build succeeds. make vet and make lint pass on the empty project.

P0.2: Proto definitions and code generation

Scope: Write the mcp.proto file from the ARCHITECTURE.md spec and generate Go code.

Depends: P0.1

Deliverables:

  • proto/mcp/v1/mcp.proto — full service definition from ARCHITECTURE.md
  • buf.yaml configuration
  • gen/mcp/v1/ — generated Go code
  • make proto and make proto-lint both pass

Test criteria: Generated code compiles. buf lint passes. All message types and RPC methods from the architecture doc are present.


Phase 1: Core Libraries

Four independent packages. All can be built in parallel once P0.2 is complete. Each package has a well-defined interface, no dependencies on other Phase 1 packages, and is fully testable in isolation.

P1.1: Registry package (internal/registry/)

Scope: SQLite schema, migrations, and CRUD operations for the node-local registry.

Depends: P0.1

Deliverables:

  • db.go — open database, run migrations, close. Schema from ARCHITECTURE.md (services, components, component_ports, component_volumes, component_cmd, events tables).
  • services.go — create, get, list, update, delete services.
  • components.go — create, get, list (by service), update desired/observed state, update spec, delete. Support filtering by desired_state.
  • events.go — insert event, query events by component+service+time range, count events in window (for flap detection), prune old events.

Test criteria: Full test coverage using t.TempDir() + real SQLite. Tests cover:

  • Schema migration is idempotent
  • Service and component CRUD
  • Desired/observed state updates
  • Event insertion, time-range queries, pruning
  • Foreign key cascading (delete service → components deleted)
  • Component composite primary key (service, name) enforced

parallel: P1.2, P1.3, P1.4

P1.2: Runtime package (internal/runtime/)

Scope: Container runtime abstraction with a podman implementation.

Depends: P0.1

Deliverables:

  • runtime.goRuntime interface:
    type Runtime interface {
        Pull(ctx context.Context, image string) error
        Run(ctx context.Context, spec ContainerSpec) error
        Stop(ctx context.Context, name string) error
        Remove(ctx context.Context, name string) error
        Inspect(ctx context.Context, name string) (ContainerInfo, error)
        List(ctx context.Context) ([]ContainerInfo, error)
    }
    
    Plus ContainerSpec and ContainerInfo structs.
  • podman.go — podman implementation. Builds command-line arguments from ContainerSpec, execs podman CLI, parses podman inspect JSON output.

Test criteria:

  • Unit tests for command-line argument building (given a ContainerSpec, verify the constructed podman args are correct). These don't require podman to be installed.
  • ContainerSpec → podman flag mapping matches the table in ARCHITECTURE.md.
  • Container naming follows <service>-<component> convention.
  • Version extraction from image tag works (e.g., registry/img:v1.2.0v1.2.0, registry/img:latestlatest, registry/img"").

parallel: P1.1, P1.3, P1.4

P1.3: Service definition package (internal/servicedef/)

Scope: Parse, validate, and write TOML service definition files.

Depends: P0.1

Deliverables:

  • servicedef.goLoad(path) → ServiceDef, Write(path, ServiceDef), LoadAll(dir) → []ServiceDef. Validation: required fields (name, node, at least one component), component names unique within service. Converts between TOML representation and proto ServiceSpec.

Test criteria:

  • Round-trip: write a ServiceDef, read it back, verify equality
  • Validation rejects missing name, missing node, empty components, duplicate component names
  • LoadAll loads all .toml files from a directory
  • active field defaults to true if omitted
  • Conversion to/from proto ServiceSpec is correct

parallel: P1.1, P1.2, P1.4

P1.4: Config package (internal/config/)

Scope: Load and validate CLI and agent configuration from TOML files.

Depends: P0.1

Deliverables:

  • cli.go — CLI config struct: services dir, MCIAS settings, auth (token path, optional username/password_file), nodes list. Load from TOML with env var overrides (MCP_*). Validate required fields.
  • agent.go — Agent config struct: server (grpc_addr, tls_cert, tls_key), database path, MCIAS settings, agent (node_name, container_runtime), monitor settings, log level. Load from TOML with env var overrides (MCP_AGENT_*). Validate required fields.

Test criteria:

  • Load from TOML file, verify all fields populated
  • Required field validation (reject missing grpc_addr, missing tls_cert, etc.)
  • Env var overrides work
  • Nodes list parses correctly from [[nodes]]

parallel: P1.1, P1.2, P1.3

P1.5: Auth package (internal/auth/)

Scope: MCIAS token validation for the agent, and token acquisition for the CLI.

Depends: P0.1, P0.2 (uses proto-generated types for gRPC interceptor)

Deliverables:

  • auth.go:
    • Interceptor — gRPC unary server interceptor that extracts bearer tokens, validates against MCIAS (with 30s SHA-256-keyed cache), checks admin role, audit-logs every RPC (method, caller, timestamp). Returns UNAUTHENTICATED or PERMISSION_DENIED on failure.
    • Login(url, username, password) → token — authenticate to MCIAS, return bearer token.
    • LoadToken(path) → token — read cached token from file.
    • SaveToken(path, token) — write token to file with 0600 permissions.

Test criteria:

  • Interceptor rejects missing token (UNAUTHENTICATED)
  • Interceptor rejects invalid token (UNAUTHENTICATED)
  • Interceptor rejects non-admin token (PERMISSION_DENIED)
  • Token caching works (same token within 30s returns cached result)
  • Token file read/write with correct permissions
  • Audit log entry emitted on every RPC (check slog output)

Note: Full interceptor testing requires an MCIAS mock or test instance. Unit tests can mock the MCIAS validation call. Integration tests against a real MCIAS instance are a Phase 4 concern.

parallel: P1.1, P1.2, P1.3, P1.4 (partially; needs P0.2 for proto types)


Phase 2: Agent

The agent is the core of MCP. Tasks in this phase build on Phase 1 libraries. Some tasks can be parallelized; dependencies are noted.

P2.1: Agent skeleton and gRPC server

Scope: Wire up the agent binary: config loading, database setup, gRPC server with TLS and auth interceptor, graceful shutdown.

Depends: P0.2, P1.1, P1.4, P1.5

Deliverables:

  • cmd/mcp-agent/main.go — cobra root command, server subcommand
  • internal/agent/agent.go — Agent struct holding registry, runtime, config. Initializes database, starts gRPC server with TLS and auth interceptor, handles SIGINT/SIGTERM for graceful shutdown.
  • Agent starts, listens on configured address, rejects unauthenticated RPCs, shuts down cleanly.

Test criteria: Agent starts with a test config, accepts TLS connections, rejects RPCs without a valid token. Graceful shutdown closes the database and stops the listener.

P2.2: Deploy handler

Scope: Implement the Deploy RPC on the agent.

Depends: P2.1, P1.2

Deliverables:

  • internal/agent/deploy.go — handles DeployRequest: records spec in registry, iterates components, calls runtime (pull, stop, remove, run, inspect), updates observed state and version, returns results.
  • Supports single-component deploy (when component field is set).

Test criteria:

  • Deploy with all components records spec in registry
  • Deploy with single component only touches that component
  • Failed pull returns error for that component, others continue
  • Registry is updated with desired_state=running and observed_state
  • Version is extracted from image tag

P2.3: Lifecycle handlers (stop, start, restart)

Scope: Implement StopService, StartService, RestartService RPCs.

Depends: P2.1, P1.2

parallel: P2.2

Deliverables:

  • internal/agent/lifecycle.go
  • Stop: for each component, call runtime stop, update desired_state to stopped, update observed_state.
  • Start: for each component, call runtime start (or run if removed), update desired_state to running, update observed_state.
  • Restart: stop then start each component.

Test criteria:

  • Stop sets desired_state=stopped, calls runtime stop
  • Start sets desired_state=running, calls runtime start
  • Restart cycles each component
  • Returns per-component results

P2.4: Status handlers (list, live check, get status)

Scope: Implement ListServices, LiveCheck, GetServiceStatus RPCs.

Depends: P2.1, P1.2

parallel: P2.2, P2.3

Deliverables:

  • internal/agent/status.go
  • ListServices: read from registry, no runtime query.
  • LiveCheck: query runtime, reconcile registry, return updated state.
  • GetServiceStatus: live check + drift detection + recent events.

Test criteria:

  • ListServices returns registry contents without touching runtime
  • LiveCheck updates observed_state from runtime
  • GetServiceStatus includes drift info for mismatched desired/observed
  • GetServiceStatus includes recent events

P2.5: Sync handler

Scope: Implement SyncDesiredState RPC.

Depends: P2.1, P1.2

parallel: P2.2, P2.3, P2.4

Deliverables:

  • internal/agent/sync.go
  • Receives list of ServiceSpecs from CLI.
  • For each service: create or update in registry, set desired_state based on active flag (running if active, stopped if not).
  • Runs reconciliation (discover unmanaged containers, set to ignore).
  • Returns per-service summary of what changed.

Test criteria:

  • New services are created in registry
  • Existing services have specs updated
  • Active=false sets desired_state=stopped for all components
  • Unmanaged containers discovered and set to ignore
  • Returns accurate change summaries

P2.6: File transfer handlers

Scope: Implement PushFile and PullFile RPCs.

Depends: P2.1

parallel: P2.2, P2.3, P2.4, P2.5

Deliverables:

  • internal/agent/files.go
  • Path validation: resolve /srv/<service>/<path>, reject .. traversal, reject symlinks escaping the service directory.
  • Push: atomic write (temp file + rename), create intermediate dirs.
  • Pull: read file, return content and permissions.

Test criteria:

  • Push creates file at correct path with correct permissions
  • Push creates intermediate directories
  • Push is atomic (partial write doesn't leave corrupt file)
  • Pull returns file content and mode
  • Path traversal rejected (../etc/passwd)
  • Symlink escape rejected
  • Service directory scoping enforced

P2.7: Adopt handler

Scope: Implement AdoptContainer RPC.

Depends: P2.1, P1.2

parallel: P2.2, P2.3, P2.4, P2.5, P2.6

Deliverables:

  • internal/agent/adopt.go
  • Matches containers by <service>-* prefix in runtime.
  • Creates service if needed.
  • Strips prefix to derive component name.
  • Sets desired_state based on current observed_state.
  • Returns per-container results.

Test criteria:

  • Matches containers by prefix
  • Creates service when it doesn't exist
  • Derives component names correctly (metacrypt-api → api, metacrypt-web → web)
  • Single-component service (mc-proxy → mc-proxy) works
  • Sets desired_state to running for running containers, stopped for stopped
  • Returns results for each adopted container

P2.8: Monitor subsystem

Scope: Implement the continuous monitoring loop and alerting.

Depends: P2.1, P1.1, P1.2

parallel: P2.2-P2.7 (can be built alongside other agent handlers)

Deliverables:

  • internal/monitor/monitor.go — Monitor struct, Start/Stop methods. Runs a goroutine with a ticker at the configured interval. Each tick: queries runtime, reconciles registry, records events, evaluates alerts.
  • internal/monitor/alerting.go — Alert evaluation: drift detection (desired != observed for managed components), flap detection (event count in window > threshold), cooldown tracking per component, alert command execution via exec (argv array, MCP_* env vars).
  • Event pruning (delete events older than retention period).

Test criteria:

  • Monitor detects state transitions and records events
  • Drift alert fires on desired/observed mismatch
  • Drift alert respects cooldown (doesn't fire again within window)
  • Flap alert fires when transitions exceed threshold in window
  • Alert command is exec'd with correct env vars
  • Event pruning removes old events, retains recent ones
  • Monitor can be stopped cleanly (goroutine exits)

P2.9: Snapshot command

Scope: Implement mcp-agent snapshot for database backup.

Depends: P2.1, P1.1

parallel: P2.2-P2.8

Deliverables:

  • cmd/mcp-agent/snapshot.go — cobra subcommand. Runs VACUUM INTO to create a consistent backup in /srv/mcp/backups/.

Test criteria:

  • Creates a backup file with timestamp in name
  • Backup is a valid SQLite database
  • Original database is unchanged

Phase 3: CLI

All CLI commands are thin gRPC clients. Most can be built in parallel once the proto (P0.2) and servicedef/config packages (P1.3, P1.4) are ready. CLI commands can be tested against a running agent (integration) or with a mock gRPC server (unit).

P3.1: CLI skeleton

Scope: Wire up the CLI binary: config loading, gRPC connection setup, cobra command tree.

Depends: P0.2, P1.3, P1.4

Deliverables:

  • cmd/mcp/main.go — cobra root command with --config flag. Subcommand stubs for all commands.
  • gRPC dial helper: reads node address from config, establishes TLS connection with CA verification, attaches bearer token to metadata.

Test criteria: CLI starts, loads config, --help shows all subcommands.

P3.2: Login command

Scope: Implement mcp login.

Depends: P3.1, P1.5

Deliverables:

  • cmd/mcp/login.go — prompts for username/password (or reads from config for unattended), calls MCIAS, saves token to configured path with 0600 permissions.

Test criteria: Token is saved to the correct path with correct permissions.

P3.3: Deploy command

Scope: Implement mcp deploy.

Depends: P3.1, P1.3

parallel: P3.4, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

  • cmd/mcp/deploy.go
  • Resolves service spec: file (from -f or default path) > agent registry.
  • Parses <service>/<component> syntax for single-component deploy.
  • Pushes spec to agent via Deploy RPC.
  • Prints per-component results.

Test criteria:

  • Reads service definition from file
  • Falls back to agent registry when no file exists
  • Fails with clear error when neither exists
  • Single-component syntax works
  • Prints results

P3.4: Lifecycle commands (stop, start, restart)

Scope: Implement mcp stop, mcp start, mcp restart.

Depends: P3.1, P1.3

parallel: P3.3, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

  • cmd/mcp/lifecycle.go
  • Stop: sets active = false in service definition file, calls StopService RPC.
  • Start: sets active = true in service definition file, calls StartService RPC.
  • Restart: calls RestartService RPC (does not change active flag).

Test criteria:

  • Stop updates the service definition file
  • Start updates the service definition file
  • Both call the correct RPC
  • Restart does not modify the file

P3.5: Status commands (list, ps, status)

Scope: Implement mcp list, mcp ps, mcp status.

Depends: P3.1

parallel: P3.3, P3.4, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

  • cmd/mcp/status.go
  • List: calls ListServices on all nodes, formats table output.
  • Ps: calls LiveCheck on all nodes, formats with uptime and version.
  • Status: calls GetServiceStatus, shows drift and recent events.

Test criteria:

  • Queries all registered nodes
  • Formats output as readable tables
  • Status highlights drift clearly

P3.6: Sync command

Scope: Implement mcp sync.

Depends: P3.1, P1.3

parallel: P3.3, P3.4, P3.5, P3.7, P3.8, P3.9, P3.10

Deliverables:

  • cmd/mcp/sync.go
  • Loads all service definitions from the services directory.
  • Groups by node.
  • Calls SyncDesiredState on each agent with that node's services.
  • Prints summary of changes.

Test criteria:

  • Loads all service definitions
  • Filters by node correctly
  • Pushes to correct agents
  • Prints change summary

P3.7: Adopt command

Scope: Implement mcp adopt.

Depends: P3.1

parallel: P3.3, P3.4, P3.5, P3.6, P3.8, P3.9, P3.10

Deliverables:

  • cmd/mcp/adopt.go
  • Calls AdoptContainer RPC on the agent.
  • Prints adopted containers and their derived component names.

Test criteria:

  • Calls RPC with service name
  • Prints results

P3.8: Service commands (show, edit, export)

Scope: Implement mcp service show, mcp service edit, mcp service export.

Depends: P3.1, P1.3

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.9, P3.10

Deliverables:

  • cmd/mcp/service.go
  • Show: calls ListServices, filters to named service, prints spec.
  • Edit: if file exists, open in $EDITOR. If not, export from agent first, then open. Save to standard path.
  • Export: calls ListServices, converts to TOML, writes to file (default path or -f).

Test criteria:

  • Show prints the correct spec
  • Export writes a valid TOML file that can be loaded back
  • Edit opens the correct file (or creates from agent spec)

P3.9: Transfer commands (push, pull)

Scope: Implement mcp push and mcp pull.

Depends: P3.1

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.10

Deliverables:

  • cmd/mcp/transfer.go
  • Push: reads local file, determines service and path, calls PushFile RPC. Default relative path = basename of local file.
  • Pull: calls PullFile RPC, writes content to local file.

Test criteria:

  • Push reads file and sends correct content
  • Push derives path from basename when omitted
  • Pull writes file locally with correct content

P3.10: Node commands

Scope: Implement mcp node list, mcp node add, mcp node remove.

Depends: P3.1, P1.4

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.9

Deliverables:

  • cmd/mcp/node.go
  • List: reads nodes from config, prints table.
  • Add: appends a [[nodes]] entry to the config file.
  • Remove: removes the named [[nodes]] entry from the config file.

Test criteria:

  • List shows all configured nodes
  • Add creates a new entry
  • Remove deletes the named entry
  • Config file remains valid TOML after add/remove

Phase 4: Deployment Artifacts

Can be worked on in parallel with Phase 2 and 3.

P4.1: Systemd units

Scope: Write systemd service and timer files.

Depends: None (these are static files)

parallel: All of Phase 2 and 3

Deliverables:

  • deploy/systemd/mcp-agent.service — from ARCHITECTURE.md
  • deploy/systemd/mcp-agent-backup.service — snapshot oneshot
  • deploy/systemd/mcp-agent-backup.timer — daily 02:00 UTC, 5min jitter

Test criteria: Files match platform conventions (security hardening, correct paths, correct user).

P4.2: Example configs

Scope: Write example configuration files.

Depends: None

parallel: All of Phase 2 and 3

Deliverables:

  • deploy/examples/mcp.toml — CLI config with all fields documented
  • deploy/examples/mcp-agent.toml — agent config with all fields documented

Test criteria: Examples are valid TOML, loadable by the config package.

P4.3: Install script

Scope: Write the agent install script.

Depends: None

parallel: All of Phase 2 and 3

Deliverables:

  • deploy/scripts/install-agent.sh — idempotent: create user/group, install binary, create /srv/mcp/, install example config, install systemd units, reload daemon.

Test criteria: Script is idempotent (running twice produces the same result).


Phase 5: Integration Testing and Polish

Serial. Requires all previous phases to be complete.

P5.1: Integration test suite

Scope: End-to-end tests: CLI → agent → podman → container lifecycle.

Depends: All of Phase 2 and 3

Deliverables:

  • Test harness that starts an agent with a test config and temp database.
  • Tests cover: deploy, stop, start, restart, sync, adopt, push/pull, list/ps/status.
  • Tests verify registry state, runtime state, and CLI output.

Test criteria: All integration tests pass. Coverage of every CLI command and agent RPC.

P5.2: Bootstrap procedure test

Scope: Test the full MCP bootstrap on a clean node with existing containers.

Depends: P5.1

Deliverables:

  • Documented test procedure: start agent, sync (discover containers), adopt, export, verify service definitions match running state.
  • Verify the container rename flow (bare names → -).

P5.3: Documentation

Scope: Final docs pass.

Depends: P5.1

Deliverables:

  • CLAUDE.md updated with final project structure and commands
  • README.md with quick-start
  • RUNBOOK.md with operational procedures
  • Verify ARCHITECTURE.md matches implementation

Parallelism Summary

Phase 0 (serial):  P0.1 → P0.2
                          │
                          ▼
Phase 1 (parallel): ┌─── P1.1 (registry)
                    ├─── P1.2 (runtime)
                    ├─── P1.3 (servicedef)
                    ├─── P1.4 (config)
                    └─── P1.5 (auth)
                          │
                    ┌─────┴──────┐
                    ▼            ▼
Phase 2 (agent):  P2.1 ──┐   Phase 3 (CLI):  P3.1 ──┐
                    │     │                     │     │
                    ▼     │                     ▼     │
                  P2.2  P2.3    Phase 4:      P3.2  P3.3
                  P2.4  P2.5    P4.1-P4.3     P3.4  P3.5
                  P2.6  P2.7    (parallel     P3.6  P3.7
                  P2.8  P2.9     with 2&3)    P3.8  P3.9
                    │                           │   P3.10
                    └──────────┬────────────────┘
                               ▼
Phase 5 (serial):  P5.1 → P5.2 → P5.3

Maximum parallelism: 5 engineers/agents during Phase 1, up to 8+ during Phase 2+3+4 combined.

Minimum serial path: P0.1 → P0.2 → P1.1 → P2.1 → P2.2 → P5.1 → P5.3