All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0, mc-proxy to v1.1.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23 KiB
MCP v1 Project Plan
Overview
This plan breaks MCP v1 into discrete implementation tasks organized into phases. Tasks within a phase can often be parallelized. Dependencies between tasks are noted explicitly.
The critical path is: proto → registry + runtime → agent deploy handler → integration testing. Parallelizable work (CLI commands, monitoring, file transfer) can proceed alongside the critical path once the proto and core libraries are ready.
Notation
- [Pn] = Phase n
- [Pn.m] = Task m in phase n
- depends: [Px.y] = must wait for that task
- parallel: [Px.y, Px.z] = can run alongside these tasks
- Each task includes scope, deliverables, and test criteria
Phase 0: Project Scaffolding
One engineer. Serial. Establishes the project skeleton that everything else builds on.
P0.1: Repository and module setup
Scope: Initialize the Go module, create the standard directory structure, and configure tooling.
Deliverables:
go.modwith module pathgit.wntrmute.dev/mc/mcpMakefilewith standard targets (build, test, vet, lint, proto, proto-lint, clean, all).golangci.yamlwith platform-standard linter config.gitignoreCLAUDE.md(project-specific AI context)- Empty
cmd/mcp/main.goandcmd/mcp-agent/main.go(compile to verify the skeleton works)
Test criteria: make build succeeds. make vet and make lint pass
on the empty project.
P0.2: Proto definitions and code generation
Scope: Write the mcp.proto file from the ARCHITECTURE.md spec and
generate Go code.
Depends: P0.1
Deliverables:
proto/mcp/v1/mcp.proto— full service definition from ARCHITECTURE.mdbuf.yamlconfigurationgen/mcp/v1/— generated Go codemake protoandmake proto-lintboth pass
Test criteria: Generated code compiles. buf lint passes. All message
types and RPC methods from the architecture doc are present.
Phase 1: Core Libraries
Four independent packages. All can be built in parallel once P0.2 is complete. Each package has a well-defined interface, no dependencies on other Phase 1 packages, and is fully testable in isolation.
P1.1: Registry package (internal/registry/)
Scope: SQLite schema, migrations, and CRUD operations for the node-local registry.
Depends: P0.1
Deliverables:
db.go— open database, run migrations, close. Schema from ARCHITECTURE.md (services, components, component_ports, component_volumes, component_cmd, events tables).services.go— create, get, list, update, delete services.components.go— create, get, list (by service), update desired/observed state, update spec, delete. Support filtering by desired_state.events.go— insert event, query events by component+service+time range, count events in window (for flap detection), prune old events.
Test criteria: Full test coverage using t.TempDir() + real SQLite.
Tests cover:
- Schema migration is idempotent
- Service and component CRUD
- Desired/observed state updates
- Event insertion, time-range queries, pruning
- Foreign key cascading (delete service → components deleted)
- Component composite primary key (service, name) enforced
parallel: P1.2, P1.3, P1.4
P1.2: Runtime package (internal/runtime/)
Scope: Container runtime abstraction with a podman implementation.
Depends: P0.1
Deliverables:
runtime.go—Runtimeinterface:Plustype Runtime interface { Pull(ctx context.Context, image string) error Run(ctx context.Context, spec ContainerSpec) error Stop(ctx context.Context, name string) error Remove(ctx context.Context, name string) error Inspect(ctx context.Context, name string) (ContainerInfo, error) List(ctx context.Context) ([]ContainerInfo, error) }ContainerSpecandContainerInfostructs.podman.go— podman implementation. Builds command-line arguments fromContainerSpec, execspodmanCLI, parsespodman inspectJSON output.
Test criteria:
- Unit tests for command-line argument building (given a ContainerSpec, verify the constructed podman args are correct). These don't require podman to be installed.
ContainerSpec→ podman flag mapping matches the table in ARCHITECTURE.md.- Container naming follows
<service>-<component>convention. - Version extraction from image tag works (e.g.,
registry/img:v1.2.0→v1.2.0,registry/img:latest→latest,registry/img→"").
parallel: P1.1, P1.3, P1.4
P1.3: Service definition package (internal/servicedef/)
Scope: Parse, validate, and write TOML service definition files.
Depends: P0.1
Deliverables:
servicedef.go—Load(path) → ServiceDef,Write(path, ServiceDef),LoadAll(dir) → []ServiceDef. Validation: required fields (name, node, at least one component), component names unique within service. Converts between TOML representation and protoServiceSpec.
Test criteria:
- Round-trip: write a ServiceDef, read it back, verify equality
- Validation rejects missing name, missing node, empty components, duplicate component names
LoadAllloads all.tomlfiles from a directoryactivefield defaults totrueif omitted- Conversion to/from proto
ServiceSpecis correct
parallel: P1.1, P1.2, P1.4
P1.4: Config package (internal/config/)
Scope: Load and validate CLI and agent configuration from TOML files.
Depends: P0.1
Deliverables:
cli.go— CLI config struct: services dir, MCIAS settings, auth (token path, optional username/password_file), nodes list. Load from TOML with env var overrides (MCP_*). Validate required fields.agent.go— Agent config struct: server (grpc_addr, tls_cert, tls_key), database path, MCIAS settings, agent (node_name, container_runtime), monitor settings, log level. Load from TOML with env var overrides (MCP_AGENT_*). Validate required fields.
Test criteria:
- Load from TOML file, verify all fields populated
- Required field validation (reject missing grpc_addr, missing tls_cert, etc.)
- Env var overrides work
- Nodes list parses correctly from
[[nodes]]
parallel: P1.1, P1.2, P1.3
P1.5: Auth package (internal/auth/)
Scope: MCIAS token validation for the agent, and token acquisition for the CLI.
Depends: P0.1, P0.2 (uses proto-generated types for gRPC interceptor)
Deliverables:
auth.go:Interceptor— gRPC unary server interceptor that extracts bearer tokens, validates against MCIAS (with 30s SHA-256-keyed cache), checks admin role, audit-logs every RPC (method, caller, timestamp). Returns UNAUTHENTICATED or PERMISSION_DENIED on failure.Login(url, username, password) → token— authenticate to MCIAS, return bearer token.LoadToken(path) → token— read cached token from file.SaveToken(path, token)— write token to file with 0600 permissions.
Test criteria:
- Interceptor rejects missing token (UNAUTHENTICATED)
- Interceptor rejects invalid token (UNAUTHENTICATED)
- Interceptor rejects non-admin token (PERMISSION_DENIED)
- Token caching works (same token within 30s returns cached result)
- Token file read/write with correct permissions
- Audit log entry emitted on every RPC (check slog output)
Note: Full interceptor testing requires an MCIAS mock or test instance. Unit tests can mock the MCIAS validation call. Integration tests against a real MCIAS instance are a Phase 4 concern.
parallel: P1.1, P1.2, P1.3, P1.4 (partially; needs P0.2 for proto types)
Phase 2: Agent
The agent is the core of MCP. Tasks in this phase build on Phase 1 libraries. Some tasks can be parallelized; dependencies are noted.
P2.1: Agent skeleton and gRPC server
Scope: Wire up the agent binary: config loading, database setup, gRPC server with TLS and auth interceptor, graceful shutdown.
Depends: P0.2, P1.1, P1.4, P1.5
Deliverables:
cmd/mcp-agent/main.go— cobra root command,serversubcommandinternal/agent/agent.go— Agent struct holding registry, runtime, config. Initializes database, starts gRPC server with TLS and auth interceptor, handles SIGINT/SIGTERM for graceful shutdown.- Agent starts, listens on configured address, rejects unauthenticated RPCs, shuts down cleanly.
Test criteria: Agent starts with a test config, accepts TLS connections, rejects RPCs without a valid token. Graceful shutdown closes the database and stops the listener.
P2.2: Deploy handler
Scope: Implement the Deploy RPC on the agent.
Depends: P2.1, P1.2
Deliverables:
internal/agent/deploy.go— handles DeployRequest: records spec in registry, iterates components, calls runtime (pull, stop, remove, run, inspect), updates observed state and version, returns results.- Supports single-component deploy (when
componentfield is set).
Test criteria:
- Deploy with all components records spec in registry
- Deploy with single component only touches that component
- Failed pull returns error for that component, others continue
- Registry is updated with desired_state=running and observed_state
- Version is extracted from image tag
P2.3: Lifecycle handlers (stop, start, restart)
Scope: Implement StopService, StartService, RestartService RPCs.
Depends: P2.1, P1.2
parallel: P2.2
Deliverables:
internal/agent/lifecycle.go- Stop: for each component, call runtime stop, update desired_state to
stopped, update observed_state. - Start: for each component, call runtime start (or run if removed),
update desired_state to
running, update observed_state. - Restart: stop then start each component.
Test criteria:
- Stop sets desired_state=stopped, calls runtime stop
- Start sets desired_state=running, calls runtime start
- Restart cycles each component
- Returns per-component results
P2.4: Status handlers (list, live check, get status)
Scope: Implement ListServices, LiveCheck, GetServiceStatus RPCs.
Depends: P2.1, P1.2
parallel: P2.2, P2.3
Deliverables:
internal/agent/status.goListServices: read from registry, no runtime query.LiveCheck: query runtime, reconcile registry, return updated state.GetServiceStatus: live check + drift detection + recent events.
Test criteria:
- ListServices returns registry contents without touching runtime
- LiveCheck updates observed_state from runtime
- GetServiceStatus includes drift info for mismatched desired/observed
- GetServiceStatus includes recent events
P2.5: Sync handler
Scope: Implement SyncDesiredState RPC.
Depends: P2.1, P1.2
parallel: P2.2, P2.3, P2.4
Deliverables:
internal/agent/sync.go- Receives list of ServiceSpecs from CLI.
- For each service: create or update in registry, set desired_state based
on
activeflag (running if active, stopped if not). - Runs reconciliation (discover unmanaged containers, set to ignore).
- Returns per-service summary of what changed.
Test criteria:
- New services are created in registry
- Existing services have specs updated
- Active=false sets desired_state=stopped for all components
- Unmanaged containers discovered and set to ignore
- Returns accurate change summaries
P2.6: File transfer handlers
Scope: Implement PushFile and PullFile RPCs.
Depends: P2.1
parallel: P2.2, P2.3, P2.4, P2.5
Deliverables:
internal/agent/files.go- Path validation: resolve
/srv/<service>/<path>, reject..traversal, reject symlinks escaping the service directory. - Push: atomic write (temp file + rename), create intermediate dirs.
- Pull: read file, return content and permissions.
Test criteria:
- Push creates file at correct path with correct permissions
- Push creates intermediate directories
- Push is atomic (partial write doesn't leave corrupt file)
- Pull returns file content and mode
- Path traversal rejected (
../etc/passwd) - Symlink escape rejected
- Service directory scoping enforced
P2.7: Adopt handler
Scope: Implement AdoptContainer RPC.
Depends: P2.1, P1.2
parallel: P2.2, P2.3, P2.4, P2.5, P2.6
Deliverables:
internal/agent/adopt.go- Matches containers by
<service>-*prefix in runtime. - Creates service if needed.
- Strips prefix to derive component name.
- Sets desired_state based on current observed_state.
- Returns per-container results.
Test criteria:
- Matches containers by prefix
- Creates service when it doesn't exist
- Derives component names correctly (metacrypt-api → api, metacrypt-web → web)
- Single-component service (mc-proxy → mc-proxy) works
- Sets desired_state to running for running containers, stopped for stopped
- Returns results for each adopted container
P2.8: Monitor subsystem
Scope: Implement the continuous monitoring loop and alerting.
Depends: P2.1, P1.1, P1.2
parallel: P2.2-P2.7 (can be built alongside other agent handlers)
Deliverables:
internal/monitor/monitor.go— Monitor struct, Start/Stop methods. Runs a goroutine with a ticker at the configured interval. Each tick: queries runtime, reconciles registry, records events, evaluates alerts.internal/monitor/alerting.go— Alert evaluation: drift detection (desired != observed for managed components), flap detection (event count in window > threshold), cooldown tracking per component, alert command execution viaexec(argv array, MCP_* env vars).- Event pruning (delete events older than retention period).
Test criteria:
- Monitor detects state transitions and records events
- Drift alert fires on desired/observed mismatch
- Drift alert respects cooldown (doesn't fire again within window)
- Flap alert fires when transitions exceed threshold in window
- Alert command is exec'd with correct env vars
- Event pruning removes old events, retains recent ones
- Monitor can be stopped cleanly (goroutine exits)
P2.9: Snapshot command
Scope: Implement mcp-agent snapshot for database backup.
Depends: P2.1, P1.1
parallel: P2.2-P2.8
Deliverables:
cmd/mcp-agent/snapshot.go— cobra subcommand. RunsVACUUM INTOto create a consistent backup in/srv/mcp/backups/.
Test criteria:
- Creates a backup file with timestamp in name
- Backup is a valid SQLite database
- Original database is unchanged
Phase 3: CLI
All CLI commands are thin gRPC clients. Most can be built in parallel once the proto (P0.2) and servicedef/config packages (P1.3, P1.4) are ready. CLI commands can be tested against a running agent (integration) or with a mock gRPC server (unit).
P3.1: CLI skeleton
Scope: Wire up the CLI binary: config loading, gRPC connection setup, cobra command tree.
Depends: P0.2, P1.3, P1.4
Deliverables:
cmd/mcp/main.go— cobra root command with--configflag. Subcommand stubs for all commands.- gRPC dial helper: reads node address from config, establishes TLS connection with CA verification, attaches bearer token to metadata.
Test criteria: CLI starts, loads config, --help shows all
subcommands.
P3.2: Login command
Scope: Implement mcp login.
Depends: P3.1, P1.5
Deliverables:
cmd/mcp/login.go— prompts for username/password (or reads from config for unattended), calls MCIAS, saves token to configured path with 0600 permissions.
Test criteria: Token is saved to the correct path with correct permissions.
P3.3: Deploy command
Scope: Implement mcp deploy.
Depends: P3.1, P1.3
parallel: P3.4, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10
Deliverables:
cmd/mcp/deploy.go- Resolves service spec: file (from
-for default path) > agent registry. - Parses
<service>/<component>syntax for single-component deploy. - Pushes spec to agent via Deploy RPC.
- Prints per-component results.
Test criteria:
- Reads service definition from file
- Falls back to agent registry when no file exists
- Fails with clear error when neither exists
- Single-component syntax works
- Prints results
P3.4: Lifecycle commands (stop, start, restart)
Scope: Implement mcp stop, mcp start, mcp restart.
Depends: P3.1, P1.3
parallel: P3.3, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10
Deliverables:
cmd/mcp/lifecycle.go- Stop: sets
active = falsein service definition file, calls StopService RPC. - Start: sets
active = truein service definition file, calls StartService RPC. - Restart: calls RestartService RPC (does not change active flag).
Test criteria:
- Stop updates the service definition file
- Start updates the service definition file
- Both call the correct RPC
- Restart does not modify the file
P3.5: Status commands (list, ps, status)
Scope: Implement mcp list, mcp ps, mcp status.
Depends: P3.1
parallel: P3.3, P3.4, P3.6, P3.7, P3.8, P3.9, P3.10
Deliverables:
cmd/mcp/status.go- List: calls ListServices on all nodes, formats table output.
- Ps: calls LiveCheck on all nodes, formats with uptime and version.
- Status: calls GetServiceStatus, shows drift and recent events.
Test criteria:
- Queries all registered nodes
- Formats output as readable tables
- Status highlights drift clearly
P3.6: Sync command
Scope: Implement mcp sync.
Depends: P3.1, P1.3
parallel: P3.3, P3.4, P3.5, P3.7, P3.8, P3.9, P3.10
Deliverables:
cmd/mcp/sync.go- Loads all service definitions from the services directory.
- Groups by node.
- Calls SyncDesiredState on each agent with that node's services.
- Prints summary of changes.
Test criteria:
- Loads all service definitions
- Filters by node correctly
- Pushes to correct agents
- Prints change summary
P3.7: Adopt command
Scope: Implement mcp adopt.
Depends: P3.1
parallel: P3.3, P3.4, P3.5, P3.6, P3.8, P3.9, P3.10
Deliverables:
cmd/mcp/adopt.go- Calls AdoptContainer RPC on the agent.
- Prints adopted containers and their derived component names.
Test criteria:
- Calls RPC with service name
- Prints results
P3.8: Service commands (show, edit, export)
Scope: Implement mcp service show, mcp service edit,
mcp service export.
Depends: P3.1, P1.3
parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.9, P3.10
Deliverables:
cmd/mcp/service.go- Show: calls ListServices, filters to named service, prints spec.
- Edit: if file exists, open in $EDITOR. If not, export from agent first, then open. Save to standard path.
- Export: calls ListServices, converts to TOML, writes to file (default
path or
-f).
Test criteria:
- Show prints the correct spec
- Export writes a valid TOML file that can be loaded back
- Edit opens the correct file (or creates from agent spec)
P3.9: Transfer commands (push, pull)
Scope: Implement mcp push and mcp pull.
Depends: P3.1
parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.10
Deliverables:
cmd/mcp/transfer.go- Push: reads local file, determines service and path, calls PushFile RPC. Default relative path = basename of local file.
- Pull: calls PullFile RPC, writes content to local file.
Test criteria:
- Push reads file and sends correct content
- Push derives path from basename when omitted
- Pull writes file locally with correct content
P3.10: Node commands
Scope: Implement mcp node list, mcp node add, mcp node remove.
Depends: P3.1, P1.4
parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.9
Deliverables:
cmd/mcp/node.go- List: reads nodes from config, prints table.
- Add: appends a
[[nodes]]entry to the config file. - Remove: removes the named
[[nodes]]entry from the config file.
Test criteria:
- List shows all configured nodes
- Add creates a new entry
- Remove deletes the named entry
- Config file remains valid TOML after add/remove
Phase 4: Deployment Artifacts
Can be worked on in parallel with Phase 2 and 3.
P4.1: Systemd units
Scope: Write systemd service and timer files.
Depends: None (these are static files)
parallel: All of Phase 2 and 3
Deliverables:
deploy/systemd/mcp-agent.service— from ARCHITECTURE.mddeploy/systemd/mcp-agent-backup.service— snapshot oneshotdeploy/systemd/mcp-agent-backup.timer— daily 02:00 UTC, 5min jitter
Test criteria: Files match platform conventions (security hardening, correct paths, correct user).
P4.2: Example configs
Scope: Write example configuration files.
Depends: None
parallel: All of Phase 2 and 3
Deliverables:
deploy/examples/mcp.toml— CLI config with all fields documenteddeploy/examples/mcp-agent.toml— agent config with all fields documented
Test criteria: Examples are valid TOML, loadable by the config package.
P4.3: Install script
Scope: Write the agent install script.
Depends: None
parallel: All of Phase 2 and 3
Deliverables:
deploy/scripts/install-agent.sh— idempotent: create user/group, install binary, create /srv/mcp/, install example config, install systemd units, reload daemon.
Test criteria: Script is idempotent (running twice produces the same result).
Phase 5: Integration Testing and Polish
Serial. Requires all previous phases to be complete.
P5.1: Integration test suite
Scope: End-to-end tests: CLI → agent → podman → container lifecycle.
Depends: All of Phase 2 and 3
Deliverables:
- Test harness that starts an agent with a test config and temp database.
- Tests cover: deploy, stop, start, restart, sync, adopt, push/pull, list/ps/status.
- Tests verify registry state, runtime state, and CLI output.
Test criteria: All integration tests pass. Coverage of every CLI command and agent RPC.
P5.2: Bootstrap procedure test
Scope: Test the full MCP bootstrap on a clean node with existing containers.
Depends: P5.1
Deliverables:
- Documented test procedure: start agent, sync (discover containers), adopt, export, verify service definitions match running state.
- Verify the container rename flow (bare names → -).
P5.3: Documentation
Scope: Final docs pass.
Depends: P5.1
Deliverables:
CLAUDE.mdupdated with final project structure and commandsREADME.mdwith quick-startRUNBOOK.mdwith operational procedures- Verify ARCHITECTURE.md matches implementation
Parallelism Summary
Phase 0 (serial): P0.1 → P0.2
│
▼
Phase 1 (parallel): ┌─── P1.1 (registry)
├─── P1.2 (runtime)
├─── P1.3 (servicedef)
├─── P1.4 (config)
└─── P1.5 (auth)
│
┌─────┴──────┐
▼ ▼
Phase 2 (agent): P2.1 ──┐ Phase 3 (CLI): P3.1 ──┐
│ │ │ │
▼ │ ▼ │
P2.2 P2.3 Phase 4: P3.2 P3.3
P2.4 P2.5 P4.1-P4.3 P3.4 P3.5
P2.6 P2.7 (parallel P3.6 P3.7
P2.8 P2.9 with 2&3) P3.8 P3.9
│ │ P3.10
└──────────┬────────────────┘
▼
Phase 5 (serial): P5.1 → P5.2 → P5.3
Maximum parallelism: 5 engineers/agents during Phase 1, up to 8+ during Phase 2+3+4 combined.
Minimum serial path: P0.1 → P0.2 → P1.1 → P2.1 → P2.2 → P5.1 → P5.3