mc/mcp

Files

Kyle Isom 08b3e2a472 Migrate module path from kyle/ to mc/ org

All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0,
mc-proxy to v1.1.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 02:07:42 -07:00

23 KiB

Raw Blame History

MCP v1 Project Plan

Overview

This plan breaks MCP v1 into discrete implementation tasks organized into phases. Tasks within a phase can often be parallelized. Dependencies between tasks are noted explicitly.

The critical path is: proto → registry + runtime → agent deploy handler → integration testing. Parallelizable work (CLI commands, monitoring, file transfer) can proceed alongside the critical path once the proto and core libraries are ready.

Notation

[Pn] = Phase n
[Pn.m] = Task m in phase n
depends: [Px.y] = must wait for that task
parallel: [Px.y, Px.z] = can run alongside these tasks
Each task includes scope, deliverables, and test criteria

Phase 0: Project Scaffolding

One engineer. Serial. Establishes the project skeleton that everything else builds on.

P0.1: Repository and module setup

Scope: Initialize the Go module, create the standard directory structure, and configure tooling.

Deliverables:

go.mod with module path git.wntrmute.dev/mc/mcp
Makefile with standard targets (build, test, vet, lint, proto, proto-lint, clean, all)
.golangci.yaml with platform-standard linter config
.gitignore
CLAUDE.md (project-specific AI context)
Empty cmd/mcp/main.go and cmd/mcp-agent/main.go (compile to verify the skeleton works)

Test criteria: make build succeeds. make vet and make lint pass on the empty project.

P0.2: Proto definitions and code generation

Scope: Write the mcp.proto file from the ARCHITECTURE.md spec and generate Go code.

Depends: P0.1

Deliverables:

proto/mcp/v1/mcp.proto — full service definition from ARCHITECTURE.md
buf.yaml configuration
gen/mcp/v1/ — generated Go code
make proto and make proto-lint both pass

Test criteria: Generated code compiles. buf lint passes. All message types and RPC methods from the architecture doc are present.

Phase 1: Core Libraries

Four independent packages. All can be built in parallel once P0.2 is complete. Each package has a well-defined interface, no dependencies on other Phase 1 packages, and is fully testable in isolation.

P1.1: Registry package (`internal/registry/`)

Scope: SQLite schema, migrations, and CRUD operations for the node-local registry.

Depends: P0.1

Deliverables:

db.go — open database, run migrations, close. Schema from ARCHITECTURE.md (services, components, component_ports, component_volumes, component_cmd, events tables).
services.go — create, get, list, update, delete services.
components.go — create, get, list (by service), update desired/observed state, update spec, delete. Support filtering by desired_state.
events.go — insert event, query events by component+service+time range, count events in window (for flap detection), prune old events.

Test criteria: Full test coverage using t.TempDir() + real SQLite. Tests cover:

Schema migration is idempotent
Service and component CRUD
Desired/observed state updates
Event insertion, time-range queries, pruning
Foreign key cascading (delete service → components deleted)
Component composite primary key (service, name) enforced

parallel: P1.2, P1.3, P1.4

P1.2: Runtime package (`internal/runtime/`)

Scope: Container runtime abstraction with a podman implementation.

Depends: P0.1

Deliverables:

runtime.go — Runtime interface:

type Runtime interface {
    Pull(ctx context.Context, image string) error
    Run(ctx context.Context, spec ContainerSpec) error
    Stop(ctx context.Context, name string) error
    Remove(ctx context.Context, name string) error
    Inspect(ctx context.Context, name string) (ContainerInfo, error)
    List(ctx context.Context) ([]ContainerInfo, error)
}

Plus ContainerSpec and ContainerInfo structs.

podman.go — podman implementation. Builds command-line arguments from ContainerSpec, execs podman CLI, parses podman inspect JSON output.

Test criteria:

Unit tests for command-line argument building (given a ContainerSpec, verify the constructed podman args are correct). These don't require podman to be installed.
ContainerSpec → podman flag mapping matches the table in ARCHITECTURE.md.
Container naming follows <service>-<component> convention.
Version extraction from image tag works (e.g., registry/img:v1.2.0 → v1.2.0, registry/img:latest → latest, registry/img → "").

parallel: P1.1, P1.3, P1.4

P1.3: Service definition package (`internal/servicedef/`)

Scope: Parse, validate, and write TOML service definition files.

Depends: P0.1

Deliverables:

servicedef.go — Load(path) → ServiceDef, Write(path, ServiceDef), LoadAll(dir) → []ServiceDef. Validation: required fields (name, node, at least one component), component names unique within service. Converts between TOML representation and proto ServiceSpec.

Test criteria:

Round-trip: write a ServiceDef, read it back, verify equality
Validation rejects missing name, missing node, empty components, duplicate component names
LoadAll loads all .toml files from a directory
active field defaults to true if omitted
Conversion to/from proto ServiceSpec is correct

parallel: P1.1, P1.2, P1.4

P1.4: Config package (`internal/config/`)

Scope: Load and validate CLI and agent configuration from TOML files.

Depends: P0.1

Deliverables:

cli.go — CLI config struct: services dir, MCIAS settings, auth (token path, optional username/password_file), nodes list. Load from TOML with env var overrides (MCP_*). Validate required fields.
agent.go — Agent config struct: server (grpc_addr, tls_cert, tls_key), database path, MCIAS settings, agent (node_name, container_runtime), monitor settings, log level. Load from TOML with env var overrides (MCP_AGENT_*). Validate required fields.

Test criteria:

Load from TOML file, verify all fields populated
Required field validation (reject missing grpc_addr, missing tls_cert, etc.)
Env var overrides work
Nodes list parses correctly from [[nodes]]

parallel: P1.1, P1.2, P1.3

P1.5: Auth package (`internal/auth/`)

Scope: MCIAS token validation for the agent, and token acquisition for the CLI.

Depends: P0.1, P0.2 (uses proto-generated types for gRPC interceptor)

Deliverables:

auth.go:
- Interceptor — gRPC unary server interceptor that extracts bearer tokens, validates against MCIAS (with 30s SHA-256-keyed cache), checks admin role, audit-logs every RPC (method, caller, timestamp). Returns UNAUTHENTICATED or PERMISSION_DENIED on failure.
- Login(url, username, password) → token — authenticate to MCIAS, return bearer token.
- LoadToken(path) → token — read cached token from file.
- SaveToken(path, token) — write token to file with 0600 permissions.

Test criteria:

Interceptor rejects missing token (UNAUTHENTICATED)
Interceptor rejects invalid token (UNAUTHENTICATED)
Interceptor rejects non-admin token (PERMISSION_DENIED)
Token caching works (same token within 30s returns cached result)
Token file read/write with correct permissions
Audit log entry emitted on every RPC (check slog output)

Note: Full interceptor testing requires an MCIAS mock or test instance. Unit tests can mock the MCIAS validation call. Integration tests against a real MCIAS instance are a Phase 4 concern.

parallel: P1.1, P1.2, P1.3, P1.4 (partially; needs P0.2 for proto types)

Phase 2: Agent

The agent is the core of MCP. Tasks in this phase build on Phase 1 libraries. Some tasks can be parallelized; dependencies are noted.

P2.1: Agent skeleton and gRPC server

Scope: Wire up the agent binary: config loading, database setup, gRPC server with TLS and auth interceptor, graceful shutdown.

Depends: P0.2, P1.1, P1.4, P1.5

Deliverables:

cmd/mcp-agent/main.go — cobra root command, server subcommand
internal/agent/agent.go — Agent struct holding registry, runtime, config. Initializes database, starts gRPC server with TLS and auth interceptor, handles SIGINT/SIGTERM for graceful shutdown.
Agent starts, listens on configured address, rejects unauthenticated RPCs, shuts down cleanly.

Test criteria: Agent starts with a test config, accepts TLS connections, rejects RPCs without a valid token. Graceful shutdown closes the database and stops the listener.

P2.2: Deploy handler

Scope: Implement the Deploy RPC on the agent.

Depends: P2.1, P1.2

Deliverables:

internal/agent/deploy.go — handles DeployRequest: records spec in registry, iterates components, calls runtime (pull, stop, remove, run, inspect), updates observed state and version, returns results.
Supports single-component deploy (when component field is set).

Test criteria:

Deploy with all components records spec in registry
Deploy with single component only touches that component
Failed pull returns error for that component, others continue
Registry is updated with desired_state=running and observed_state
Version is extracted from image tag

P2.3: Lifecycle handlers (stop, start, restart)

Scope: Implement StopService, StartService, RestartService RPCs.

Depends: P2.1, P1.2

parallel: P2.2

Deliverables:

internal/agent/lifecycle.go
Stop: for each component, call runtime stop, update desired_state to stopped, update observed_state.
Start: for each component, call runtime start (or run if removed), update desired_state to running, update observed_state.
Restart: stop then start each component.

Test criteria:

Stop sets desired_state=stopped, calls runtime stop
Start sets desired_state=running, calls runtime start
Restart cycles each component
Returns per-component results

P2.4: Status handlers (list, live check, get status)

Scope: Implement ListServices, LiveCheck, GetServiceStatus RPCs.

Depends: P2.1, P1.2

parallel: P2.2, P2.3

Deliverables:

internal/agent/status.go
ListServices: read from registry, no runtime query.
LiveCheck: query runtime, reconcile registry, return updated state.
GetServiceStatus: live check + drift detection + recent events.

Test criteria:

ListServices returns registry contents without touching runtime
LiveCheck updates observed_state from runtime
GetServiceStatus includes drift info for mismatched desired/observed
GetServiceStatus includes recent events

P2.5: Sync handler

Scope: Implement SyncDesiredState RPC.

Depends: P2.1, P1.2

parallel: P2.2, P2.3, P2.4

Deliverables:

internal/agent/sync.go
Receives list of ServiceSpecs from CLI.
For each service: create or update in registry, set desired_state based on active flag (running if active, stopped if not).
Runs reconciliation (discover unmanaged containers, set to ignore).
Returns per-service summary of what changed.

Test criteria:

New services are created in registry
Existing services have specs updated
Active=false sets desired_state=stopped for all components
Unmanaged containers discovered and set to ignore
Returns accurate change summaries

P2.6: File transfer handlers

Scope: Implement PushFile and PullFile RPCs.

Depends: P2.1

parallel: P2.2, P2.3, P2.4, P2.5

Deliverables:

internal/agent/files.go
Path validation: resolve /srv/<service>/<path>, reject .. traversal, reject symlinks escaping the service directory.
Push: atomic write (temp file + rename), create intermediate dirs.
Pull: read file, return content and permissions.

Test criteria:

Push creates file at correct path with correct permissions
Push creates intermediate directories
Push is atomic (partial write doesn't leave corrupt file)
Pull returns file content and mode
Path traversal rejected (../etc/passwd)
Symlink escape rejected
Service directory scoping enforced

P2.7: Adopt handler

Scope: Implement AdoptContainer RPC.

Depends: P2.1, P1.2

parallel: P2.2, P2.3, P2.4, P2.5, P2.6

Deliverables:

internal/agent/adopt.go
Matches containers by <service>-* prefix in runtime.
Creates service if needed.
Strips prefix to derive component name.
Sets desired_state based on current observed_state.
Returns per-container results.

Test criteria:

Matches containers by prefix
Creates service when it doesn't exist
Derives component names correctly (metacrypt-api → api, metacrypt-web → web)
Single-component service (mc-proxy → mc-proxy) works
Sets desired_state to running for running containers, stopped for stopped
Returns results for each adopted container

P2.8: Monitor subsystem

Scope: Implement the continuous monitoring loop and alerting.

Depends: P2.1, P1.1, P1.2

parallel: P2.2-P2.7 (can be built alongside other agent handlers)

Deliverables:

internal/monitor/monitor.go — Monitor struct, Start/Stop methods. Runs a goroutine with a ticker at the configured interval. Each tick: queries runtime, reconciles registry, records events, evaluates alerts.
internal/monitor/alerting.go — Alert evaluation: drift detection (desired != observed for managed components), flap detection (event count in window > threshold), cooldown tracking per component, alert command execution via exec (argv array, MCP_* env vars).
Event pruning (delete events older than retention period).

Test criteria:

Monitor detects state transitions and records events
Drift alert fires on desired/observed mismatch
Drift alert respects cooldown (doesn't fire again within window)
Flap alert fires when transitions exceed threshold in window
Alert command is exec'd with correct env vars
Event pruning removes old events, retains recent ones
Monitor can be stopped cleanly (goroutine exits)

P2.9: Snapshot command

Scope: Implement mcp-agent snapshot for database backup.

Depends: P2.1, P1.1

parallel: P2.2-P2.8

Deliverables:

cmd/mcp-agent/snapshot.go — cobra subcommand. Runs VACUUM INTO to create a consistent backup in /srv/mcp/backups/.

Test criteria:

Creates a backup file with timestamp in name
Backup is a valid SQLite database
Original database is unchanged

Phase 3: CLI

All CLI commands are thin gRPC clients. Most can be built in parallel once the proto (P0.2) and servicedef/config packages (P1.3, P1.4) are ready. CLI commands can be tested against a running agent (integration) or with a mock gRPC server (unit).

P3.1: CLI skeleton

Scope: Wire up the CLI binary: config loading, gRPC connection setup, cobra command tree.

Depends: P0.2, P1.3, P1.4

Deliverables:

cmd/mcp/main.go — cobra root command with --config flag. Subcommand stubs for all commands.
gRPC dial helper: reads node address from config, establishes TLS connection with CA verification, attaches bearer token to metadata.

Test criteria: CLI starts, loads config, --help shows all subcommands.

Scope: Implement mcp login.

Depends: P3.1, P1.5

Deliverables:

cmd/mcp/login.go — prompts for username/password (or reads from config for unattended), calls MCIAS, saves token to configured path with 0600 permissions.

Test criteria: Token is saved to the correct path with correct permissions.

P3.3: Deploy command

Scope: Implement mcp deploy.

Depends: P3.1, P1.3

parallel: P3.4, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

cmd/mcp/deploy.go
Resolves service spec: file (from -f or default path) > agent registry.
Parses <service>/<component> syntax for single-component deploy.
Pushes spec to agent via Deploy RPC.
Prints per-component results.

Test criteria:

Reads service definition from file
Falls back to agent registry when no file exists
Fails with clear error when neither exists
Single-component syntax works
Prints results

P3.4: Lifecycle commands (stop, start, restart)

Scope: Implement mcp stop, mcp start, mcp restart.

Depends: P3.1, P1.3

parallel: P3.3, P3.5, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

cmd/mcp/lifecycle.go
Stop: sets active = false in service definition file, calls StopService RPC.
Start: sets active = true in service definition file, calls StartService RPC.
Restart: calls RestartService RPC (does not change active flag).

Test criteria:

Stop updates the service definition file
Start updates the service definition file
Both call the correct RPC
Restart does not modify the file

P3.5: Status commands (list, ps, status)

Scope: Implement mcp list, mcp ps, mcp status.

Depends: P3.1

parallel: P3.3, P3.4, P3.6, P3.7, P3.8, P3.9, P3.10

Deliverables:

cmd/mcp/status.go
List: calls ListServices on all nodes, formats table output.
Ps: calls LiveCheck on all nodes, formats with uptime and version.
Status: calls GetServiceStatus, shows drift and recent events.

Test criteria:

Queries all registered nodes
Formats output as readable tables
Status highlights drift clearly

P3.6: Sync command

Scope: Implement mcp sync.

Depends: P3.1, P1.3

parallel: P3.3, P3.4, P3.5, P3.7, P3.8, P3.9, P3.10

Deliverables:

cmd/mcp/sync.go
Loads all service definitions from the services directory.
Groups by node.
Calls SyncDesiredState on each agent with that node's services.
Prints summary of changes.

Test criteria:

Loads all service definitions
Filters by node correctly
Pushes to correct agents
Prints change summary

P3.7: Adopt command

Scope: Implement mcp adopt.

Depends: P3.1

parallel: P3.3, P3.4, P3.5, P3.6, P3.8, P3.9, P3.10

Deliverables:

cmd/mcp/adopt.go
Calls AdoptContainer RPC on the agent.
Prints adopted containers and their derived component names.

Test criteria:

Calls RPC with service name
Prints results

P3.8: Service commands (show, edit, export)

Scope: Implement mcp service show, mcp service edit, mcp service export.

Depends: P3.1, P1.3

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.9, P3.10

Deliverables:

cmd/mcp/service.go
Show: calls ListServices, filters to named service, prints spec.
Edit: if file exists, open in $EDITOR. If not, export from agent first, then open. Save to standard path.
Export: calls ListServices, converts to TOML, writes to file (default path or -f).

Test criteria:

Show prints the correct spec
Export writes a valid TOML file that can be loaded back
Edit opens the correct file (or creates from agent spec)

P3.9: Transfer commands (push, pull)

Scope: Implement mcp push and mcp pull.

Depends: P3.1

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.10

Deliverables:

cmd/mcp/transfer.go
Push: reads local file, determines service and path, calls PushFile RPC. Default relative path = basename of local file.
Pull: calls PullFile RPC, writes content to local file.

Test criteria:

Push reads file and sends correct content
Push derives path from basename when omitted
Pull writes file locally with correct content

P3.10: Node commands

Scope: Implement mcp node list, mcp node add, mcp node remove.

Depends: P3.1, P1.4

parallel: P3.3, P3.4, P3.5, P3.6, P3.7, P3.8, P3.9

Deliverables:

cmd/mcp/node.go
List: reads nodes from config, prints table.
Add: appends a [[nodes]] entry to the config file.
Remove: removes the named [[nodes]] entry from the config file.

Test criteria:

List shows all configured nodes
Add creates a new entry
Remove deletes the named entry
Config file remains valid TOML after add/remove

Phase 4: Deployment Artifacts

Can be worked on in parallel with Phase 2 and 3.

P4.1: Systemd units

Scope: Write systemd service and timer files.

Depends: None (these are static files)

parallel: All of Phase 2 and 3

Deliverables:

deploy/systemd/mcp-agent.service — from ARCHITECTURE.md
deploy/systemd/mcp-agent-backup.service — snapshot oneshot
deploy/systemd/mcp-agent-backup.timer — daily 02:00 UTC, 5min jitter

Test criteria: Files match platform conventions (security hardening, correct paths, correct user).

P4.2: Example configs

Scope: Write example configuration files.

Depends: None

parallel: All of Phase 2 and 3

Deliverables:

deploy/examples/mcp.toml — CLI config with all fields documented
deploy/examples/mcp-agent.toml — agent config with all fields documented

Test criteria: Examples are valid TOML, loadable by the config package.

P4.3: Install script

Scope: Write the agent install script.

Depends: None

parallel: All of Phase 2 and 3

Deliverables:

deploy/scripts/install-agent.sh — idempotent: create user/group, install binary, create /srv/mcp/, install example config, install systemd units, reload daemon.

Test criteria: Script is idempotent (running twice produces the same result).

Phase 5: Integration Testing and Polish

Serial. Requires all previous phases to be complete.

P5.1: Integration test suite

Scope: End-to-end tests: CLI → agent → podman → container lifecycle.

Depends: All of Phase 2 and 3

Deliverables:

Test harness that starts an agent with a test config and temp database.
Tests cover: deploy, stop, start, restart, sync, adopt, push/pull, list/ps/status.
Tests verify registry state, runtime state, and CLI output.

Test criteria: All integration tests pass. Coverage of every CLI command and agent RPC.

P5.2: Bootstrap procedure test

Scope: Test the full MCP bootstrap on a clean node with existing containers.

Depends: P5.1

Deliverables:

Documented test procedure: start agent, sync (discover containers), adopt, export, verify service definitions match running state.
Verify the container rename flow (bare names → -).

P5.3: Documentation

Scope: Final docs pass.

Depends: P5.1

Deliverables:

CLAUDE.md updated with final project structure and commands
README.md with quick-start
RUNBOOK.md with operational procedures
Verify ARCHITECTURE.md matches implementation

Parallelism Summary

Phase 0 (serial):  P0.1 → P0.2
                          │
                          ▼
Phase 1 (parallel): ┌─── P1.1 (registry)
                    ├─── P1.2 (runtime)
                    ├─── P1.3 (servicedef)
                    ├─── P1.4 (config)
                    └─── P1.5 (auth)
                          │
                    ┌─────┴──────┐
                    ▼            ▼
Phase 2 (agent):  P2.1 ──┐   Phase 3 (CLI):  P3.1 ──┐
                    │     │                     │     │
                    ▼     │                     ▼     │
                  P2.2  P2.3    Phase 4:      P3.2  P3.3
                  P2.4  P2.5    P4.1-P4.3     P3.4  P3.5
                  P2.6  P2.7    (parallel     P3.6  P3.7
                  P2.8  P2.9     with 2&3)    P3.8  P3.9
                    │                           │   P3.10
                    └──────────┬────────────────┘
                               ▼
Phase 5 (serial):  P5.1 → P5.2 → P5.3

Maximum parallelism: 5 engineers/agents during Phase 1, up to 8+ during Phase 2+3+4 combined.

Minimum serial path: P0.1 → P0.2 → P1.1 → P2.1 → P2.2 → P5.1 → P5.3

23 KiB Raw Blame History

MCP v1 Project Plan

Overview

Notation

Phase 0: Project Scaffolding

P0.1: Repository and module setup

P0.2: Proto definitions and code generation

Phase 1: Core Libraries

P1.1: Registry package (internal/registry/)

P1.2: Runtime package (internal/runtime/)

P1.3: Service definition package (internal/servicedef/)

P1.4: Config package (internal/config/)

P1.5: Auth package (internal/auth/)

Phase 2: Agent

P2.1: Agent skeleton and gRPC server

P2.2: Deploy handler

P2.3: Lifecycle handlers (stop, start, restart)

P2.4: Status handlers (list, live check, get status)

P2.5: Sync handler

P2.6: File transfer handlers

P2.7: Adopt handler

P2.8: Monitor subsystem

P2.9: Snapshot command

Phase 3: CLI

P3.1: CLI skeleton

P3.2: Login command

P3.3: Deploy command

P3.4: Lifecycle commands (stop, start, restart)

P3.5: Status commands (list, ps, status)

P3.6: Sync command

P3.7: Adopt command

P3.8: Service commands (show, edit, export)

P3.9: Transfer commands (push, pull)

P3.10: Node commands

Phase 4: Deployment Artifacts

P4.1: Systemd units

P4.2: Example configs

P4.3: Install script

Phase 5: Integration Testing and Polish

P5.1: Integration test suite

P5.2: Bootstrap procedure test

P5.3: Documentation

Parallelism Summary

23 KiB

Raw Blame History

P1.1: Registry package (`internal/registry/`)

P1.2: Runtime package (`internal/runtime/`)

P1.3: Service definition package (`internal/servicedef/`)

P1.4: Config package (`internal/config/`)

P1.5: Auth package (`internal/auth/`)