Commit Graph

46 Commits

Author SHA1 Message Date
f932dd64cc Add undeploy command: full inverse of deploy
Implements `mcp undeploy <service>` which tears down all infrastructure
for a service: removes mc-proxy routes, DNS records, TLS certificates,
stops and removes containers, releases allocated ports, and marks the
service inactive.

This fills the gap between `stop` (temporary pause) and `purge` (registry
cleanup). Undeploy is the complete teardown that returns the node to the
state before the service was deployed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.6.0
2026-03-27 21:45:42 -07:00
b2eaa69619 flake: install shell completions for mcp
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:41:25 -07:00
43789dd6be Fix route-based port mapping: use hostPort as container port
allocateRoutePorts() was using the route's port field (the mc-proxy
listener port, e.g. 443) as the container internal port in the podman
port mapping. For L7 routes, apps don't listen on the mc-proxy port —
they read $PORT (set to the assigned host port) and listen on that.

The mapping host:53204 → container:443 fails because nothing listens
on 443 inside the container. Fix: use hostPort as both the host and
container port, so $PORT = host port = container port.

Broke mcdoc in production (manually fixed, now permanently fixed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.5.0
2026-03-27 16:50:48 -07:00
2dd0ea93fc Fix ImageExists to use skopeo instead of podman manifest inspect
podman manifest inspect only works for multi-arch manifest lists,
returning exit code 125 for regular single-arch images. Switch to
skopeo inspect which works for both.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:49:48 -07:00
169b3a0d4a Fix EnsureRecord to check all existing records before updating
When multiple A records exist for a service (e.g., LAN and Tailscale
IPs), check all of them for the correct value before attempting an
update. Previously only checked the first record, which could trigger
a 409 conflict if another record already had the target value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:17:19 -07:00
2bda7fc138 Fix DNS record JSON parsing for MCNS response format
MCNS returns records wrapped in {"records": [...]} envelope with
uppercase field names (ID, Name, Type, Value), not bare arrays
with lowercase fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:12:43 -07:00
76247978c2 Fix protoToComponent to include routes in synced components
Routes from the proto ComponentSpec were dropped during sync, causing
the deploy flow to see empty regRoutes and skip cert provisioning,
route registration, and DNS registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:39:26 -07:00
ca3bc736f6 Bump version to v0.5.0 for Phase D release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:33:54 -07:00
9d9ad6588e Phase D: Automated DNS registration via MCNS
Add DNSRegistrar that creates/updates/deletes A records in MCNS
during deploy and stop. When a service has routes, the agent ensures
an A record exists in the configured zone pointing to the node's
address. On stop, the record is removed.

- Add MCNSConfig to agent config (server_url, ca_cert, token_path,
  zone, node_addr) with defaults and env overrides
- Add DNSRegistrar (internal/agent/dns.go): REST client for MCNS
  record CRUD, nil-receiver safe
- Wire into deploy flow (EnsureRecord after route registration)
- Wire into stop flow (RemoveRecord before container stop)
- 7 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:33:41 -07:00
e4d131021e Bump version to v0.4.0 for Phase C release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.4.0
2026-03-27 13:47:02 -07:00
8d6c060483 Update mc-proxy dependency to v1.2.0, drop replace directive
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:39:41 -07:00
c7e1232f98 Phase C: Automated TLS cert provisioning for L7 routes
Add CertProvisioner that requests TLS certificates from Metacrypt's CA
API during deploy. When a service has L7 routes, the agent checks for
an existing cert, re-issues if missing or within 30 days of expiry,
and writes chain+key to mc-proxy's cert directory before registering
routes.

- Add MetacryptConfig to agent config (server_url, ca_cert, mount,
  issuer, token_path) with defaults and env overrides
- Add CertProvisioner (internal/agent/certs.go): REST client for
  Metacrypt IssueCert, atomic file writes, cert expiry checking
- Wire into Agent struct and deploy flow (before route registration)
- Add hasL7Routes/l7Hostnames helpers in deploy.go
- Fix pre-existing lint issues: unreachable code in portalloc.go,
  gofmt in servicedef.go, gosec suppressions, golangci v2 config
- Update vendored mc-proxy to fix protobuf init panic
- 10 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:31:11 -07:00
572d2fb196 Regenerate proto files for mc/ module path
Raw descriptor bytes in .pb.go files were corrupted by the sed-based
module path rename (string length changed, breaking protobuf binary
encoding). Regenerated with protoc to fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:54:40 -07:00
c6a84a1b80 Bump flake.nix version to match latest tag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:16:45 -07:00
08b3e2a472 Migrate module path from kyle/ to mc/ org
All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0,
mc-proxy to v1.1.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.3.0
2026-03-27 02:07:42 -07:00
6e30cf12f2 Mark Phase B complete in PROGRESS_V1.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:36:50 -07:00
c28562dbcf Merge pull request 'Phase B: Agent registers routes with mc-proxy on deploy' (#2) from phase-b-route-registration into master 2026-03-27 08:36:25 +00:00
84c487e7f8 Phase B: Agent registers routes with mc-proxy on deploy
The agent connects to mc-proxy via Unix socket and automatically
registers/removes routes during deploy and stop. This eliminates
manual mcproxyctl usage or TOML editing.

- New ProxyRouter abstraction wraps mc-proxy client library
- Deploy: after container starts, registers routes with mc-proxy
  using host ports from the registry
- Stop: removes routes from mc-proxy before stopping container
- Config: [mcproxy] section with socket path and cert_dir
- Nil-safe: if mc-proxy socket not configured, route registration
  is silently skipped (backward compatible)
- L7 routes use certs from convention path (<cert_dir>/<service>.pem)
- L4 routes use TLS passthrough (backend_tls=true)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:35:06 -07:00
8b1c89fdc9 Add mcp build command and deploy auto-build
Extends MCP to own the full build-push-deploy lifecycle. When deploying,
the CLI checks whether each component's image tag exists in the registry
and builds/pushes automatically if missing and build config is present.

- Add Build, Push, ImageExists to runtime.Runtime interface (podman impl)
- Add mcp build <service>[/<image>] command
- Add [build] section to CLI config (workspace path)
- Add path and [build.images] to service definitions
- Wire auto-build into mcp deploy before agent RPC
- Update ARCHITECTURE.md with runtime interface and deploy auto-build docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.2.0
2026-03-27 01:34:25 -07:00
d7f18a5d90 Add Platform Evolution tracking to PROGRESS_V1.md
Phase A complete: route declarations, port allocation, $PORT env vars.
Phase B in progress: agent mc-proxy route registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:25:26 -07:00
5a802bceb6 Merge pull request 'Add route declarations and automatic port allocation' (#1) from mcp-routes-port-allocation into master 2026-03-27 08:16:20 +00:00
777ba8a0e1 Add route declarations and automatic port allocation to MCP agent
Service definitions can now declare routes per component instead of
manual port mappings:

  [[components.routes]]
  name = "rest"
  port = 8443
  mode = "l4"

The agent allocates free host ports at deploy time and injects
$PORT/$PORT_<NAME> env vars into containers. Backward compatible:
components with old-style ports= work unchanged.

Changes:
- Proto: RouteSpec message, routes + env fields on ComponentSpec
- Servicedef: RouteDef parsing and validation from TOML
- Registry: component_routes table with host_port tracking
- Runtime: Env field on ContainerSpec, -e flag in BuildRunArgs
- Agent: PortAllocator (random 10000-60000, availability check),
  deploy wiring for route→port mapping and env injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:04:47 -07:00
503c52dc26 Update service definition example for convention-driven format
Drop uses_mcdsl, full image URLs, ports, network, user, restart.
Add route declarations and service-level version. Image names and
most config are now derived from conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:19:12 -07:00
6465da3547 Add build and release lifecycle to ARCHITECTURE.md
Service definitions now include [build] config (path, uses_mcdsl,
images) so MCP owns the full build-push-deploy lifecycle, replacing
mcdeploy.toml. Documents mcp build, mcp sync auto-build, image
versioning policy (explicit tags, never :latest), and workspace
convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:31:05 -07:00
e18a3647bf Add Nix flake for mcp and mcp-agent
Exposes two packages:
- default (mcp CLI) for operator workstations
- mcp-agent for managed nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:46:36 -07:00
1e58dcce27 Implement mcp purge command for registry cleanup
Add PurgeComponent RPC to the agent service that removes stale registry
entries for components that are both gone (observed state is removed,
unknown, or exited) and unwanted (not in any current service definition).
Refuses to purge components with running or stopped containers. When all
components of a service are purged, the service row is deleted too.
Supports --dry-run to preview without modifying the database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:30:45 -07:00
1afbf5e1f6 Add purge design to architecture doc
Purge removes stale registry entries — components that are no longer
in service definitions and have no running container. Designed as an
explicit, safe operation separate from sync: sync is additive (push
desired state), purge is subtractive (remove forgotten entries).

Includes safety rules (refuses to purge running containers), dry-run
mode, agent RPC definition, and rationale for why sync should not be
made destructive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:22:27 -07:00
ea8a42a696 P5.2 + P5.3: Bootstrap docs, README, and RUNBOOK
- docs/bootstrap.md: step-by-step bootstrap procedure with lessons
  learned from the first deployment (NixOS sandbox issues, podman
  rootless setup, container naming, MCR auth workaround)
- README.md: quick-start guide, command reference, doc links
- RUNBOOK.md: operational procedures for operators (health checks,
  common operations, unsealing metacrypt, cert renewal, incident
  response, disaster recovery, file locations)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:32:22 -07:00
ff9bfc5087 Update PROGRESS_V1.md with deployment status and remaining work
Documents Phase 6 (deployment), bugs fixed during rollout,
remaining work organized by priority (operational, quality,
design, infrastructure), and current platform state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:27:30 -07:00
17ac0f3014 Trim whitespace from token file in CLI
Token files with trailing newlines caused gRPC "non-printable ASCII
characters" errors in the authorization header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:19:27 -07:00
7133871be2 Default CLI config path to ~/.config/mcp/mcp.toml
Eliminates the need to pass --config on every command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:16:34 -07:00
efa32a7712 Fix container name handling for hyphenated service names
Extract ContainerNameFor and SplitContainerName into names.go.
ContainerNameFor handles single-component services where service
name equals component name (e.g., mc-proxy → "mc-proxy" not
"mc-proxy-mc-proxy"). SplitContainerName checks known services
from the registry before falling back to naive split on "-", fixing
mc-proxy being misidentified as service "mc" component "proxy".

Also fixes podman ps JSON parsing (Command field is []string not
string) found during deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:13:20 -07:00
941dd7003a Fix design-vs-implementation gaps found in verification
Critical fixes:
- Wire monitor subsystem to agent startup (was dead code)
- Implement NodeStatus RPC (disk, memory, CPU, runtime version, uptime)
- Deploy respects active=false (sets desired_state=stopped, not always running)

Medium fixes:
- Add Started field to runtime.ContainerInfo, populate from podman inspect
- Populate ComponentInfo.started in status handlers for uptime display
- Add Monitor field to Agent struct for graceful shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.1.0
2026-03-26 12:29:04 -07:00
8f913ddf9b P2.2-P2.9, P3.2-P3.10, P4.1-P4.3: Complete Phases 2, 3, and 4
11 work units built in parallel and merged:

Agent handlers (Phase 2):
- P2.2 Deploy: pull images, stop/remove/run containers, update registry
- P2.3 Lifecycle: stop/start/restart with desired_state tracking
- P2.4 Status: list (registry), live check (runtime), get status (drift+events)
- P2.5 Sync: receive desired state, reconcile unmanaged containers
- P2.6 File transfer: push/pull scoped to /srv/<service>/, path validation
- P2.7 Adopt: match <service>-* containers, derive component names
- P2.8 Monitor: continuous watch loop, drift/flap alerting, event pruning
- P2.9 Snapshot: VACUUM INTO database backup command

CLI commands (Phase 3):
- P3.2 Login, P3.3 Deploy, P3.4 Stop/Start/Restart
- P3.5 List/Ps/Status, P3.6 Sync, P3.7 Adopt
- P3.8 Service show/edit/export, P3.9 Push/Pull, P3.10 Node list/add/remove

Deployment artifacts (Phase 4):
- Systemd units (agent service + backup timer)
- Example configs (CLI + agent)
- Install script (idempotent)

All packages: build, vet, lint (0 issues), test (all pass).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:21:18 -07:00
d7cc970133 Split CLI command stubs into separate files
Move each command function from main.go into its own file
(deploy.go, lifecycle.go, status.go, etc.) to enable parallel
development by multiple workers without file conflicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:59:17 -07:00
53535f1e96 P2.1 + P3.1: Agent skeleton and CLI skeleton
Agent (P2.1): Agent struct with registry DB, runtime, and logger.
gRPC server with TLS 1.3 and MCIAS auth interceptor. Graceful
shutdown on SIGINT/SIGTERM. All RPCs return Unimplemented until
handlers are built in P2.2-P2.9.

CLI (P3.1): Full command tree with all 15 subcommands as stubs
(login, deploy, stop, start, restart, list, ps, status, sync,
adopt, service show/edit/export, push, pull, node list/add/remove).
gRPC dial helper with TLS, CA cert, and bearer token attachment.

Both gates for parallel Phase 2+3 work are now open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:51:03 -07:00
15b8823810 P1.2-P1.5: Complete Phase 1 core libraries
Four packages built in parallel:

- P1.2 runtime: Container runtime abstraction with podman implementation.
  Interface (Pull/Run/Stop/Remove/Inspect/List), ContainerSpec/ContainerInfo
  types, CLI arg building, version extraction from image tags. 2 tests.

- P1.3 servicedef: TOML service definition file parsing. Load/Write/LoadAll,
  validation (required fields, unique component names), proto conversion.
  5 tests.

- P1.4 config: CLI and agent config loading from TOML. Duration type for
  time fields, env var overrides (MCP_*/MCP_AGENT_*), required field
  validation, sensible defaults. 7 tests.

- P1.5 auth: MCIAS integration. Token validator with 30s SHA-256 cache,
  gRPC unary interceptor (admin role enforcement, audit logging),
  Login/LoadToken/SaveToken for CLI. 9 tests.

All packages pass build, vet, lint, and test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:36:12 -07:00
6122123064 P1.1: Registry package with full CRUD and tests
SQLite schema (services, components, ports, volumes, cmd, events),
migrations, and complete CRUD operations. 7 tests covering:
idempotent migration, service CRUD, duplicate name rejection,
component CRUD with ports/volumes/cmd, composite PK enforcement,
cascade delete, and event insert/query/count/prune.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:18:35 -07:00
3f23b14ef4 P0.2: Proto definitions and code generation
Full gRPC service definition with 12 RPCs: Deploy, StopService,
StartService, RestartService, SyncDesiredState, ListServices,
GetServiceStatus, LiveCheck, AdoptContainers, PushFile, PullFile,
NodeStatus. buf lint passes. Generated Go code compiles.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:14:43 -07:00
eaad18116a P0.1: Repository and module setup
Go module, Makefile with standard targets, golangci-lint v2 config,
CLAUDE.md, and empty CLI/agent binaries. Build, vet, and lint all pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:12:52 -07:00
6a90b21a62 Add PROJECT_PLAN_V1.md and PROGRESS_V1.md
30 discrete tasks across 5 phases, with dependency graph and
parallelism analysis. Phase 1 (5 core libraries) is fully parallel.
Phases 2+3+4 (agent handlers, CLI commands, deployment artifacts)
support up to 8+ concurrent engineers/agents. Critical path is
proto → registry + runtime → agent deploy → integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:08:06 -07:00
a1bbc008b5 Update ARCHITECTURE.md with design audit findings
Incorporates all 14 items from DESIGN_AUDIT.md: node registry in CLI
config, container naming convention (<service>-<component>), active
state semantics, adopt by service prefix, EventInfo service field,
version from image tag, snapshot/backup timer, exec-style alert
commands, overlay-only bind address, RPC audit logging, /srv/ ownership,
rootless podman UID mapping docs.

Three minor fixes from final review: stale adopt syntax in bootstrap
section, explicit container naming in deploy flow, clarify that list/ps
query all registered nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:03:25 -07:00
12d8d733be Add DESIGN_AUDIT.md from second review pass
Engineering and security review of the rewritten ARCHITECTURE.md.
14 issues identified and resolved: node registry in CLI config,
active/desired_state semantics, container naming convention
(<service>-<component>), exec-style alert commands to prevent
injection, agent binds to overlay IP only (not behind MC-Proxy),
RPC audit logging, /srv/ owned by mcp user, and others.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:58:04 -07:00
ea7a9dcf4d Rewrite ARCHITECTURE.md incorporating review findings
Major design changes from the review:
- Merge agent and watcher into a single smart per-node daemon
- CLI is a thin client with no database; service definition files
  are the operator's source of truth for desired state
- Registry database lives on the agent, not the CLI
- Rename containers to components; components are independently
  deployable within a service (mcp deploy metacrypt/web)
- active: true/false in service definitions; desired_state values
  are running/stopped/ignore
- Server-side TLS + bearer token (not mTLS)
- Dedicated mcp user with rootless podman
- CLI commands: list (registry), ps (live), status (drift+events),
  sync (push desired state)
- Agent reports node resources (disk, memory, CPU) for future scheduling
- Agent is gRPC-only (deliberate exception to REST+gRPC parity rule)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:31:48 -07:00
c8d0d42ea8 Add REVIEW.md from architecture review session
Documents 12 issues found during critical review of ARCHITECTURE.md
and their resolutions: merged agent/watcher into single smart daemon,
components model for independent deploy within services, database
lives on agent not CLI, TLS+bearer (not mTLS), desired_state=ignore
for unmanaged containers, and other clarifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:27:57 -07:00
6b99937a69 Add MCP v1 architecture specification
Design spec for the Metacircular Control Plane covering master/agent
architecture, service registry with desired/observed state tracking,
container lifecycle management, service definition files, single-file
transfer scoped to /srv/<service>/, and continuous monitoring via
mcp watch with event logging and alerting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 09:42:41 -07:00