Commit Graph

72 Commits

Author SHA1 Message Date
68d670b3ed Add edge CLI scaffolding for Phase 2 testing
Temporary CLI commands for testing edge routing RPCs directly
(before the master exists):

  mcp edge list -n svc
  mcp edge setup <hostname> -n svc --backend-hostname ... --backend-port ...
  mcp edge remove <hostname> -n svc

Verified end-to-end on svc: setup provisions route in mc-proxy and
persists in agent registry, remove cleans up both, list shows routes
with cert metadata.

Finding: MCNS registers LAN IPs for .svc.mcp. hostnames, not Tailnet
IPs. The v2 master needs to register Tailnet IPs in deploy flow step 3.

These commands will be removed or replaced when the master is built
(Phase 3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:04:12 -07:00
714320c018 Add edge routing and health check RPCs (Phase 2)
New agent RPCs for v2 multi-node orchestration:

- SetupEdgeRoute: provisions TLS cert from Metacrypt, resolves backend
  hostname to Tailnet IP, validates it's in 100.64.0.0/10, registers
  L7 route in mc-proxy. Rejects backend_tls=false.
- RemoveEdgeRoute: removes mc-proxy route, cleans up TLS cert, removes
  registry entry.
- ListEdgeRoutes: returns all edge routes with cert serial/expiry.
- HealthCheck: returns agent health and container count.

New database table (migration 4): edge_routes stores hostname, backend
info, and cert paths for persistence across agent restarts.

ProxyRouter gains CertPath/KeyPath helpers for consistent cert path
construction.

Security:
- Backend hostname must resolve to a Tailnet IP (100.64.0.0/10)
- backend_tls=false is rejected (no cleartext to backends)
- Cert provisioning failure fails the setup (no route to missing cert)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.9.0
2026-04-02 13:13:10 -07:00
fa8ba6fac1 Move ARCHITECTURE_V2.md to metacircular docs
The v2 architecture doc is platform-wide (covers master, agents,
edge routing, snapshots, migration across all nodes). Moved to
docs/architecture-v2.md in the metacircular workspace repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 11:09:09 -07:00
f66758b92b Hardcode version in flake.nix
The git fetcher doesn't provide gitDescribe, so the Nix build was
falling through to shortRev and producing commit-hash versions instead
of tag-based ones.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.8.3
2026-03-30 17:46:12 -07:00
09d0d197c3 Add component-level targeting to start, stop, and restart
Allow start/stop/restart to target a single component via
<service>/<component> syntax, matching deploy/logs/purge. When a
component is specified, start/stop skip toggling the service-level
active flag. Agent-side filtering returns NotFound for unknown
components.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 17:26:05 -07:00
52914d50b0 Pass mode, backend-tls, and tls cert/key through route add
The --mode flag was defined but never wired through to the RPC.
Add tls_cert and tls_key fields to AddProxyRouteRequest so L7
routes can be created via mcp route add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.8.2
2026-03-29 20:44:44 -07:00
bb4bee51ba Add mono-repo consideration to ARCHITECTURE_V2.md open questions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 20:40:32 -07:00
4ac8a6d60b Add ARCHITECTURE_V2.md for multi-node master/agent topology
Documents the planned v2 architecture: mcp-master on straylight
coordinates deployments across worker (rift) and edge (svc) nodes.
Includes edge routing flow, agent RPCs, migration plan, and
operational issues from v1 that motivate the redesign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 20:37:24 -07:00
d8f45ca520 Merge explicit ports with route-allocated ports during deploy
Previously, explicit port mappings from the service definition were
ignored when routes were present. Now both are included, allowing
services to have stable external port bindings alongside dynamic
route-allocated ports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.8.1
2026-03-29 19:28:40 -07:00
95f86157b4 Add mcp route command for managing mc-proxy routes
New top-level command with list, add, remove subcommands. Supports
-n/--node to target a specific node. Adds AddProxyRoute and
RemoveProxyRoute RPCs to the agent. Moves route listing from
mcp node routes to mcp route list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.8.0
2026-03-29 19:10:11 -07:00
93e26d3789 Add mcp dns and mcp node routes commands
mcp dns queries MCNS via an agent to list all zones and DNS records.
mcp node routes queries mc-proxy on each node for listener/route status,
matching the mcproxyctl status output format.

New agent RPCs: ListDNSRecords, ListProxyRoutes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.10
2026-03-29 18:51:53 -07:00
3d2edb7c26 Fall back to podman logs when journalctl is inaccessible
Probe journalctl with -n 0 before committing to it. When the journal
is not readable (e.g. rootless podman without user journal storage),
fall back to podman logs instead of streaming the permission error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.9
2026-03-29 17:54:14 -07:00
bf02935716 Add agent version to mcp node list
Thread the linker-injected version string into the Agent struct and
return it in the NodeStatus RPC. The CLI now dials each node and
displays the agent version alongside name and address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.8
2026-03-29 17:49:49 -07:00
c4f0d7be8e Fix mcp logs permission error for rootless podman journald driver
Rootless podman writes container logs to the user journal, but
journalctl without --user only reads the system journal. Add --user
when the agent is running as a non-root user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.7
2026-03-29 16:46:01 -07:00
4d900eafd1 Derive flake version from git rev instead of hardcoding
Eliminates the manual version bump in flake.nix on each release.
Uses self.shortRev (or dirtyShortRev) since self.gitDescribe is not
yet available in this Nix version. Makefile builds still get the full
git describe output via ldflags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.6
2026-03-28 22:48:44 -07:00
38f9070c24 Add top-level mcp edit command
Shortcut for mcp service edit — opens the service definition in $EDITOR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:44:19 -07:00
67d0ab1d9d Bump flake.nix version to 0.7.5
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:40:29 -07:00
7383b370f0 Fix mcp ps showing registry version instead of runtime, error on unknown component
mcp ps now uses the actual container image and version from the runtime
instead of the registry, which could be stale after a failed deploy.

Deploy now returns an error when the component filter matches nothing
instead of silently succeeding with zero results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.5
2026-03-28 19:13:02 -07:00
4c847e6de9 Fix extraneous blank lines in mcp logs output
Skip empty lines from the scanner that result from double newlines
(application slog trailing newline + container runtime newline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.4
2026-03-28 18:22:38 -07:00
14b978861f Add mcp logs command for streaming container logs
New server-streaming Logs RPC streams container output to the CLI.
Supports --tail/-n, --follow/-f, --timestamps/-t, --since.

Detects journald log driver and falls back to journalctl (podman logs
can't read journald outside the originating user session). New containers
default to k8s-file via mcp user's containers.conf.

Also adds stream auth interceptor for the agent gRPC server (required
for streaming RPCs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.3
2026-03-28 17:54:48 -07:00
18365cc0a8 Document system account auth model in ARCHITECTURE.md
Replaces the "admin required for all operations" model with the new
three-tier identity model: human operators for CLI, mcp-agent system
account for infrastructure automation, admin reserved for MCIAS-level
administration. Documents agent-to-service token paths and per-service
authorization policies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:11:08 -07:00
86d516acf6 Drop admin requirement from agent interceptor, reject guests
The agent now accepts any authenticated user or system account, except
those with the guest role. Admin is reserved for MCIAS account management
and policy changes, not routine deploy/stop/start operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.2
2026-03-28 16:07:17 -07:00
dd167b8e0b Auto-login to MCR before image push using CLI token
mcp build and mcp deploy (auto-build path) now authenticate to the
container registry using the CLI's stored MCIAS token before pushing.
MCR accepts JWTs as passwords, so this works with both human and
service account tokens. Falls back silently to existing podman auth.

Eliminates the need for a separate interactive `podman login` step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.1
2026-03-28 15:13:35 -07:00
41437e3730 Use mcdsl/terminal.ReadPassword for secure password input
Replaces raw bufio.Scanner password reading (which echoed to terminal)
with the new mcdsl terminal package that suppresses echo via x/term.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.7.0
2026-03-28 11:11:35 -07:00
cedba9bf83 Fix mcp ps uptime: parse StartedAt from podman ps JSON
List() was not extracting the StartedAt field from podman's JSON
output, so LiveCheck always returned zero timestamps and the CLI
showed "-" for every container's uptime.

podman ps --format json includes StartedAt as a Unix timestamp
(int64). Parse it into ContainerInfo.Started so the existing
LiveCheck → CLI uptime display chain works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:56:39 -07:00
f06ab9aeb6 Bump flake version to 0.6.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:46:23 -07:00
f932dd64cc Add undeploy command: full inverse of deploy
Implements `mcp undeploy <service>` which tears down all infrastructure
for a service: removes mc-proxy routes, DNS records, TLS certificates,
stops and removes containers, releases allocated ports, and marks the
service inactive.

This fills the gap between `stop` (temporary pause) and `purge` (registry
cleanup). Undeploy is the complete teardown that returns the node to the
state before the service was deployed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.6.0
2026-03-27 21:45:42 -07:00
b2eaa69619 flake: install shell completions for mcp
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:41:25 -07:00
43789dd6be Fix route-based port mapping: use hostPort as container port
allocateRoutePorts() was using the route's port field (the mc-proxy
listener port, e.g. 443) as the container internal port in the podman
port mapping. For L7 routes, apps don't listen on the mc-proxy port —
they read $PORT (set to the assigned host port) and listen on that.

The mapping host:53204 → container:443 fails because nothing listens
on 443 inside the container. Fix: use hostPort as both the host and
container port, so $PORT = host port = container port.

Broke mcdoc in production (manually fixed, now permanently fixed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.5.0
2026-03-27 16:50:48 -07:00
2dd0ea93fc Fix ImageExists to use skopeo instead of podman manifest inspect
podman manifest inspect only works for multi-arch manifest lists,
returning exit code 125 for regular single-arch images. Switch to
skopeo inspect which works for both.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:49:48 -07:00
169b3a0d4a Fix EnsureRecord to check all existing records before updating
When multiple A records exist for a service (e.g., LAN and Tailscale
IPs), check all of them for the correct value before attempting an
update. Previously only checked the first record, which could trigger
a 409 conflict if another record already had the target value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:17:19 -07:00
2bda7fc138 Fix DNS record JSON parsing for MCNS response format
MCNS returns records wrapped in {"records": [...]} envelope with
uppercase field names (ID, Name, Type, Value), not bare arrays
with lowercase fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:12:43 -07:00
76247978c2 Fix protoToComponent to include routes in synced components
Routes from the proto ComponentSpec were dropped during sync, causing
the deploy flow to see empty regRoutes and skip cert provisioning,
route registration, and DNS registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:39:26 -07:00
ca3bc736f6 Bump version to v0.5.0 for Phase D release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:33:54 -07:00
9d9ad6588e Phase D: Automated DNS registration via MCNS
Add DNSRegistrar that creates/updates/deletes A records in MCNS
during deploy and stop. When a service has routes, the agent ensures
an A record exists in the configured zone pointing to the node's
address. On stop, the record is removed.

- Add MCNSConfig to agent config (server_url, ca_cert, token_path,
  zone, node_addr) with defaults and env overrides
- Add DNSRegistrar (internal/agent/dns.go): REST client for MCNS
  record CRUD, nil-receiver safe
- Wire into deploy flow (EnsureRecord after route registration)
- Wire into stop flow (RemoveRecord before container stop)
- 7 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:33:41 -07:00
e4d131021e Bump version to v0.4.0 for Phase C release
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.4.0
2026-03-27 13:47:02 -07:00
8d6c060483 Update mc-proxy dependency to v1.2.0, drop replace directive
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:39:41 -07:00
c7e1232f98 Phase C: Automated TLS cert provisioning for L7 routes
Add CertProvisioner that requests TLS certificates from Metacrypt's CA
API during deploy. When a service has L7 routes, the agent checks for
an existing cert, re-issues if missing or within 30 days of expiry,
and writes chain+key to mc-proxy's cert directory before registering
routes.

- Add MetacryptConfig to agent config (server_url, ca_cert, mount,
  issuer, token_path) with defaults and env overrides
- Add CertProvisioner (internal/agent/certs.go): REST client for
  Metacrypt IssueCert, atomic file writes, cert expiry checking
- Wire into Agent struct and deploy flow (before route registration)
- Add hasL7Routes/l7Hostnames helpers in deploy.go
- Fix pre-existing lint issues: unreachable code in portalloc.go,
  gofmt in servicedef.go, gosec suppressions, golangci v2 config
- Update vendored mc-proxy to fix protobuf init panic
- 10 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:31:11 -07:00
572d2fb196 Regenerate proto files for mc/ module path
Raw descriptor bytes in .pb.go files were corrupted by the sed-based
module path rename (string length changed, breaking protobuf binary
encoding). Regenerated with protoc to fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:54:40 -07:00
c6a84a1b80 Bump flake.nix version to match latest tag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:16:45 -07:00
08b3e2a472 Migrate module path from kyle/ to mc/ org
All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0,
mc-proxy to v1.1.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.3.0
2026-03-27 02:07:42 -07:00
6e30cf12f2 Mark Phase B complete in PROGRESS_V1.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:36:50 -07:00
c28562dbcf Merge pull request 'Phase B: Agent registers routes with mc-proxy on deploy' (#2) from phase-b-route-registration into master 2026-03-27 08:36:25 +00:00
84c487e7f8 Phase B: Agent registers routes with mc-proxy on deploy
The agent connects to mc-proxy via Unix socket and automatically
registers/removes routes during deploy and stop. This eliminates
manual mcproxyctl usage or TOML editing.

- New ProxyRouter abstraction wraps mc-proxy client library
- Deploy: after container starts, registers routes with mc-proxy
  using host ports from the registry
- Stop: removes routes from mc-proxy before stopping container
- Config: [mcproxy] section with socket path and cert_dir
- Nil-safe: if mc-proxy socket not configured, route registration
  is silently skipped (backward compatible)
- L7 routes use certs from convention path (<cert_dir>/<service>.pem)
- L4 routes use TLS passthrough (backend_tls=true)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:35:06 -07:00
8b1c89fdc9 Add mcp build command and deploy auto-build
Extends MCP to own the full build-push-deploy lifecycle. When deploying,
the CLI checks whether each component's image tag exists in the registry
and builds/pushes automatically if missing and build config is present.

- Add Build, Push, ImageExists to runtime.Runtime interface (podman impl)
- Add mcp build <service>[/<image>] command
- Add [build] section to CLI config (workspace path)
- Add path and [build.images] to service definitions
- Wire auto-build into mcp deploy before agent RPC
- Update ARCHITECTURE.md with runtime interface and deploy auto-build docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v0.2.0
2026-03-27 01:34:25 -07:00
d7f18a5d90 Add Platform Evolution tracking to PROGRESS_V1.md
Phase A complete: route declarations, port allocation, $PORT env vars.
Phase B in progress: agent mc-proxy route registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:25:26 -07:00
5a802bceb6 Merge pull request 'Add route declarations and automatic port allocation' (#1) from mcp-routes-port-allocation into master 2026-03-27 08:16:20 +00:00
777ba8a0e1 Add route declarations and automatic port allocation to MCP agent
Service definitions can now declare routes per component instead of
manual port mappings:

  [[components.routes]]
  name = "rest"
  port = 8443
  mode = "l4"

The agent allocates free host ports at deploy time and injects
$PORT/$PORT_<NAME> env vars into containers. Backward compatible:
components with old-style ports= work unchanged.

Changes:
- Proto: RouteSpec message, routes + env fields on ComponentSpec
- Servicedef: RouteDef parsing and validation from TOML
- Registry: component_routes table with host_port tracking
- Runtime: Env field on ContainerSpec, -e flag in BuildRunArgs
- Agent: PortAllocator (random 10000-60000, availability check),
  deploy wiring for route→port mapping and env injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:04:47 -07:00
503c52dc26 Update service definition example for convention-driven format
Drop uses_mcdsl, full image URLs, ports, network, user, restart.
Add route declarations and service-level version. Image names and
most config are now derived from conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 00:19:12 -07:00
6465da3547 Add build and release lifecycle to ARCHITECTURE.md
Service definitions now include [build] config (path, uses_mcdsl,
images) so MCP owns the full build-push-deploy lifecycle, replacing
mcdeploy.toml. Documents mcp build, mcp sync auto-build, image
versioning policy (explicit tags, never :latest), and workspace
convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:31:05 -07:00