RouteDef gains Public field (bool) for edge routing. ServiceDef gains
Tier field. Node validation relaxed: defaults to tier=worker when both
node and tier are empty (v2 compatibility).
ToProto/FromProto updated to round-trip all new fields. Without this,
public=true in TOML was silently dropped and edge routing never triggered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node addresses may be Tailscale DNS names (e.g., rift.scylla-hammerhead.ts.net:9444)
but MCNS needs an IPv4 address for A records. The master now resolves
the hostname via net.LookupHost before passing it to the DNS client.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New cmd/mcp-master/ entry point following the agent pattern:
cobra CLI with --config, version, and server commands.
Makefile: add mcp-master target, update all and clean targets.
Example config: deploy/examples/mcp-master.toml with all sections.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Master struct with Run() lifecycle following the agent pattern exactly:
open DB → bootstrap nodes → create agent pool → DNS client → TLS →
auth interceptor → gRPC server → signal handler.
RPC handlers:
- Deploy: place service (tier-aware), forward to agent, register DNS
with Tailnet IP, detect public routes, validate against allowed
domains, coordinate edge routing via SetupEdgeRoute, record placement
and edge routes in master DB, return structured per-step results.
- Undeploy: undeploy on worker first, then remove edge routes, DNS,
and DB records. Best-effort cleanup on failure.
- Status: query agents for service status, aggregate with placements
and edge route info from master DB.
- ListNodes: return all nodes with placement counts.
Placement algorithm: fewest services, ties broken alphabetically.
DNS client: extracted from agent's DNSRegistrar with explicit nodeAddr
parameter (master registers for different nodes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AgentClient wraps a gRPC connection to a single agent with typed
forwarding methods (Deploy, UndeployService, SetupEdgeRoute, etc.).
AgentPool manages connections to multiple agents keyed by node name.
Follows the same TLS 1.3 + token interceptor pattern as cmd/mcp/dial.go
but runs server-side with the master's own MCIAS service token.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New internal/masterdb/ package for mcp-master cluster state. Separate
from the agent's registry because the schemas are fundamentally
different (cluster-wide placement vs node-local containers).
Tables: nodes, placements, edge_routes. Full CRUD with tests.
Follows the same Open/migrate pattern as internal/registry/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary CLI commands for testing edge routing RPCs directly
(before the master exists):
mcp edge list -n svc
mcp edge setup <hostname> -n svc --backend-hostname ... --backend-port ...
mcp edge remove <hostname> -n svc
Verified end-to-end on svc: setup provisions route in mc-proxy and
persists in agent registry, remove cleans up both, list shows routes
with cert metadata.
Finding: MCNS registers LAN IPs for .svc.mcp. hostnames, not Tailnet
IPs. The v2 master needs to register Tailnet IPs in deploy flow step 3.
These commands will be removed or replaced when the master is built
(Phase 3).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New agent RPCs for v2 multi-node orchestration:
- SetupEdgeRoute: provisions TLS cert from Metacrypt, resolves backend
hostname to Tailnet IP, validates it's in 100.64.0.0/10, registers
L7 route in mc-proxy. Rejects backend_tls=false.
- RemoveEdgeRoute: removes mc-proxy route, cleans up TLS cert, removes
registry entry.
- ListEdgeRoutes: returns all edge routes with cert serial/expiry.
- HealthCheck: returns agent health and container count.
New database table (migration 4): edge_routes stores hostname, backend
info, and cert paths for persistence across agent restarts.
ProxyRouter gains CertPath/KeyPath helpers for consistent cert path
construction.
Security:
- Backend hostname must resolve to a Tailnet IP (100.64.0.0/10)
- backend_tls=false is rejected (no cleartext to backends)
- Cert provisioning failure fails the setup (no route to missing cert)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The v2 architecture doc is platform-wide (covers master, agents,
edge routing, snapshots, migration across all nodes). Moved to
docs/architecture-v2.md in the metacircular workspace repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The git fetcher doesn't provide gitDescribe, so the Nix build was
falling through to shortRev and producing commit-hash versions instead
of tag-based ones.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow start/stop/restart to target a single component via
<service>/<component> syntax, matching deploy/logs/purge. When a
component is specified, start/stop skip toggling the service-level
active flag. Agent-side filtering returns NotFound for unknown
components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --mode flag was defined but never wired through to the RPC.
Add tls_cert and tls_key fields to AddProxyRouteRequest so L7
routes can be created via mcp route add.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the planned v2 architecture: mcp-master on straylight
coordinates deployments across worker (rift) and edge (svc) nodes.
Includes edge routing flow, agent RPCs, migration plan, and
operational issues from v1 that motivate the redesign.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, explicit port mappings from the service definition were
ignored when routes were present. Now both are included, allowing
services to have stable external port bindings alongside dynamic
route-allocated ports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New top-level command with list, add, remove subcommands. Supports
-n/--node to target a specific node. Adds AddProxyRoute and
RemoveProxyRoute RPCs to the agent. Moves route listing from
mcp node routes to mcp route list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcp dns queries MCNS via an agent to list all zones and DNS records.
mcp node routes queries mc-proxy on each node for listener/route status,
matching the mcproxyctl status output format.
New agent RPCs: ListDNSRecords, ListProxyRoutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Probe journalctl with -n 0 before committing to it. When the journal
is not readable (e.g. rootless podman without user journal storage),
fall back to podman logs instead of streaming the permission error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread the linker-injected version string into the Agent struct and
return it in the NodeStatus RPC. The CLI now dials each node and
displays the agent version alongside name and address.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rootless podman writes container logs to the user journal, but
journalctl without --user only reads the system journal. Add --user
when the agent is running as a non-root user.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the manual version bump in flake.nix on each release.
Uses self.shortRev (or dirtyShortRev) since self.gitDescribe is not
yet available in this Nix version. Makefile builds still get the full
git describe output via ldflags.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcp ps now uses the actual container image and version from the runtime
instead of the registry, which could be stale after a failed deploy.
Deploy now returns an error when the component filter matches nothing
instead of silently succeeding with zero results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip empty lines from the scanner that result from double newlines
(application slog trailing newline + container runtime newline).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New server-streaming Logs RPC streams container output to the CLI.
Supports --tail/-n, --follow/-f, --timestamps/-t, --since.
Detects journald log driver and falls back to journalctl (podman logs
can't read journald outside the originating user session). New containers
default to k8s-file via mcp user's containers.conf.
Also adds stream auth interceptor for the agent gRPC server (required
for streaming RPCs).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the "admin required for all operations" model with the new
three-tier identity model: human operators for CLI, mcp-agent system
account for infrastructure automation, admin reserved for MCIAS-level
administration. Documents agent-to-service token paths and per-service
authorization policies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The agent now accepts any authenticated user or system account, except
those with the guest role. Admin is reserved for MCIAS account management
and policy changes, not routine deploy/stop/start operations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcp build and mcp deploy (auto-build path) now authenticate to the
container registry using the CLI's stored MCIAS token before pushing.
MCR accepts JWTs as passwords, so this works with both human and
service account tokens. Falls back silently to existing podman auth.
Eliminates the need for a separate interactive `podman login` step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces raw bufio.Scanner password reading (which echoed to terminal)
with the new mcdsl terminal package that suppresses echo via x/term.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
List() was not extracting the StartedAt field from podman's JSON
output, so LiveCheck always returned zero timestamps and the CLI
showed "-" for every container's uptime.
podman ps --format json includes StartedAt as a Unix timestamp
(int64). Parse it into ContainerInfo.Started so the existing
LiveCheck → CLI uptime display chain works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements `mcp undeploy <service>` which tears down all infrastructure
for a service: removes mc-proxy routes, DNS records, TLS certificates,
stops and removes containers, releases allocated ports, and marks the
service inactive.
This fills the gap between `stop` (temporary pause) and `purge` (registry
cleanup). Undeploy is the complete teardown that returns the node to the
state before the service was deployed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
allocateRoutePorts() was using the route's port field (the mc-proxy
listener port, e.g. 443) as the container internal port in the podman
port mapping. For L7 routes, apps don't listen on the mc-proxy port —
they read $PORT (set to the assigned host port) and listen on that.
The mapping host:53204 → container:443 fails because nothing listens
on 443 inside the container. Fix: use hostPort as both the host and
container port, so $PORT = host port = container port.
Broke mcdoc in production (manually fixed, now permanently fixed).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
podman manifest inspect only works for multi-arch manifest lists,
returning exit code 125 for regular single-arch images. Switch to
skopeo inspect which works for both.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When multiple A records exist for a service (e.g., LAN and Tailscale
IPs), check all of them for the correct value before attempting an
update. Previously only checked the first record, which could trigger
a 409 conflict if another record already had the target value.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MCNS returns records wrapped in {"records": [...]} envelope with
uppercase field names (ID, Name, Type, Value), not bare arrays
with lowercase fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Routes from the proto ComponentSpec were dropped during sync, causing
the deploy flow to see empty regRoutes and skip cert provisioning,
route registration, and DNS registration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add DNSRegistrar that creates/updates/deletes A records in MCNS
during deploy and stop. When a service has routes, the agent ensures
an A record exists in the configured zone pointing to the node's
address. On stop, the record is removed.
- Add MCNSConfig to agent config (server_url, ca_cert, token_path,
zone, node_addr) with defaults and env overrides
- Add DNSRegistrar (internal/agent/dns.go): REST client for MCNS
record CRUD, nil-receiver safe
- Wire into deploy flow (EnsureRecord after route registration)
- Wire into stop flow (RemoveRecord before container stop)
- 7 new tests, make all passes with 0 issues
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CertProvisioner that requests TLS certificates from Metacrypt's CA
API during deploy. When a service has L7 routes, the agent checks for
an existing cert, re-issues if missing or within 30 days of expiry,
and writes chain+key to mc-proxy's cert directory before registering
routes.
- Add MetacryptConfig to agent config (server_url, ca_cert, mount,
issuer, token_path) with defaults and env overrides
- Add CertProvisioner (internal/agent/certs.go): REST client for
Metacrypt IssueCert, atomic file writes, cert expiry checking
- Wire into Agent struct and deploy flow (before route registration)
- Add hasL7Routes/l7Hostnames helpers in deploy.go
- Fix pre-existing lint issues: unreachable code in portalloc.go,
gofmt in servicedef.go, gosec suppressions, golangci v2 config
- Update vendored mc-proxy to fix protobuf init panic
- 10 new tests, make all passes with 0 issues
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Raw descriptor bytes in .pb.go files were corrupted by the sed-based
module path rename (string length changed, breaking protobuf binary
encoding). Regenerated with protoc to fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>