Commit Graph

14 Commits

Author SHA1 Message Date
Kyle Isom
4a55972455 monitor: see unikernel VMs + use canonical container naming
Two drift-reporting bugs:
1. The monitor listed only the podman runtime, so unikernel VMs always
   showed observed=unknown (false drift). It now takes a ContainerLister
   and the agent passes a merged lister (containers + VMs), mirroring
   listAllContainers.
2. The monitor computed the lookup name as service+"-"+component, which
   is wrong when component==service (the name collapses to just the
   service, e.g. "uktest"/"mc-proxy"). It now uses the canonical
   naming.ContainerNameFor — extracted to a shared package so the agent
   and monitor can't disagree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 12:48:31 -07:00
Kyle Isom
47ec4e60ad unikernel: isolated host-only bridge networking (Phase 2)
When the mcp-br0 bridge exists, the agent runs unikernels on it instead
of QEMU user-mode networking: each VM gets a TAP device on the bridge
and a static 10.99.0.0/24 IP (baked into the Nanos image via ops
RunConfig). With the host firewall dropping off-bridge VM traffic and no
NAT, a VM can reach only the gateway -- making mc-proxy mediation
mandatory by topology rather than convention.

- runtime/qemu.go: bridge mode (createTAP/destroyTAP, IP allocator,
  deterministic MAC, static-IP ops config, VMAddr for proxy backends).
- agent auto-enables bridge mode when /sys/class/net/mcp-br0 exists.

Verified on straylight: uktest unikernel boots on mcp-br0 at 10.99.0.2,
serves via the gateway, TAP enslaved to the bridge; bridge has no uplink
and off-bridge forwarding is dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 01:07:49 -07:00
Kyle Isom
d56f224359 Add unikernel runtime: run services as Nanos VMs under QEMU/KVM
Implements the hypervisor design's Phase 1: a second runtime.Runtime
backend (QEMU) that runs each service component as a Nanos unikernel VM
instead of a podman container, selected per-component via a new
runtime = "unikernel" service-def field.

- internal/runtime/qemu.go: QEMURuntime. Pull extracts the ELF from the
  OCI image; Run does `ops build` + boots qemu-system-x86_64 with KVM,
  user-mode net port-forwards, QMP control socket and serial console log;
  Stop/Remove/Inspect/List/Logs map onto VM lifecycle + state dir.
- proto/registry/servicedef: add runtime, memory_mb, vcpus fields
  (registry migration 5).
- agent: holds both runtimes; runtimeFor() selects per component;
  listAllContainers() merges containers + VMs so drift/status see both.
  Unikernel runtime auto-enables on nodes with /dev/kvm + ops.

Validated end-to-end on straylight: a test service deploys via
`mcp deploy --direct`, boots as a Nanos unikernel, serves HTTP through
the agent port-forward, and reports running via `mcp status`/`mcp logs`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 00:54:49 -07:00
3b08caaa0a Add [master] config section to agent for registration
Heartbeat client now reads master connection settings from the agent
config ([master] section) with env var fallback. Includes address,
ca_cert, token_path, and role fields.

Agent's Run() creates and starts the heartbeat client automatically
when [master] is configured.

Tested on all three nodes: rift (master), svc (edge), orion (worker)
all registered with the master and sending heartbeats every 30s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 13:53:15 -07:00
fa4d022bc1 Add boot sequencing to agent
The agent reads [[boot.sequence]] stages from its config and starts
services in dependency order before accepting gRPC connections. Each
stage waits for its services to pass health checks before proceeding:

- tcp: TCP connect to the container's mapped port
- grpc: standard gRPC health check

Foundation stage (stage 0): blocks and retries indefinitely if health
fails — all downstream services depend on it.
Non-foundation stages: log warning and proceed on failure.

Uses the recover logic to start containers from the registry, then
health-checks to verify readiness.

Config example:
  [[boot.sequence]]
  name = "foundation"
  services = ["mcias", "mcns"]
  timeout = "120s"
  health = "tcp"

Architecture v2 Phase 4 feature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:53:11 -07:00
bf02935716 Add agent version to mcp node list
Thread the linker-injected version string into the Agent struct and
return it in the NodeStatus RPC. The CLI now dials each node and
displays the agent version alongside name and address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:49:49 -07:00
14b978861f Add mcp logs command for streaming container logs
New server-streaming Logs RPC streams container output to the CLI.
Supports --tail/-n, --follow/-f, --timestamps/-t, --since.

Detects journald log driver and falls back to journalctl (podman logs
can't read journald outside the originating user session). New containers
default to k8s-file via mcp user's containers.conf.

Also adds stream auth interceptor for the agent gRPC server (required
for streaming RPCs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:54:48 -07:00
9d9ad6588e Phase D: Automated DNS registration via MCNS
Add DNSRegistrar that creates/updates/deletes A records in MCNS
during deploy and stop. When a service has routes, the agent ensures
an A record exists in the configured zone pointing to the node's
address. On stop, the record is removed.

- Add MCNSConfig to agent config (server_url, ca_cert, token_path,
  zone, node_addr) with defaults and env overrides
- Add DNSRegistrar (internal/agent/dns.go): REST client for MCNS
  record CRUD, nil-receiver safe
- Wire into deploy flow (EnsureRecord after route registration)
- Wire into stop flow (RemoveRecord before container stop)
- 7 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:33:41 -07:00
c7e1232f98 Phase C: Automated TLS cert provisioning for L7 routes
Add CertProvisioner that requests TLS certificates from Metacrypt's CA
API during deploy. When a service has L7 routes, the agent checks for
an existing cert, re-issues if missing or within 30 days of expiry,
and writes chain+key to mc-proxy's cert directory before registering
routes.

- Add MetacryptConfig to agent config (server_url, ca_cert, mount,
  issuer, token_path) with defaults and env overrides
- Add CertProvisioner (internal/agent/certs.go): REST client for
  Metacrypt IssueCert, atomic file writes, cert expiry checking
- Wire into Agent struct and deploy flow (before route registration)
- Add hasL7Routes/l7Hostnames helpers in deploy.go
- Fix pre-existing lint issues: unreachable code in portalloc.go,
  gofmt in servicedef.go, gosec suppressions, golangci v2 config
- Update vendored mc-proxy to fix protobuf init panic
- 10 new tests, make all passes with 0 issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:31:11 -07:00
08b3e2a472 Migrate module path from kyle/ to mc/ org
All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0,
mc-proxy to v1.1.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:07:42 -07:00
84c487e7f8 Phase B: Agent registers routes with mc-proxy on deploy
The agent connects to mc-proxy via Unix socket and automatically
registers/removes routes during deploy and stop. This eliminates
manual mcproxyctl usage or TOML editing.

- New ProxyRouter abstraction wraps mc-proxy client library
- Deploy: after container starts, registers routes with mc-proxy
  using host ports from the registry
- Stop: removes routes from mc-proxy before stopping container
- Config: [mcproxy] section with socket path and cert_dir
- Nil-safe: if mc-proxy socket not configured, route registration
  is silently skipped (backward compatible)
- L7 routes use certs from convention path (<cert_dir>/<service>.pem)
- L4 routes use TLS passthrough (backend_tls=true)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:35:06 -07:00
777ba8a0e1 Add route declarations and automatic port allocation to MCP agent
Service definitions can now declare routes per component instead of
manual port mappings:

  [[components.routes]]
  name = "rest"
  port = 8443
  mode = "l4"

The agent allocates free host ports at deploy time and injects
$PORT/$PORT_<NAME> env vars into containers. Backward compatible:
components with old-style ports= work unchanged.

Changes:
- Proto: RouteSpec message, routes + env fields on ComponentSpec
- Servicedef: RouteDef parsing and validation from TOML
- Registry: component_routes table with host_port tracking
- Runtime: Env field on ContainerSpec, -e flag in BuildRunArgs
- Agent: PortAllocator (random 10000-60000, availability check),
  deploy wiring for route→port mapping and env injection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 01:04:47 -07:00
941dd7003a Fix design-vs-implementation gaps found in verification
Critical fixes:
- Wire monitor subsystem to agent startup (was dead code)
- Implement NodeStatus RPC (disk, memory, CPU, runtime version, uptime)
- Deploy respects active=false (sets desired_state=stopped, not always running)

Medium fixes:
- Add Started field to runtime.ContainerInfo, populate from podman inspect
- Populate ComponentInfo.started in status handlers for uptime display
- Add Monitor field to Agent struct for graceful shutdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:29:04 -07:00
53535f1e96 P2.1 + P3.1: Agent skeleton and CLI skeleton
Agent (P2.1): Agent struct with registry DB, runtime, and logger.
gRPC server with TLS 1.3 and MCIAS auth interceptor. Graceful
shutdown on SIGINT/SIGTERM. All RPCs return Unimplemented until
handlers are built in P2.2-P2.9.

CLI (P3.1): Full command tree with all 15 subcommands as stubs
(login, deploy, stop, start, restart, list, ps, status, sync,
adopt, service show/edit/export, push, pull, node list/add/remove).
gRPC dial helper with TLS, CA cert, and bearer token attachment.

Both gates for parallel Phase 2+3 work are now open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:51:03 -07:00