diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 54d7100..1b499d0 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -121,9 +121,26 @@ option for future security hardening. ## Authentication and Authorization MCP follows the platform authentication model: all auth is delegated to -MCIAS. +MCIAS. The auth model separates three concerns: operator intent (CLI to +agent), infrastructure automation (agent to platform services), and +access control (who can do what). -### Agent Authentication +### Identity Model + +| Identity | Type | Purpose | +|----------|------|---------| +| Human operator (e.g., `kyle`) | human | CLI operations: deploy, stop, start, build | +| `mcp-agent` | system | Agent-to-service automation: certs, DNS, routes, image pull | +| Per-service accounts (e.g., `mcq`) | system | Scoped self-management (own DNS records only) | +| `admin` role | role | MCIAS account management, policy changes, zone creation | +| `guest` role | role | Explicitly rejected by the agent | + +The `admin` role is reserved for MCIAS-level administrative operations +(account creation, policy management, zone mutations). Routine MCP +operations (deploy, stop, start, build) do not require admin — any +authenticated non-guest user or system account is accepted. + +### Agent Authentication (CLI → Agent) The agent is a gRPC server with a unary interceptor that enforces authentication on every RPC: @@ -132,10 +149,34 @@ authentication on every RPC: (`authorization: Bearer `). 2. Agent extracts the token and validates it against MCIAS (cached 30s by SHA-256 of the token, per platform convention). -3. Agent checks that the caller has the `admin` role. All MCP operations - require admin -- there is no unprivileged MCP access. +3. Agent rejects guests (`guest` role → `PERMISSION_DENIED`). All other + authenticated users and system accounts are accepted. 4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired - token) or `PERMISSION_DENIED` (valid token, not admin). + token) or `PERMISSION_DENIED` (guest). + +### Agent Service Authentication (Agent → Platform Services) + +The agent authenticates to platform services using a long-lived system +account token (`mcp-agent`). Each service has its own token file: + +| Service | Token Path | Operations | +|---------|------------|------------| +| Metacrypt | `/srv/mcp/metacrypt-token` | TLS cert provisioning (PKI issue) | +| MCNS | `/srv/mcp/mcns-token` | DNS record create/delete (any name) | +| mc-proxy | Unix socket (no auth) | Route registration/removal | +| MCR | podman auth store | Image pull (JWT-as-password) | + +These tokens are issued by MCIAS for the `mcp-agent` system account. +They carry no roles — authorization is handled by each service's policy +engine: + +- **Metacrypt:** Policy rule grants `mcp-agent` write access to + `engine/pki/issue`. +- **MCNS:** Code-level authorization: system account `mcp-agent` can + manage any record; other system accounts can only manage records + matching their username. +- **MCR:** Default policy allows all authenticated users to push/pull. + MCR accepts MCIAS JWTs as passwords at the `/v2/token` endpoint. ### CLI Authentication @@ -148,6 +189,15 @@ obtained by: The stored token is used for all subsequent agent RPCs until it expires. +### MCR Registry Authentication + +`mcp build` auto-authenticates to MCR before pushing images. It reads +the CLI's stored MCIAS token and uses it as the password for `podman +login`. MCR's token endpoint accepts MCIAS JWTs as passwords (the +personal-access-token pattern), so both human and system account tokens +work. This eliminates the need for a separate interactive `podman login` +step. + --- ## Services and Components @@ -224,6 +274,9 @@ mcp pull [local-file] Copy a file from /srv// to mcp node list List registered nodes mcp node add
Register a node mcp node remove Deregister a node + +mcp agent upgrade [node] Build, push, and restart agent on all (or one) node(s) +mcp agent status Show agent version on each node ``` ### Service Definition Files @@ -1144,20 +1197,84 @@ The agent's data directory follows the platform convention: ### Agent Deployment (on nodes) -The agent is deployed like any other Metacircular service: +#### Provisioning (one-time per node) -1. Provision the `mcp` system user via NixOS config (with podman access - and subuid/subgid ranges for rootless containers). +Each node needs a one-time setup before the agent can run. The steps are +the same regardless of OS, but the mechanism differs: + +1. Create `mcp` system user with podman access and subuid/subgid ranges. 2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages `/srv//` directories for all services). 3. Create `/srv/mcp/` directory and config file. 4. Provision TLS certificate from Metacrypt. 5. Create an MCIAS system account for the agent (`mcp-agent`). -6. Install the `mcp-agent` binary. -7. Start via systemd unit. +6. Install the initial `mcp-agent` binary to `/srv/mcp/mcp-agent`. +7. Install and start the systemd unit. -The agent runs as a systemd service. Container-first deployment is a v2 -concern -- MCP needs to be running before it can manage its own agent. +On **NixOS** (rift), provisioning is declarative via the NixOS config. +The NixOS config owns the infrastructure (user, systemd unit, podman, +directories, permissions) but **not** the binary. `ExecStart` points to +`/srv/mcp/mcp-agent`, a mutable path that MCP manages. NixOS may +bootstrap the initial binary there, but subsequent updates come from MCP. + +On **Debian** (hyperborea, svc), provisioning is done via a setup script +or ansible playbook that creates the same layout. + +#### Binary Location + +The agent binary lives at `/srv/mcp/mcp-agent` on **all** nodes, +regardless of OS. This unifies the update mechanism across the fleet. + +#### Agent Upgrades + +After initial provisioning, the agent binary is updated via +`mcp agent upgrade`. The CLI: + +1. Cross-compiles the agent for each target architecture + (`GOARCH=amd64` for rift/svc, `GOARCH=arm64` for hyperborea). +2. SSHs to each node, pushes the binary to `/srv/mcp/mcp-agent.new`. +3. Atomically swaps the binary (`mv mcp-agent.new mcp-agent`). +4. Restarts the systemd service (`systemctl restart mcp-agent`). + +SSH is used instead of gRPC because: +- It works even when the agent is broken or has an incompatible version. +- The binary is ~17MB, which exceeds gRPC default message limits. +- No self-restart coordination needed. + +The CLI uses `golang.org/x/crypto/ssh` for native SSH, keeping the +entire workflow in a single binary with no external tool dependencies. + +#### Node Configuration + +Node config includes SSH and architecture info for agent management: + +```toml +[[nodes]] +name = "rift" +address = "100.95.252.120:9444" +ssh = "rift" # SSH host (from ~/.ssh/config or hostname) +arch = "amd64" # GOARCH for cross-compilation + +[[nodes]] +name = "hyperborea" +address = "100.x.x.x:9444" +ssh = "hyperborea" +arch = "arm64" +``` + +#### Coordinated Upgrades + +New MCP releases often add new RPCs. A CLI at v0.6.0 calling an agent +at v0.5.0 fails with `Unimplemented`. Therefore agent upgrades must be +coordinated: `mcp agent upgrade` (with no node argument) upgrades all +nodes before the CLI is used for other operations. + +If a node fails to upgrade, it is reported but the others still proceed. +The operator can retry or investigate via SSH. + +#### Systemd Unit + +The systemd unit is the same on all nodes: ```ini [Unit] @@ -1167,7 +1284,7 @@ Wants=network-online.target [Service] Type=simple -ExecStart=/usr/local/bin/mcp-agent server --config /srv/mcp/mcp-agent.toml +ExecStart=/srv/mcp/mcp-agent server --config /srv/mcp/mcp-agent.toml Restart=on-failure RestartSec=5 @@ -1175,17 +1292,14 @@ User=mcp Group=mcp NoNewPrivileges=true -ProtectSystem=strict -ProtectHome=true +ProtectSystem=full +ProtectHome=false PrivateTmp=true PrivateDevices=true ProtectKernelTunables=true ProtectKernelModules=true -ProtectControlGroups=true RestrictSUIDSGID=true -RestrictNamespaces=true LockPersonality=true -MemoryDenyWriteExecute=true RestrictRealtime=true ReadWritePaths=/srv @@ -1195,6 +1309,7 @@ WantedBy=multi-user.target Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes files to any service's `/srv//` directory on behalf of the CLI. +`ProtectHome=false` because the `mcp` user's home is `/srv/mcp`. ### CLI Installation (on operator workstation)