Document system account auth model in ARCHITECTURE.md

Replaces the "admin required for all operations" model with the new three-tier identity model: human operators for CLI, mcp-agent system account for infrastructure automation, admin reserved for MCIAS-level administration. Documents agent-to-service token paths and per-service authorization policies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:11:08 -07:00
parent 86d516acf6
commit 18365cc0a8
1 changed files with 133 additions and 18 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -121,9 +121,26 @@ option for future security hardening.
 ## Authentication and Authorization
 MCP follows the platform authentication model: all auth is delegated to
-MCIAS.
+MCIAS. The auth model separates three concerns: operator intent (CLI to
 agent), infrastructure automation (agent to platform services), and
 access control (who can do what).
-### Agent Authentication
+### Identity Model
 | Identity | Type | Purpose |
 |----------|------|---------|
 | Human operator (e.g., `kyle`) | human | CLI operations: deploy, stop, start, build |
 | `mcp-agent` | system | Agent-to-service automation: certs, DNS, routes, image pull |
 | Per-service accounts (e.g., `mcq`) | system | Scoped self-management (own DNS records only) |
 | `admin` role | role | MCIAS account management, policy changes, zone creation |
 | `guest` role | role | Explicitly rejected by the agent |
 The `admin` role is reserved for MCIAS-level administrative operations
 (account creation, policy management, zone mutations). Routine MCP
 operations (deploy, stop, start, build) do not require admin — any
 authenticated non-guest user or system account is accepted.
 ### Agent Authentication (CLI → Agent)
 The agent is a gRPC server with a unary interceptor that enforces
 authentication on every RPC:
@@ -132,10 +149,34 @@ authentication on every RPC:
   (`authorization: Bearer <token>`).
 2. Agent extracts the token and validates it against MCIAS (cached 30s by
   SHA-256 of the token, per platform convention).
-3. Agent checks that the caller has the `admin` role. All MCP operations
+3. Agent rejects guests (`guest` role → `PERMISSION_DENIED`). All other
-   require admin -- there is no unprivileged MCP access.
+   authenticated users and system accounts are accepted.
 4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired
-   token) or `PERMISSION_DENIED` (valid token, not admin).
+   token) or `PERMISSION_DENIED` (guest).
 ### Agent Service Authentication (Agent → Platform Services)
 The agent authenticates to platform services using a long-lived system
 account token (`mcp-agent`). Each service has its own token file:
 | Service | Token Path | Operations |
 |---------|------------|------------|
 | Metacrypt | `/srv/mcp/metacrypt-token` | TLS cert provisioning (PKI issue) |
 | MCNS | `/srv/mcp/mcns-token` | DNS record create/delete (any name) |
 | mc-proxy | Unix socket (no auth) | Route registration/removal |
 | MCR | podman auth store | Image pull (JWT-as-password) |
 These tokens are issued by MCIAS for the `mcp-agent` system account.
 They carry no roles — authorization is handled by each service's policy
 engine:
 - **Metacrypt:** Policy rule grants `mcp-agent` write access to
  `engine/pki/issue`.
 - **MCNS:** Code-level authorization: system account `mcp-agent` can
  manage any record; other system accounts can only manage records
  matching their username.
 - **MCR:** Default policy allows all authenticated users to push/pull.
  MCR accepts MCIAS JWTs as passwords at the `/v2/token` endpoint.
 ### CLI Authentication
@@ -148,6 +189,15 @@ obtained by:
 The stored token is used for all subsequent agent RPCs until it expires.
 ### MCR Registry Authentication
 `mcp build` auto-authenticates to MCR before pushing images. It reads
 the CLI's stored MCIAS token and uses it as the password for `podman
 login`. MCR's token endpoint accepts MCIAS JWTs as passwords (the
 personal-access-token pattern), so both human and system account tokens
 work. This eliminates the need for a separate interactive `podman login`
 step.
 ---
 ## Services and Components
@@ -224,6 +274,9 @@ mcp pull <service> <path> [local-file] Copy a file from /srv/<service>/<path> to
 mcp node list                          List registered nodes
 mcp node add <name> <address>          Register a node
 mcp node remove <name>                 Deregister a node
 mcp agent upgrade [node]               Build, push, and restart agent on all (or one) node(s)
 mcp agent status                       Show agent version on each node
 ```
 ### Service Definition Files
@@ -1144,20 +1197,84 @@ The agent's data directory follows the platform convention:
 ### Agent Deployment (on nodes)
-The agent is deployed like any other Metacircular service:
+#### Provisioning (one-time per node)
-1. Provision the `mcp` system user via NixOS config (with podman access
+Each node needs a one-time setup before the agent can run. The steps are
-   and subuid/subgid ranges for rootless containers).
+the same regardless of OS, but the mechanism differs:
 1. Create `mcp` system user with podman access and subuid/subgid ranges.
 2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages
   `/srv/<service>/` directories for all services).
 3. Create `/srv/mcp/` directory and config file.
 4. Provision TLS certificate from Metacrypt.
 5. Create an MCIAS system account for the agent (`mcp-agent`).
-6. Install the `mcp-agent` binary.
+6. Install the initial `mcp-agent` binary to `/srv/mcp/mcp-agent`.
-7. Start via systemd unit.
+7. Install and start the systemd unit.
-The agent runs as a systemd service. Container-first deployment is a v2
+On **NixOS** (rift), provisioning is declarative via the NixOS config.
-concern -- MCP needs to be running before it can manage its own agent.
+The NixOS config owns the infrastructure (user, systemd unit, podman,
 directories, permissions) but **not** the binary. `ExecStart` points to
 `/srv/mcp/mcp-agent`, a mutable path that MCP manages. NixOS may
 bootstrap the initial binary there, but subsequent updates come from MCP.
 On **Debian** (hyperborea, svc), provisioning is done via a setup script
 or ansible playbook that creates the same layout.
 #### Binary Location
 The agent binary lives at `/srv/mcp/mcp-agent` on **all** nodes,
 regardless of OS. This unifies the update mechanism across the fleet.
 #### Agent Upgrades
 After initial provisioning, the agent binary is updated via
 `mcp agent upgrade`. The CLI:
 1. Cross-compiles the agent for each target architecture
   (`GOARCH=amd64` for rift/svc, `GOARCH=arm64` for hyperborea).
 2. SSHs to each node, pushes the binary to `/srv/mcp/mcp-agent.new`.
 3. Atomically swaps the binary (`mv mcp-agent.new mcp-agent`).
 4. Restarts the systemd service (`systemctl restart mcp-agent`).
 SSH is used instead of gRPC because:
 - It works even when the agent is broken or has an incompatible version.
 - The binary is ~17MB, which exceeds gRPC default message limits.
 - No self-restart coordination needed.
 The CLI uses `golang.org/x/crypto/ssh` for native SSH, keeping the
 entire workflow in a single binary with no external tool dependencies.
 #### Node Configuration
 Node config includes SSH and architecture info for agent management:
 ```toml
 [[nodes]]
 name = "rift"
 address = "100.95.252.120:9444"
 ssh = "rift"           # SSH host (from ~/.ssh/config or hostname)
 arch = "amd64"         # GOARCH for cross-compilation
 [[nodes]]
 name = "hyperborea"
 address = "100.x.x.x:9444"
 ssh = "hyperborea"
 arch = "arm64"
 ```
 #### Coordinated Upgrades
 New MCP releases often add new RPCs. A CLI at v0.6.0 calling an agent
 at v0.5.0 fails with `Unimplemented`. Therefore agent upgrades must be
 coordinated: `mcp agent upgrade` (with no node argument) upgrades all
 nodes before the CLI is used for other operations.
 If a node fails to upgrade, it is reported but the others still proceed.
 The operator can retry or investigate via SSH.
 #### Systemd Unit
 The systemd unit is the same on all nodes:
 ```ini
 [Unit]
@@ -1167,7 +1284,7 @@ Wants=network-online.target
 [Service]
 Type=simple
-ExecStart=/usr/local/bin/mcp-agent server --config /srv/mcp/mcp-agent.toml
+ExecStart=/srv/mcp/mcp-agent server --config /srv/mcp/mcp-agent.toml
 Restart=on-failure
 RestartSec=5
@@ -1175,17 +1292,14 @@ User=mcp
 Group=mcp
 NoNewPrivileges=true
-ProtectSystem=strict
+ProtectSystem=full
-ProtectHome=true
+ProtectHome=false
 PrivateTmp=true
 PrivateDevices=true
 ProtectKernelTunables=true
 ProtectKernelModules=true
 ProtectControlGroups=true
 RestrictSUIDSGID=true
 RestrictNamespaces=true
 LockPersonality=true
 MemoryDenyWriteExecute=true
 RestrictRealtime=true
 ReadWritePaths=/srv
@@ -1195,6 +1309,7 @@ WantedBy=multi-user.target
 Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes
 files to any service's `/srv/<service>/` directory on behalf of the CLI.
 `ProtectHome=false` because the `mcp` user's home is `/srv/mcp`.
 ### CLI Installation (on operator workstation)