Document system account auth model in ARCHITECTURE.md
Replaces the "admin required for all operations" model with the new three-tier identity model: human operators for CLI, mcp-agent system account for infrastructure automation, admin reserved for MCIAS-level administration. Documents agent-to-service token paths and per-service authorization policies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
151
ARCHITECTURE.md
151
ARCHITECTURE.md
@@ -121,9 +121,26 @@ option for future security hardening.
|
||||
## Authentication and Authorization
|
||||
|
||||
MCP follows the platform authentication model: all auth is delegated to
|
||||
MCIAS.
|
||||
MCIAS. The auth model separates three concerns: operator intent (CLI to
|
||||
agent), infrastructure automation (agent to platform services), and
|
||||
access control (who can do what).
|
||||
|
||||
### Agent Authentication
|
||||
### Identity Model
|
||||
|
||||
| Identity | Type | Purpose |
|
||||
|----------|------|---------|
|
||||
| Human operator (e.g., `kyle`) | human | CLI operations: deploy, stop, start, build |
|
||||
| `mcp-agent` | system | Agent-to-service automation: certs, DNS, routes, image pull |
|
||||
| Per-service accounts (e.g., `mcq`) | system | Scoped self-management (own DNS records only) |
|
||||
| `admin` role | role | MCIAS account management, policy changes, zone creation |
|
||||
| `guest` role | role | Explicitly rejected by the agent |
|
||||
|
||||
The `admin` role is reserved for MCIAS-level administrative operations
|
||||
(account creation, policy management, zone mutations). Routine MCP
|
||||
operations (deploy, stop, start, build) do not require admin — any
|
||||
authenticated non-guest user or system account is accepted.
|
||||
|
||||
### Agent Authentication (CLI → Agent)
|
||||
|
||||
The agent is a gRPC server with a unary interceptor that enforces
|
||||
authentication on every RPC:
|
||||
@@ -132,10 +149,34 @@ authentication on every RPC:
|
||||
(`authorization: Bearer <token>`).
|
||||
2. Agent extracts the token and validates it against MCIAS (cached 30s by
|
||||
SHA-256 of the token, per platform convention).
|
||||
3. Agent checks that the caller has the `admin` role. All MCP operations
|
||||
require admin -- there is no unprivileged MCP access.
|
||||
3. Agent rejects guests (`guest` role → `PERMISSION_DENIED`). All other
|
||||
authenticated users and system accounts are accepted.
|
||||
4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired
|
||||
token) or `PERMISSION_DENIED` (valid token, not admin).
|
||||
token) or `PERMISSION_DENIED` (guest).
|
||||
|
||||
### Agent Service Authentication (Agent → Platform Services)
|
||||
|
||||
The agent authenticates to platform services using a long-lived system
|
||||
account token (`mcp-agent`). Each service has its own token file:
|
||||
|
||||
| Service | Token Path | Operations |
|
||||
|---------|------------|------------|
|
||||
| Metacrypt | `/srv/mcp/metacrypt-token` | TLS cert provisioning (PKI issue) |
|
||||
| MCNS | `/srv/mcp/mcns-token` | DNS record create/delete (any name) |
|
||||
| mc-proxy | Unix socket (no auth) | Route registration/removal |
|
||||
| MCR | podman auth store | Image pull (JWT-as-password) |
|
||||
|
||||
These tokens are issued by MCIAS for the `mcp-agent` system account.
|
||||
They carry no roles — authorization is handled by each service's policy
|
||||
engine:
|
||||
|
||||
- **Metacrypt:** Policy rule grants `mcp-agent` write access to
|
||||
`engine/pki/issue`.
|
||||
- **MCNS:** Code-level authorization: system account `mcp-agent` can
|
||||
manage any record; other system accounts can only manage records
|
||||
matching their username.
|
||||
- **MCR:** Default policy allows all authenticated users to push/pull.
|
||||
MCR accepts MCIAS JWTs as passwords at the `/v2/token` endpoint.
|
||||
|
||||
### CLI Authentication
|
||||
|
||||
@@ -148,6 +189,15 @@ obtained by:
|
||||
|
||||
The stored token is used for all subsequent agent RPCs until it expires.
|
||||
|
||||
### MCR Registry Authentication
|
||||
|
||||
`mcp build` auto-authenticates to MCR before pushing images. It reads
|
||||
the CLI's stored MCIAS token and uses it as the password for `podman
|
||||
login`. MCR's token endpoint accepts MCIAS JWTs as passwords (the
|
||||
personal-access-token pattern), so both human and system account tokens
|
||||
work. This eliminates the need for a separate interactive `podman login`
|
||||
step.
|
||||
|
||||
---
|
||||
|
||||
## Services and Components
|
||||
@@ -224,6 +274,9 @@ mcp pull <service> <path> [local-file] Copy a file from /srv/<service>/<path> to
|
||||
mcp node list List registered nodes
|
||||
mcp node add <name> <address> Register a node
|
||||
mcp node remove <name> Deregister a node
|
||||
|
||||
mcp agent upgrade [node] Build, push, and restart agent on all (or one) node(s)
|
||||
mcp agent status Show agent version on each node
|
||||
```
|
||||
|
||||
### Service Definition Files
|
||||
@@ -1144,20 +1197,84 @@ The agent's data directory follows the platform convention:
|
||||
|
||||
### Agent Deployment (on nodes)
|
||||
|
||||
The agent is deployed like any other Metacircular service:
|
||||
#### Provisioning (one-time per node)
|
||||
|
||||
1. Provision the `mcp` system user via NixOS config (with podman access
|
||||
and subuid/subgid ranges for rootless containers).
|
||||
Each node needs a one-time setup before the agent can run. The steps are
|
||||
the same regardless of OS, but the mechanism differs:
|
||||
|
||||
1. Create `mcp` system user with podman access and subuid/subgid ranges.
|
||||
2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages
|
||||
`/srv/<service>/` directories for all services).
|
||||
3. Create `/srv/mcp/` directory and config file.
|
||||
4. Provision TLS certificate from Metacrypt.
|
||||
5. Create an MCIAS system account for the agent (`mcp-agent`).
|
||||
6. Install the `mcp-agent` binary.
|
||||
7. Start via systemd unit.
|
||||
6. Install the initial `mcp-agent` binary to `/srv/mcp/mcp-agent`.
|
||||
7. Install and start the systemd unit.
|
||||
|
||||
The agent runs as a systemd service. Container-first deployment is a v2
|
||||
concern -- MCP needs to be running before it can manage its own agent.
|
||||
On **NixOS** (rift), provisioning is declarative via the NixOS config.
|
||||
The NixOS config owns the infrastructure (user, systemd unit, podman,
|
||||
directories, permissions) but **not** the binary. `ExecStart` points to
|
||||
`/srv/mcp/mcp-agent`, a mutable path that MCP manages. NixOS may
|
||||
bootstrap the initial binary there, but subsequent updates come from MCP.
|
||||
|
||||
On **Debian** (hyperborea, svc), provisioning is done via a setup script
|
||||
or ansible playbook that creates the same layout.
|
||||
|
||||
#### Binary Location
|
||||
|
||||
The agent binary lives at `/srv/mcp/mcp-agent` on **all** nodes,
|
||||
regardless of OS. This unifies the update mechanism across the fleet.
|
||||
|
||||
#### Agent Upgrades
|
||||
|
||||
After initial provisioning, the agent binary is updated via
|
||||
`mcp agent upgrade`. The CLI:
|
||||
|
||||
1. Cross-compiles the agent for each target architecture
|
||||
(`GOARCH=amd64` for rift/svc, `GOARCH=arm64` for hyperborea).
|
||||
2. SSHs to each node, pushes the binary to `/srv/mcp/mcp-agent.new`.
|
||||
3. Atomically swaps the binary (`mv mcp-agent.new mcp-agent`).
|
||||
4. Restarts the systemd service (`systemctl restart mcp-agent`).
|
||||
|
||||
SSH is used instead of gRPC because:
|
||||
- It works even when the agent is broken or has an incompatible version.
|
||||
- The binary is ~17MB, which exceeds gRPC default message limits.
|
||||
- No self-restart coordination needed.
|
||||
|
||||
The CLI uses `golang.org/x/crypto/ssh` for native SSH, keeping the
|
||||
entire workflow in a single binary with no external tool dependencies.
|
||||
|
||||
#### Node Configuration
|
||||
|
||||
Node config includes SSH and architecture info for agent management:
|
||||
|
||||
```toml
|
||||
[[nodes]]
|
||||
name = "rift"
|
||||
address = "100.95.252.120:9444"
|
||||
ssh = "rift" # SSH host (from ~/.ssh/config or hostname)
|
||||
arch = "amd64" # GOARCH for cross-compilation
|
||||
|
||||
[[nodes]]
|
||||
name = "hyperborea"
|
||||
address = "100.x.x.x:9444"
|
||||
ssh = "hyperborea"
|
||||
arch = "arm64"
|
||||
```
|
||||
|
||||
#### Coordinated Upgrades
|
||||
|
||||
New MCP releases often add new RPCs. A CLI at v0.6.0 calling an agent
|
||||
at v0.5.0 fails with `Unimplemented`. Therefore agent upgrades must be
|
||||
coordinated: `mcp agent upgrade` (with no node argument) upgrades all
|
||||
nodes before the CLI is used for other operations.
|
||||
|
||||
If a node fails to upgrade, it is reported but the others still proceed.
|
||||
The operator can retry or investigate via SSH.
|
||||
|
||||
#### Systemd Unit
|
||||
|
||||
The systemd unit is the same on all nodes:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
@@ -1167,7 +1284,7 @@ Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/mcp-agent server --config /srv/mcp/mcp-agent.toml
|
||||
ExecStart=/srv/mcp/mcp-agent server --config /srv/mcp/mcp-agent.toml
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
@@ -1175,17 +1292,14 @@ User=mcp
|
||||
Group=mcp
|
||||
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ProtectSystem=full
|
||||
ProtectHome=false
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectControlGroups=true
|
||||
RestrictSUIDSGID=true
|
||||
RestrictNamespaces=true
|
||||
LockPersonality=true
|
||||
MemoryDenyWriteExecute=true
|
||||
RestrictRealtime=true
|
||||
ReadWritePaths=/srv
|
||||
|
||||
@@ -1195,6 +1309,7 @@ WantedBy=multi-user.target
|
||||
|
||||
Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes
|
||||
files to any service's `/srv/<service>/` directory on behalf of the CLI.
|
||||
`ProtectHome=false` because the `mcp` user's home is `/srv/mcp`.
|
||||
|
||||
### CLI Installation (on operator workstation)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user