Document system account auth model in ARCHITECTURE.md
Replaces the "admin required for all operations" model with the new three-tier identity model: human operators for CLI, mcp-agent system account for infrastructure automation, admin reserved for MCIAS-level administration. Documents agent-to-service token paths and per-service authorization policies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
151
ARCHITECTURE.md
151
ARCHITECTURE.md
@@ -121,9 +121,26 @@ option for future security hardening.
|
|||||||
## Authentication and Authorization
|
## Authentication and Authorization
|
||||||
|
|
||||||
MCP follows the platform authentication model: all auth is delegated to
|
MCP follows the platform authentication model: all auth is delegated to
|
||||||
MCIAS.
|
MCIAS. The auth model separates three concerns: operator intent (CLI to
|
||||||
|
agent), infrastructure automation (agent to platform services), and
|
||||||
|
access control (who can do what).
|
||||||
|
|
||||||
### Agent Authentication
|
### Identity Model
|
||||||
|
|
||||||
|
| Identity | Type | Purpose |
|
||||||
|
|----------|------|---------|
|
||||||
|
| Human operator (e.g., `kyle`) | human | CLI operations: deploy, stop, start, build |
|
||||||
|
| `mcp-agent` | system | Agent-to-service automation: certs, DNS, routes, image pull |
|
||||||
|
| Per-service accounts (e.g., `mcq`) | system | Scoped self-management (own DNS records only) |
|
||||||
|
| `admin` role | role | MCIAS account management, policy changes, zone creation |
|
||||||
|
| `guest` role | role | Explicitly rejected by the agent |
|
||||||
|
|
||||||
|
The `admin` role is reserved for MCIAS-level administrative operations
|
||||||
|
(account creation, policy management, zone mutations). Routine MCP
|
||||||
|
operations (deploy, stop, start, build) do not require admin — any
|
||||||
|
authenticated non-guest user or system account is accepted.
|
||||||
|
|
||||||
|
### Agent Authentication (CLI → Agent)
|
||||||
|
|
||||||
The agent is a gRPC server with a unary interceptor that enforces
|
The agent is a gRPC server with a unary interceptor that enforces
|
||||||
authentication on every RPC:
|
authentication on every RPC:
|
||||||
@@ -132,10 +149,34 @@ authentication on every RPC:
|
|||||||
(`authorization: Bearer <token>`).
|
(`authorization: Bearer <token>`).
|
||||||
2. Agent extracts the token and validates it against MCIAS (cached 30s by
|
2. Agent extracts the token and validates it against MCIAS (cached 30s by
|
||||||
SHA-256 of the token, per platform convention).
|
SHA-256 of the token, per platform convention).
|
||||||
3. Agent checks that the caller has the `admin` role. All MCP operations
|
3. Agent rejects guests (`guest` role → `PERMISSION_DENIED`). All other
|
||||||
require admin -- there is no unprivileged MCP access.
|
authenticated users and system accounts are accepted.
|
||||||
4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired
|
4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired
|
||||||
token) or `PERMISSION_DENIED` (valid token, not admin).
|
token) or `PERMISSION_DENIED` (guest).
|
||||||
|
|
||||||
|
### Agent Service Authentication (Agent → Platform Services)
|
||||||
|
|
||||||
|
The agent authenticates to platform services using a long-lived system
|
||||||
|
account token (`mcp-agent`). Each service has its own token file:
|
||||||
|
|
||||||
|
| Service | Token Path | Operations |
|
||||||
|
|---------|------------|------------|
|
||||||
|
| Metacrypt | `/srv/mcp/metacrypt-token` | TLS cert provisioning (PKI issue) |
|
||||||
|
| MCNS | `/srv/mcp/mcns-token` | DNS record create/delete (any name) |
|
||||||
|
| mc-proxy | Unix socket (no auth) | Route registration/removal |
|
||||||
|
| MCR | podman auth store | Image pull (JWT-as-password) |
|
||||||
|
|
||||||
|
These tokens are issued by MCIAS for the `mcp-agent` system account.
|
||||||
|
They carry no roles — authorization is handled by each service's policy
|
||||||
|
engine:
|
||||||
|
|
||||||
|
- **Metacrypt:** Policy rule grants `mcp-agent` write access to
|
||||||
|
`engine/pki/issue`.
|
||||||
|
- **MCNS:** Code-level authorization: system account `mcp-agent` can
|
||||||
|
manage any record; other system accounts can only manage records
|
||||||
|
matching their username.
|
||||||
|
- **MCR:** Default policy allows all authenticated users to push/pull.
|
||||||
|
MCR accepts MCIAS JWTs as passwords at the `/v2/token` endpoint.
|
||||||
|
|
||||||
### CLI Authentication
|
### CLI Authentication
|
||||||
|
|
||||||
@@ -148,6 +189,15 @@ obtained by:
|
|||||||
|
|
||||||
The stored token is used for all subsequent agent RPCs until it expires.
|
The stored token is used for all subsequent agent RPCs until it expires.
|
||||||
|
|
||||||
|
### MCR Registry Authentication
|
||||||
|
|
||||||
|
`mcp build` auto-authenticates to MCR before pushing images. It reads
|
||||||
|
the CLI's stored MCIAS token and uses it as the password for `podman
|
||||||
|
login`. MCR's token endpoint accepts MCIAS JWTs as passwords (the
|
||||||
|
personal-access-token pattern), so both human and system account tokens
|
||||||
|
work. This eliminates the need for a separate interactive `podman login`
|
||||||
|
step.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Services and Components
|
## Services and Components
|
||||||
@@ -224,6 +274,9 @@ mcp pull <service> <path> [local-file] Copy a file from /srv/<service>/<path> to
|
|||||||
mcp node list List registered nodes
|
mcp node list List registered nodes
|
||||||
mcp node add <name> <address> Register a node
|
mcp node add <name> <address> Register a node
|
||||||
mcp node remove <name> Deregister a node
|
mcp node remove <name> Deregister a node
|
||||||
|
|
||||||
|
mcp agent upgrade [node] Build, push, and restart agent on all (or one) node(s)
|
||||||
|
mcp agent status Show agent version on each node
|
||||||
```
|
```
|
||||||
|
|
||||||
### Service Definition Files
|
### Service Definition Files
|
||||||
@@ -1144,20 +1197,84 @@ The agent's data directory follows the platform convention:
|
|||||||
|
|
||||||
### Agent Deployment (on nodes)
|
### Agent Deployment (on nodes)
|
||||||
|
|
||||||
The agent is deployed like any other Metacircular service:
|
#### Provisioning (one-time per node)
|
||||||
|
|
||||||
1. Provision the `mcp` system user via NixOS config (with podman access
|
Each node needs a one-time setup before the agent can run. The steps are
|
||||||
and subuid/subgid ranges for rootless containers).
|
the same regardless of OS, but the mechanism differs:
|
||||||
|
|
||||||
|
1. Create `mcp` system user with podman access and subuid/subgid ranges.
|
||||||
2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages
|
2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages
|
||||||
`/srv/<service>/` directories for all services).
|
`/srv/<service>/` directories for all services).
|
||||||
3. Create `/srv/mcp/` directory and config file.
|
3. Create `/srv/mcp/` directory and config file.
|
||||||
4. Provision TLS certificate from Metacrypt.
|
4. Provision TLS certificate from Metacrypt.
|
||||||
5. Create an MCIAS system account for the agent (`mcp-agent`).
|
5. Create an MCIAS system account for the agent (`mcp-agent`).
|
||||||
6. Install the `mcp-agent` binary.
|
6. Install the initial `mcp-agent` binary to `/srv/mcp/mcp-agent`.
|
||||||
7. Start via systemd unit.
|
7. Install and start the systemd unit.
|
||||||
|
|
||||||
The agent runs as a systemd service. Container-first deployment is a v2
|
On **NixOS** (rift), provisioning is declarative via the NixOS config.
|
||||||
concern -- MCP needs to be running before it can manage its own agent.
|
The NixOS config owns the infrastructure (user, systemd unit, podman,
|
||||||
|
directories, permissions) but **not** the binary. `ExecStart` points to
|
||||||
|
`/srv/mcp/mcp-agent`, a mutable path that MCP manages. NixOS may
|
||||||
|
bootstrap the initial binary there, but subsequent updates come from MCP.
|
||||||
|
|
||||||
|
On **Debian** (hyperborea, svc), provisioning is done via a setup script
|
||||||
|
or ansible playbook that creates the same layout.
|
||||||
|
|
||||||
|
#### Binary Location
|
||||||
|
|
||||||
|
The agent binary lives at `/srv/mcp/mcp-agent` on **all** nodes,
|
||||||
|
regardless of OS. This unifies the update mechanism across the fleet.
|
||||||
|
|
||||||
|
#### Agent Upgrades
|
||||||
|
|
||||||
|
After initial provisioning, the agent binary is updated via
|
||||||
|
`mcp agent upgrade`. The CLI:
|
||||||
|
|
||||||
|
1. Cross-compiles the agent for each target architecture
|
||||||
|
(`GOARCH=amd64` for rift/svc, `GOARCH=arm64` for hyperborea).
|
||||||
|
2. SSHs to each node, pushes the binary to `/srv/mcp/mcp-agent.new`.
|
||||||
|
3. Atomically swaps the binary (`mv mcp-agent.new mcp-agent`).
|
||||||
|
4. Restarts the systemd service (`systemctl restart mcp-agent`).
|
||||||
|
|
||||||
|
SSH is used instead of gRPC because:
|
||||||
|
- It works even when the agent is broken or has an incompatible version.
|
||||||
|
- The binary is ~17MB, which exceeds gRPC default message limits.
|
||||||
|
- No self-restart coordination needed.
|
||||||
|
|
||||||
|
The CLI uses `golang.org/x/crypto/ssh` for native SSH, keeping the
|
||||||
|
entire workflow in a single binary with no external tool dependencies.
|
||||||
|
|
||||||
|
#### Node Configuration
|
||||||
|
|
||||||
|
Node config includes SSH and architecture info for agent management:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[[nodes]]
|
||||||
|
name = "rift"
|
||||||
|
address = "100.95.252.120:9444"
|
||||||
|
ssh = "rift" # SSH host (from ~/.ssh/config or hostname)
|
||||||
|
arch = "amd64" # GOARCH for cross-compilation
|
||||||
|
|
||||||
|
[[nodes]]
|
||||||
|
name = "hyperborea"
|
||||||
|
address = "100.x.x.x:9444"
|
||||||
|
ssh = "hyperborea"
|
||||||
|
arch = "arm64"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Coordinated Upgrades
|
||||||
|
|
||||||
|
New MCP releases often add new RPCs. A CLI at v0.6.0 calling an agent
|
||||||
|
at v0.5.0 fails with `Unimplemented`. Therefore agent upgrades must be
|
||||||
|
coordinated: `mcp agent upgrade` (with no node argument) upgrades all
|
||||||
|
nodes before the CLI is used for other operations.
|
||||||
|
|
||||||
|
If a node fails to upgrade, it is reported but the others still proceed.
|
||||||
|
The operator can retry or investigate via SSH.
|
||||||
|
|
||||||
|
#### Systemd Unit
|
||||||
|
|
||||||
|
The systemd unit is the same on all nodes:
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
[Unit]
|
[Unit]
|
||||||
@@ -1167,7 +1284,7 @@ Wants=network-online.target
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
ExecStart=/usr/local/bin/mcp-agent server --config /srv/mcp/mcp-agent.toml
|
ExecStart=/srv/mcp/mcp-agent server --config /srv/mcp/mcp-agent.toml
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=5
|
RestartSec=5
|
||||||
|
|
||||||
@@ -1175,17 +1292,14 @@ User=mcp
|
|||||||
Group=mcp
|
Group=mcp
|
||||||
|
|
||||||
NoNewPrivileges=true
|
NoNewPrivileges=true
|
||||||
ProtectSystem=strict
|
ProtectSystem=full
|
||||||
ProtectHome=true
|
ProtectHome=false
|
||||||
PrivateTmp=true
|
PrivateTmp=true
|
||||||
PrivateDevices=true
|
PrivateDevices=true
|
||||||
ProtectKernelTunables=true
|
ProtectKernelTunables=true
|
||||||
ProtectKernelModules=true
|
ProtectKernelModules=true
|
||||||
ProtectControlGroups=true
|
|
||||||
RestrictSUIDSGID=true
|
RestrictSUIDSGID=true
|
||||||
RestrictNamespaces=true
|
|
||||||
LockPersonality=true
|
LockPersonality=true
|
||||||
MemoryDenyWriteExecute=true
|
|
||||||
RestrictRealtime=true
|
RestrictRealtime=true
|
||||||
ReadWritePaths=/srv
|
ReadWritePaths=/srv
|
||||||
|
|
||||||
@@ -1195,6 +1309,7 @@ WantedBy=multi-user.target
|
|||||||
|
|
||||||
Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes
|
Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes
|
||||||
files to any service's `/srv/<service>/` directory on behalf of the CLI.
|
files to any service's `/srv/<service>/` directory on behalf of the CLI.
|
||||||
|
`ProtectHome=false` because the `mcp` user's home is `/srv/mcp`.
|
||||||
|
|
||||||
### CLI Installation (on operator workstation)
|
### CLI Installation (on operator workstation)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user