- docs/bootstrap.md: step-by-step bootstrap procedure with lessons learned from the first deployment (NixOS sandbox issues, podman rootless setup, container naming, MCR auth workaround) - README.md: quick-start guide, command reference, doc links - RUNBOOK.md: operational procedures for operators (health checks, common operations, unsealing metacrypt, cert renewal, incident response, disaster recovery, file locations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.6 KiB
MCP Bootstrap Procedure
How to bring MCP up on a node for the first time, including migrating existing containers from another user's podman instance.
Prerequisites
- NixOS configuration applied with
configs/mcp.nix(createsmcpuser with rootless podman, subuid/subgid, systemd service) - MCIAS system account with
adminrole (for token validation and cert provisioning) - Metacrypt running (for TLS certificate issuance)
Step 1: Provision TLS Certificate
Issue a cert from Metacrypt with DNS and IP SANs:
export METACRYPT_TOKEN="<admin-token>"
# From a machine that can reach Metacrypt (e.g., via loopback on rift):
curl -sk -X POST https://127.0.0.1:18443/v1/engine/request \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $METACRYPT_TOKEN" \
-d '{
"mount": "pki",
"operation": "issue",
"path": "web",
"data": {
"issuer": "web",
"common_name": "mcp-agent.svc.mcp.metacircular.net",
"profile": "server",
"dns_names": ["mcp-agent.svc.mcp.metacircular.net"],
"ip_addresses": ["<tailscale-ip>", "<lan-ip>"],
"ttl": "2160h"
}
}' > cert-response.json
# Extract cert and key from the JSON response and install:
doas cp cert.pem /srv/mcp/certs/cert.pem
doas cp key.pem /srv/mcp/certs/key.pem
doas chown mcp:mcp /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
doas chmod 600 /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
Step 2: Add DNS Record
Add an A record for mcp-agent.svc.mcp.metacircular.net pointing to the
node's IP in the MCNS zone file, bump the serial, restart CoreDNS.
Step 3: Write Agent Config
Create /srv/mcp/mcp-agent.toml:
[server]
grpc_addr = "<tailscale-ip>:9444"
tls_cert = "/srv/mcp/certs/cert.pem"
tls_key = "/srv/mcp/certs/key.pem"
[database]
path = "/srv/mcp/mcp.db"
[mcias]
server_url = "https://mcias.metacircular.net:8443"
service_name = "mcp-agent"
[agent]
node_name = "<node-name>"
container_runtime = "podman"
[monitor]
interval = "60s"
alert_command = []
cooldown = "15m"
flap_threshold = 3
flap_window = "10m"
retention = "30d"
[log]
level = "info"
Step 4: Install Agent Binary
scp mcp-agent <node>:/tmp/
ssh <node> "doas cp /tmp/mcp-agent /usr/local/bin/mcp-agent"
Step 5: Start the Agent
ssh <node> "doas systemctl start mcp-agent"
ssh <node> "doas systemctl status mcp-agent"
Step 6: Configure CLI
On the operator's workstation, create ~/.config/mcp/mcp.toml and save
the MCIAS admin service account token to ~/.config/mcp/token.
Step 7: Migrate Containers (if existing)
If containers are running under another user (e.g., kyle), migrate them
to the mcp user's podman. Process each service in dependency order:
Dependency order: Metacrypt → MC-Proxy → MCR → MCNS
For each service:
# 1. Stop containers under the old user
ssh <node> "podman stop <container> && podman rm <container>"
# 2. Transfer ownership of data directory
ssh <node> "doas chown -R mcp:mcp /srv/<service>"
# 3. Transfer images to mcp's podman
ssh <node> "podman save <image> -o /tmp/<service>.tar"
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman load -i /tmp/<service>.tar'"
# 4. Start containers under mcp (with new naming convention)
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman run -d \
--name <service>-<component> \
--network mcpnet \
--restart unless-stopped \
--user 0:0 \
-p <ports> \
-v /srv/<service>:/srv/<service> \
<image> <cmd>'"
Container naming convention: <service>-<component> (e.g.,
metacrypt-api, metacrypt-web, mc-proxy).
Network: Services whose components need to communicate (metacrypt
api↔web, mcr api↔web) must be on the same podman network with DNS
enabled. Create with podman network create mcpnet.
Config updates: If service configs reference container names for
inter-component communication (e.g., vault_grpc = "metacrypt:9443"),
update them to use the new names (e.g., vault_grpc = "metacrypt-api:9443").
Unseal Metacrypt after migration — it starts sealed.
Step 8: Adopt Containers
mcp adopt metacrypt
mcp adopt mc-proxy
mcp adopt mcr
mcp adopt mcns
Step 9: Export and Complete Service Definitions
mcp service export metacrypt
mcp service export mc-proxy
mcp service export mcr
mcp service export mcns
The exported files will have name + image only. Edit each file to add the full container spec: network, ports, volumes, user, restart, cmd.
Then sync to push the complete specs:
mcp sync
Step 10: Verify
mcp status
All services should show desired: running, observed: running, no drift.
Lessons Learned (from first deployment, 2026-03-26)
- NixOS systemd sandbox:
ProtectHome=trueblocks/run/userwhich rootless podman needs. UseProtectHome=false.ProtectSystem=strictalso blocks it; usefullinstead. - PATH: the agent's systemd unit needs
PATH=/run/current-system/sw/binto find podman. - XDG_RUNTIME_DIR: must be set to
/run/user/<uid>for rootless podman. Pin the UID in NixOS config to avoid drift. - Podman ps JSON: the
Commandfield is[]string, notstring. - Container naming:
mc-proxy(service with hyphen) breaks naive split on-. The agent uses registry-aware splitting. - Token whitespace: token files with trailing newlines cause gRPC header errors. The CLI trims whitespace.
- MCR auth: rootless podman under a new user can't pull from MCR without
OCI token auth. Workaround:
podman save+podman loadto transfer images.