Files
mcp/docs/bootstrap.md
Kyle Isom ea8a42a696 P5.2 + P5.3: Bootstrap docs, README, and RUNBOOK
- docs/bootstrap.md: step-by-step bootstrap procedure with lessons
  learned from the first deployment (NixOS sandbox issues, podman
  rootless setup, container naming, MCR auth workaround)
- README.md: quick-start guide, command reference, doc links
- RUNBOOK.md: operational procedures for operators (health checks,
  common operations, unsealing metacrypt, cert renewal, incident
  response, disaster recovery, file locations)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:32:22 -07:00

5.6 KiB

MCP Bootstrap Procedure

How to bring MCP up on a node for the first time, including migrating existing containers from another user's podman instance.

Prerequisites

  • NixOS configuration applied with configs/mcp.nix (creates mcp user with rootless podman, subuid/subgid, systemd service)
  • MCIAS system account with admin role (for token validation and cert provisioning)
  • Metacrypt running (for TLS certificate issuance)

Step 1: Provision TLS Certificate

Issue a cert from Metacrypt with DNS and IP SANs:

export METACRYPT_TOKEN="<admin-token>"

# From a machine that can reach Metacrypt (e.g., via loopback on rift):
curl -sk -X POST https://127.0.0.1:18443/v1/engine/request \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $METACRYPT_TOKEN" \
  -d '{
    "mount": "pki",
    "operation": "issue",
    "path": "web",
    "data": {
      "issuer": "web",
      "common_name": "mcp-agent.svc.mcp.metacircular.net",
      "profile": "server",
      "dns_names": ["mcp-agent.svc.mcp.metacircular.net"],
      "ip_addresses": ["<tailscale-ip>", "<lan-ip>"],
      "ttl": "2160h"
    }
  }' > cert-response.json

# Extract cert and key from the JSON response and install:
doas cp cert.pem /srv/mcp/certs/cert.pem
doas cp key.pem /srv/mcp/certs/key.pem
doas chown mcp:mcp /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
doas chmod 600 /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem

Step 2: Add DNS Record

Add an A record for mcp-agent.svc.mcp.metacircular.net pointing to the node's IP in the MCNS zone file, bump the serial, restart CoreDNS.

Step 3: Write Agent Config

Create /srv/mcp/mcp-agent.toml:

[server]
grpc_addr = "<tailscale-ip>:9444"
tls_cert  = "/srv/mcp/certs/cert.pem"
tls_key   = "/srv/mcp/certs/key.pem"

[database]
path = "/srv/mcp/mcp.db"

[mcias]
server_url   = "https://mcias.metacircular.net:8443"
service_name = "mcp-agent"

[agent]
node_name         = "<node-name>"
container_runtime = "podman"

[monitor]
interval       = "60s"
alert_command  = []
cooldown       = "15m"
flap_threshold = 3
flap_window    = "10m"
retention      = "30d"

[log]
level = "info"

Step 4: Install Agent Binary

scp mcp-agent <node>:/tmp/
ssh <node> "doas cp /tmp/mcp-agent /usr/local/bin/mcp-agent"

Step 5: Start the Agent

ssh <node> "doas systemctl start mcp-agent"
ssh <node> "doas systemctl status mcp-agent"

Step 6: Configure CLI

On the operator's workstation, create ~/.config/mcp/mcp.toml and save the MCIAS admin service account token to ~/.config/mcp/token.

Step 7: Migrate Containers (if existing)

If containers are running under another user (e.g., kyle), migrate them to the mcp user's podman. Process each service in dependency order:

Dependency order: Metacrypt → MC-Proxy → MCR → MCNS

For each service:

# 1. Stop containers under the old user
ssh <node> "podman stop <container> && podman rm <container>"

# 2. Transfer ownership of data directory
ssh <node> "doas chown -R mcp:mcp /srv/<service>"

# 3. Transfer images to mcp's podman
ssh <node> "podman save <image> -o /tmp/<service>.tar"
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman load -i /tmp/<service>.tar'"

# 4. Start containers under mcp (with new naming convention)
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman run -d \
  --name <service>-<component> \
  --network mcpnet \
  --restart unless-stopped \
  --user 0:0 \
  -p <ports> \
  -v /srv/<service>:/srv/<service> \
  <image> <cmd>'"

Container naming convention: <service>-<component> (e.g., metacrypt-api, metacrypt-web, mc-proxy).

Network: Services whose components need to communicate (metacrypt api↔web, mcr api↔web) must be on the same podman network with DNS enabled. Create with podman network create mcpnet.

Config updates: If service configs reference container names for inter-component communication (e.g., vault_grpc = "metacrypt:9443"), update them to use the new names (e.g., vault_grpc = "metacrypt-api:9443").

Unseal Metacrypt after migration — it starts sealed.

Step 8: Adopt Containers

mcp adopt metacrypt
mcp adopt mc-proxy
mcp adopt mcr
mcp adopt mcns

Step 9: Export and Complete Service Definitions

mcp service export metacrypt
mcp service export mc-proxy
mcp service export mcr
mcp service export mcns

The exported files will have name + image only. Edit each file to add the full container spec: network, ports, volumes, user, restart, cmd.

Then sync to push the complete specs:

mcp sync

Step 10: Verify

mcp status

All services should show desired: running, observed: running, no drift.

Lessons Learned (from first deployment, 2026-03-26)

  • NixOS systemd sandbox: ProtectHome=true blocks /run/user which rootless podman needs. Use ProtectHome=false. ProtectSystem=strict also blocks it; use full instead.
  • PATH: the agent's systemd unit needs PATH=/run/current-system/sw/bin to find podman.
  • XDG_RUNTIME_DIR: must be set to /run/user/<uid> for rootless podman. Pin the UID in NixOS config to avoid drift.
  • Podman ps JSON: the Command field is []string, not string.
  • Container naming: mc-proxy (service with hyphen) breaks naive split on -. The agent uses registry-aware splitting.
  • Token whitespace: token files with trailing newlines cause gRPC header errors. The CLI trims whitespace.
  • MCR auth: rootless podman under a new user can't pull from MCR without OCI token auth. Workaround: podman save + podman load to transfer images.