- docs/bootstrap.md: step-by-step bootstrap procedure with lessons learned from the first deployment (NixOS sandbox issues, podman rootless setup, container naming, MCR auth workaround) - README.md: quick-start guide, command reference, doc links - RUNBOOK.md: operational procedures for operators (health checks, common operations, unsealing metacrypt, cert renewal, incident response, disaster recovery, file locations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
199 lines
5.6 KiB
Markdown
199 lines
5.6 KiB
Markdown
# MCP Bootstrap Procedure
|
|
|
|
How to bring MCP up on a node for the first time, including migrating
|
|
existing containers from another user's podman instance.
|
|
|
|
## Prerequisites
|
|
|
|
- NixOS configuration applied with `configs/mcp.nix` (creates `mcp` user
|
|
with rootless podman, subuid/subgid, systemd service)
|
|
- MCIAS system account with `admin` role (for token validation and cert
|
|
provisioning)
|
|
- Metacrypt running (for TLS certificate issuance)
|
|
|
|
## Step 1: Provision TLS Certificate
|
|
|
|
Issue a cert from Metacrypt with DNS and IP SANs:
|
|
|
|
```bash
|
|
export METACRYPT_TOKEN="<admin-token>"
|
|
|
|
# From a machine that can reach Metacrypt (e.g., via loopback on rift):
|
|
curl -sk -X POST https://127.0.0.1:18443/v1/engine/request \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $METACRYPT_TOKEN" \
|
|
-d '{
|
|
"mount": "pki",
|
|
"operation": "issue",
|
|
"path": "web",
|
|
"data": {
|
|
"issuer": "web",
|
|
"common_name": "mcp-agent.svc.mcp.metacircular.net",
|
|
"profile": "server",
|
|
"dns_names": ["mcp-agent.svc.mcp.metacircular.net"],
|
|
"ip_addresses": ["<tailscale-ip>", "<lan-ip>"],
|
|
"ttl": "2160h"
|
|
}
|
|
}' > cert-response.json
|
|
|
|
# Extract cert and key from the JSON response and install:
|
|
doas cp cert.pem /srv/mcp/certs/cert.pem
|
|
doas cp key.pem /srv/mcp/certs/key.pem
|
|
doas chown mcp:mcp /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
|
|
doas chmod 600 /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
|
|
```
|
|
|
|
## Step 2: Add DNS Record
|
|
|
|
Add an A record for `mcp-agent.svc.mcp.metacircular.net` pointing to the
|
|
node's IP in the MCNS zone file, bump the serial, restart CoreDNS.
|
|
|
|
## Step 3: Write Agent Config
|
|
|
|
Create `/srv/mcp/mcp-agent.toml`:
|
|
|
|
```toml
|
|
[server]
|
|
grpc_addr = "<tailscale-ip>:9444"
|
|
tls_cert = "/srv/mcp/certs/cert.pem"
|
|
tls_key = "/srv/mcp/certs/key.pem"
|
|
|
|
[database]
|
|
path = "/srv/mcp/mcp.db"
|
|
|
|
[mcias]
|
|
server_url = "https://mcias.metacircular.net:8443"
|
|
service_name = "mcp-agent"
|
|
|
|
[agent]
|
|
node_name = "<node-name>"
|
|
container_runtime = "podman"
|
|
|
|
[monitor]
|
|
interval = "60s"
|
|
alert_command = []
|
|
cooldown = "15m"
|
|
flap_threshold = 3
|
|
flap_window = "10m"
|
|
retention = "30d"
|
|
|
|
[log]
|
|
level = "info"
|
|
```
|
|
|
|
## Step 4: Install Agent Binary
|
|
|
|
```bash
|
|
scp mcp-agent <node>:/tmp/
|
|
ssh <node> "doas cp /tmp/mcp-agent /usr/local/bin/mcp-agent"
|
|
```
|
|
|
|
## Step 5: Start the Agent
|
|
|
|
```bash
|
|
ssh <node> "doas systemctl start mcp-agent"
|
|
ssh <node> "doas systemctl status mcp-agent"
|
|
```
|
|
|
|
## Step 6: Configure CLI
|
|
|
|
On the operator's workstation, create `~/.config/mcp/mcp.toml` and save
|
|
the MCIAS admin service account token to `~/.config/mcp/token`.
|
|
|
|
## Step 7: Migrate Containers (if existing)
|
|
|
|
If containers are running under another user (e.g., `kyle`), migrate them
|
|
to the `mcp` user's podman. Process each service in dependency order:
|
|
|
|
**Dependency order:** Metacrypt → MC-Proxy → MCR → MCNS
|
|
|
|
For each service:
|
|
|
|
```bash
|
|
# 1. Stop containers under the old user
|
|
ssh <node> "podman stop <container> && podman rm <container>"
|
|
|
|
# 2. Transfer ownership of data directory
|
|
ssh <node> "doas chown -R mcp:mcp /srv/<service>"
|
|
|
|
# 3. Transfer images to mcp's podman
|
|
ssh <node> "podman save <image> -o /tmp/<service>.tar"
|
|
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman load -i /tmp/<service>.tar'"
|
|
|
|
# 4. Start containers under mcp (with new naming convention)
|
|
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman run -d \
|
|
--name <service>-<component> \
|
|
--network mcpnet \
|
|
--restart unless-stopped \
|
|
--user 0:0 \
|
|
-p <ports> \
|
|
-v /srv/<service>:/srv/<service> \
|
|
<image> <cmd>'"
|
|
```
|
|
|
|
**Container naming convention:** `<service>-<component>` (e.g.,
|
|
`metacrypt-api`, `metacrypt-web`, `mc-proxy`).
|
|
|
|
**Network:** Services whose components need to communicate (metacrypt
|
|
api↔web, mcr api↔web) must be on the same podman network with DNS
|
|
enabled. Create with `podman network create mcpnet`.
|
|
|
|
**Config updates:** If service configs reference container names for
|
|
inter-component communication (e.g., `vault_grpc = "metacrypt:9443"`),
|
|
update them to use the new names (e.g., `vault_grpc = "metacrypt-api:9443"`).
|
|
|
|
**Unseal Metacrypt** after migration — it starts sealed.
|
|
|
|
## Step 8: Adopt Containers
|
|
|
|
```bash
|
|
mcp adopt metacrypt
|
|
mcp adopt mc-proxy
|
|
mcp adopt mcr
|
|
mcp adopt mcns
|
|
```
|
|
|
|
## Step 9: Export and Complete Service Definitions
|
|
|
|
```bash
|
|
mcp service export metacrypt
|
|
mcp service export mc-proxy
|
|
mcp service export mcr
|
|
mcp service export mcns
|
|
```
|
|
|
|
The exported files will have name + image only. Edit each file to add the
|
|
full container spec: network, ports, volumes, user, restart, cmd.
|
|
|
|
Then sync to push the complete specs:
|
|
|
|
```bash
|
|
mcp sync
|
|
```
|
|
|
|
## Step 10: Verify
|
|
|
|
```bash
|
|
mcp status
|
|
```
|
|
|
|
All services should show `desired: running`, `observed: running`, no drift.
|
|
|
|
## Lessons Learned (from first deployment, 2026-03-26)
|
|
|
|
- **NixOS systemd sandbox**: `ProtectHome=true` blocks `/run/user` which
|
|
rootless podman needs. Use `ProtectHome=false`. `ProtectSystem=strict`
|
|
also blocks it; use `full` instead.
|
|
- **PATH**: the agent's systemd unit needs `PATH=/run/current-system/sw/bin`
|
|
to find podman.
|
|
- **XDG_RUNTIME_DIR**: must be set to `/run/user/<uid>` for rootless podman.
|
|
Pin the UID in NixOS config to avoid drift.
|
|
- **Podman ps JSON**: the `Command` field is `[]string`, not `string`.
|
|
- **Container naming**: `mc-proxy` (service with hyphen) breaks naive split
|
|
on `-`. The agent uses registry-aware splitting.
|
|
- **Token whitespace**: token files with trailing newlines cause gRPC header
|
|
errors. The CLI trims whitespace.
|
|
- **MCR auth**: rootless podman under a new user can't pull from MCR without
|
|
OCI token auth. Workaround: `podman save` + `podman load` to transfer
|
|
images.
|