P5.2 + P5.3: Bootstrap docs, README, and RUNBOOK
- docs/bootstrap.md: step-by-step bootstrap procedure with lessons learned from the first deployment (NixOS sandbox issues, podman rootless setup, container naming, MCR auth workaround) - README.md: quick-start guide, command reference, doc links - RUNBOOK.md: operational procedures for operators (health checks, common operations, unsealing metacrypt, cert renewal, incident response, disaster recovery, file locations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -47,8 +47,8 @@
|
|||||||
## Phase 5: Integration and Polish
|
## Phase 5: Integration and Polish
|
||||||
|
|
||||||
- [ ] **P5.1** Integration test suite
|
- [ ] **P5.1** Integration test suite
|
||||||
- [ ] **P5.2** Bootstrap procedure test
|
- [x] **P5.2** Bootstrap procedure — documented in `docs/bootstrap.md`
|
||||||
- [x] **P5.3** Documentation — CLAUDE.md done; README.md and RUNBOOK.md pending
|
- [x] **P5.3** Documentation — CLAUDE.md, README.md, RUNBOOK.md
|
||||||
|
|
||||||
## Phase 6: Deployment (completed 2026-03-26)
|
## Phase 6: Deployment (completed 2026-03-26)
|
||||||
|
|
||||||
|
|||||||
119
README.md
Normal file
119
README.md
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
# MCP — Metacircular Control Plane
|
||||||
|
|
||||||
|
MCP is the orchestrator for the [Metacircular](https://metacircular.net)
|
||||||
|
platform. It manages container lifecycle, tracks what services run where,
|
||||||
|
and transfers files between the operator's workstation and managed nodes.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
**CLI** (`mcp`) — thin client on the operator's workstation. Reads local
|
||||||
|
service definition files, pushes intent to agents, queries status.
|
||||||
|
|
||||||
|
**Agent** (`mcp-agent`) — per-node daemon. Manages containers via rootless
|
||||||
|
podman, stores a SQLite registry of desired/observed state, monitors for
|
||||||
|
drift, and alerts the operator.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Build
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make all # vet, lint, test, build
|
||||||
|
make mcp # CLI only
|
||||||
|
make mcp-agent # agent only
|
||||||
|
```
|
||||||
|
|
||||||
|
### Install the CLI
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp mcp ~/.local/bin/
|
||||||
|
mkdir -p ~/.config/mcp/services
|
||||||
|
```
|
||||||
|
|
||||||
|
Create `~/.config/mcp/mcp.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[services]
|
||||||
|
dir = "/home/<user>/.config/mcp/services"
|
||||||
|
|
||||||
|
[mcias]
|
||||||
|
server_url = "https://mcias.metacircular.net:8443"
|
||||||
|
service_name = "mcp"
|
||||||
|
|
||||||
|
[auth]
|
||||||
|
token_path = "/home/<user>/.config/mcp/token"
|
||||||
|
|
||||||
|
[[nodes]]
|
||||||
|
name = "rift"
|
||||||
|
address = "100.95.252.120:9444"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authenticate
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp login
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp status # full picture: services, drift, events
|
||||||
|
mcp ps # live container check with uptime
|
||||||
|
mcp list # quick registry query
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deploy a service
|
||||||
|
|
||||||
|
Write a service definition in `~/.config/mcp/services/<name>.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
name = "myservice"
|
||||||
|
node = "rift"
|
||||||
|
active = true
|
||||||
|
|
||||||
|
[[components]]
|
||||||
|
name = "api"
|
||||||
|
image = "mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0"
|
||||||
|
network = "mcpnet"
|
||||||
|
user = "0:0"
|
||||||
|
restart = "unless-stopped"
|
||||||
|
ports = ["127.0.0.1:8443:8443"]
|
||||||
|
volumes = ["/srv/myservice:/srv/myservice"]
|
||||||
|
cmd = ["server", "--config", "/srv/myservice/myservice.toml"]
|
||||||
|
```
|
||||||
|
|
||||||
|
Then deploy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp deploy myservice
|
||||||
|
```
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `mcp login` | Authenticate to MCIAS |
|
||||||
|
| `mcp deploy <service>[/<component>]` | Deploy from service definition |
|
||||||
|
| `mcp stop <service>` | Stop all components |
|
||||||
|
| `mcp start <service>` | Start all components |
|
||||||
|
| `mcp restart <service>` | Restart all components |
|
||||||
|
| `mcp list` | List services (registry) |
|
||||||
|
| `mcp ps` | Live container check |
|
||||||
|
| `mcp status [service]` | Full status with drift and events |
|
||||||
|
| `mcp sync` | Push all service definitions |
|
||||||
|
| `mcp adopt <service>` | Adopt running containers |
|
||||||
|
| `mcp service show <service>` | Print spec from agent |
|
||||||
|
| `mcp service edit <service>` | Edit definition in $EDITOR |
|
||||||
|
| `mcp service export <service>` | Export agent spec to file |
|
||||||
|
| `mcp push <file> <service> [path]` | Push file to node |
|
||||||
|
| `mcp pull <service> <path> [file]` | Pull file from node |
|
||||||
|
| `mcp node list` | List nodes |
|
||||||
|
| `mcp node add <name> <addr>` | Add a node |
|
||||||
|
| `mcp node remove <name>` | Remove a node |
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) — design specification
|
||||||
|
- [RUNBOOK.md](RUNBOOK.md) — operational procedures
|
||||||
|
- [PROJECT_PLAN_V1.md](PROJECT_PLAN_V1.md) — implementation plan
|
||||||
|
- [PROGRESS_V1.md](PROGRESS_V1.md) — progress and remaining work
|
||||||
305
RUNBOOK.md
Normal file
305
RUNBOOK.md
Normal file
@@ -0,0 +1,305 @@
|
|||||||
|
# MCP Runbook
|
||||||
|
|
||||||
|
Operational procedures for the Metacircular Control Plane. Written for
|
||||||
|
operators at 3 AM.
|
||||||
|
|
||||||
|
## Service Overview
|
||||||
|
|
||||||
|
MCP manages container lifecycle on Metacircular nodes. Two components:
|
||||||
|
- **mcp-agent** — systemd service on each node (rift). Manages containers
|
||||||
|
via rootless podman, stores registry in SQLite, monitors for drift.
|
||||||
|
- **mcp** — CLI on the operator's workstation (vade). Pushes desired state,
|
||||||
|
queries status.
|
||||||
|
|
||||||
|
## Health Checks
|
||||||
|
|
||||||
|
### Quick status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp status
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows all services, desired vs observed state, drift, and recent events.
|
||||||
|
No drift = healthy.
|
||||||
|
|
||||||
|
### Agent process
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh rift "doas systemctl status mcp-agent"
|
||||||
|
ssh rift "doas journalctl -u mcp-agent --since '10 min ago' --no-pager"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Individual service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp status metacrypt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Operations
|
||||||
|
|
||||||
|
### Check what's running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp ps # live check with uptime
|
||||||
|
mcp list # from registry (no runtime query)
|
||||||
|
mcp status # full picture with drift and events
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restart a service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp restart metacrypt
|
||||||
|
```
|
||||||
|
|
||||||
|
Restarts all components. Does not change the `active` flag. Metacrypt
|
||||||
|
will need to be unsealed after restart.
|
||||||
|
|
||||||
|
### Stop a service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp stop metacrypt
|
||||||
|
```
|
||||||
|
|
||||||
|
Sets `active = false` in the service definition file and stops all
|
||||||
|
containers. The agent will not restart them.
|
||||||
|
|
||||||
|
### Start a stopped service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp start metacrypt
|
||||||
|
```
|
||||||
|
|
||||||
|
Sets `active = true` and starts all containers.
|
||||||
|
|
||||||
|
### Deploy an update
|
||||||
|
|
||||||
|
Edit the service definition to update the image tag, then deploy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp service edit metacrypt # opens in $EDITOR
|
||||||
|
mcp deploy metacrypt # deploys all components
|
||||||
|
mcp deploy metacrypt/web # deploy just the web component
|
||||||
|
```
|
||||||
|
|
||||||
|
### Push a config file to a node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp push metacrypt.toml metacrypt # → /srv/metacrypt/metacrypt.toml
|
||||||
|
mcp push cert.pem metacrypt certs/cert.pem # → /srv/metacrypt/certs/cert.pem
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pull a file from a node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp pull metacrypt metacrypt.toml ./local-copy.toml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sync desired state
|
||||||
|
|
||||||
|
Push all service definitions to the agent without deploying:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp sync
|
||||||
|
```
|
||||||
|
|
||||||
|
### View service definition
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp service show metacrypt # from agent registry
|
||||||
|
cat ~/.config/mcp/services/metacrypt.toml # local file
|
||||||
|
```
|
||||||
|
|
||||||
|
### Export service definition from agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp service export metacrypt
|
||||||
|
```
|
||||||
|
|
||||||
|
Writes the agent's current spec to the local service definition file.
|
||||||
|
|
||||||
|
## Unsealing Metacrypt
|
||||||
|
|
||||||
|
Metacrypt starts sealed after any restart. Unseal via the API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -sk -X POST https://metacrypt.svc.mcp.metacircular.net:8443/v1/unseal \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"password":"<unseal-password>"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Or via the web UI at `https://metacrypt.svc.mcp.metacircular.net`.
|
||||||
|
|
||||||
|
**Important:** Restarting metacrypt-api requires unsealing. To avoid this
|
||||||
|
when updating just the UI, deploy only the web component:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp deploy metacrypt/web
|
||||||
|
```
|
||||||
|
|
||||||
|
## Agent Management
|
||||||
|
|
||||||
|
### Restart the agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh rift "doas systemctl restart mcp-agent"
|
||||||
|
```
|
||||||
|
|
||||||
|
Containers keep running — the agent is stateless w.r.t. container
|
||||||
|
lifecycle. Podman's restart policy keeps containers up.
|
||||||
|
|
||||||
|
### View agent logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh rift "doas journalctl -u mcp-agent -f" # follow
|
||||||
|
ssh rift "doas journalctl -u mcp-agent --since today" # today's logs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Agent database backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh rift "doas -u mcp /usr/local/bin/mcp-agent snapshot --config /srv/mcp/mcp-agent.toml"
|
||||||
|
```
|
||||||
|
|
||||||
|
Backups go to `/srv/mcp/backups/`.
|
||||||
|
|
||||||
|
### Update the agent binary
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On vade, in the mcp repo:
|
||||||
|
make clean && make mcp-agent
|
||||||
|
scp mcp-agent rift:/tmp/
|
||||||
|
ssh rift "doas systemctl stop mcp-agent && \
|
||||||
|
doas cp /tmp/mcp-agent /usr/local/bin/mcp-agent && \
|
||||||
|
doas systemctl start mcp-agent"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update the CLI binary
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make clean && make mcp
|
||||||
|
cp mcp ~/.local/bin/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Node Management
|
||||||
|
|
||||||
|
### List nodes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp node list
|
||||||
|
```
|
||||||
|
|
||||||
|
### Add a node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp node add <name> <address:port>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Remove a node
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp node remove <name>
|
||||||
|
```
|
||||||
|
|
||||||
|
## TLS Certificate Renewal
|
||||||
|
|
||||||
|
The agent's TLS cert is at `/srv/mcp/certs/cert.pem`. Check expiry:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh rift "openssl x509 -in /srv/mcp/certs/cert.pem -noout -enddate"
|
||||||
|
```
|
||||||
|
|
||||||
|
To renew (requires a Metacrypt token):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export METACRYPT_TOKEN="<token>"
|
||||||
|
ssh rift "curl -sk -X POST https://127.0.0.1:18443/v1/engine/request \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-H 'Authorization: Bearer $METACRYPT_TOKEN' \
|
||||||
|
-d '{
|
||||||
|
\"mount\": \"pki\",
|
||||||
|
\"operation\": \"issue\",
|
||||||
|
\"path\": \"web\",
|
||||||
|
\"data\": {
|
||||||
|
\"issuer\": \"web\",
|
||||||
|
\"common_name\": \"mcp-agent.svc.mcp.metacircular.net\",
|
||||||
|
\"profile\": \"server\",
|
||||||
|
\"dns_names\": [\"mcp-agent.svc.mcp.metacircular.net\"],
|
||||||
|
\"ip_addresses\": [\"100.95.252.120\", \"192.168.88.181\"],
|
||||||
|
\"ttl\": \"2160h\"
|
||||||
|
}
|
||||||
|
}'" > /tmp/cert-response.json
|
||||||
|
|
||||||
|
# Extract and install cert+key from the JSON response, then:
|
||||||
|
ssh rift "doas systemctl restart mcp-agent"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Incident Procedures
|
||||||
|
|
||||||
|
### Service not running (drift detected)
|
||||||
|
|
||||||
|
1. `mcp status` — identify which service/component drifted.
|
||||||
|
2. Check agent logs: `ssh rift "doas journalctl -u mcp-agent --since '10 min ago'"`
|
||||||
|
3. Check container logs: `ssh rift "doas -u mcp podman logs <container-name>"`
|
||||||
|
4. Restart: `mcp restart <service>`
|
||||||
|
5. If metacrypt: unseal after restart.
|
||||||
|
|
||||||
|
### Agent unreachable
|
||||||
|
|
||||||
|
1. Check if the agent process is running: `ssh rift "doas systemctl status mcp-agent"`
|
||||||
|
2. If stopped: `ssh rift "doas systemctl start mcp-agent"`
|
||||||
|
3. Check logs for crash reason: `ssh rift "doas journalctl -u mcp-agent -n 50"`
|
||||||
|
4. Containers keep running independently — podman's restart policy handles them.
|
||||||
|
|
||||||
|
### Token expired
|
||||||
|
|
||||||
|
MCP CLI shows `UNAUTHENTICATED` or `PERMISSION_DENIED`:
|
||||||
|
|
||||||
|
1. Check token: the mcp-agent service account token is at `~/.config/mcp/token`
|
||||||
|
2. Validate: `curl -sk -X POST -H "Authorization: Bearer $(cat ~/.config/mcp/token)" https://mcias.metacircular.net:8443/v1/token/validate`
|
||||||
|
3. If expired: generate a new service account token from MCIAS admin dashboard.
|
||||||
|
|
||||||
|
### Database corruption
|
||||||
|
|
||||||
|
The agent's SQLite database is at `/srv/mcp/mcp.db`:
|
||||||
|
|
||||||
|
1. Stop the agent: `ssh rift "doas systemctl stop mcp-agent"`
|
||||||
|
2. Restore from backup: `ssh rift "doas -u mcp cp /srv/mcp/backups/<latest>.db /srv/mcp/mcp.db"`
|
||||||
|
3. Start the agent: `ssh rift "doas systemctl start mcp-agent"`
|
||||||
|
4. Run `mcp sync` to re-push desired state.
|
||||||
|
|
||||||
|
If no backup exists, delete the database and re-bootstrap:
|
||||||
|
|
||||||
|
1. `ssh rift "doas -u mcp rm /srv/mcp/mcp.db"`
|
||||||
|
2. `ssh rift "doas systemctl start mcp-agent"` (creates fresh database)
|
||||||
|
3. `mcp sync` (pushes all service definitions)
|
||||||
|
|
||||||
|
### Disaster recovery (rift lost)
|
||||||
|
|
||||||
|
1. Provision new machine, connect to overlay network.
|
||||||
|
2. Apply NixOS config (creates mcp user, installs agent).
|
||||||
|
3. Install mcp-agent binary.
|
||||||
|
4. Restore `/srv/` from backups (each service's backup timer creates daily snapshots).
|
||||||
|
5. Provision TLS cert from Metacrypt.
|
||||||
|
6. Start agent: `doas systemctl start mcp-agent`
|
||||||
|
7. `mcp sync` from vade to push service definitions.
|
||||||
|
8. Unseal Metacrypt.
|
||||||
|
|
||||||
|
## File Locations
|
||||||
|
|
||||||
|
### On rift (agent)
|
||||||
|
|
||||||
|
| Path | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `/srv/mcp/mcp-agent.toml` | Agent config |
|
||||||
|
| `/srv/mcp/mcp.db` | Registry database |
|
||||||
|
| `/srv/mcp/certs/` | Agent TLS cert and key |
|
||||||
|
| `/srv/mcp/backups/` | Database snapshots |
|
||||||
|
| `/srv/<service>/` | Service data directories |
|
||||||
|
|
||||||
|
### On vade (CLI)
|
||||||
|
|
||||||
|
| Path | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `~/.config/mcp/mcp.toml` | CLI config |
|
||||||
|
| `~/.config/mcp/token` | MCIAS bearer token |
|
||||||
|
| `~/.config/mcp/services/` | Service definition files |
|
||||||
198
docs/bootstrap.md
Normal file
198
docs/bootstrap.md
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
# MCP Bootstrap Procedure
|
||||||
|
|
||||||
|
How to bring MCP up on a node for the first time, including migrating
|
||||||
|
existing containers from another user's podman instance.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- NixOS configuration applied with `configs/mcp.nix` (creates `mcp` user
|
||||||
|
with rootless podman, subuid/subgid, systemd service)
|
||||||
|
- MCIAS system account with `admin` role (for token validation and cert
|
||||||
|
provisioning)
|
||||||
|
- Metacrypt running (for TLS certificate issuance)
|
||||||
|
|
||||||
|
## Step 1: Provision TLS Certificate
|
||||||
|
|
||||||
|
Issue a cert from Metacrypt with DNS and IP SANs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export METACRYPT_TOKEN="<admin-token>"
|
||||||
|
|
||||||
|
# From a machine that can reach Metacrypt (e.g., via loopback on rift):
|
||||||
|
curl -sk -X POST https://127.0.0.1:18443/v1/engine/request \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer $METACRYPT_TOKEN" \
|
||||||
|
-d '{
|
||||||
|
"mount": "pki",
|
||||||
|
"operation": "issue",
|
||||||
|
"path": "web",
|
||||||
|
"data": {
|
||||||
|
"issuer": "web",
|
||||||
|
"common_name": "mcp-agent.svc.mcp.metacircular.net",
|
||||||
|
"profile": "server",
|
||||||
|
"dns_names": ["mcp-agent.svc.mcp.metacircular.net"],
|
||||||
|
"ip_addresses": ["<tailscale-ip>", "<lan-ip>"],
|
||||||
|
"ttl": "2160h"
|
||||||
|
}
|
||||||
|
}' > cert-response.json
|
||||||
|
|
||||||
|
# Extract cert and key from the JSON response and install:
|
||||||
|
doas cp cert.pem /srv/mcp/certs/cert.pem
|
||||||
|
doas cp key.pem /srv/mcp/certs/key.pem
|
||||||
|
doas chown mcp:mcp /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
|
||||||
|
doas chmod 600 /srv/mcp/certs/cert.pem /srv/mcp/certs/key.pem
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2: Add DNS Record
|
||||||
|
|
||||||
|
Add an A record for `mcp-agent.svc.mcp.metacircular.net` pointing to the
|
||||||
|
node's IP in the MCNS zone file, bump the serial, restart CoreDNS.
|
||||||
|
|
||||||
|
## Step 3: Write Agent Config
|
||||||
|
|
||||||
|
Create `/srv/mcp/mcp-agent.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[server]
|
||||||
|
grpc_addr = "<tailscale-ip>:9444"
|
||||||
|
tls_cert = "/srv/mcp/certs/cert.pem"
|
||||||
|
tls_key = "/srv/mcp/certs/key.pem"
|
||||||
|
|
||||||
|
[database]
|
||||||
|
path = "/srv/mcp/mcp.db"
|
||||||
|
|
||||||
|
[mcias]
|
||||||
|
server_url = "https://mcias.metacircular.net:8443"
|
||||||
|
service_name = "mcp-agent"
|
||||||
|
|
||||||
|
[agent]
|
||||||
|
node_name = "<node-name>"
|
||||||
|
container_runtime = "podman"
|
||||||
|
|
||||||
|
[monitor]
|
||||||
|
interval = "60s"
|
||||||
|
alert_command = []
|
||||||
|
cooldown = "15m"
|
||||||
|
flap_threshold = 3
|
||||||
|
flap_window = "10m"
|
||||||
|
retention = "30d"
|
||||||
|
|
||||||
|
[log]
|
||||||
|
level = "info"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Install Agent Binary
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scp mcp-agent <node>:/tmp/
|
||||||
|
ssh <node> "doas cp /tmp/mcp-agent /usr/local/bin/mcp-agent"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Start the Agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh <node> "doas systemctl start mcp-agent"
|
||||||
|
ssh <node> "doas systemctl status mcp-agent"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 6: Configure CLI
|
||||||
|
|
||||||
|
On the operator's workstation, create `~/.config/mcp/mcp.toml` and save
|
||||||
|
the MCIAS admin service account token to `~/.config/mcp/token`.
|
||||||
|
|
||||||
|
## Step 7: Migrate Containers (if existing)
|
||||||
|
|
||||||
|
If containers are running under another user (e.g., `kyle`), migrate them
|
||||||
|
to the `mcp` user's podman. Process each service in dependency order:
|
||||||
|
|
||||||
|
**Dependency order:** Metacrypt → MC-Proxy → MCR → MCNS
|
||||||
|
|
||||||
|
For each service:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Stop containers under the old user
|
||||||
|
ssh <node> "podman stop <container> && podman rm <container>"
|
||||||
|
|
||||||
|
# 2. Transfer ownership of data directory
|
||||||
|
ssh <node> "doas chown -R mcp:mcp /srv/<service>"
|
||||||
|
|
||||||
|
# 3. Transfer images to mcp's podman
|
||||||
|
ssh <node> "podman save <image> -o /tmp/<service>.tar"
|
||||||
|
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman load -i /tmp/<service>.tar'"
|
||||||
|
|
||||||
|
# 4. Start containers under mcp (with new naming convention)
|
||||||
|
ssh <node> "doas su -l -s /bin/sh mcp -c 'XDG_RUNTIME_DIR=/run/user/<uid> podman run -d \
|
||||||
|
--name <service>-<component> \
|
||||||
|
--network mcpnet \
|
||||||
|
--restart unless-stopped \
|
||||||
|
--user 0:0 \
|
||||||
|
-p <ports> \
|
||||||
|
-v /srv/<service>:/srv/<service> \
|
||||||
|
<image> <cmd>'"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Container naming convention:** `<service>-<component>` (e.g.,
|
||||||
|
`metacrypt-api`, `metacrypt-web`, `mc-proxy`).
|
||||||
|
|
||||||
|
**Network:** Services whose components need to communicate (metacrypt
|
||||||
|
api↔web, mcr api↔web) must be on the same podman network with DNS
|
||||||
|
enabled. Create with `podman network create mcpnet`.
|
||||||
|
|
||||||
|
**Config updates:** If service configs reference container names for
|
||||||
|
inter-component communication (e.g., `vault_grpc = "metacrypt:9443"`),
|
||||||
|
update them to use the new names (e.g., `vault_grpc = "metacrypt-api:9443"`).
|
||||||
|
|
||||||
|
**Unseal Metacrypt** after migration — it starts sealed.
|
||||||
|
|
||||||
|
## Step 8: Adopt Containers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp adopt metacrypt
|
||||||
|
mcp adopt mc-proxy
|
||||||
|
mcp adopt mcr
|
||||||
|
mcp adopt mcns
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 9: Export and Complete Service Definitions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp service export metacrypt
|
||||||
|
mcp service export mc-proxy
|
||||||
|
mcp service export mcr
|
||||||
|
mcp service export mcns
|
||||||
|
```
|
||||||
|
|
||||||
|
The exported files will have name + image only. Edit each file to add the
|
||||||
|
full container spec: network, ports, volumes, user, restart, cmd.
|
||||||
|
|
||||||
|
Then sync to push the complete specs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp sync
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 10: Verify
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mcp status
|
||||||
|
```
|
||||||
|
|
||||||
|
All services should show `desired: running`, `observed: running`, no drift.
|
||||||
|
|
||||||
|
## Lessons Learned (from first deployment, 2026-03-26)
|
||||||
|
|
||||||
|
- **NixOS systemd sandbox**: `ProtectHome=true` blocks `/run/user` which
|
||||||
|
rootless podman needs. Use `ProtectHome=false`. `ProtectSystem=strict`
|
||||||
|
also blocks it; use `full` instead.
|
||||||
|
- **PATH**: the agent's systemd unit needs `PATH=/run/current-system/sw/bin`
|
||||||
|
to find podman.
|
||||||
|
- **XDG_RUNTIME_DIR**: must be set to `/run/user/<uid>` for rootless podman.
|
||||||
|
Pin the UID in NixOS config to avoid drift.
|
||||||
|
- **Podman ps JSON**: the `Command` field is `[]string`, not `string`.
|
||||||
|
- **Container naming**: `mc-proxy` (service with hyphen) breaks naive split
|
||||||
|
on `-`. The agent uses registry-aware splitting.
|
||||||
|
- **Token whitespace**: token files with trailing newlines cause gRPC header
|
||||||
|
errors. The CLI trims whitespace.
|
||||||
|
- **MCR auth**: rootless podman under a new user can't pull from MCR without
|
||||||
|
OCI token auth. Workaround: `podman save` + `podman load` to transfer
|
||||||
|
images.
|
||||||
Reference in New Issue
Block a user