Document MCP-based container management for MCNS on rift, replacing the docker-compose workflow. Add deploy/mcns-rift.toml as the reference MCP service definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.8 KiB
MCNS Runbook
Service Overview
MCNS is an authoritative DNS server for the Metacircular platform. It listens on port 53 (UDP+TCP) for DNS queries, port 8443 for the REST management API, and port 9443 for the gRPC management API. Zone and record data is stored in SQLite. All management operations require MCIAS authentication; DNS queries are unauthenticated.
Health Checks
CLI
mcns status --addr https://localhost:8443
With a custom CA certificate:
mcns status --addr https://localhost:8443 --ca-cert /srv/mcns/certs/ca.pem
Expected output: ok
REST
curl -k https://localhost:8443/v1/health
Expected: HTTP 200.
gRPC
Use the AdminService.Health RPC on port 9443. This method is public
(no auth required).
DNS
dig @localhost svc.mcp.metacircular.net SOA +short
A valid SOA response confirms the DNS listener and database are working.
Common Operations
Start the Service
- Verify config exists:
ls /srv/mcns/mcns.toml - Start the container:
docker compose -f deploy/docker/docker-compose-rift.yml up -d - Verify health:
mcns status --addr https://localhost:8443
Stop the Service
- Stop the container:
docker compose -f deploy/docker/docker-compose-rift.yml stop mcns - MCNS handles SIGTERM gracefully and drains in-flight requests (30s timeout).
Restart the Service
- Restart the container:
docker compose -f deploy/docker/docker-compose-rift.yml restart mcns - Verify health:
mcns status --addr https://localhost:8443
Backup (Snapshot)
- Run the snapshot command:
mcns snapshot --config /srv/mcns/mcns.toml - The snapshot is saved to
/srv/mcns/backups/mcns-YYYYMMDD-HHMMSS.db. - Verify the snapshot file exists and has a reasonable size:
ls -lh /srv/mcns/backups/
Restore from Snapshot
- Stop the service (see above).
- Back up the current database:
cp /srv/mcns/mcns.db /srv/mcns/mcns.db.pre-restore - Copy the snapshot into place:
cp /srv/mcns/backups/mcns-YYYYMMDD-HHMMSS.db /srv/mcns/mcns.db - Start the service (see above).
- Verify the service is healthy:
mcns status --addr https://localhost:8443 - Verify zones are accessible by querying DNS:
dig @localhost svc.mcp.metacircular.net SOA +short
Log Inspection
Container logs:
docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 100 mcns
Follow logs in real time:
docker compose -f deploy/docker/docker-compose-rift.yml logs -f mcns
MCNS logs to stderr as structured text (slog). Log level is configured
via [log] level in mcns.toml (debug, info, warn, error).
Incident Procedures
Database Corruption
Symptoms: server fails to start with SQLite errors, or queries return unexpected errors.
- Stop the service.
- Check for WAL/SHM files alongside the database:
ls -la /srv/mcns/mcns.db* - Attempt an integrity check:
sqlite3 /srv/mcns/mcns.db "PRAGMA integrity_check;" - If integrity check fails, restore from the most recent snapshot:
cp /srv/mcns/mcns.db /srv/mcns/mcns.db.corrupt cp /srv/mcns/backups/mcns-YYYYMMDD-HHMMSS.db /srv/mcns/mcns.db - Start the service and verify health.
- Re-create any records added after the snapshot was taken.
Certificate Expiry
Symptoms: health check fails with TLS errors, API clients get certificate errors.
- Check certificate expiry:
openssl x509 -in /srv/mcns/certs/cert.pem -noout -enddate - Replace the certificate and key files at the paths in
mcns.toml. - Restart the service to load the new certificate.
- Verify health:
mcns status --addr https://localhost:8443
MCIAS Outage
Symptoms: management API returns 502 or authentication errors. DNS continues to work normally (DNS has no auth dependency).
- Confirm MCIAS is unreachable:
curl -k https://svc.metacircular.net:8443/v1/health - DNS resolution is unaffected -- no immediate action needed for DNS.
- Management operations (zone/record create/update/delete) will fail until MCIAS recovers.
- Escalate to MCIAS (see Escalation below).
DNS Not Resolving
Symptoms: dig @<server> <name> returns SERVFAIL or times out.
- Verify the service is running:
docker compose -f deploy/docker/docker-compose-rift.yml ps mcns - Check that port 53 is listening:
ss -ulnp | grep ':53' ss -tlnp | grep ':53' - Test an authoritative query:
dig @localhost svc.mcp.metacircular.net SOA - Test a forwarded query:
dig @localhost example.com A - If authoritative queries fail but forwarding works, the database may be corrupt (see Database Corruption above).
- If forwarding fails, check upstream connectivity:
dig @1.1.1.1 example.com A - Check logs for errors:
docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 50 mcns
Port 53 Already in Use
Symptoms: MCNS fails to start with "address already in use" on port 53.
- Identify what is using the port:
ss -ulnp | grep ':53' ss -tlnp | grep ':53' - Common culprit:
systemd-resolvedlistening on127.0.0.53:53.- If on a system with systemd-resolved, either disable it or bind
MCNS to a specific IP instead of
0.0.0.0:53.
- If on a system with systemd-resolved, either disable it or bind
MCNS to a specific IP instead of
- If another DNS server is running, stop it or change the MCNS
[dns] listen_addrinmcns.tomlto a different address. - Restart MCNS and verify DNS is responding.
Deployment with MCP
MCNS runs on rift as a single container managed by MCP. The service
definition lives at ~/.config/mcp/services/mcns.toml on the operator's
machine. A reference copy is maintained at deploy/mcns-rift.toml in
this repository.
The container image is pulled from MCR. The container mounts /srv/mcns
and runs as --user 0:0. DNS listens on port 53 (UDP+TCP) on both
192.168.88.181 and 100.95.252.120, with the management API on 8443/9443.
Note: the operator's ~/.config/mcp/services/mcns.toml may still
reference the old CoreDNS image and needs updating to the new MCNS image.
Key Operations
- Deploy or update:
mcp deploy mcns - Restart:
mcp restart mcns - Stop:
mcp stop mcns(WARNING: stops DNS for all internal zones) - Check status:
mcp psormcp status mcns - View logs:
ssh rift 'doas su - mcp -s /bin/sh -c "podman logs mcns"'
Escalation
Escalate when:
- Database corruption cannot be resolved by restoring a snapshot.
- MCIAS is down and management operations are urgently needed.
- DNS resolution failures persist after following the procedures above.
- Any issue not covered by this runbook.
Escalation path: Kyle (platform owner).