Files
mc-proxy/RUNBOOK.md
Kyle Isom dc1816b159 Add MCP deployment section to RUNBOOK.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:09:18 -07:00

355 lines
9.1 KiB
Markdown

# RUNBOOK.md
Operational procedures for mc-proxy. Written for operators, not developers.
## Service Overview
mc-proxy is a Layer 4 TLS SNI proxy. It routes incoming TLS connections to
backend services based on the SNI hostname. It does not terminate TLS or
inspect application-layer traffic. A global firewall blocks connections by
IP, CIDR, or GeoIP country before routing.
## Health Checks
### Via gRPC (requires admin API enabled)
```bash
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
```
Expected output:
```
mc-proxy v0.1.0
uptime: 4h32m10s
connections: 1247
:443 routes=2 active=12
:8443 routes=1 active=3
:9443 routes=1 active=0
```
### Via systemd
```bash
systemctl status mc-proxy
journalctl -u mc-proxy -n 50 --no-pager
```
### Via process
```bash
ss -tlnp | grep mc-proxy
```
Verify all configured listener ports are in LISTEN state.
## Common Operations
### Start / Stop / Restart
```bash
systemctl start mc-proxy
systemctl stop mc-proxy
systemctl restart mc-proxy
```
Stopping the service triggers graceful shutdown: new connections are refused,
in-flight connections drain for up to `shutdown_timeout` (default 30s), then
remaining connections are force-closed.
### View Logs
```bash
# Recent logs
journalctl -u mc-proxy -n 100 --no-pager
# Follow live
journalctl -u mc-proxy -f
# Filter by severity
journalctl -u mc-proxy -p err
```
### Reload GeoIP Database
Send SIGHUP to reload the GeoIP database without restarting:
```bash
systemctl kill -s HUP mc-proxy
```
Or:
```bash
kill -HUP $(pidof mc-proxy)
```
Verify in logs:
```
level=INFO msg="received SIGHUP, reloading GeoIP database"
```
### Create a Database Backup
```bash
# Manual backup
mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml
# Manual backup to a specific path
mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml -o /tmp/mc-proxy-backup.db
```
Automated daily backups run via the systemd timer:
```bash
# Check timer status
systemctl list-timers mc-proxy-backup.timer
# Run backup manually via systemd
systemctl start mc-proxy-backup.service
# View backup logs
journalctl -u mc-proxy-backup.service -n 20 --no-pager
```
Backups are stored in `/srv/mc-proxy/backups/` and pruned after 30 days.
### Restore from Backup
1. Stop the service:
```bash
systemctl stop mc-proxy
```
2. Replace the database:
```bash
cp /srv/mc-proxy/backups/mc-proxy-<timestamp>.db /srv/mc-proxy/mc-proxy.db
chown mc-proxy:mc-proxy /srv/mc-proxy/mc-proxy.db
chmod 0600 /srv/mc-proxy/mc-proxy.db
```
3. Start the service:
```bash
systemctl start mc-proxy
```
4. Verify health:
```bash
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
```
### Manage Routes at Runtime (gRPC)
Routes can be added and removed at runtime via the gRPC admin API using
`grpcurl` or any gRPC client.
```bash
# List routes for a listener
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/ListRoutes \
-d '{"listener_addr": ":443"}'
# Add a route
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/AddRoute \
-d '{"listener_addr": ":443", "route": {"hostname": "new.metacircular.net", "backend": "127.0.0.1:38443"}}'
# Remove a route
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/RemoveRoute \
-d '{"listener_addr": ":443", "hostname": "old.metacircular.net"}'
```
### Manage Firewall Rules at Runtime (gRPC)
```bash
# List rules
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/GetFirewallRules
# Block an IP
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "203.0.113.50"}}'
# Block a CIDR
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_CIDR", "value": "198.51.100.0/24"}}'
# Block a country
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_COUNTRY", "value": "RU"}}'
# Remove a rule
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/RemoveFirewallRule \
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "203.0.113.50"}}'
```
## Deployment with MCP
mc-proxy runs on rift as a single container managed by MCP. The service
definition lives at `~/.config/mcp/services/mc-proxy.toml` on rift (reference
copy at `deploy/mc-proxy-rift.toml` in this repo). The container mounts
`/srv/mc-proxy` which holds the config file, SQLite database, GeoIP database,
and TLS certificates for backends. It runs as `--user 0:0` under rootless
podman.
Listeners: `:443` (L7 terminating), `:8443` (L4 passthrough), `:9443` (L4
passthrough).
### Deploy or Update
```bash
mcp deploy mc-proxy
```
### Restart / Stop
```bash
mcp restart mc-proxy
mcp stop mc-proxy
```
### Check Status
```bash
mcp ps
mcp status mc-proxy
```
### View Logs
```bash
ssh rift 'doas su - mcp -s /bin/sh -c "podman logs mc-proxy"'
```
### Update Routes
Edit the config at `/srv/mc-proxy/mc-proxy.toml` on rift, then restart:
```bash
mcp restart mc-proxy
```
Routes added at runtime via the gRPC admin API are persisted in the database
and survive restarts. Editing the TOML config is only necessary for changing
listener definitions or static seed routes.
## Incident Procedures
### Proxy Not Starting
1. Check logs for the error:
```bash
journalctl -u mc-proxy -n 50 --no-pager
```
2. Common causes:
- **"database.path is required"** — config file missing or malformed.
- **"firewall: geoip_db is required"** — country blocks configured but GeoIP database missing.
- **"address already in use"** — another process holds the port.
```bash
ss -tlnp | grep ':<port>'
```
- **Permission denied on database** — check ownership:
```bash
ls -la /srv/mc-proxy/mc-proxy.db
chown mc-proxy:mc-proxy /srv/mc-proxy/mc-proxy.db
```
### High Connection Count / Resource Exhaustion
1. Check active connections:
```bash
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
```
2. Check system-level connection count:
```bash
ss -tn | grep -c ':<port>'
```
3. If under attack, add firewall rules via gRPC to block the source:
```bash
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "<attacker-ip>"}}'
```
4. If many IPs from one region, consider a country block or CIDR block.
### Database Corruption
1. Stop the service:
```bash
systemctl stop mc-proxy
```
2. Check database integrity:
```bash
sqlite3 /srv/mc-proxy/mc-proxy.db "PRAGMA integrity_check;"
```
3. If corrupted, restore from the most recent backup (see [Restore from Backup](#restore-from-backup)).
4. If no backups exist, delete the database and restart. The service will
re-seed from the TOML configuration:
```bash
rm /srv/mc-proxy/mc-proxy.db
systemctl start mc-proxy
```
Note: any routes or firewall rules added at runtime via gRPC will be lost.
### GeoIP Database Stale or Missing
1. Download a fresh copy of GeoLite2-Country.mmdb from MaxMind.
2. Place it at the configured path:
```bash
cp GeoLite2-Country.mmdb /srv/mc-proxy/GeoLite2-Country.mmdb
chown mc-proxy:mc-proxy /srv/mc-proxy/GeoLite2-Country.mmdb
```
3. Reload without restart:
```bash
systemctl kill -s HUP mc-proxy
```
### Certificate Expiry (gRPC Admin API)
The gRPC admin API uses TLS certificates from `/srv/mc-proxy/certs/`.
Certificates are loaded at startup; replacing them requires a restart.
1. Replace the certificates:
```bash
cp new-cert.pem /srv/mc-proxy/certs/cert.pem
cp new-key.pem /srv/mc-proxy/certs/key.pem
chown mc-proxy:mc-proxy /srv/mc-proxy/certs/*.pem
chmod 0600 /srv/mc-proxy/certs/key.pem
```
2. Restart:
```bash
systemctl restart mc-proxy
```
Note: certificate expiry does not affect the proxy listeners — they do not
terminate TLS.
### Backend Unreachable
If a backend service is down, connections to routes pointing at that backend
will fail at the dial phase and the client receives a TCP RST. mc-proxy logs
the dial failure at `warn` level.
1. Check logs for dial errors:
```bash
journalctl -u mc-proxy -n 100 --no-pager | grep "dial"
```
2. Verify the backend is running:
```bash
ss -tlnp | grep ':<backend-port>'
```
3. This is not an mc-proxy issue — fix the backend service.
## Escalation
If the runbook does not resolve the issue:
1. Collect logs: `journalctl -u mc-proxy --since "1 hour ago" > /tmp/mc-proxy-logs.txt`
2. Collect status: `mc-proxy status -c /srv/mc-proxy/mc-proxy.toml > /tmp/mc-proxy-status.txt`
3. Collect database state: `mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml -o /tmp/mc-proxy-escalation.db`
4. Escalate with the collected artifacts.