Rewrite README with project overview and quick start. Add RUNBOOK with operational procedures and incident playbooks. Fix Dockerfile for Go 1.25 with version injection. Add docker-compose.yml. Clean up golangci.yaml for mc-proxy. Add server tests (10) covering the full proxy pipeline with TCP echo backends, and grpcserver tests (13) covering all admin API RPCs with bufconn and write-through DB verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
305 lines
8.0 KiB
Markdown
305 lines
8.0 KiB
Markdown
# RUNBOOK.md
|
|
|
|
Operational procedures for mc-proxy. Written for operators, not developers.
|
|
|
|
## Service Overview
|
|
|
|
mc-proxy is a Layer 4 TLS SNI proxy. It routes incoming TLS connections to
|
|
backend services based on the SNI hostname. It does not terminate TLS or
|
|
inspect application-layer traffic. A global firewall blocks connections by
|
|
IP, CIDR, or GeoIP country before routing.
|
|
|
|
## Health Checks
|
|
|
|
### Via gRPC (requires admin API enabled)
|
|
|
|
```bash
|
|
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
|
|
```
|
|
|
|
Expected output:
|
|
|
|
```
|
|
mc-proxy v0.1.0
|
|
uptime: 4h32m10s
|
|
connections: 1247
|
|
|
|
:443 routes=2 active=12
|
|
:8443 routes=1 active=3
|
|
:9443 routes=1 active=0
|
|
```
|
|
|
|
### Via systemd
|
|
|
|
```bash
|
|
systemctl status mc-proxy
|
|
journalctl -u mc-proxy -n 50 --no-pager
|
|
```
|
|
|
|
### Via process
|
|
|
|
```bash
|
|
ss -tlnp | grep mc-proxy
|
|
```
|
|
|
|
Verify all configured listener ports are in LISTEN state.
|
|
|
|
## Common Operations
|
|
|
|
### Start / Stop / Restart
|
|
|
|
```bash
|
|
systemctl start mc-proxy
|
|
systemctl stop mc-proxy
|
|
systemctl restart mc-proxy
|
|
```
|
|
|
|
Stopping the service triggers graceful shutdown: new connections are refused,
|
|
in-flight connections drain for up to `shutdown_timeout` (default 30s), then
|
|
remaining connections are force-closed.
|
|
|
|
### View Logs
|
|
|
|
```bash
|
|
# Recent logs
|
|
journalctl -u mc-proxy -n 100 --no-pager
|
|
|
|
# Follow live
|
|
journalctl -u mc-proxy -f
|
|
|
|
# Filter by severity
|
|
journalctl -u mc-proxy -p err
|
|
```
|
|
|
|
### Reload GeoIP Database
|
|
|
|
Send SIGHUP to reload the GeoIP database without restarting:
|
|
|
|
```bash
|
|
systemctl kill -s HUP mc-proxy
|
|
```
|
|
|
|
Or:
|
|
|
|
```bash
|
|
kill -HUP $(pidof mc-proxy)
|
|
```
|
|
|
|
Verify in logs:
|
|
|
|
```
|
|
level=INFO msg="received SIGHUP, reloading GeoIP database"
|
|
```
|
|
|
|
### Create a Database Backup
|
|
|
|
```bash
|
|
# Manual backup
|
|
mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml
|
|
|
|
# Manual backup to a specific path
|
|
mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml -o /tmp/mc-proxy-backup.db
|
|
```
|
|
|
|
Automated daily backups run via the systemd timer:
|
|
|
|
```bash
|
|
# Check timer status
|
|
systemctl list-timers mc-proxy-backup.timer
|
|
|
|
# Run backup manually via systemd
|
|
systemctl start mc-proxy-backup.service
|
|
|
|
# View backup logs
|
|
journalctl -u mc-proxy-backup.service -n 20 --no-pager
|
|
```
|
|
|
|
Backups are stored in `/srv/mc-proxy/backups/` and pruned after 30 days.
|
|
|
|
### Restore from Backup
|
|
|
|
1. Stop the service:
|
|
```bash
|
|
systemctl stop mc-proxy
|
|
```
|
|
2. Replace the database:
|
|
```bash
|
|
cp /srv/mc-proxy/backups/mc-proxy-<timestamp>.db /srv/mc-proxy/mc-proxy.db
|
|
chown mc-proxy:mc-proxy /srv/mc-proxy/mc-proxy.db
|
|
chmod 0600 /srv/mc-proxy/mc-proxy.db
|
|
```
|
|
3. Start the service:
|
|
```bash
|
|
systemctl start mc-proxy
|
|
```
|
|
4. Verify health:
|
|
```bash
|
|
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
|
|
```
|
|
|
|
### Manage Routes at Runtime (gRPC)
|
|
|
|
Routes can be added and removed at runtime via the gRPC admin API using
|
|
`grpcurl` or any gRPC client.
|
|
|
|
```bash
|
|
# List routes for a listener
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/ListRoutes \
|
|
-d '{"listener_addr": ":443"}'
|
|
|
|
# Add a route
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/AddRoute \
|
|
-d '{"listener_addr": ":443", "route": {"hostname": "new.metacircular.net", "backend": "127.0.0.1:38443"}}'
|
|
|
|
# Remove a route
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/RemoveRoute \
|
|
-d '{"listener_addr": ":443", "hostname": "old.metacircular.net"}'
|
|
```
|
|
|
|
### Manage Firewall Rules at Runtime (gRPC)
|
|
|
|
```bash
|
|
# List rules
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/GetFirewallRules
|
|
|
|
# Block an IP
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
|
|
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "203.0.113.50"}}'
|
|
|
|
# Block a CIDR
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
|
|
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_CIDR", "value": "198.51.100.0/24"}}'
|
|
|
|
# Block a country
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
|
|
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_COUNTRY", "value": "RU"}}'
|
|
|
|
# Remove a rule
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/RemoveFirewallRule \
|
|
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "203.0.113.50"}}'
|
|
```
|
|
|
|
## Incident Procedures
|
|
|
|
### Proxy Not Starting
|
|
|
|
1. Check logs for the error:
|
|
```bash
|
|
journalctl -u mc-proxy -n 50 --no-pager
|
|
```
|
|
2. Common causes:
|
|
- **"database.path is required"** — config file missing or malformed.
|
|
- **"firewall: geoip_db is required"** — country blocks configured but GeoIP database missing.
|
|
- **"address already in use"** — another process holds the port.
|
|
```bash
|
|
ss -tlnp | grep ':<port>'
|
|
```
|
|
- **Permission denied on database** — check ownership:
|
|
```bash
|
|
ls -la /srv/mc-proxy/mc-proxy.db
|
|
chown mc-proxy:mc-proxy /srv/mc-proxy/mc-proxy.db
|
|
```
|
|
|
|
### High Connection Count / Resource Exhaustion
|
|
|
|
1. Check active connections:
|
|
```bash
|
|
mc-proxy status -c /srv/mc-proxy/mc-proxy.toml
|
|
```
|
|
2. Check system-level connection count:
|
|
```bash
|
|
ss -tn | grep -c ':<port>'
|
|
```
|
|
3. If under attack, add firewall rules via gRPC to block the source:
|
|
```bash
|
|
grpcurl -cacert ca.pem -cert client.pem -key client-key.pem \
|
|
localhost:9090 mc_proxy.v1.ProxyAdminService/AddFirewallRule \
|
|
-d '{"rule": {"type": "FIREWALL_RULE_TYPE_IP", "value": "<attacker-ip>"}}'
|
|
```
|
|
4. If many IPs from one region, consider a country block or CIDR block.
|
|
|
|
### Database Corruption
|
|
|
|
1. Stop the service:
|
|
```bash
|
|
systemctl stop mc-proxy
|
|
```
|
|
2. Check database integrity:
|
|
```bash
|
|
sqlite3 /srv/mc-proxy/mc-proxy.db "PRAGMA integrity_check;"
|
|
```
|
|
3. If corrupted, restore from the most recent backup (see [Restore from Backup](#restore-from-backup)).
|
|
4. If no backups exist, delete the database and restart. The service will
|
|
re-seed from the TOML configuration:
|
|
```bash
|
|
rm /srv/mc-proxy/mc-proxy.db
|
|
systemctl start mc-proxy
|
|
```
|
|
Note: any routes or firewall rules added at runtime via gRPC will be lost.
|
|
|
|
### GeoIP Database Stale or Missing
|
|
|
|
1. Download a fresh copy of GeoLite2-Country.mmdb from MaxMind.
|
|
2. Place it at the configured path:
|
|
```bash
|
|
cp GeoLite2-Country.mmdb /srv/mc-proxy/GeoLite2-Country.mmdb
|
|
chown mc-proxy:mc-proxy /srv/mc-proxy/GeoLite2-Country.mmdb
|
|
```
|
|
3. Reload without restart:
|
|
```bash
|
|
systemctl kill -s HUP mc-proxy
|
|
```
|
|
|
|
### Certificate Expiry (gRPC Admin API)
|
|
|
|
The gRPC admin API uses TLS certificates from `/srv/mc-proxy/certs/`.
|
|
Certificates are loaded at startup; replacing them requires a restart.
|
|
|
|
1. Replace the certificates:
|
|
```bash
|
|
cp new-cert.pem /srv/mc-proxy/certs/cert.pem
|
|
cp new-key.pem /srv/mc-proxy/certs/key.pem
|
|
chown mc-proxy:mc-proxy /srv/mc-proxy/certs/*.pem
|
|
chmod 0600 /srv/mc-proxy/certs/key.pem
|
|
```
|
|
2. Restart:
|
|
```bash
|
|
systemctl restart mc-proxy
|
|
```
|
|
|
|
Note: certificate expiry does not affect the proxy listeners — they do not
|
|
terminate TLS.
|
|
|
|
### Backend Unreachable
|
|
|
|
If a backend service is down, connections to routes pointing at that backend
|
|
will fail at the dial phase and the client receives a TCP RST. mc-proxy logs
|
|
the dial failure at `warn` level.
|
|
|
|
1. Check logs for dial errors:
|
|
```bash
|
|
journalctl -u mc-proxy -n 100 --no-pager | grep "dial"
|
|
```
|
|
2. Verify the backend is running:
|
|
```bash
|
|
ss -tlnp | grep ':<backend-port>'
|
|
```
|
|
3. This is not an mc-proxy issue — fix the backend service.
|
|
|
|
## Escalation
|
|
|
|
If the runbook does not resolve the issue:
|
|
|
|
1. Collect logs: `journalctl -u mc-proxy --since "1 hour ago" > /tmp/mc-proxy-logs.txt`
|
|
2. Collect status: `mc-proxy status -c /srv/mc-proxy/mc-proxy.toml > /tmp/mc-proxy-status.txt`
|
|
3. Collect database state: `mc-proxy snapshot -c /srv/mc-proxy/mc-proxy.toml -o /tmp/mc-proxy-escalation.db`
|
|
4. Escalate with the collected artifacts.
|