- docker-compose.yml: single service with data volume, ports 8443/9443/8080 - RUNBOOK.md: health checks, common operations (start/stop/backup/init), FIDO2 key registration, incident procedures (won't start, DB corruption, cert expiry, disk full, sync failures), escalation path - Makefile: added docker target Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.7 KiB
RUNBOOK.md — eng-pad-server
1. Service Overview
eng-pad-server receives engineering notebook data from the Engineering Pad Android app via gRPC, stores it in SQLite, and serves read-only views through a web UI. Single authenticated user.
Ports: 8443 (REST/HTTPS), 9443 (gRPC/TLS), 8080 (Web UI)
Data: /srv/eng-pad-server/
Config: /srv/eng-pad-server/eng-pad-server.toml
Binary: /usr/local/bin/eng-pad-server
2. Health Checks
-
Check service is running:
systemctl status eng-pad-server -
Check database health:
eng-pad-server status -c /srv/eng-pad-server/eng-pad-server.toml -
Check web UI responds:
curl -k https://localhost:8443/login -
Check gRPC responds:
grpcurl -insecure localhost:9443 list
3. Common Operations
Start / Stop / Restart
systemctl start eng-pad-server
systemctl stop eng-pad-server
systemctl restart eng-pad-server
View Logs
journalctl -u eng-pad-server -f
Manual Backup
eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml
Backup saved to /srv/eng-pad-server/backups/.
Check Backup Timer
systemctl list-timers eng-pad-server-backup.timer
Initialize (First Time)
-
Install the binary and config:
sudo deploy/scripts/install.sh -
Edit the config file:
sudo -u engpad vi /srv/eng-pad-server/eng-pad-server.toml -
Generate TLS certificates (or copy existing ones):
# Self-signed for development: openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \ -keyout /srv/eng-pad-server/certs/key.pem \ -out /srv/eng-pad-server/certs/cert.pem \ -days 3650 -nodes -subj '/CN=pad.metacircular.net' chown engpad:engpad /srv/eng-pad-server/certs/*.pem chmod 600 /srv/eng-pad-server/certs/key.pem -
Create the admin user:
eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml -
Start the service:
systemctl enable --now eng-pad-server systemctl enable --now eng-pad-server-backup.timer
Register a FIDO2/U2F Security Key
- Log in to the web UI with password.
- Navigate to
/keys. - Enter a name for the key (e.g., "YubiKey 5").
- Click "Register" and touch the key when prompted.
Docker Deployment
cd deploy/docker
docker compose up -d
First-time setup inside the container:
docker compose exec eng-pad-server eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml
4. Alerting
No automated alerting is configured. Monitor via:
systemctl status eng-pad-server— process healthjournalctl -u eng-pad-server --since "1 hour ago" | grep ERROR— errors- Backup age:
ls -lt /srv/eng-pad-server/backups/ | head
5. Incident Procedures
Service Won't Start
- Check logs:
journalctl -u eng-pad-server -n 50 --no-pager - Common causes:
- Config file missing or invalid → fix config
- TLS cert/key missing → regenerate or copy
- Port already in use →
ss -tlnp | grep 8443 - Database locked → check for zombie processes:
fuser /srv/eng-pad-server/eng-pad-server.db
Database Corruption
- Stop the service:
systemctl stop eng-pad-server - Check integrity:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check" - If corrupted, restore from backup:
cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db chown engpad:engpad /srv/eng-pad-server/eng-pad-server.db - Restart:
systemctl start eng-pad-server
Certificate Expiry
- Check expiry:
openssl x509 -in /srv/eng-pad-server/certs/cert.pem -noout -dates - Regenerate or renew the certificate.
- Restart the service (picks up new certs on start).
Disk Full
- Check disk usage:
df -h /srv/eng-pad-server/ du -sh /srv/eng-pad-server/* - Prune old backups:
ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{} - Compact the database:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
Sync Fails from Android App
- Verify server is reachable from the device's network.
- Check gRPC port is open:
ss -tlnp | grep 9443 - Check TLS cert is valid and trusted by the device.
- Check credentials: verify the user exists via
eng-pad-server status. - Check server logs for auth failures:
journalctl -u eng-pad-server | grep UNAUTHENTICATED
6. Escalation
If the runbook doesn't resolve the issue:
- Check ARCHITECTURE.md for system design context.
- Check AUDIT.md for known security considerations.
- Review recent commits for changes that may have introduced the issue.