- ARCHITECTURE.md: document nginx + direct gRPC topology, add grpc_plain_addr config, update cert filenames to Let's Encrypt convention, add passwd to CLI table - RUNBOOK.md: replace systemctl/journalctl with docker commands, fix cert path references, improve sync troubleshooting steps - Example config: update cert paths, document grpc_plain_addr option - grpcserver: add optional plaintext gRPC listener for reverse proxy - config: add GRPCPlainAddr field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.9 KiB
RUNBOOK.md — eng-pad-server
1. Service Overview
eng-pad-server receives engineering notebook data from the Engineering Pad Android app via gRPC, stores it in SQLite, and serves read-only views through a web UI. Single authenticated user.
Host: deimos.wntrmute.net
URL: https://pad.metacircular.net
Ports: 443 (nginx → 8080 web UI), 8443 (REST/TLS), 9443 (gRPC/TLS)
Data: /srv/eng-pad-server/
Config: /srv/eng-pad-server/eng-pad-server.toml
TLS: Let's Encrypt (/etc/letsencrypt/live/pad.metacircular.net/), copied to /srv/eng-pad-server/certs/
Container: eng-pad-server (Docker, --restart unless-stopped)
2. Health Checks
-
Check container is running:
docker ps | grep eng-pad-server -
Check web UI responds:
curl -s https://pad.metacircular.net/login | head -1 -
Check container logs:
docker logs eng-pad-server --tail 20
3. Common Operations
Start / Stop / Restart
docker start eng-pad-server
docker stop eng-pad-server
docker restart eng-pad-server
View Logs
docker logs eng-pad-server -f
Deploy New Version
# From local machine:
rsync -az --exclude='.git' --exclude='srv/' . deimos.wntrmute.net:/tmp/eng-pad-server-build/
ssh deimos.wntrmute.net "cd /tmp/eng-pad-server-build && \
docker build -t eng-pad-server . && \
docker stop eng-pad-server && docker rm eng-pad-server && \
docker run -d --name eng-pad-server --restart unless-stopped \
-p 127.0.0.1:8090:8080 -p 8443:8443 -p 9443:9443 \
-v /srv/eng-pad-server:/srv/eng-pad-server eng-pad-server"
Create User
docker exec -it eng-pad-server \
eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml
Reset User Password
docker exec -it eng-pad-server \
eng-pad-server passwd <username> -c /srv/eng-pad-server/eng-pad-server.toml
Manual Backup
docker exec eng-pad-server \
eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml
Backup saved to /srv/eng-pad-server/backups/.
Renew TLS Certificates
After certbot renews the Let's Encrypt cert:
sudo cp /etc/letsencrypt/live/pad.metacircular.net/{fullchain,privkey}.pem \
/srv/eng-pad-server/certs/
docker restart eng-pad-server
Register a FIDO2/U2F Security Key
- Log in to the web UI at https://pad.metacircular.net with password.
- Navigate to
/keys. - Enter a name for the key (e.g., "YubiKey 5").
- Click "Register" and touch the key when prompted.
4. Alerting
No automated alerting is configured. Monitor via:
docker ps | grep eng-pad-server— container healthdocker logs eng-pad-server --since 1h 2>&1 | grep ERROR— errors- Backup age:
ls -lt /srv/eng-pad-server/backups/ | head
5. Incident Procedures
Service Won't Start
- Check logs:
docker logs eng-pad-server --tail 50 - Common causes:
- Config file missing or invalid → fix
/srv/eng-pad-server/eng-pad-server.toml - TLS cert/key missing → re-copy from Let's Encrypt (see Renew TLS above)
- Port already in use →
ss -tlnp | grep -E '8443|9443|8090' - Database locked → check for zombie processes:
fuser /srv/eng-pad-server/eng-pad-server.db
- Config file missing or invalid → fix
Database Corruption
- Stop the container:
docker stop eng-pad-server - Check integrity:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check" - If corrupted, restore from backup:
cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db - Restart:
docker start eng-pad-server
Certificate Expiry
- Check expiry:
openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates - Renew via certbot (see "Renew TLS Certificates" above).
- Restart the container (picks up new certs on start).
Disk Full
- Check disk usage:
df -h /srv/eng-pad-server/ du -sh /srv/eng-pad-server/* - Prune old backups:
ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{} - Compact the database:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
Sync Fails from Android App
- Verify the app has the correct server URL (
pad.metacircular.net:9443). - Use "Test Connection" in the app's sync settings for a specific error.
- Check gRPC port is open:
ss -tlnp | grep 9443 - Check firewall:
sudo ufw status | grep 9443(must be ALLOW). - Check TLS cert is valid:
openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates - Check server logs for auth failures:
docker logs eng-pad-server 2>&1 | grep -i error
6. Escalation
If the runbook doesn't resolve the issue:
- Check ARCHITECTURE.md for system design context.
- Check AUDIT.md for known security considerations.
- Review recent commits for changes that may have introduced the issue.