- Add `passwd` CLI command to reset user passwords - Fix web UI templates: parse each page template with layout so blocks render correctly (was outputting empty pages) - Add login error logging for debugging auth failures - Update README with deploy workflow and container management commands - Update RUNBOOK for Docker-on-deimos deployment (replaces systemd refs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.8 KiB
RUNBOOK.md — eng-pad-server
1. Service Overview
eng-pad-server receives engineering notebook data from the Engineering Pad Android app via gRPC, stores it in SQLite, and serves read-only views through a web UI. Single authenticated user.
Host: deimos.wntrmute.net
URL: https://pad.metacircular.net
Ports: 443 (nginx → 8080 web UI), 8443 (REST/TLS), 9443 (gRPC/TLS)
Data: /srv/eng-pad-server/
Config: /srv/eng-pad-server/eng-pad-server.toml
TLS: Let's Encrypt (/etc/letsencrypt/live/pad.metacircular.net/), copied to /srv/eng-pad-server/certs/
Container: eng-pad-server (Docker, --restart unless-stopped)
2. Health Checks
-
Check container is running:
docker ps | grep eng-pad-server -
Check web UI responds:
curl -s https://pad.metacircular.net/login | head -1 -
Check container logs:
docker logs eng-pad-server --tail 20
3. Common Operations
Start / Stop / Restart
docker start eng-pad-server
docker stop eng-pad-server
docker restart eng-pad-server
View Logs
docker logs eng-pad-server -f
Deploy New Version
# From local machine:
rsync -az --exclude='.git' --exclude='srv/' . deimos.wntrmute.net:/tmp/eng-pad-server-build/
ssh deimos.wntrmute.net "cd /tmp/eng-pad-server-build && \
docker build -t eng-pad-server . && \
docker stop eng-pad-server && docker rm eng-pad-server && \
docker run -d --name eng-pad-server --restart unless-stopped \
-p 127.0.0.1:8090:8080 -p 8443:8443 -p 9443:9443 \
-v /srv/eng-pad-server:/srv/eng-pad-server eng-pad-server"
Create User
docker exec -it eng-pad-server \
eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml
Reset User Password
docker exec -it eng-pad-server \
eng-pad-server passwd <username> -c /srv/eng-pad-server/eng-pad-server.toml
Manual Backup
docker exec eng-pad-server \
eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml
Backup saved to /srv/eng-pad-server/backups/.
Renew TLS Certificates
After certbot renews the Let's Encrypt cert:
sudo cp /etc/letsencrypt/live/pad.metacircular.net/{fullchain,privkey}.pem \
/srv/eng-pad-server/certs/
docker restart eng-pad-server
Register a FIDO2/U2F Security Key
- Log in to the web UI at https://pad.metacircular.net with password.
- Navigate to
/keys. - Enter a name for the key (e.g., "YubiKey 5").
- Click "Register" and touch the key when prompted.
4. Alerting
No automated alerting is configured. Monitor via:
systemctl status eng-pad-server— process healthjournalctl -u eng-pad-server --since "1 hour ago" | grep ERROR— errors- Backup age:
ls -lt /srv/eng-pad-server/backups/ | head
5. Incident Procedures
Service Won't Start
- Check logs:
docker logs eng-pad-server --tail 50 - Common causes:
- Config file missing or invalid → fix
/srv/eng-pad-server/eng-pad-server.toml - TLS cert/key missing → re-copy from Let's Encrypt (see Renew TLS above)
- Port already in use →
ss -tlnp | grep -E '8443|9443|8090' - Database locked → check for zombie processes:
fuser /srv/eng-pad-server/eng-pad-server.db
- Config file missing or invalid → fix
Database Corruption
- Stop the service:
systemctl stop eng-pad-server - Check integrity:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check" - If corrupted, restore from backup:
cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db chown engpad:engpad /srv/eng-pad-server/eng-pad-server.db - Restart:
systemctl start eng-pad-server
Certificate Expiry
- Check expiry:
openssl x509 -in /srv/eng-pad-server/certs/cert.pem -noout -dates - Regenerate or renew the certificate.
- Restart the service (picks up new certs on start).
Disk Full
- Check disk usage:
df -h /srv/eng-pad-server/ du -sh /srv/eng-pad-server/* - Prune old backups:
ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{} - Compact the database:
sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
Sync Fails from Android App
- Verify server is reachable from the device's network.
- Check gRPC port is open:
ss -tlnp | grep 9443 - Check TLS cert is valid and trusted by the device.
- Check credentials: verify the user exists via
eng-pad-server status. - Check server logs for auth failures:
journalctl -u eng-pad-server | grep UNAUTHENTICATED
6. Escalation
If the runbook doesn't resolve the issue:
- Check ARCHITECTURE.md for system design context.
- Check AUDIT.md for known security considerations.
- Review recent commits for changes that may have introduced the issue.