Files
eng-pad-server/RUNBOOK.md
Kyle Isom 691301dade Update docs for Docker-on-deimos deployment, add grpc_plain_addr option
- ARCHITECTURE.md: document nginx + direct gRPC topology, add
  grpc_plain_addr config, update cert filenames to Let's Encrypt
  convention, add passwd to CLI table
- RUNBOOK.md: replace systemctl/journalctl with docker commands,
  fix cert path references, improve sync troubleshooting steps
- Example config: update cert paths, document grpc_plain_addr option
- grpcserver: add optional plaintext gRPC listener for reverse proxy
- config: add GRPCPlainAddr field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 08:58:01 -07:00

182 lines
4.9 KiB
Markdown

# RUNBOOK.md — eng-pad-server
## 1. Service Overview
eng-pad-server receives engineering notebook data from the Engineering
Pad Android app via gRPC, stores it in SQLite, and serves read-only
views through a web UI. Single authenticated user.
**Host**: deimos.wntrmute.net
**URL**: https://pad.metacircular.net
**Ports**: 443 (nginx → 8080 web UI), 8443 (REST/TLS), 9443 (gRPC/TLS)
**Data**: `/srv/eng-pad-server/`
**Config**: `/srv/eng-pad-server/eng-pad-server.toml`
**TLS**: Let's Encrypt (`/etc/letsencrypt/live/pad.metacircular.net/`), copied to `/srv/eng-pad-server/certs/`
**Container**: `eng-pad-server` (Docker, `--restart unless-stopped`)
## 2. Health Checks
1. Check container is running:
```
docker ps | grep eng-pad-server
```
2. Check web UI responds:
```
curl -s https://pad.metacircular.net/login | head -1
```
3. Check container logs:
```
docker logs eng-pad-server --tail 20
```
## 3. Common Operations
### Start / Stop / Restart
```
docker start eng-pad-server
docker stop eng-pad-server
docker restart eng-pad-server
```
### View Logs
```
docker logs eng-pad-server -f
```
### Deploy New Version
```bash
# From local machine:
rsync -az --exclude='.git' --exclude='srv/' . deimos.wntrmute.net:/tmp/eng-pad-server-build/
ssh deimos.wntrmute.net "cd /tmp/eng-pad-server-build && \
docker build -t eng-pad-server . && \
docker stop eng-pad-server && docker rm eng-pad-server && \
docker run -d --name eng-pad-server --restart unless-stopped \
-p 127.0.0.1:8090:8080 -p 8443:8443 -p 9443:9443 \
-v /srv/eng-pad-server:/srv/eng-pad-server eng-pad-server"
```
### Create User
```
docker exec -it eng-pad-server \
eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml
```
### Reset User Password
```
docker exec -it eng-pad-server \
eng-pad-server passwd <username> -c /srv/eng-pad-server/eng-pad-server.toml
```
### Manual Backup
```
docker exec eng-pad-server \
eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml
```
Backup saved to `/srv/eng-pad-server/backups/`.
### Renew TLS Certificates
After certbot renews the Let's Encrypt cert:
```
sudo cp /etc/letsencrypt/live/pad.metacircular.net/{fullchain,privkey}.pem \
/srv/eng-pad-server/certs/
docker restart eng-pad-server
```
### Register a FIDO2/U2F Security Key
1. Log in to the web UI at https://pad.metacircular.net with password.
2. Navigate to `/keys`.
3. Enter a name for the key (e.g., "YubiKey 5").
4. Click "Register" and touch the key when prompted.
## 4. Alerting
No automated alerting is configured. Monitor via:
- `docker ps | grep eng-pad-server` — container health
- `docker logs eng-pad-server --since 1h 2>&1 | grep ERROR` — errors
- Backup age: `ls -lt /srv/eng-pad-server/backups/ | head`
## 5. Incident Procedures
### Service Won't Start
1. Check logs:
```
docker logs eng-pad-server --tail 50
```
2. Common causes:
- Config file missing or invalid → fix `/srv/eng-pad-server/eng-pad-server.toml`
- TLS cert/key missing → re-copy from Let's Encrypt (see Renew TLS above)
- Port already in use → `ss -tlnp | grep -E '8443|9443|8090'`
- Database locked → check for zombie processes: `fuser /srv/eng-pad-server/eng-pad-server.db`
### Database Corruption
1. Stop the container:
```
docker stop eng-pad-server
```
2. Check integrity:
```
sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check"
```
3. If corrupted, restore from backup:
```
cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db
```
4. Restart:
```
docker start eng-pad-server
```
### Certificate Expiry
1. Check expiry:
```
openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates
```
2. Renew via certbot (see "Renew TLS Certificates" above).
3. Restart the container (picks up new certs on start).
### Disk Full
1. Check disk usage:
```
df -h /srv/eng-pad-server/
du -sh /srv/eng-pad-server/*
```
2. Prune old backups:
```
ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{}
```
3. Compact the database:
```
sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
```
### Sync Fails from Android App
1. Verify the app has the correct server URL (`pad.metacircular.net:9443`).
2. Use "Test Connection" in the app's sync settings for a specific error.
3. Check gRPC port is open: `ss -tlnp | grep 9443`
4. Check firewall: `sudo ufw status | grep 9443` (must be ALLOW).
5. Check TLS cert is valid: `openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates`
6. Check server logs for auth failures: `docker logs eng-pad-server 2>&1 | grep -i error`
## 6. Escalation
If the runbook doesn't resolve the issue:
1. Check ARCHITECTURE.md for system design context.
2. Check AUDIT.md for known security considerations.
3. Review recent commits for changes that may have introduced the issue.