Files
eng-pad-server/RUNBOOK.md
Kyle Isom 691301dade Update docs for Docker-on-deimos deployment, add grpc_plain_addr option
- ARCHITECTURE.md: document nginx + direct gRPC topology, add
  grpc_plain_addr config, update cert filenames to Let's Encrypt
  convention, add passwd to CLI table
- RUNBOOK.md: replace systemctl/journalctl with docker commands,
  fix cert path references, improve sync troubleshooting steps
- Example config: update cert paths, document grpc_plain_addr option
- grpcserver: add optional plaintext gRPC listener for reverse proxy
- config: add GRPCPlainAddr field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 08:58:01 -07:00

4.9 KiB

RUNBOOK.md — eng-pad-server

1. Service Overview

eng-pad-server receives engineering notebook data from the Engineering Pad Android app via gRPC, stores it in SQLite, and serves read-only views through a web UI. Single authenticated user.

Host: deimos.wntrmute.net URL: https://pad.metacircular.net Ports: 443 (nginx → 8080 web UI), 8443 (REST/TLS), 9443 (gRPC/TLS) Data: /srv/eng-pad-server/ Config: /srv/eng-pad-server/eng-pad-server.toml TLS: Let's Encrypt (/etc/letsencrypt/live/pad.metacircular.net/), copied to /srv/eng-pad-server/certs/ Container: eng-pad-server (Docker, --restart unless-stopped)

2. Health Checks

  1. Check container is running:

    docker ps | grep eng-pad-server
    
  2. Check web UI responds:

    curl -s https://pad.metacircular.net/login | head -1
    
  3. Check container logs:

    docker logs eng-pad-server --tail 20
    

3. Common Operations

Start / Stop / Restart

docker start eng-pad-server
docker stop eng-pad-server
docker restart eng-pad-server

View Logs

docker logs eng-pad-server -f

Deploy New Version

# From local machine:
rsync -az --exclude='.git' --exclude='srv/' . deimos.wntrmute.net:/tmp/eng-pad-server-build/
ssh deimos.wntrmute.net "cd /tmp/eng-pad-server-build && \
  docker build -t eng-pad-server . && \
  docker stop eng-pad-server && docker rm eng-pad-server && \
  docker run -d --name eng-pad-server --restart unless-stopped \
    -p 127.0.0.1:8090:8080 -p 8443:8443 -p 9443:9443 \
    -v /srv/eng-pad-server:/srv/eng-pad-server eng-pad-server"

Create User

docker exec -it eng-pad-server \
  eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml

Reset User Password

docker exec -it eng-pad-server \
  eng-pad-server passwd <username> -c /srv/eng-pad-server/eng-pad-server.toml

Manual Backup

docker exec eng-pad-server \
  eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml

Backup saved to /srv/eng-pad-server/backups/.

Renew TLS Certificates

After certbot renews the Let's Encrypt cert:

sudo cp /etc/letsencrypt/live/pad.metacircular.net/{fullchain,privkey}.pem \
  /srv/eng-pad-server/certs/
docker restart eng-pad-server

Register a FIDO2/U2F Security Key

  1. Log in to the web UI at https://pad.metacircular.net with password.
  2. Navigate to /keys.
  3. Enter a name for the key (e.g., "YubiKey 5").
  4. Click "Register" and touch the key when prompted.

4. Alerting

No automated alerting is configured. Monitor via:

  • docker ps | grep eng-pad-server — container health
  • docker logs eng-pad-server --since 1h 2>&1 | grep ERROR — errors
  • Backup age: ls -lt /srv/eng-pad-server/backups/ | head

5. Incident Procedures

Service Won't Start

  1. Check logs:
    docker logs eng-pad-server --tail 50
    
  2. Common causes:
    • Config file missing or invalid → fix /srv/eng-pad-server/eng-pad-server.toml
    • TLS cert/key missing → re-copy from Let's Encrypt (see Renew TLS above)
    • Port already in use → ss -tlnp | grep -E '8443|9443|8090'
    • Database locked → check for zombie processes: fuser /srv/eng-pad-server/eng-pad-server.db

Database Corruption

  1. Stop the container:
    docker stop eng-pad-server
    
  2. Check integrity:
    sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check"
    
  3. If corrupted, restore from backup:
    cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db
    
  4. Restart:
    docker start eng-pad-server
    

Certificate Expiry

  1. Check expiry:
    openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates
    
  2. Renew via certbot (see "Renew TLS Certificates" above).
  3. Restart the container (picks up new certs on start).

Disk Full

  1. Check disk usage:
    df -h /srv/eng-pad-server/
    du -sh /srv/eng-pad-server/*
    
  2. Prune old backups:
    ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{}
    
  3. Compact the database:
    sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
    

Sync Fails from Android App

  1. Verify the app has the correct server URL (pad.metacircular.net:9443).
  2. Use "Test Connection" in the app's sync settings for a specific error.
  3. Check gRPC port is open: ss -tlnp | grep 9443
  4. Check firewall: sudo ufw status | grep 9443 (must be ALLOW).
  5. Check TLS cert is valid: openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates
  6. Check server logs for auth failures: docker logs eng-pad-server 2>&1 | grep -i error

6. Escalation

If the runbook doesn't resolve the issue:

  1. Check ARCHITECTURE.md for system design context.
  2. Check AUDIT.md for known security considerations.
  3. Review recent commits for changes that may have introduced the issue.