Files
eng-pad-server/RUNBOOK.md
Kyle Isom da148a577d Add docker-compose, RUNBOOK.md, and docker Makefile target
- docker-compose.yml: single service with data volume, ports 8443/9443/8080
- RUNBOOK.md: health checks, common operations (start/stop/backup/init),
  FIDO2 key registration, incident procedures (won't start, DB corruption,
  cert expiry, disk full, sync failures), escalation path
- Makefile: added docker target

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 21:36:27 -07:00

4.7 KiB

RUNBOOK.md — eng-pad-server

1. Service Overview

eng-pad-server receives engineering notebook data from the Engineering Pad Android app via gRPC, stores it in SQLite, and serves read-only views through a web UI. Single authenticated user.

Ports: 8443 (REST/HTTPS), 9443 (gRPC/TLS), 8080 (Web UI) Data: /srv/eng-pad-server/ Config: /srv/eng-pad-server/eng-pad-server.toml Binary: /usr/local/bin/eng-pad-server

2. Health Checks

  1. Check service is running:

    systemctl status eng-pad-server
    
  2. Check database health:

    eng-pad-server status -c /srv/eng-pad-server/eng-pad-server.toml
    
  3. Check web UI responds:

    curl -k https://localhost:8443/login
    
  4. Check gRPC responds:

    grpcurl -insecure localhost:9443 list
    

3. Common Operations

Start / Stop / Restart

systemctl start eng-pad-server
systemctl stop eng-pad-server
systemctl restart eng-pad-server

View Logs

journalctl -u eng-pad-server -f

Manual Backup

eng-pad-server snapshot -c /srv/eng-pad-server/eng-pad-server.toml

Backup saved to /srv/eng-pad-server/backups/.

Check Backup Timer

systemctl list-timers eng-pad-server-backup.timer

Initialize (First Time)

  1. Install the binary and config:

    sudo deploy/scripts/install.sh
    
  2. Edit the config file:

    sudo -u engpad vi /srv/eng-pad-server/eng-pad-server.toml
    
  3. Generate TLS certificates (or copy existing ones):

    # Self-signed for development:
    openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
      -keyout /srv/eng-pad-server/certs/key.pem \
      -out /srv/eng-pad-server/certs/cert.pem \
      -days 3650 -nodes -subj '/CN=pad.metacircular.net'
    chown engpad:engpad /srv/eng-pad-server/certs/*.pem
    chmod 600 /srv/eng-pad-server/certs/key.pem
    
  4. Create the admin user:

    eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml
    
  5. Start the service:

    systemctl enable --now eng-pad-server
    systemctl enable --now eng-pad-server-backup.timer
    

Register a FIDO2/U2F Security Key

  1. Log in to the web UI with password.
  2. Navigate to /keys.
  3. Enter a name for the key (e.g., "YubiKey 5").
  4. Click "Register" and touch the key when prompted.

Docker Deployment

cd deploy/docker
docker compose up -d

First-time setup inside the container:

docker compose exec eng-pad-server eng-pad-server init -c /srv/eng-pad-server/eng-pad-server.toml

4. Alerting

No automated alerting is configured. Monitor via:

  • systemctl status eng-pad-server — process health
  • journalctl -u eng-pad-server --since "1 hour ago" | grep ERROR — errors
  • Backup age: ls -lt /srv/eng-pad-server/backups/ | head

5. Incident Procedures

Service Won't Start

  1. Check logs:
    journalctl -u eng-pad-server -n 50 --no-pager
    
  2. Common causes:
    • Config file missing or invalid → fix config
    • TLS cert/key missing → regenerate or copy
    • Port already in use → ss -tlnp | grep 8443
    • Database locked → check for zombie processes: fuser /srv/eng-pad-server/eng-pad-server.db

Database Corruption

  1. Stop the service:
    systemctl stop eng-pad-server
    
  2. Check integrity:
    sqlite3 /srv/eng-pad-server/eng-pad-server.db "PRAGMA integrity_check"
    
  3. If corrupted, restore from backup:
    cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db
    chown engpad:engpad /srv/eng-pad-server/eng-pad-server.db
    
  4. Restart:
    systemctl start eng-pad-server
    

Certificate Expiry

  1. Check expiry:
    openssl x509 -in /srv/eng-pad-server/certs/cert.pem -noout -dates
    
  2. Regenerate or renew the certificate.
  3. Restart the service (picks up new certs on start).

Disk Full

  1. Check disk usage:
    df -h /srv/eng-pad-server/
    du -sh /srv/eng-pad-server/*
    
  2. Prune old backups:
    ls -t /srv/eng-pad-server/backups/ | tail -n +8 | xargs -I{} rm /srv/eng-pad-server/backups/{}
    
  3. Compact the database:
    sqlite3 /srv/eng-pad-server/eng-pad-server.db "VACUUM"
    

Sync Fails from Android App

  1. Verify server is reachable from the device's network.
  2. Check gRPC port is open: ss -tlnp | grep 9443
  3. Check TLS cert is valid and trusted by the device.
  4. Check credentials: verify the user exists via eng-pad-server status.
  5. Check server logs for auth failures: journalctl -u eng-pad-server | grep UNAUTHENTICATED

6. Escalation

If the runbook doesn't resolve the issue:

  1. Check ARCHITECTURE.md for system design context.
  2. Check AUDIT.md for known security considerations.
  3. Review recent commits for changes that may have introduced the issue.