Update docs for Docker-on-deimos deployment, add grpc_plain_addr option
- ARCHITECTURE.md: document nginx + direct gRPC topology, add grpc_plain_addr config, update cert filenames to Let's Encrypt convention, add passwd to CLI table - RUNBOOK.md: replace systemctl/journalctl with docker commands, fix cert path references, improve sync troubleshooting steps - Example config: update cert paths, document grpc_plain_addr option - grpcserver: add optional plaintext gRPC listener for reverse proxy - config: add GRPCPlainAddr field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
28
RUNBOOK.md
28
RUNBOOK.md
@@ -102,8 +102,8 @@ docker restart eng-pad-server
|
||||
## 4. Alerting
|
||||
|
||||
No automated alerting is configured. Monitor via:
|
||||
- `systemctl status eng-pad-server` — process health
|
||||
- `journalctl -u eng-pad-server --since "1 hour ago" | grep ERROR` — errors
|
||||
- `docker ps | grep eng-pad-server` — container health
|
||||
- `docker logs eng-pad-server --since 1h 2>&1 | grep ERROR` — errors
|
||||
- Backup age: `ls -lt /srv/eng-pad-server/backups/ | head`
|
||||
|
||||
## 5. Incident Procedures
|
||||
@@ -122,9 +122,9 @@ No automated alerting is configured. Monitor via:
|
||||
|
||||
### Database Corruption
|
||||
|
||||
1. Stop the service:
|
||||
1. Stop the container:
|
||||
```
|
||||
systemctl stop eng-pad-server
|
||||
docker stop eng-pad-server
|
||||
```
|
||||
2. Check integrity:
|
||||
```
|
||||
@@ -133,21 +133,20 @@ No automated alerting is configured. Monitor via:
|
||||
3. If corrupted, restore from backup:
|
||||
```
|
||||
cp /srv/eng-pad-server/backups/eng-pad-server-LATEST.db /srv/eng-pad-server/eng-pad-server.db
|
||||
chown engpad:engpad /srv/eng-pad-server/eng-pad-server.db
|
||||
```
|
||||
4. Restart:
|
||||
```
|
||||
systemctl start eng-pad-server
|
||||
docker start eng-pad-server
|
||||
```
|
||||
|
||||
### Certificate Expiry
|
||||
|
||||
1. Check expiry:
|
||||
```
|
||||
openssl x509 -in /srv/eng-pad-server/certs/cert.pem -noout -dates
|
||||
openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates
|
||||
```
|
||||
2. Regenerate or renew the certificate.
|
||||
3. Restart the service (picks up new certs on start).
|
||||
2. Renew via certbot (see "Renew TLS Certificates" above).
|
||||
3. Restart the container (picks up new certs on start).
|
||||
|
||||
### Disk Full
|
||||
|
||||
@@ -167,11 +166,12 @@ No automated alerting is configured. Monitor via:
|
||||
|
||||
### Sync Fails from Android App
|
||||
|
||||
1. Verify server is reachable from the device's network.
|
||||
2. Check gRPC port is open: `ss -tlnp | grep 9443`
|
||||
3. Check TLS cert is valid and trusted by the device.
|
||||
4. Check credentials: verify the user exists via `eng-pad-server status`.
|
||||
5. Check server logs for auth failures: `journalctl -u eng-pad-server | grep UNAUTHENTICATED`
|
||||
1. Verify the app has the correct server URL (`pad.metacircular.net:9443`).
|
||||
2. Use "Test Connection" in the app's sync settings for a specific error.
|
||||
3. Check gRPC port is open: `ss -tlnp | grep 9443`
|
||||
4. Check firewall: `sudo ufw status | grep 9443` (must be ALLOW).
|
||||
5. Check TLS cert is valid: `openssl x509 -in /srv/eng-pad-server/certs/fullchain.pem -noout -dates`
|
||||
6. Check server logs for auth failures: `docker logs eng-pad-server 2>&1 | grep -i error`
|
||||
|
||||
## 6. Escalation
|
||||
|
||||
|
||||
Reference in New Issue
Block a user