Implement Phase 1: core framework, operational tooling, and runbook

Core packages: crypto (Argon2id/AES-256-GCM), config (TOML/viper),
db (SQLite/migrations), barrier (encrypted storage), seal (state machine
with rate-limited unseal), auth (MCIAS integration with token cache),
policy (priority-based ACL engine), engine (interface + registry).

Server: HTTPS with TLS 1.2+, REST API, auth/admin middleware, htmx web UI
(init, unseal, login, dashboard pages).

CLI: cobra/viper subcommands (server, init, status, snapshot) with env
var override support (METACRYPT_ prefix).

Operational tooling: Dockerfile (multi-stage, non-root), docker-compose,
hardened systemd units (service + daily backup timer), install script,
backup script with retention pruning, production config examples.

Runbook covering installation, configuration, daily operations,
backup/restore, monitoring, troubleshooting, and security procedures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-14 20:43:11 -07:00
commit 4ddd32b117
60 changed files with 4644 additions and 0 deletions

415
RUNBOOK.md Normal file
View File

@@ -0,0 +1,415 @@
# Metacrypt Operations Runbook
## Overview
Metacrypt is a cryptographic service for the Metacircular platform. It provides an encrypted storage barrier, engine-based cryptographic operations, and MCIAS-backed authentication. The service uses a seal/unseal model: it starts sealed after every restart and must be unsealed with a password before it can serve requests.
### Service States
| State | Description |
|---|---|
| **Uninitialized** | Fresh install. Must run `metacrypt init` or use the web UI. |
| **Sealed** | Initialized but locked. No cryptographic operations available. |
| **Initializing** | Transient state during first-time setup. |
| **Unsealed** | Fully operational. All APIs and engines available. |
### Architecture
```
Client → HTTPS (:8443) → Metacrypt Server
├── Auth (proxied to MCIAS)
├── Policy Engine (ACL rules in barrier)
├── Engine Registry (mount/unmount crypto engines)
└── Encrypted Barrier → SQLite (on disk)
```
Key hierarchy:
```
Seal Password (operator-held, never stored)
→ Argon2id → Key-Wrapping Key (KWK, ephemeral)
→ AES-256-GCM decrypt → Master Encryption Key (MEK)
→ AES-256-GCM → all barrier-stored data
```
---
## Installation
### Binary Install (systemd)
```bash
# Build
make metacrypt
# Install (as root)
sudo deploy/scripts/install.sh ./metacrypt
```
This creates:
| Path | Purpose |
|---|---|
| `/usr/local/bin/metacrypt` | Binary |
| `/etc/metacrypt/metacrypt.toml` | Configuration |
| `/etc/metacrypt/certs/` | TLS certificates |
| `/var/lib/metacrypt/` | Database and backups |
### Docker Install
```bash
# Build image
make docker
# Or use docker compose
docker compose -f deploy/docker/docker-compose.yml up -d
```
The Docker container mounts a single volume at `/data` which must contain:
| File | Required | Description |
|---|---|---|
| `metacrypt.toml` | Yes | Configuration (use `deploy/examples/metacrypt-docker.toml` as template) |
| `certs/server.crt` | Yes | TLS certificate |
| `certs/server.key` | Yes | TLS private key |
| `certs/mcias-ca.crt` | If MCIAS uses private CA | MCIAS CA certificate |
| `metacrypt.db` | No | Created automatically on first run |
To prepare a Docker volume:
```bash
docker volume create metacrypt-data
# Copy files into the volume
docker run --rm -v metacrypt-data:/data -v $(pwd)/deploy/examples:/src alpine \
sh -c "cp /src/metacrypt-docker.toml /data/metacrypt.toml && mkdir -p /data/certs"
# Then copy your TLS certs into the volume the same way
```
---
## Configuration
Configuration is loaded from TOML. The config file location is determined by (in order):
1. `--config` flag
2. `METACRYPT_CONFIG` environment variable (via viper)
3. `metacrypt.toml` in the current directory
4. `/etc/metacrypt/metacrypt.toml`
All config values can be overridden via environment variables with the `METACRYPT_` prefix (e.g., `METACRYPT_SERVER_LISTEN_ADDR`).
See `deploy/examples/metacrypt.toml` for a fully commented production config.
### Required Settings
- `server.listen_addr` — bind address (e.g., `:8443`)
- `server.tls_cert` / `server.tls_key` — TLS certificate and key paths
- `database.path` — SQLite database file path
- `mcias.server_url` — MCIAS authentication server URL
### TLS Certificates
Metacrypt always terminates TLS. Minimum TLS 1.2 is enforced.
To generate a self-signed certificate for testing:
```bash
openssl req -x509 -newkey ec -pkeyopt ec_paramgen_curve:P-384 \
-keyout server.key -out server.crt -days 365 -nodes \
-subj "/CN=metacrypt.local" \
-addext "subjectAltName=DNS:metacrypt.local,DNS:localhost,IP:127.0.0.1"
```
For production, use certificates from your internal CA or a public CA.
---
## First-Time Setup
### Option A: CLI (recommended for servers)
```bash
metacrypt init --config /etc/metacrypt/metacrypt.toml
```
This prompts for a seal password, generates the master encryption key, and stores the encrypted MEK in the database. The service is left in the unsealed state.
### Option B: Web UI
Start the server and navigate to `https://<host>:8443/`. If the service is uninitialized, you will be redirected to the init page.
### Seal Password Requirements
- The seal password is the root of all security. If lost, the data is unrecoverable.
- Store it in a secure location (password manager, HSM, sealed envelope in a safe).
- The password is never stored — only a salt and encrypted MEK are persisted.
- Argon2id parameters (time=3, memory=128 MiB, threads=4) are stored in the database at init time.
---
## Daily Operations
### Starting the Service
```bash
# systemd
sudo systemctl start metacrypt
# Docker
docker compose -f deploy/docker/docker-compose.yml up -d
# Manual
metacrypt server --config /etc/metacrypt/metacrypt.toml
```
The service starts **sealed**. It must be unsealed before it can serve requests.
### Unsealing
After every restart, the service must be unsealed:
```bash
# Via API
curl -sk -X POST https://localhost:8443/v1/unseal \
-H 'Content-Type: application/json' \
-d '{"password":"<seal-password>"}'
# Via web UI
# Navigate to https://<host>:8443/unseal
```
**Rate limiting**: After 5 failed unseal attempts within one minute, a 60-second lockout is enforced.
### Checking Status
```bash
# Remote check
metacrypt status --addr https://metacrypt.example.com:8443 --ca-cert /path/to/ca.crt
# Via API
curl -sk https://localhost:8443/v1/status
# Returns: {"state":"sealed"} or {"state":"unsealed"} etc.
```
### Sealing (Emergency)
An admin user can seal the service at any time, which zeroizes all key material in memory:
```bash
curl -sk -X POST https://localhost:8443/v1/seal \
-H "Authorization: Bearer <admin-token>"
```
This immediately makes all cryptographic operations unavailable. Use this if you suspect a compromise.
### Authentication
Metacrypt proxies authentication to MCIAS. Users log in with their MCIAS credentials:
```bash
# API login
curl -sk -X POST https://localhost:8443/v1/auth/login \
-H 'Content-Type: application/json' \
-d '{"username":"alice","password":"..."}'
# Returns: {"token":"...","expires_at":"..."}
# Use the token for subsequent requests
curl -sk https://localhost:8443/v1/auth/tokeninfo \
-H "Authorization: Bearer <token>"
```
Users with the MCIAS `admin` role automatically get admin privileges in Metacrypt.
---
## Backup and Restore
### Creating Backups
```bash
# CLI
metacrypt snapshot --config /etc/metacrypt/metacrypt.toml --output /var/lib/metacrypt/backups/metacrypt-$(date +%Y%m%d).db
# Using the backup script (with 30-day retention)
deploy/scripts/backup.sh 30
```
The backup is a consistent SQLite snapshot created with `VACUUM INTO`. The backup file contains the same encrypted data as the live database — the seal password is still required to access it.
### Automated Backups (systemd)
```bash
sudo systemctl enable --now metacrypt-backup.timer
```
This runs a backup daily at 02:00 with up to 5 minutes of jitter.
### Restoring from Backup
1. Stop the service: `systemctl stop metacrypt`
2. Replace the database: `cp /var/lib/metacrypt/backups/metacrypt-20260314.db /var/lib/metacrypt/metacrypt.db`
3. Fix permissions: `chown metacrypt:metacrypt /var/lib/metacrypt/metacrypt.db && chmod 0600 /var/lib/metacrypt/metacrypt.db`
4. Start the service: `systemctl start metacrypt`
5. Unseal with the original seal password
**The seal password does not change between backups.** A backup restored from any point in time uses the same seal password that was set during `metacrypt init`.
---
## Monitoring
### Health Check
```bash
curl -sk https://localhost:8443/v1/status
```
Returns HTTP 200 in all states. Check the `state` field:
- `unsealed` — healthy, fully operational
- `sealed` — needs unseal, no crypto operations available
- `uninitialized` — needs init
### Log Output
Metacrypt logs structured JSON to stdout. When running under systemd, logs go to the journal:
```bash
# Follow logs
journalctl -u metacrypt -f
# Recent errors
journalctl -u metacrypt --priority=err --since="1 hour ago"
```
### Key Metrics to Monitor
| What | How | Alert When |
|---|---|---|
| Service state | `GET /v1/status` | `state != "unsealed"` for more than a few minutes after restart |
| TLS certificate expiry | External cert checker | < 30 days to expiry |
| Database file size | `stat /var/lib/metacrypt/metacrypt.db` | Unexpectedly large growth |
| Backup age | `find /var/lib/metacrypt/backups -name '*.db' -mtime +2` | No backup in 48 hours |
| MCIAS connectivity | Login attempt | Auth failures not caused by bad credentials |
---
## Troubleshooting
### Service won't start
| Symptom | Cause | Fix |
|---|---|---|
| `config: server.tls_cert is required` | Missing or invalid config | Check config file path and contents |
| `db: create file: permission denied` | Wrong permissions on data dir | `chown -R metacrypt:metacrypt /var/lib/metacrypt` |
| `server: tls: failed to find any PEM data` | Bad cert/key files | Verify PEM format: `openssl x509 -in server.crt -text -noout` |
### Unseal fails
| Symptom | Cause | Fix |
|---|---|---|
| `invalid password` (401) | Wrong seal password | Verify password. There is no recovery if the password is lost. |
| `too many attempts` (429) | Rate limited | Wait 60 seconds, then try again |
| `not initialized` (412) | Database is empty/new | Run `metacrypt init` |
### Authentication fails
| Symptom | Cause | Fix |
|---|---|---|
| `invalid credentials` (401) | Bad username/password or MCIAS down | Verify MCIAS is reachable: `curl -sk https://mcias.metacircular.net:8443/v1/health` |
| `sealed` (503) | Service not unsealed | Unseal the service first |
| Connection refused to MCIAS | Network/TLS issue | Check `mcias.server_url` and `mcias.ca_cert` in config |
### Database Issues
```bash
# Check database integrity
sqlite3 /var/lib/metacrypt/metacrypt.db "PRAGMA integrity_check;"
# Check WAL mode
sqlite3 /var/lib/metacrypt/metacrypt.db "PRAGMA journal_mode;"
# Should return: wal
# Check file permissions
ls -la /var/lib/metacrypt/metacrypt.db
# Should be: -rw------- metacrypt metacrypt
```
---
## Security Considerations
### Seal Password
- The seal password is the single point of trust. Protect it accordingly.
- Use a strong, unique password (recommend 20+ characters or a passphrase).
- Store it in at least two independent secure locations.
- Rotate by re-initializing (requires data migration — not yet automated).
### Key Material Lifecycle
- **KWK** (Key-Wrapping Key): derived from password, used only during unseal, zeroized immediately after.
- **MEK** (Master Encryption Key): held in memory while unsealed, zeroized on seal.
- **DEKs** (Data Encryption Keys): per-engine, stored encrypted in the barrier, zeroized on seal.
Sealing the service (`POST /v1/seal`) explicitly zeroizes all key material from process memory.
### File Permissions
| Path | Mode | Owner |
|---|---|---|
| `/etc/metacrypt/metacrypt.toml` | 0640 | metacrypt:metacrypt |
| `/etc/metacrypt/certs/server.key` | 0600 | metacrypt:metacrypt |
| `/var/lib/metacrypt/metacrypt.db` | 0600 | metacrypt:metacrypt |
| `/var/lib/metacrypt/backups/` | 0700 | metacrypt:metacrypt |
### systemd Hardening
The provided service unit applies: `NoNewPrivileges`, `ProtectSystem=strict`, `ProtectHome`, `PrivateTmp`, `PrivateDevices`, `MemoryDenyWriteExecute`, and namespace restrictions. Only `/var/lib/metacrypt` is writable.
### Docker Security
The container runs as a non-root `metacrypt` user. The `/data` volume should be owned by the container's metacrypt UID (determined at build time). Do not run the container with `--privileged`.
---
## Operational Procedures
### Planned Restart
1. Notify users that crypto operations will be briefly unavailable
2. `systemctl restart metacrypt`
3. Unseal the service
4. Verify: `metacrypt status --addr https://localhost:8443`
### Password Rotation
There is no online password rotation in Phase 1. To change the seal password:
1. Create a backup: `metacrypt snapshot --output pre-rotation.db`
2. Stop the service
3. Re-initialize with a new database and password
4. Migrate data from the old barrier (requires custom tooling or a future `metacrypt rekey` command)
### Disaster Recovery
If the server is lost but you have a database backup and the seal password:
1. Install Metacrypt on a new server (see Installation)
2. Copy the backup database to `/var/lib/metacrypt/metacrypt.db`
3. Fix ownership: `chown metacrypt:metacrypt /var/lib/metacrypt/metacrypt.db`
4. Start the service and unseal with the original password
The database backup contains the encrypted MEK and all barrier data. No additional secrets beyond the seal password are needed for recovery.
### Upgrading Metacrypt
1. Build or download the new binary
2. Create a backup: `metacrypt snapshot --output pre-upgrade.db`
3. Replace the binary: `install -m 0755 metacrypt /usr/local/bin/metacrypt`
4. Restart: `systemctl restart metacrypt`
5. Unseal and verify
Database migrations run automatically on startup.