Add web UI for SSH CA, Transit, and User engines; full security audit and remediation

Web UI: Added browser-based management for all three remaining engines (SSH CA, Transit, User E2E). Includes gRPC client wiring, handler files, 7 HTML templates, dashboard mount forms, and conditional navigation links. Fixed REST API routes to match design specs (SSH CA cert singular paths, Transit PATCH for update-key-config). Security audit: Conducted full-system audit covering crypto core, all engine implementations, API servers, policy engine, auth, deployment, and documentation. Identified 42 new findings (#39-#80) across all severity levels. Remediation of all 8 High findings: - #68: Replaced 14 JSON-injection-vulnerable error responses with safe json.Encoder via writeJSONError helper - #48: Added two-layer path traversal defense (barrier validatePath rejects ".." segments; engine ValidateName enforces safe name pattern) - #39: Extended RLock through entire crypto operations in barrier Get/Put/Delete/List to eliminate TOCTOU race with Seal - #40: Unified ReWrapKeys and seal_config UPDATE into single SQLite transaction to prevent irrecoverable data loss on crash during MEK rotation - #49: Added resolveTTL to CA engine enforcing issuer MaxTTL ceiling on handleIssue and handleSignCSR - #61: Store raw ECDH private key bytes in userState for effective zeroization on Seal - #62: Fixed user engine policy resource path from mountPath to mountName() so policy rules match correctly - #69: Added newPolicyChecker helper and passed service-level policy evaluation to all 25 typed REST handler engine.Request structs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 22:02:06 -07:00
parent 128f5abc4d
commit a80323e320
29 changed files with 5061 additions and 647 deletions
--- a/REMEDIATION.md
+++ b/REMEDIATION.md
@@ -1,354 +1,579 @@
-# Remediation Plan
+# Remediation Plan — High-Priority Audit Findings

-**Date**: 2026-03-16
-**Scope**: Audit findings #25–#38 from engine design review
+**Date**: 2026-03-17
+**Scope**: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69

-This document provides a concrete remediation plan for each open finding. Items
-are grouped by priority and ordered for efficient implementation (dependencies
-first).
+This plan addresses all eight High-severity findings from the 2026-03-17
+full system audit. Findings are grouped into four work items by shared root
+cause or affected subsystem. The order reflects dependency chains: #68 is a
+standalone fix that should ship first; #48 is a prerequisite for safe
+operation across all engines; #39/#40 affect the storage core; the remaining
+four affect specific engines.

 ---

-## Critical
+## Work Item 1: JSON Injection in REST Error Responses (#68)

-### #37 — `adminOnlyOperations` name collision blocks user `rotate-key`
+**Risk**: An error message containing `"` or `\` breaks the JSON response
+structure. If the error contains attacker-controlled input (e.g., a mount
+name or key name that triggers a downstream error), this enables JSON
+injection in API responses.

-**Problem**: The `adminOnlyOperations` map in `handleEngineRequest`
-(`internal/server/routes.go:265`) is a flat `map[string]bool` keyed by
-operation name. The transit engine's `rotate-key` is admin-only, but the user
-engine's `rotate-key` is user-self. Since the map is checked before engine
-dispatch, non-admin users are blocked from calling `rotate-key` on any engine
-mount — including user engine mounts where it should be allowed.
-
-**Fix**: Replace the flat map with an engine-type-qualified lookup. Two options:
-
-**Option A — Qualify the map key** (minimal change):
-
-Change the map type to include the engine type prefix:
+**Root cause**: 13 locations in `internal/server/routes.go` construct JSON
+error responses via string concatenation:

 ```go
-var adminOnlyOperations = map[string]bool{
-    "ca:import-root":          true,
-    "ca:create-issuer":        true,
-    "ca:delete-issuer":        true,
-    "ca:revoke-cert":          true,
-    "ca:delete-cert":          true,
-    "transit:create-key":      true,
-    "transit:delete-key":      true,
-    "transit:rotate-key":      true,
-    "transit:update-key-config": true,
-    "transit:trim-key":        true,
-    "sshca:create-profile":    true,
-    "sshca:update-profile":    true,
-    "sshca:delete-profile":    true,
-    "sshca:revoke-cert":       true,
-    "sshca:delete-cert":       true,
-    "user:provision":          true,
-    "user:delete-user":        true,
+http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError)
+```
+
+The `writeEngineError` helper (line 1704) is the most common entry point;
+most typed handlers call it.
+
+### Fix
+
+1. **Replace `writeEngineError`** with a safe JSON encoder:
+
+   ```go
+   func writeJSONError(w http.ResponseWriter, msg string, code int) {
+       w.Header().Set("Content-Type", "application/json")
+       w.WriteHeader(code)
+       _ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
+   }
+   ```
+
+2. **Replace all 13 call sites** that use string concatenation with
+   `writeJSONError(w, grpcMessage(err), status)` or
+   `writeJSONError(w, err.Error(), status)`.
+
+   The `grpcMessage` helper already exists in the webserver package and
+   extracts human-readable messages from gRPC errors. Add an equivalent
+   to the REST server, and prefer it over raw `err.Error()` to avoid
+   leaking internal error details.
+
+3. **Grep for the pattern** `"error":"` in `routes.go` to confirm no
+   remaining string-concatenated JSON.
+
+### Files
+
+| File | Change |
+|------|--------|
+| `internal/server/routes.go` | Replace `writeEngineError` and all 13 inline error sites |
+
+### Verification
+
+- `go vet ./internal/server/`
+- `go test ./internal/server/`
+- Manual test: mount an engine with a name containing `"`, trigger an error,
+  verify the response is valid JSON.
+
+---
+
+## Work Item 2: Path Traversal via Unsanitized Names (#48)
+
+**Risk**: User-controlled strings (issuer names, key names, profile names,
+usernames, mount names) are concatenated directly into barrier storage
+paths. An input containing `../` traverses the barrier namespace, allowing
+reads and writes to arbitrary paths. This affects all four engines and the
+engine registry.
+
+**Root cause**: No validation exists at any layer — neither the barrier's
+`Put`/`Get`/`Delete` methods nor the engines sanitize path components.
+
+### Vulnerable locations
+
+| File | Input | Path Pattern |
+|------|-------|-------------|
+| `ca/ca.go` | issuer `name` | `mountPath + "issuers/" + name + "/"` |
+| `sshca/sshca.go` | profile `name` | `mountPath + "profiles/" + name + ".json"` |
+| `transit/transit.go` | key `name` | `mountPath + "keys/" + name + "/"` |
+| `user/user.go` | `username` | `mountPath + "users/" + username + "/"` |
+| `engine/engine.go` | mount `name` | `engine/{type}/{name}/` |
+| `policy/policy.go` | rule `ID` | `policy/rules/{id}` |
+
+### Fix
+
+Enforce validation at **two layers** (defense in depth):
+
+1. **Barrier layer** — reject paths containing `..` segments.
+
+   Add a `validatePath` check at the top of `Get`, `Put`, `Delete`, and
+   `List` in `barrier.go`:
+
+   ```go
+   var ErrInvalidPath = errors.New("barrier: invalid path")
+
+   func validatePath(p string) error {
+       for _, seg := range strings.Split(p, "/") {
+           if seg == ".." {
+               return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p)
+           }
+       }
+       return nil
+   }
+   ```
+
+   Call `validatePath` at the entry of `Get`, `Put`, `Delete`, `List`.
+   Return `ErrInvalidPath` on failure.
+
+2. **Engine/registry layer** — validate entity names at input boundaries.
+
+   Add a `ValidateName` helper to `internal/engine/`:
+
+   ```go
+   var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
+
+   func ValidateName(name string) error {
+       if name == "" || len(name) > 128 || !namePattern.MatchString(name) {
+           return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+
+               "dot, hyphen, or underscore characters", name)
+       }
+       return nil
+   }
+   ```
+
+   Call `ValidateName` in:
+
+   | Location | Input validated |
+   |----------|----------------|
+   | `engine.go` `Mount()` | mount name |
+   | `ca.go` `handleCreateIssuer` | issuer name |
+   | `sshca.go` `handleCreateProfile` | profile name |
+   | `transit.go` `handleCreateKey` | key name |
+   | `user.go` `handleRegister`, `handleProvision` | username |
+   | `user.go` `handleEncrypt` | recipient usernames |
+   | `policy.go` `CreateRule` | rule ID |
+
+   Note: certificate serials are generated server-side from `crypto/rand`
+   and hex-encoded, so they are safe. Validate anyway for defense in depth.
+
+### Files
+
+| File | Change |
+|------|--------|
+| `internal/barrier/barrier.go` | Add `validatePath`, call from Get/Put/Delete/List |
+| `internal/engine/engine.go` | Add `ValidateName`, call from `Mount` |
+| `internal/engine/ca/ca.go` | Call `ValidateName` on issuer name |
+| `internal/engine/sshca/sshca.go` | Call `ValidateName` on profile name |
+| `internal/engine/transit/transit.go` | Call `ValidateName` on key name |
+| `internal/engine/user/user.go` | Call `ValidateName` on usernames |
+| `internal/policy/policy.go` | Call `ValidateName` on rule ID |
+
+### Verification
+
+- Add `TestValidatePath` to `barrier_test.go`: confirm `../` and `..` are
+  rejected; confirm normal paths pass.
+- Add `TestValidateName` to `engine_test.go`: confirm `../evil`, empty
+  string, and overlong names are rejected; confirm valid names pass.
+- `go test ./internal/barrier/ ./internal/engine/... ./internal/policy/`
+
+---
+
+## Work Item 3: Barrier Concurrency and Crash Safety (#39, #40)
+
+These two findings share the barrier/seal subsystem and should be addressed
+together.
+
+### #39 — TOCTOU Race in Barrier Get/Put
+
+**Risk**: `Get` and `Put` copy the `mek` slice header and `keys` map
+reference under `RLock`, release the lock, then use the copied references
+for encryption/decryption. A concurrent `Seal()` zeroizes the underlying
+byte slices in place before nil-ing the fields, so a concurrent reader
+uses zeroized key material.
+
+**Root cause**: The lock does not cover the crypto operation. The "copy"
+is a shallow reference copy (slice header), not a deep byte copy. `Seal()`
+zeroizes the backing array, which is shared.
+
+**Current locking pattern** (`barrier.go`):
+
+```
+Get:  RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key)
+Put:  RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key)
+Seal: Lock  → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock
+```
+
+**Fix**: Hold `RLock` through the entire crypto operation:
+
+```go
+func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) {
+    if err := validatePath(path); err != nil {
+        return nil, err
+    }
+    b.mu.RLock()
+    defer b.mu.RUnlock()
+    if b.mek == nil {
+        return nil, ErrSealed
+    }
+    // query DB, resolve key, decrypt — all under RLock
+    // ...
 }
 ```

-In `handleEngineRequest`, look up `engineType + ":" + operation` instead of
-just `operation`. The `engineType` is already known from the mount registry
-(the generic endpoint resolves the mount to an engine type).
+This is the minimal, safest change. `RLock` permits concurrent readers, so
+there is no throughput regression for parallel `Get`/`Put` operations. The
+only serialization point is `Seal()`, which acquires the exclusive `Lock`
+and waits for all readers to drain — exactly the semantics we want.

-**Option B — Per-engine admin operations** (cleaner but more code):
+Apply the same pattern to `Put`, `Delete`, and `List`.

-Each engine implements an `AdminOperations() []string` method. The server
-queries the resolved engine for its admin operations instead of using a global
-map.
+**Alternative considered**: Atomic pointer swap (`atomic.Pointer[keyState]`).
+This eliminates the lock from the hot path entirely, but introduces
+complexity around deferred zeroization of the old state (readers may still
+hold references). The `RLock`-through-crypto approach is simpler and
+sufficient for Metacrypt's concurrency profile.

-**Recommendation**: Option A. It requires a one-line change to the lookup and
-a mechanical update to the map keys. The generic endpoint already resolves the
-mount to get the engine type.
+### #40 — Crash During `ReWrapKeys` Loses All Data

-**Files to change**:
- `internal/server/routes.go` — update map and lookup in `handleEngineRequest`
- `engines/sshca.md` — update `adminOnlyOperations` section
- `engines/transit.md` — update `adminOnlyOperations` section
- `engines/user.md` — update `adminOnlyOperations` section
+**Risk**: `RotateMEK` calls `barrier.ReWrapKeys(newMEK)` which commits a
+transaction re-wrapping all DEKs, then separately updates `seal_config`
+with the new encrypted MEK. A crash between these two database operations
+leaves DEKs wrapped with a MEK that is not persisted — all data is
+irrecoverable.

-**Tests**: Add test case in `internal/server/server_test.go` — non-admin user
-calling `rotate-key` via generic endpoint on a user engine mount should succeed
-(policy permitting). Same call on a transit mount should return 403.
-
---
-
-## High
-
-### #28 — HMAC output not versioned
-
-**Problem**: HMAC output is raw base64 with no key version indicator. After key
-rotation and `min_decryption_version` advancement, old HMACs are unverifiable
-because the engine doesn't know which key version produced them.
-
-**Fix**: Use the same versioned prefix format as ciphertext and signatures:
+**Current flow** (`seal.go` lines 245–313):

 ```
-metacrypt:v{version}:{base64(mac_bytes)}
+1. Generate newMEK
+2. barrier.ReWrapKeys(ctx, newMEK)   ← commits transaction (barrier_keys updated)
+3. crypto.Encrypt(kwk, newMEK, nil)  ← encrypt new MEK
+4. UPDATE seal_config SET encrypted_mek = ?  ← separate statement, not in transaction
+   *** CRASH HERE = DATA LOSS ***
+5. Swap in-memory MEK
 ```

-Update the `hmac` operation to include `key_version` in the response. Update
-internal HMAC verification to parse the version prefix and select the
-corresponding key version (subject to `min_decryption_version` enforcement).
+**Fix**: Unify steps 2–4 into a single database transaction.

-**Files to change**:
- `engines/transit.md` — update HMAC section, add HMAC output format, update
-  Cryptographic Details section
- Implementation: `internal/engine/transit/sign.go` (when implemented)
-
-### #30 — `max_key_versions` vs `min_decryption_version` unclear
-
-**Problem**: The spec doesn't define when `max_key_versions` pruning happens or
-whether it respects `min_decryption_version`. Auto-pruning on rotation could
-destroy versions that still have unrewrapped ciphertext.
-
-**Fix**: Define the behavior explicitly in `engines/transit.md`:
-
-1. `max_key_versions` pruning happens during `rotate-key`, after the new
-   version is created.
-2. Pruning **only** deletes versions **strictly less than**
-   `min_decryption_version`. If `max_key_versions` would require deleting a
-   version at or above `min_decryption_version`, the version is **retained**
-   and a warning is included in the response:
-   `"warning": "max_key_versions exceeded; advance min_decryption_version to enable pruning"`.
-3. This means `max_key_versions` is a soft limit — it is only enforceable
-   after the operator completes the rotation cycle (rotate → rewrap → advance
-   min → prune happens automatically on next rotate).
-
-This resolves the original audit finding #16 as well.
-
-**Files to change**:
- `engines/transit.md` — add `max_key_versions` behavior to Key Rotation
-  section and `rotate-key` flow
- `AUDIT.md` — mark #16 as RESOLVED with reference to the new behavior
-
-### #33 — Auto-provision creates keys for arbitrary usernames
-
-**Problem**: The encrypt flow auto-provisions recipients without validating
-that the username exists in MCIAS. Any authenticated user can create barrier
-entries for non-existent users.
-
-**Fix**: Before auto-provisioning, validate the recipient username against
-MCIAS. The engine has access to the auth system via `req.CallerInfo` context.
-Add an MCIAS user lookup:
-
-1. Add a `ValidateUsername(username string) (bool, error)` method to the auth
-   client interface. This calls the MCIAS user info endpoint to check if the
-   username exists.
-2. In the encrypt flow, before auto-provisioning a recipient, call
-   `ValidateUsername`. If the user doesn't exist in MCIAS, return an error:
-   `"recipient not found: {username}"`.
-3. Document this validation in the encrypt flow and security considerations.
-
-**Alternative** (simpler, weaker): Skip MCIAS validation but add a
-rate limit on auto-provisioning (e.g., max 10 new provisions per encrypt
-request, max 100 total auto-provisions per hour per caller). This prevents
-storage inflation but doesn't prevent phantom users.
-
-**Recommendation**: MCIAS validation. It's the correct security boundary —
-only real MCIAS users should have keypairs.
-
-**Files to change**:
- `engines/user.md` — update encrypt flow step 2, add MCIAS validation
- `internal/auth/` — add `ValidateUsername` to auth client (when implemented)
-
---
-
-## Medium
-
-### #25 — Missing `list-certs` REST route (SSH CA)
-
-**Fix**: Add to the REST endpoints table:
-
-```
-| GET | `/v1/sshca/{mount}/certs` | List cert records |
-```
-
-Add to the route registration code block:
+Refactor `ReWrapKeys` to accept an optional `*sql.Tx`:

 ```go
-r.Get("/v1/sshca/{mount}/certs", s.requireAuth(s.handleSSHCAListCerts))
-```
+// ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction.
+func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error {
+    // Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx.
+    rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys")
+    // ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ...
+}

-**Files to change**: `engines/sshca.md`
-
-### #26 — KRL section type description error
-
-**Fix**: Change the description block from:
-
-```
-Section type: KRL_SECTION_CERT_SERIAL_LIST (0x21)
-```
-
-to:
-
-```
-Section type: KRL_SECTION_CERTIFICATES (0x01)
-  CA key blob: ssh.MarshalAuthorizedKey(caSigner.PublicKey())
-  Subsection type: KRL_SECTION_CERT_SERIAL_LIST (0x20)
-```
-
-This matches the pseudocode comments and the OpenSSH `PROTOCOL.krl` spec.
-
-**Files to change**: `engines/sshca.md`
-
-### #27 — Policy check after cert construction (SSH CA)
-
-**Fix**: Reorder the sign-host flow steps:
-
-1. Authenticate caller.
-2. Parse the supplied SSH public key.
-3. Parse TTL.
-4. **Policy check**: for each hostname, check policy on
-   `sshca/{mount}/id/{hostname}`, action `sign`.
-5. Generate serial (only after policy passes).
-6. Build `ssh.Certificate`.
-7. Sign, store, return.
-
-Same reordering for sign-user.
-
-**Files to change**: `engines/sshca.md`
-
-### #29 — `rewrap` policy action not specified
-
-**Fix**: Add `rewrap` as an explicit action in the `operationAction` mapping.
-`rewrap` maps to `decrypt` (since it requires internal access to plaintext).
-Batch variants map to the same action.
-
-Add to the authorization section in `engines/transit.md`:
-
-> The `rewrap` and `batch-rewrap` operations require the `decrypt` action —
-> rewrap internally decrypts with the old version and re-encrypts with the
-> latest, so the caller must have decrypt permission. Alternatively, a
-> dedicated `rewrap` action could be added for finer-grained control, but
-> `decrypt` is the safer default (granting `rewrap` without `decrypt` would be
-> odd since rewrap implies decrypt capability).
-
-**Recommendation**: Map to `decrypt`. Simpler, and anyone who should rewrap
-should also be able to decrypt.
-
-**Files to change**: `engines/transit.md`
-
-### #31 — Missing `get-public-key` REST route (Transit)
-
-**Fix**: Add to the REST endpoints table:
-
-```
-| GET | `/v1/transit/{mount}/keys/{name}/public-key` | Get public key |
-```
-
-Add to the route registration code block:
-
-```go
-r.Get("/v1/transit/{mount}/keys/{name}/public-key", s.requireAuth(s.handleTransitGetPublicKey))
-```
-
-**Files to change**: `engines/transit.md`
-
-### #34 — No recipient limit on encrypt (User)
-
-**Fix**: Add a compile-time constant `maxRecipients = 100` to the user engine.
-Reject requests exceeding this limit with `400 Bad Request` / `InvalidArgument`
-before any ECDH computation.
-
-Add to the encrypt flow in `engines/user.md` after step 1:
-
-> Validate that `len(recipients) <= maxRecipients` (100). Reject with error if
-> exceeded.
-
-Add to the security considerations section.
-
-**Files to change**: `engines/user.md`
-
---
-
-## Low
-
-### #32 — `exportable` flag with no export operation (Transit)
-
-**Fix**: Add an `export-key` operation to the transit engine:
-
- Auth: User+Policy (action `read`).
- Only succeeds if the key's `exportable` flag is `true`.
- Returns raw key material (base64-encoded) for the current version only.
- Asymmetric keys: returns private key in PKCS8 PEM.
- Symmetric keys: returns raw key bytes, base64-encoded.
- Add to HandleRequest dispatch, gRPC service, REST endpoints.
-
-Alternatively, if key export is never intended, remove the `exportable` flag
-from `create-key` to avoid dead code. Given that transit is meant to keep keys
-server-side, **removing the flag** may be the better choice. Document the
-decision either way.
-
-**Recommendation**: Remove `exportable`. Transit's entire value proposition is
-that keys never leave the service. If export is needed for migration, a
-dedicated admin-only `export-key` can be added later with appropriate audit
-logging (#7).
-
-**Files to change**: `engines/transit.md`
-
-### #35 — No re-encryption support for user key rotation
-
-**Fix**: Add a `re-encrypt` operation:
-
- Auth: User (self) — only the envelope recipient can re-encrypt.
- Input: old envelope.
- Flow: decrypt with current key, generate new DEK, re-encrypt, return new
-  envelope.
- The old key must still be valid at the time of re-encryption. Document the
-  workflow: re-encrypt all stored envelopes, then rotate-key.
-
-This is a quality-of-life improvement, not a security fix. The current design
-(decrypt + encrypt separately) works but requires the caller to handle
-plaintext.
-
-**Files to change**: `engines/user.md`
-
-### #36 — `UserKeyConfig` type undefined
-
-**Fix**: Add the type definition to the in-memory state section:
-
-```go
-type UserKeyConfig struct {
-    Algorithm       string    `json:"algorithm"`        // key exchange algorithm used
-    CreatedAt       time.Time `json:"created_at"`
-    AutoProvisioned bool      `json:"auto_provisioned"` // created via auto-provision
+// SwapMEK updates the in-memory MEK after a committed transaction.
+func (b *AESGCMBarrier) SwapMEK(newMEK []byte) {
+    b.mu.Lock()
+    defer b.mu.Unlock()
+    mcrypto.Zeroize(b.mek)
+    b.mek = newMEK
 }
 ```

-**Files to change**: `engines/user.md`
+Then in `RotateMEK`:

-### #38 — `ZeroizeKey` prerequisite not cross-referenced
+```go
+func (m *Manager) RotateMEK(ctx context.Context, password string) error {
+    // ... derive KWK, generate newMEK ...

-**Fix**: Add to the Implementation Steps section in both `engines/transit.md`
-and `engines/user.md`:
+    tx, err := m.db.BeginTx(ctx, nil)
+    if err != nil {
+        return err
+    }
+    defer tx.Rollback()

-> **Prerequisite**: `engine.ZeroizeKey` must exist in
-> `internal/engine/helpers.go` (created as part of the SSH CA engine
-> implementation — see `engines/sshca.md` step 1).
+    // Re-wrap all DEKs within the transaction.
+    if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil {
+        return err
+    }

-**Files to change**: `engines/transit.md`, `engines/user.md`
+    // Update seal_config within the same transaction.
+    encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil)
+    if err != nil {
+        return err
+    }
+    if _, err := tx.ExecContext(ctx,
+        "UPDATE seal_config SET encrypted_mek = ? WHERE id = 1",
+        encNewMEK,
+    ); err != nil {
+        return err
+    }
+
+    if err := tx.Commit(); err != nil {
+        return err
+    }
+
+    // Only after commit: update in-memory state.
+    m.barrier.SwapMEK(newMEK)
+    return nil
+}
+```
+
+SQLite in WAL mode handles this correctly — the transaction is atomic
+regardless of process crash. The `barrier_keys` and `seal_config` updates
+either both commit or neither does.
+
+### Files
+
+| File | Change |
+|------|--------|
+| `internal/barrier/barrier.go` | Extend RLock scope in Get/Put/Delete/List; add `ReWrapKeysTx`, `SwapMEK` |
+| `internal/seal/seal.go` | Wrap ReWrapKeysTx + seal_config UPDATE in single transaction |
+| `internal/barrier/barrier_test.go` | Add concurrent Get/Seal stress test |
+
+### Verification
+
+- `go test -race ./internal/barrier/ ./internal/seal/`
+- Add `TestConcurrentGetSeal`: spawn goroutines doing Get while another
+  goroutine calls Seal. Run with `-race`. Verify no panics or data races.
+- Add `TestRotateMEKAtomic`: verify that `barrier_keys` and `seal_config`
+  are updated in the same transaction (mock the DB to detect transaction
+  boundaries, or verify via rollback behavior).
+
+---
+
+## Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69)
+
+These four findings touch separate files with no overlap and can be
+addressed in parallel.
+
+### #49 — No TTL Ceiling in CA Certificate Issuance
+
+**Risk**: A non-admin user can request an arbitrarily long certificate
+lifetime. The issuer's `MaxTTL` exists in config but is not enforced
+during `handleIssue` or `handleSignCSR`.
+
+**Root cause**: The CA engine applies the user's requested TTL directly
+to the certificate without comparing it against `issuerConfig.MaxTTL`.
+The SSH CA engine correctly enforces this via `resolveTTL` — the CA
+engine does not.
+
+**Fix**: Add a `resolveTTL` method to the CA engine, following the SSH
+CA engine's pattern (`sshca.go` lines 902–932):
+
+```go
+func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) {
+    maxTTL, err := time.ParseDuration(issuer.config.MaxTTL)
+    if err != nil {
+        maxTTL = 2160 * time.Hour // 90 days fallback
+    }
+
+    if requested != "" {
+        ttl, err := time.ParseDuration(requested)
+        if err != nil {
+            return 0, fmt.Errorf("invalid TTL: %w", err)
+        }
+        if ttl > maxTTL {
+            return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL)
+        }
+        return ttl, nil
+    }
+
+    return maxTTL, nil
+}
+```
+
+Call this in `handleIssue` and `handleSignCSR` before constructing the
+certificate. Replace the raw TTL string with the validated duration.
+
+| File | Change |
+|------|--------|
+| `internal/engine/ca/ca.go` | Add `resolveTTL`, call from `handleIssue` and `handleSignCSR` |
+| `internal/engine/ca/ca_test.go` | Add test: issue cert with TTL > MaxTTL, verify rejection |
+
+### #61 — Ineffective ECDH Key Zeroization
+
+**Risk**: `privKey.Bytes()` returns a copy of the private key bytes.
+Zeroizing the copy leaves the original inside `*ecdh.PrivateKey`. Go's
+`crypto/ecdh` API does not expose the internal byte slice.
+
+**Root cause**: Language/API limitation in Go's `crypto/ecdh` package.
+
+**Fix**: Store the raw private key bytes alongside the parsed key in
+`userState`, and zeroize those bytes on seal:
+
+```go
+type userState struct {
+    privKey    *ecdh.PrivateKey
+    privBytes  []byte            // raw key bytes, retained for zeroization
+    pubKey     *ecdh.PublicKey
+    config     *UserKeyConfig
+}
+```
+
+On **load from barrier** (Unseal, auto-provision):
+```go
+raw, err := b.Get(ctx, prefix+"priv.key")
+priv, err := curve.NewPrivateKey(raw)
+state.privBytes = raw  // retain for zeroization
+state.privKey = priv
+```
+
+On **Seal**:
+```go
+mcrypto.Zeroize(u.privBytes)
+u.privKey = nil
+u.privBytes = nil
+```
+
+Document the limitation: the parsed `*ecdh.PrivateKey` struct's internal
+copy cannot be zeroized from Go code. Setting `privKey = nil` makes it
+eligible for GC, but does not guarantee immediate byte overwrite. This is
+an accepted Go runtime limitation.
+
+| File | Change |
+|------|--------|
+| `internal/engine/user/user.go` | Add `privBytes` to `userState`, populate on load, zeroize on Seal |
+| `internal/engine/user/types.go` | Update `userState` struct |
+
+### #62 — User Engine Policy Path Uses `mountPath` Instead of Mount Name
+
+**Risk**: Policy checks construct the resource path using `e.mountPath`
+(which is `engine/user/{name}/`) instead of just the mount name. Policy
+rules match against `user/{name}/recipient/{username}`, so the full mount
+path creates a mismatch like `user/engine/user/myengine//recipient/alice`.
+No policy rule will ever match.
+
+**Root cause**: Line 358 of `user.go` uses `e.mountPath` directly. The
+SSH CA and transit engines correctly use a `mountName()` helper.
+
+**Fix**: Add a `mountName()` method to the user engine:
+
+```go
+func (e *UserEngine) mountName() string {
+    // mountPath is "engine/user/{name}/"
+    parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/")
+    if len(parts) >= 3 {
+        return parts[2]
+    }
+    return e.mountPath
+}
+```
+
+Change line 358:
+
+```go
+resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r)
+```
+
+Audit all other resource path constructions in the user engine to confirm
+they also use the correct mount name.
+
+| File | Change |
+|------|--------|
+| `internal/engine/user/user.go` | Add `mountName()`, fix resource path on line 358 |
+| `internal/engine/user/user_test.go` | Add test: verify policy resource path format |
+
+### #69 — Typed REST Handlers Bypass Policy Engine
+
+**Risk**: 18 typed REST handlers pass `nil` for `CheckPolicy` in the
+`engine.Request`, skipping service-level policy evaluation. The generic
+`/v1/engine/request` endpoint correctly passes a `policyChecker`. Since
+engines #54 and #58 default to allow when no policy matches, typed routes
+are effectively unprotected by policy.
+
+**Root cause**: Typed handlers were modeled after admin-only operations
+(which don't need policy) but applied to user-accessible operations.
+
+**Fix**: Extract the policy checker construction from
+`handleEngineRequest` into a shared helper:
+
+```go
+func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker {
+    return func(resource, action string) (string, bool) {
+        effect, matched, err := s.policy.Check(
+            info.Username, info.Roles, resource, action,
+        )
+        if err != nil || !matched {
+            return "deny", false
+        }
+        return effect, matched
+    }
+}
+```
+
+Then in each typed handler, set `CheckPolicy` on the request:
+
+```go
+req := &engine.Request{
+    Operation:   "get-cert",
+    Data:        data,
+    CallerInfo:  callerInfo,
+    CheckPolicy: s.newPolicyChecker(callerInfo),
+}
+```
+
+**18 handlers to update**:
+
+| Handler | Operation |
+|---------|-----------|
+| `handleGetCert` | `get-cert` |
+| `handleRevokeCert` | `revoke-cert` |
+| `handleDeleteCert` | `delete-cert` |
+| `handleSSHCASignHost` | `sign-host` |
+| `handleSSHCASignUser` | `sign-user` |
+| `handleSSHCAGetProfile` | `get-profile` |
+| `handleSSHCAListProfiles` | `list-profiles` |
+| `handleSSHCADeleteProfile` | `delete-profile` |
+| `handleSSHCAGetCert` | `get-cert` |
+| `handleSSHCAListCerts` | `list-certs` |
+| `handleSSHCARevokeCert` | `revoke-cert` |
+| `handleSSHCADeleteCert` | `delete-cert` |
+| `handleUserRegister` | `register` |
+| `handleUserProvision` | `provision` |
+| `handleUserListUsers` | `list-users` |
+| `handleUserGetPublicKey` | `get-public-key` |
+| `handleUserDeleteUser` | `delete-user` |
+| `handleUserDecrypt` | `decrypt` |
+
+Note: `handleUserEncrypt` already passes a policy checker — verify it
+uses the same shared helper after refactoring. Admin-only handlers
+(behind `requireAdmin` wrapper) do not need a policy checker since admin
+bypasses policy.
+
+| File | Change |
+|------|--------|
+| `internal/server/routes.go` | Add `newPolicyChecker`, pass to all 18 typed handlers |
+| `internal/server/server_test.go` | Add test: policy-denied user is rejected by typed route |
+
+### Verification (Work Item 4, all findings)
+
+```bash
+go test ./internal/engine/ca/
+go test ./internal/engine/user/
+go test ./internal/server/
+go vet ./...
+```

 ---

 ## Implementation Order

-The remediation items should be implemented in this order to respect
-dependencies:
+```
+1. #68  JSON injection             (standalone, ship immediately)
+2. #48  Path traversal             (standalone, blocks safe engine operation)
+3. #39  Barrier TOCTOU race    ─┐
+   #40  ReWrapKeys crash safety ┘  (coupled, requires careful testing)
+4. #49  CA TTL enforcement     ─┐
+   #61  ECDH zeroization       │
+   #62  User policy path       │  (independent fixes, parallelizable)
+   #69  Policy bypass          ─┘
+```

-1. **#37** — `adminOnlyOperations` qualification (critical, blocks user engine
-   `rotate-key`). This is a code change to `internal/server/routes.go` plus
-   spec updates. Do first because it affects all engine implementations.
+Items 1 and 2 have no dependencies and can be done in parallel by
+different engineers.

-2. **#28, #29, #30, #31, #32** — Transit spec fixes (can be done as a single
-   spec update pass).
+Items 3 and 4 can also be done in parallel since they touch different
+subsystems (barrier/seal vs engines/server).

-3. **#25, #26, #27** — SSH CA spec fixes (single spec update pass).
+---

-4. **#33, #34, #35, #36** — User spec fixes (single spec update pass).
+## Post-Remediation

-5. **#38** — Cross-reference update (trivial, do with transit and user spec
-   fixes).
+After all eight findings are resolved:

-Items within the same group are independent and can be done in parallel.
+1. **Update AUDIT.md** — mark #39, #40, #48, #49, #61, #62, #68, #69 as
+   RESOLVED with resolution summaries.
+2. **Run the full pipeline**: `make all` (vet, lint, test, build).
+3. **Run race detector**: `go test -race ./...`
+4. **Address related medium findings** that interact with these fixes:
+   - #54 (SSH CA default-allow) and #58 (transit default-allow) — once
+     #69 is fixed, the typed handlers will pass policy checkers to the
+     engines, but the engines still default-allow when `CheckPolicy`
+     returns no match. Consider changing the engine-level default to deny
+     for non-admin callers.
+   - #72 (policy ID path traversal) — already covered by #48's
+     `ValidateName` fix on `CreateRule`.