# Remediation Plan — High-Priority Audit Findings **Date**: 2026-03-17 **Scope**: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69 This plan addresses all eight High-severity findings from the 2026-03-17 full system audit. Findings are grouped into four work items by shared root cause or affected subsystem. The order reflects dependency chains: #68 is a standalone fix that should ship first; #48 is a prerequisite for safe operation across all engines; #39/#40 affect the storage core; the remaining four affect specific engines. --- ## Work Item 1: JSON Injection in REST Error Responses (#68) **Risk**: An error message containing `"` or `\` breaks the JSON response structure. If the error contains attacker-controlled input (e.g., a mount name or key name that triggers a downstream error), this enables JSON injection in API responses. **Root cause**: 13 locations in `internal/server/routes.go` construct JSON error responses via string concatenation: ```go http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError) ``` The `writeEngineError` helper (line 1704) is the most common entry point; most typed handlers call it. ### Fix 1. **Replace `writeEngineError`** with a safe JSON encoder: ```go func writeJSONError(w http.ResponseWriter, msg string, code int) { w.Header().Set("Content-Type", "application/json") w.WriteHeader(code) _ = json.NewEncoder(w).Encode(map[string]string{"error": msg}) } ``` 2. **Replace all 13 call sites** that use string concatenation with `writeJSONError(w, grpcMessage(err), status)` or `writeJSONError(w, err.Error(), status)`. The `grpcMessage` helper already exists in the webserver package and extracts human-readable messages from gRPC errors. Add an equivalent to the REST server, and prefer it over raw `err.Error()` to avoid leaking internal error details. 3. **Grep for the pattern** `"error":"` in `routes.go` to confirm no remaining string-concatenated JSON. ### Files | File | Change | |------|--------| | `internal/server/routes.go` | Replace `writeEngineError` and all 13 inline error sites | ### Verification - `go vet ./internal/server/` - `go test ./internal/server/` - Manual test: mount an engine with a name containing `"`, trigger an error, verify the response is valid JSON. --- ## Work Item 2: Path Traversal via Unsanitized Names (#48) **Risk**: User-controlled strings (issuer names, key names, profile names, usernames, mount names) are concatenated directly into barrier storage paths. An input containing `../` traverses the barrier namespace, allowing reads and writes to arbitrary paths. This affects all four engines and the engine registry. **Root cause**: No validation exists at any layer — neither the barrier's `Put`/`Get`/`Delete` methods nor the engines sanitize path components. ### Vulnerable locations | File | Input | Path Pattern | |------|-------|-------------| | `ca/ca.go` | issuer `name` | `mountPath + "issuers/" + name + "/"` | | `sshca/sshca.go` | profile `name` | `mountPath + "profiles/" + name + ".json"` | | `transit/transit.go` | key `name` | `mountPath + "keys/" + name + "/"` | | `user/user.go` | `username` | `mountPath + "users/" + username + "/"` | | `engine/engine.go` | mount `name` | `engine/{type}/{name}/` | | `policy/policy.go` | rule `ID` | `policy/rules/{id}` | ### Fix Enforce validation at **two layers** (defense in depth): 1. **Barrier layer** — reject paths containing `..` segments. Add a `validatePath` check at the top of `Get`, `Put`, `Delete`, and `List` in `barrier.go`: ```go var ErrInvalidPath = errors.New("barrier: invalid path") func validatePath(p string) error { for _, seg := range strings.Split(p, "/") { if seg == ".." { return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p) } } return nil } ``` Call `validatePath` at the entry of `Get`, `Put`, `Delete`, `List`. Return `ErrInvalidPath` on failure. 2. **Engine/registry layer** — validate entity names at input boundaries. Add a `ValidateName` helper to `internal/engine/`: ```go var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`) func ValidateName(name string) error { if name == "" || len(name) > 128 || !namePattern.MatchString(name) { return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+ "dot, hyphen, or underscore characters", name) } return nil } ``` Call `ValidateName` in: | Location | Input validated | |----------|----------------| | `engine.go` `Mount()` | mount name | | `ca.go` `handleCreateIssuer` | issuer name | | `sshca.go` `handleCreateProfile` | profile name | | `transit.go` `handleCreateKey` | key name | | `user.go` `handleRegister`, `handleProvision` | username | | `user.go` `handleEncrypt` | recipient usernames | | `policy.go` `CreateRule` | rule ID | Note: certificate serials are generated server-side from `crypto/rand` and hex-encoded, so they are safe. Validate anyway for defense in depth. ### Files | File | Change | |------|--------| | `internal/barrier/barrier.go` | Add `validatePath`, call from Get/Put/Delete/List | | `internal/engine/engine.go` | Add `ValidateName`, call from `Mount` | | `internal/engine/ca/ca.go` | Call `ValidateName` on issuer name | | `internal/engine/sshca/sshca.go` | Call `ValidateName` on profile name | | `internal/engine/transit/transit.go` | Call `ValidateName` on key name | | `internal/engine/user/user.go` | Call `ValidateName` on usernames | | `internal/policy/policy.go` | Call `ValidateName` on rule ID | ### Verification - Add `TestValidatePath` to `barrier_test.go`: confirm `../` and `..` are rejected; confirm normal paths pass. - Add `TestValidateName` to `engine_test.go`: confirm `../evil`, empty string, and overlong names are rejected; confirm valid names pass. - `go test ./internal/barrier/ ./internal/engine/... ./internal/policy/` --- ## Work Item 3: Barrier Concurrency and Crash Safety (#39, #40) These two findings share the barrier/seal subsystem and should be addressed together. ### #39 — TOCTOU Race in Barrier Get/Put **Risk**: `Get` and `Put` copy the `mek` slice header and `keys` map reference under `RLock`, release the lock, then use the copied references for encryption/decryption. A concurrent `Seal()` zeroizes the underlying byte slices in place before nil-ing the fields, so a concurrent reader uses zeroized key material. **Root cause**: The lock does not cover the crypto operation. The "copy" is a shallow reference copy (slice header), not a deep byte copy. `Seal()` zeroizes the backing array, which is shared. **Current locking pattern** (`barrier.go`): ``` Get: RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key) Put: RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key) Seal: Lock → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock ``` **Fix**: Hold `RLock` through the entire crypto operation: ```go func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) { if err := validatePath(path); err != nil { return nil, err } b.mu.RLock() defer b.mu.RUnlock() if b.mek == nil { return nil, ErrSealed } // query DB, resolve key, decrypt — all under RLock // ... } ``` This is the minimal, safest change. `RLock` permits concurrent readers, so there is no throughput regression for parallel `Get`/`Put` operations. The only serialization point is `Seal()`, which acquires the exclusive `Lock` and waits for all readers to drain — exactly the semantics we want. Apply the same pattern to `Put`, `Delete`, and `List`. **Alternative considered**: Atomic pointer swap (`atomic.Pointer[keyState]`). This eliminates the lock from the hot path entirely, but introduces complexity around deferred zeroization of the old state (readers may still hold references). The `RLock`-through-crypto approach is simpler and sufficient for Metacrypt's concurrency profile. ### #40 — Crash During `ReWrapKeys` Loses All Data **Risk**: `RotateMEK` calls `barrier.ReWrapKeys(newMEK)` which commits a transaction re-wrapping all DEKs, then separately updates `seal_config` with the new encrypted MEK. A crash between these two database operations leaves DEKs wrapped with a MEK that is not persisted — all data is irrecoverable. **Current flow** (`seal.go` lines 245–313): ``` 1. Generate newMEK 2. barrier.ReWrapKeys(ctx, newMEK) ← commits transaction (barrier_keys updated) 3. crypto.Encrypt(kwk, newMEK, nil) ← encrypt new MEK 4. UPDATE seal_config SET encrypted_mek = ? ← separate statement, not in transaction *** CRASH HERE = DATA LOSS *** 5. Swap in-memory MEK ``` **Fix**: Unify steps 2–4 into a single database transaction. Refactor `ReWrapKeys` to accept an optional `*sql.Tx`: ```go // ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction. func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error { // Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx. rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys") // ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ... } // SwapMEK updates the in-memory MEK after a committed transaction. func (b *AESGCMBarrier) SwapMEK(newMEK []byte) { b.mu.Lock() defer b.mu.Unlock() mcrypto.Zeroize(b.mek) b.mek = newMEK } ``` Then in `RotateMEK`: ```go func (m *Manager) RotateMEK(ctx context.Context, password string) error { // ... derive KWK, generate newMEK ... tx, err := m.db.BeginTx(ctx, nil) if err != nil { return err } defer tx.Rollback() // Re-wrap all DEKs within the transaction. if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil { return err } // Update seal_config within the same transaction. encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil) if err != nil { return err } if _, err := tx.ExecContext(ctx, "UPDATE seal_config SET encrypted_mek = ? WHERE id = 1", encNewMEK, ); err != nil { return err } if err := tx.Commit(); err != nil { return err } // Only after commit: update in-memory state. m.barrier.SwapMEK(newMEK) return nil } ``` SQLite in WAL mode handles this correctly — the transaction is atomic regardless of process crash. The `barrier_keys` and `seal_config` updates either both commit or neither does. ### Files | File | Change | |------|--------| | `internal/barrier/barrier.go` | Extend RLock scope in Get/Put/Delete/List; add `ReWrapKeysTx`, `SwapMEK` | | `internal/seal/seal.go` | Wrap ReWrapKeysTx + seal_config UPDATE in single transaction | | `internal/barrier/barrier_test.go` | Add concurrent Get/Seal stress test | ### Verification - `go test -race ./internal/barrier/ ./internal/seal/` - Add `TestConcurrentGetSeal`: spawn goroutines doing Get while another goroutine calls Seal. Run with `-race`. Verify no panics or data races. - Add `TestRotateMEKAtomic`: verify that `barrier_keys` and `seal_config` are updated in the same transaction (mock the DB to detect transaction boundaries, or verify via rollback behavior). --- ## Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69) These four findings touch separate files with no overlap and can be addressed in parallel. ### #49 — No TTL Ceiling in CA Certificate Issuance **Risk**: A non-admin user can request an arbitrarily long certificate lifetime. The issuer's `MaxTTL` exists in config but is not enforced during `handleIssue` or `handleSignCSR`. **Root cause**: The CA engine applies the user's requested TTL directly to the certificate without comparing it against `issuerConfig.MaxTTL`. The SSH CA engine correctly enforces this via `resolveTTL` — the CA engine does not. **Fix**: Add a `resolveTTL` method to the CA engine, following the SSH CA engine's pattern (`sshca.go` lines 902–932): ```go func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) { maxTTL, err := time.ParseDuration(issuer.config.MaxTTL) if err != nil { maxTTL = 2160 * time.Hour // 90 days fallback } if requested != "" { ttl, err := time.ParseDuration(requested) if err != nil { return 0, fmt.Errorf("invalid TTL: %w", err) } if ttl > maxTTL { return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL) } return ttl, nil } return maxTTL, nil } ``` Call this in `handleIssue` and `handleSignCSR` before constructing the certificate. Replace the raw TTL string with the validated duration. | File | Change | |------|--------| | `internal/engine/ca/ca.go` | Add `resolveTTL`, call from `handleIssue` and `handleSignCSR` | | `internal/engine/ca/ca_test.go` | Add test: issue cert with TTL > MaxTTL, verify rejection | ### #61 — Ineffective ECDH Key Zeroization **Risk**: `privKey.Bytes()` returns a copy of the private key bytes. Zeroizing the copy leaves the original inside `*ecdh.PrivateKey`. Go's `crypto/ecdh` API does not expose the internal byte slice. **Root cause**: Language/API limitation in Go's `crypto/ecdh` package. **Fix**: Store the raw private key bytes alongside the parsed key in `userState`, and zeroize those bytes on seal: ```go type userState struct { privKey *ecdh.PrivateKey privBytes []byte // raw key bytes, retained for zeroization pubKey *ecdh.PublicKey config *UserKeyConfig } ``` On **load from barrier** (Unseal, auto-provision): ```go raw, err := b.Get(ctx, prefix+"priv.key") priv, err := curve.NewPrivateKey(raw) state.privBytes = raw // retain for zeroization state.privKey = priv ``` On **Seal**: ```go mcrypto.Zeroize(u.privBytes) u.privKey = nil u.privBytes = nil ``` Document the limitation: the parsed `*ecdh.PrivateKey` struct's internal copy cannot be zeroized from Go code. Setting `privKey = nil` makes it eligible for GC, but does not guarantee immediate byte overwrite. This is an accepted Go runtime limitation. | File | Change | |------|--------| | `internal/engine/user/user.go` | Add `privBytes` to `userState`, populate on load, zeroize on Seal | | `internal/engine/user/types.go` | Update `userState` struct | ### #62 — User Engine Policy Path Uses `mountPath` Instead of Mount Name **Risk**: Policy checks construct the resource path using `e.mountPath` (which is `engine/user/{name}/`) instead of just the mount name. Policy rules match against `user/{name}/recipient/{username}`, so the full mount path creates a mismatch like `user/engine/user/myengine//recipient/alice`. No policy rule will ever match. **Root cause**: Line 358 of `user.go` uses `e.mountPath` directly. The SSH CA and transit engines correctly use a `mountName()` helper. **Fix**: Add a `mountName()` method to the user engine: ```go func (e *UserEngine) mountName() string { // mountPath is "engine/user/{name}/" parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/") if len(parts) >= 3 { return parts[2] } return e.mountPath } ``` Change line 358: ```go resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r) ``` Audit all other resource path constructions in the user engine to confirm they also use the correct mount name. | File | Change | |------|--------| | `internal/engine/user/user.go` | Add `mountName()`, fix resource path on line 358 | | `internal/engine/user/user_test.go` | Add test: verify policy resource path format | ### #69 — Typed REST Handlers Bypass Policy Engine **Risk**: 18 typed REST handlers pass `nil` for `CheckPolicy` in the `engine.Request`, skipping service-level policy evaluation. The generic `/v1/engine/request` endpoint correctly passes a `policyChecker`. Since engines #54 and #58 default to allow when no policy matches, typed routes are effectively unprotected by policy. **Root cause**: Typed handlers were modeled after admin-only operations (which don't need policy) but applied to user-accessible operations. **Fix**: Extract the policy checker construction from `handleEngineRequest` into a shared helper: ```go func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker { return func(resource, action string) (string, bool) { effect, matched, err := s.policy.Check( info.Username, info.Roles, resource, action, ) if err != nil || !matched { return "deny", false } return effect, matched } } ``` Then in each typed handler, set `CheckPolicy` on the request: ```go req := &engine.Request{ Operation: "get-cert", Data: data, CallerInfo: callerInfo, CheckPolicy: s.newPolicyChecker(callerInfo), } ``` **18 handlers to update**: | Handler | Operation | |---------|-----------| | `handleGetCert` | `get-cert` | | `handleRevokeCert` | `revoke-cert` | | `handleDeleteCert` | `delete-cert` | | `handleSSHCASignHost` | `sign-host` | | `handleSSHCASignUser` | `sign-user` | | `handleSSHCAGetProfile` | `get-profile` | | `handleSSHCAListProfiles` | `list-profiles` | | `handleSSHCADeleteProfile` | `delete-profile` | | `handleSSHCAGetCert` | `get-cert` | | `handleSSHCAListCerts` | `list-certs` | | `handleSSHCARevokeCert` | `revoke-cert` | | `handleSSHCADeleteCert` | `delete-cert` | | `handleUserRegister` | `register` | | `handleUserProvision` | `provision` | | `handleUserListUsers` | `list-users` | | `handleUserGetPublicKey` | `get-public-key` | | `handleUserDeleteUser` | `delete-user` | | `handleUserDecrypt` | `decrypt` | Note: `handleUserEncrypt` already passes a policy checker — verify it uses the same shared helper after refactoring. Admin-only handlers (behind `requireAdmin` wrapper) do not need a policy checker since admin bypasses policy. | File | Change | |------|--------| | `internal/server/routes.go` | Add `newPolicyChecker`, pass to all 18 typed handlers | | `internal/server/server_test.go` | Add test: policy-denied user is rejected by typed route | ### Verification (Work Item 4, all findings) ```bash go test ./internal/engine/ca/ go test ./internal/engine/user/ go test ./internal/server/ go vet ./... ``` --- ## Implementation Order ``` 1. #68 JSON injection (standalone, ship immediately) 2. #48 Path traversal (standalone, blocks safe engine operation) 3. #39 Barrier TOCTOU race ─┐ #40 ReWrapKeys crash safety ┘ (coupled, requires careful testing) 4. #49 CA TTL enforcement ─┐ #61 ECDH zeroization │ #62 User policy path │ (independent fixes, parallelizable) #69 Policy bypass ─┘ ``` Items 1 and 2 have no dependencies and can be done in parallel by different engineers. Items 3 and 4 can also be done in parallel since they touch different subsystems (barrier/seal vs engines/server). --- ## Post-Remediation (High) All eight High findings (#39, #40, #48, #49, #61, #62, #68, #69) have been resolved. See commit `a80323e`. --- --- # Remediation Plan — Medium-Priority Audit Findings **Date**: 2026-03-17 **Scope**: AUDIT.md findings #7, #10, #21, #41, #42, #43, #46, #50, #51, #53, #54, #58, #59, #63, #64, #65, #66, #67, #70, #72, #73, #74, #78, #79 This plan addresses all medium-severity findings. Findings are grouped into eight work items by subsystem. Two findings (#72, #79) are already resolved or trivial and are noted inline. **Previously resolved by High remediation**: - #72 (policy rule ID path traversal) — covered by #48's `ValidateName` on `CreateRule` and barrier-level `validatePath`. --- ## Work Item 5: Default-Deny in Engine Policy Checks (#54, #58) **Risk**: The SSH CA and transit engines default to **allow** when no policy rule matches a non-admin user's request. This contradicts the default-deny principle in ARCHITECTURE.md and the engineering standards. Now that #69 is resolved (typed REST handlers pass `CheckPolicy`), the engines receive a policy checker — but when `CheckPolicy` returns no match, they still allow. **Root cause**: Both engines treat "no matching policy rule" as implicit allow for authenticated users. ### Fix Change the policy evaluation result handling in both engines. **SSH CA** (`sshca.go`, `handleSignHost` ~line 309, `handleSignUser` ~line 443): ```go // Before (default-allow): if matched && effect == "deny" { return nil, ErrForbidden } // After (default-deny): if req.CheckPolicy != nil { effect, matched := req.CheckPolicy(resource, "sign") if !matched || effect != "allow" { return nil, ErrForbidden } } ``` **Transit** (`transit.go`, `requireUserWithPolicy` ~line 330): ```go // Before (default-allow): if matched { if effect == "deny" { return ErrForbidden } return nil } return nil // no match → allow // After (default-deny): if req.CheckPolicy == nil { return ErrForbidden } effect, matched := req.CheckPolicy(resource, action) if !matched || effect != "allow" { return ErrForbidden } return nil ``` Apply the same pattern to any other policy check sites in these engines. ### Files | File | Change | |------|--------| | `internal/engine/sshca/sshca.go` | Default-deny in `handleSignHost`, `handleSignUser` | | `internal/engine/transit/transit.go` | Default-deny in `requireUserWithPolicy` | | `internal/engine/sshca/sshca_test.go` | Add test: no policy rules → user request denied | | `internal/engine/transit/transit_test.go` | Add test: no policy rules → user request denied | --- ## Work Item 6: User Engine Concurrency and Safety (#63, #64, #65, #66, #67) Five findings in the user engine share a root cause: the engine was implemented after the others and does not follow the same concurrency patterns. ### #63 — Missing role checks on decrypt, re-encrypt, rotate-key **Fix**: Add `IsUser()` check to `handleDecrypt`, `handleReEncrypt`, and `handleRotateKey`, matching the pattern used in `handleEncrypt`: ```go if !req.CallerInfo.IsUser() { return nil, ErrForbidden } ``` ### #64 — Initialize holds no mutex **Fix**: Acquire the write lock at the top of `Initialize`: ```go func (e *UserEngine) Initialize(ctx context.Context, ...) error { e.mu.Lock() defer e.mu.Unlock() // ... existing body ... } ``` This matches the pattern in the SSH CA and transit engines. ### #65 — handleEncrypt releases lock before using state **Fix**: Hold `RLock` (not write lock) through the ECDH and encryption operations. After auto-provisioning (which needs the write lock), downgrade to a read lock for the crypto work: ```go e.mu.Lock() // ... auto-provisioning ... // Capture references while holding write lock. senderState := e.users[sender] recipientStates := ... e.mu.Unlock() // Re-acquire RLock for crypto operations. e.mu.RLock() defer e.mu.RUnlock() // verify states are still valid (not nil) if senderState.privKey == nil { return nil, ErrSealed } // ... ECDH, wrap DEK, encrypt ... ``` Alternatively, hold the write lock through the entire operation (simpler but serializes all encrypts). Given that encryption is the hot path, the downgrade approach is preferred. ### #66 — handleReEncrypt manual lock without defer **Fix**: Restructure to use `defer` for all lock releases. Split the method into a provisioning phase (write lock) and a crypto phase (read lock), each with its own `defer`: ```go func (e *UserEngine) handleReEncrypt(...) { // Phase 1: read envelope, resolve users. e.mu.RLock() // ... read state ... e.mu.RUnlock() // Phase 2: crypto operations. e.mu.RLock() defer e.mu.RUnlock() // ... decrypt, re-wrap, encrypt ... } ``` ### #67 — No sealed-state check in HandleRequest **Fix**: Add a sealed-state guard at the top of `HandleRequest`, before the operation dispatch: ```go func (e *UserEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) { e.mu.RLock() sealed := e.config == nil e.mu.RUnlock() if sealed { return nil, ErrSealed } switch req.Operation { // ... } } ``` ### Files | File | Change | |------|--------| | `internal/engine/user/user.go` | All five fixes above | | `internal/engine/user/user_test.go` | Add tests: guest role denied; concurrent encrypt/seal; sealed-state rejection | ### Verification ```bash go test -race ./internal/engine/user/ ``` --- ## Work Item 7: Barrier and Seal Fixes (#41, #42, #43) ### #41 — `loadKeys` errors silently swallowed during unseal **Risk**: If `loadKeys` fails to decrypt DEK entries (corrupt data, wrong MEK), errors are swallowed and the keys map is incomplete. Subsequent operations fail with confusing `ErrKeyNotFound` errors. **Fix**: Return the error from `loadKeys` instead of ignoring it. Distinguish between "table doesn't exist" (acceptable on first run before migration) and genuine decryption failures: ```go func (b *AESGCMBarrier) Unseal(mek []byte) error { // ... if err := b.loadKeys(); err != nil { // Check if this is a "no such table" error (pre-migration). if !isTableMissing(err) { b.mek = nil return fmt.Errorf("barrier: load keys: %w", err) } // Pre-migration: no barrier_keys table yet, proceed with empty map. } return nil } func isTableMissing(err error) bool { return strings.Contains(err.Error(), "no such table") } ``` ### #42 — No AAD binding on MEK encryption with KWK **Risk**: The MEK ciphertext in `seal_config` has no AAD, so there is no cryptographic binding to its storage purpose. **Fix**: Pass a constant AAD string when encrypting/decrypting the MEK: ```go var mekAAD = []byte("metacrypt/seal_config/mek") // In Init: encryptedMEK, err := crypto.Encrypt(kwk, mek, mekAAD) // In Unseal: mek, err := crypto.Decrypt(kwk, encryptedMEK, mekAAD) // In RotateMEK: newEncMEK, err := crypto.Encrypt(kwk, newMEK, mekAAD) ``` **Migration note**: Existing databases have MEKs encrypted with `nil` AAD. The unseal path must try `mekAAD` first, then fall back to `nil` for backward compatibility. After successful unseal with `nil`, re-encrypt with `mekAAD` and update `seal_config`. ### #43 — Barrier `List` SQL LIKE with unescaped prefix **Risk**: If a barrier path prefix contains `%` or `_`, the LIKE query matches unintended entries. **Fix**: Escape SQL LIKE wildcards in the prefix: ```go func escapeLIKE(s string) string { s = strings.ReplaceAll(s, `\`, `\\`) s = strings.ReplaceAll(s, `%`, `\%`) s = strings.ReplaceAll(s, `_`, `\_`) return s } // In List: rows, err := b.db.QueryContext(ctx, "SELECT path FROM barrier_entries WHERE path LIKE ? ESCAPE '\\'", escapeLIKE(prefix)+"%") ``` ### Files | File | Change | |------|--------| | `internal/barrier/barrier.go` | Fix `loadKeys` error handling; add `escapeLIKE` to `List` | | `internal/seal/seal.go` | Add `mekAAD` constant; use in Init, Unseal, RotateMEK with fallback | | `internal/barrier/barrier_test.go` | Add test: `List` with `%` and `_` in paths | | `internal/seal/seal_test.go` | Add test: unseal with AAD-encrypted MEK | --- ## Work Item 8: CA Engine Fixes (#50, #51, #70) ### #50 — Non-admin users can override key usages **Risk**: A non-admin user can request `cert sign` or `crl sign` key usage on a leaf certificate, creating a de facto CA certificate. **Fix**: Only allow admin users to override `key_usages` and `ext_key_usages`. Non-admin users get the profile defaults: ```go // In handleIssue, after profile lookup: if req.CallerInfo.IsAdmin { if v, ok := req.Data["key_usages"].([]interface{}); ok { profile.KeyUse = toStringSlice(v) } if v, ok := req.Data["ext_key_usages"].([]interface{}); ok { profile.ExtKeyUsages = toStringSlice(v) } } ``` ### #51 — Certificate renewal does not revoke original **Fix**: After issuing the renewed certificate, revoke the original by serial. The original serial is available from the request data: ```go // In handleRenew, after successful issuance: oldSerial := req.Data["serial"].(string) if err := e.revokeCertBySerial(ctx, issuerName, oldSerial, req.CallerInfo.Username); err != nil { // Log but do not fail — the new cert is already issued. e.logger.Warn("failed to revoke original during renewal", "serial", oldSerial, "error", err) } ``` ### #70 — `RenewCert` has no REST route (API sync violation) **Fix**: Add a dedicated REST route for certificate renewal: ```go r.Post("/v1/ca/{mount}/cert/{serial}/renew", s.requireAuth(s.handleRenewCert)) ``` Implement `handleRenewCert` following the typed handler pattern (with `CheckPolicy`). ### Files | File | Change | |------|--------| | `internal/engine/ca/ca.go` | Guard key usage overrides with `IsAdmin`; revoke original in `handleRenew` | | `internal/server/routes.go` | Add `POST /v1/ca/{mount}/cert/{serial}/renew` route and handler | | `internal/engine/ca/ca_test.go` | Add test: non-admin cannot set `cert sign` usage; renewal revokes original | --- ## Work Item 9: SSH CA Lock Granularity (#53) **Risk**: `HandleRequest` acquires a write lock for all operations including reads (`get-ca-pubkey`, `get-cert`, `list-certs`, `get-profile`, `list-profiles`, `get-krl`). This serializes all operations unnecessarily. **Fix**: Move locking into individual handlers. Read-only operations use `RLock`; mutating operations use `Lock`: ```go func (e *SSHCAEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) { // No lock here — each handler manages its own. switch req.Operation { case "get-ca-pubkey": return e.handleGetCAPubkey(ctx, req) // uses RLock internally case "sign-host": return e.handleSignHost(ctx, req) // uses Lock internally // ... } } ``` Read-only handlers: ```go func (e *SSHCAEngine) handleGetCAPubkey(ctx context.Context, req *engine.Request) (*engine.Response, error) { e.mu.RLock() defer e.mu.RUnlock() // ... } ``` Mutating handlers (`sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert`): ```go func (e *SSHCAEngine) handleSignHost(ctx context.Context, req *engine.Request) (*engine.Response, error) { e.mu.Lock() defer e.mu.Unlock() // ... } ``` ### Files | File | Change | |------|--------| | `internal/engine/sshca/sshca.go` | Remove top-level lock from `HandleRequest`; add per-handler locks | --- ## Work Item 10: Transit Ciphertext Validation (#59) **Risk**: `parseVersionedData` accepts negative version numbers. A crafted ciphertext `metacrypt:v-1:...` parses as version -1, which fails the key lookup but produces a confusing error. **Fix**: Add a bounds check after parsing: ```go func parseVersionedData(s string) (int, []byte, error) { // ... existing parse logic ... version, err := strconv.Atoi(parts[1][1:]) if err != nil { return 0, nil, ErrInvalidFormat } if version < 1 { return 0, nil, fmt.Errorf("transit: %w: version must be >= 1", ErrInvalidFormat) } // ... } ``` ### Files | File | Change | |------|--------| | `internal/engine/transit/transit.go` | Add `version < 1` check in `parseVersionedData` | | `internal/engine/transit/transit_test.go` | Add test: `metacrypt:v0:...` and `metacrypt:v-1:...` rejected | --- ## Work Item 11: Database and Auth (#46, #74) ### #46 — SQLite PRAGMAs only applied to first connection **Risk**: `database/sql` may open additional connections that don't receive the `journal_mode`, `foreign_keys`, and `busy_timeout` PRAGMAs. **Fix**: Use the `_pragma` DSN parameter supported by `modernc.org/sqlite`: ```go dsn := path + "?_pragma=journal_mode(WAL)&_pragma=foreign_keys(ON)&_pragma=busy_timeout(5000)" db, err := sql.Open("sqlite", dsn) ``` Remove the manual PRAGMA execution loop. This ensures every connection opened by the pool receives the PRAGMAs. ### #74 — Token validation cache grows without bound **Risk**: The `cache` map in `auth.go` grows indefinitely. Expired entries are never proactively removed. **Fix**: Add periodic eviction. Run a background goroutine that sweeps expired entries every minute: ```go func (a *Authenticator) startEviction(ctx context.Context) { ticker := time.NewTicker(time.Minute) go func() { defer ticker.Stop() for { select { case <-ctx.Done(): return case <-ticker.C: a.evictExpired() } } }() } func (a *Authenticator) evictExpired() { now := time.Now() a.mu.Lock() defer a.mu.Unlock() for key, entry := range a.cache { if now.After(entry.expiresAt) { delete(a.cache, key) } } } ``` Call `startEviction` from the `Authenticator` constructor. Accept a `context.Context` to allow cancellation on shutdown. ### Files | File | Change | |------|--------| | `internal/db/db.go` | Switch to `_pragma` DSN parameters; remove manual PRAGMA loop | | `internal/auth/auth.go` | Add `startEviction` goroutine; call from constructor | | `internal/db/db_test.go` | Verify PRAGMAs are active on a fresh connection from the pool | | `internal/auth/auth_test.go` | Add test: expired entries are evicted after sweep | --- ## Work Item 12: Policy Engine Glob Matching (#73) **Risk**: `filepath.Match` does not support `**` for recursive directory matching. Administrators writing rules like `engine/**/certs/*` will find they don't match paths with multiple segments. **Fix**: Replace `filepath.Match` with `path.Match` (POSIX-style, no OS-specific behavior) and add support for `**` by splitting the pattern and value on `/` and matching segments: ```go import "path" func matchesAnyGlob(patterns []string, value string) bool { for _, p := range patterns { if matchGlob(p, value) { return true } } return false } func matchGlob(pattern, value string) bool { // Handle ** by checking if any suffix of value matches the rest of pattern. if strings.Contains(pattern, "**") { parts := strings.SplitN(pattern, "**", 2) prefix := parts[0] suffix := parts[1] if !strings.HasPrefix(value, prefix) { return false } remainder := value[len(prefix):] // Try matching suffix against every suffix of remainder. for i := 0; i <= len(remainder); i++ { if matched, _ := path.Match(suffix, remainder[i:]); matched { return true } // Advance to next separator. next := strings.IndexByte(remainder[i:], '/') if next < 0 { break } i += next } return false } matched, _ := path.Match(pattern, value) return matched } ``` Document in POLICY.md that `**` matches zero or more path segments, while `*` matches within a single segment. ### Files | File | Change | |------|--------| | `internal/policy/policy.go` | Replace `filepath.Match` with `path.Match` + `**` support | | `internal/policy/policy_test.go` | Add test: `engine/**/certs/*` matches `engine/ca/prod/certs/abc123` | --- ## Work Item 13: Deployment Fixes (#78, #79) ### #78 — systemd `ExecReload` sends SIGHUP with no handler **Fix**: Two options: **Option A** (recommended): Remove `ExecReload` entirely. Metacrypt does not support graceful reload — configuration changes require a restart. The `ExecReload` line creates a false expectation. **Option B**: Implement SIGHUP handling that re-reads the TOML config and applies non-breaking changes (log level, TLS cert reload). This is significant new functionality and should be a separate feature. For now, remove `ExecReload` from both service units: ```diff -ExecReload=/bin/kill -HUP $MAINPID ``` ### #79 — Dockerfiles use Go 1.23 but module requires Go 1.25 **Fix**: Update the builder base image: ```diff -FROM golang:1.23-alpine AS builder +FROM golang:1.25-alpine AS builder ``` Apply to both `Dockerfile.api` and `Dockerfile.web`. Also remove the unnecessary `gcc` and `musl-dev` installation since `CGO_ENABLED=0`. ```diff -RUN apk add --no-cache gcc musl-dev ``` ### #7 — No audit logging **Note**: This is the only medium finding that requires significant new functionality rather than a targeted fix. Audit logging should be implemented as a dedicated subsystem: 1. Define an `AuditEvent` struct with: timestamp, caller, operation, resource, outcome (success/denied/error), and metadata. 2. Write events to a structured log sink (slog with JSON output to a dedicated file at `/srv/metacrypt/audit.log`). 3. Instrument every engine `HandleRequest` to emit an event on completion. 4. Instrument policy `CreateRule`/`DeleteRule`. 5. Instrument seal/unseal operations. This is a substantial feature. Track separately from the quick-fix items in this plan. ### #10 — No extension allowlist for SSH host certificates **Note**: Host certificates typically don't use extensions (extensions are for user certificates). The fix is to ignore the `extensions` field on host signing requests rather than passing it through: ```go // In handleSignHost, after building the certificate: cert.Permissions.Extensions = nil // Host certs should not carry extensions ``` ### #21 — No rate limiting on transit cryptographic operations **Note**: This requires a rate-limiting middleware or per-caller token bucket. Best implemented as server-level middleware applied to engine request handlers: ```go func (s *Server) requireRateLimit(next http.HandlerFunc) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { info := TokenInfoFromContext(r.Context()) if !s.rateLimiter.Allow(info.Username) { writeJSONError(w, "rate limit exceeded", http.StatusTooManyRequests) return } next(w, r) } } ``` Track separately — this affects API design decisions (limits, quotas, per-user vs per-token). ### Files | File | Change | |------|--------| | `deploy/systemd/metacrypt.service` | Remove `ExecReload` line | | `deploy/systemd/metacrypt-web.service` | Remove `ExecReload` line | | `Dockerfile.api` | Update to `golang:1.25-alpine`; remove `gcc musl-dev` | | `Dockerfile.web` | Update to `golang:1.25-alpine`; remove `gcc musl-dev` | --- ## Implementation Order ``` 5. #54, #58 Default-deny in engines (security-critical, do first) 6. #63–#67 User engine concurrency (5 coupled fixes, one change) 7. #41–#43 Barrier/seal fixes (3 independent fixes) 8. #50, #51, #70 CA engine fixes (key usage + renewal + API sync) 9. #53 SSH CA lock granularity (standalone refactor) 10. #59 Transit version validation (one-line fix) 11. #46, #74 DB pragmas + auth cache (independent, no overlap) 12. #73 Policy glob matching (standalone) 13. #78, #79 Deployment fixes (non-code, standalone) #7 Audit logging (new feature, track separately) #10 SSH host extensions (one-line fix, ship with #9) #21 Transit rate limiting (new feature, track separately) ``` Work items 5–10 can be parallelized across engineers: - Engineer A: #5 (default-deny) + #9 (SSH CA locks) + #10 (host extensions) - Engineer B: #6 (user concurrency) + #8 (CA fixes) - Engineer C: #7 (barrier/seal) + #10 (transit version) + #11 (DB/auth) - Independent: #12 (policy), #13 (deployment) Work items #7 (audit logging) and #21 (rate limiting) are substantial new features and should be tracked as separate engineering tasks, not quick fixes. --- ## Post-Remediation (Medium) After all medium findings are resolved: 1. **Update AUDIT.md** — mark each finding as RESOLVED with summary. 2. **Run the full pipeline**: `make all` (vet, lint, test, build). 3. **Run race detector**: `go test -race ./...` — especially important for work items 6 and 9 (concurrency changes). 4. **Address remaining low findings** — 14 low-severity items remain, mostly zeroization gaps and documentation drift.