Web UI: Added browser-based management for all three remaining engines (SSH CA, Transit, User E2E). Includes gRPC client wiring, handler files, 7 HTML templates, dashboard mount forms, and conditional navigation links. Fixed REST API routes to match design specs (SSH CA cert singular paths, Transit PATCH for update-key-config). Security audit: Conducted full-system audit covering crypto core, all engine implementations, API servers, policy engine, auth, deployment, and documentation. Identified 42 new findings (#39-#80) across all severity levels. Remediation of all 8 High findings: - #68: Replaced 14 JSON-injection-vulnerable error responses with safe json.Encoder via writeJSONError helper - #48: Added two-layer path traversal defense (barrier validatePath rejects ".." segments; engine ValidateName enforces safe name pattern) - #39: Extended RLock through entire crypto operations in barrier Get/Put/Delete/List to eliminate TOCTOU race with Seal - #40: Unified ReWrapKeys and seal_config UPDATE into single SQLite transaction to prevent irrecoverable data loss on crash during MEK rotation - #49: Added resolveTTL to CA engine enforcing issuer MaxTTL ceiling on handleIssue and handleSignCSR - #61: Store raw ECDH private key bytes in userState for effective zeroization on Seal - #62: Fixed user engine policy resource path from mountPath to mountName() so policy rules match correctly - #69: Added newPolicyChecker helper and passed service-level policy evaluation to all 25 typed REST handler engine.Request structs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
20 KiB
Remediation Plan — High-Priority Audit Findings
Date: 2026-03-17 Scope: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69
This plan addresses all eight High-severity findings from the 2026-03-17 full system audit. Findings are grouped into four work items by shared root cause or affected subsystem. The order reflects dependency chains: #68 is a standalone fix that should ship first; #48 is a prerequisite for safe operation across all engines; #39/#40 affect the storage core; the remaining four affect specific engines.
Work Item 1: JSON Injection in REST Error Responses (#68)
Risk: An error message containing " or \ breaks the JSON response
structure. If the error contains attacker-controlled input (e.g., a mount
name or key name that triggers a downstream error), this enables JSON
injection in API responses.
Root cause: 13 locations in internal/server/routes.go construct JSON
error responses via string concatenation:
http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError)
The writeEngineError helper (line 1704) is the most common entry point;
most typed handlers call it.
Fix
-
Replace
writeEngineErrorwith a safe JSON encoder:func writeJSONError(w http.ResponseWriter, msg string, code int) { w.Header().Set("Content-Type", "application/json") w.WriteHeader(code) _ = json.NewEncoder(w).Encode(map[string]string{"error": msg}) } -
Replace all 13 call sites that use string concatenation with
writeJSONError(w, grpcMessage(err), status)orwriteJSONError(w, err.Error(), status).The
grpcMessagehelper already exists in the webserver package and extracts human-readable messages from gRPC errors. Add an equivalent to the REST server, and prefer it over rawerr.Error()to avoid leaking internal error details. -
Grep for the pattern
"error":"inroutes.goto confirm no remaining string-concatenated JSON.
Files
| File | Change |
|---|---|
internal/server/routes.go |
Replace writeEngineError and all 13 inline error sites |
Verification
go vet ./internal/server/go test ./internal/server/- Manual test: mount an engine with a name containing
", trigger an error, verify the response is valid JSON.
Work Item 2: Path Traversal via Unsanitized Names (#48)
Risk: User-controlled strings (issuer names, key names, profile names,
usernames, mount names) are concatenated directly into barrier storage
paths. An input containing ../ traverses the barrier namespace, allowing
reads and writes to arbitrary paths. This affects all four engines and the
engine registry.
Root cause: No validation exists at any layer — neither the barrier's
Put/Get/Delete methods nor the engines sanitize path components.
Vulnerable locations
| File | Input | Path Pattern |
|---|---|---|
ca/ca.go |
issuer name |
mountPath + "issuers/" + name + "/" |
sshca/sshca.go |
profile name |
mountPath + "profiles/" + name + ".json" |
transit/transit.go |
key name |
mountPath + "keys/" + name + "/" |
user/user.go |
username |
mountPath + "users/" + username + "/" |
engine/engine.go |
mount name |
engine/{type}/{name}/ |
policy/policy.go |
rule ID |
policy/rules/{id} |
Fix
Enforce validation at two layers (defense in depth):
-
Barrier layer — reject paths containing
..segments.Add a
validatePathcheck at the top ofGet,Put,Delete, andListinbarrier.go:var ErrInvalidPath = errors.New("barrier: invalid path") func validatePath(p string) error { for _, seg := range strings.Split(p, "/") { if seg == ".." { return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p) } } return nil }Call
validatePathat the entry ofGet,Put,Delete,List. ReturnErrInvalidPathon failure. -
Engine/registry layer — validate entity names at input boundaries.
Add a
ValidateNamehelper tointernal/engine/:var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`) func ValidateName(name string) error { if name == "" || len(name) > 128 || !namePattern.MatchString(name) { return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+ "dot, hyphen, or underscore characters", name) } return nil }Call
ValidateNamein:Location Input validated engine.goMount()mount name ca.gohandleCreateIssuerissuer name sshca.gohandleCreateProfileprofile name transit.gohandleCreateKeykey name user.gohandleRegister,handleProvisionusername user.gohandleEncryptrecipient usernames policy.goCreateRulerule ID Note: certificate serials are generated server-side from
crypto/randand hex-encoded, so they are safe. Validate anyway for defense in depth.
Files
| File | Change |
|---|---|
internal/barrier/barrier.go |
Add validatePath, call from Get/Put/Delete/List |
internal/engine/engine.go |
Add ValidateName, call from Mount |
internal/engine/ca/ca.go |
Call ValidateName on issuer name |
internal/engine/sshca/sshca.go |
Call ValidateName on profile name |
internal/engine/transit/transit.go |
Call ValidateName on key name |
internal/engine/user/user.go |
Call ValidateName on usernames |
internal/policy/policy.go |
Call ValidateName on rule ID |
Verification
- Add
TestValidatePathtobarrier_test.go: confirm../and..are rejected; confirm normal paths pass. - Add
TestValidateNametoengine_test.go: confirm../evil, empty string, and overlong names are rejected; confirm valid names pass. go test ./internal/barrier/ ./internal/engine/... ./internal/policy/
Work Item 3: Barrier Concurrency and Crash Safety (#39, #40)
These two findings share the barrier/seal subsystem and should be addressed together.
#39 — TOCTOU Race in Barrier Get/Put
Risk: Get and Put copy the mek slice header and keys map
reference under RLock, release the lock, then use the copied references
for encryption/decryption. A concurrent Seal() zeroizes the underlying
byte slices in place before nil-ing the fields, so a concurrent reader
uses zeroized key material.
Root cause: The lock does not cover the crypto operation. The "copy"
is a shallow reference copy (slice header), not a deep byte copy. Seal()
zeroizes the backing array, which is shared.
Current locking pattern (barrier.go):
Get: RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key)
Put: RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key)
Seal: Lock → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock
Fix: Hold RLock through the entire crypto operation:
func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) {
if err := validatePath(path); err != nil {
return nil, err
}
b.mu.RLock()
defer b.mu.RUnlock()
if b.mek == nil {
return nil, ErrSealed
}
// query DB, resolve key, decrypt — all under RLock
// ...
}
This is the minimal, safest change. RLock permits concurrent readers, so
there is no throughput regression for parallel Get/Put operations. The
only serialization point is Seal(), which acquires the exclusive Lock
and waits for all readers to drain — exactly the semantics we want.
Apply the same pattern to Put, Delete, and List.
Alternative considered: Atomic pointer swap (atomic.Pointer[keyState]).
This eliminates the lock from the hot path entirely, but introduces
complexity around deferred zeroization of the old state (readers may still
hold references). The RLock-through-crypto approach is simpler and
sufficient for Metacrypt's concurrency profile.
#40 — Crash During ReWrapKeys Loses All Data
Risk: RotateMEK calls barrier.ReWrapKeys(newMEK) which commits a
transaction re-wrapping all DEKs, then separately updates seal_config
with the new encrypted MEK. A crash between these two database operations
leaves DEKs wrapped with a MEK that is not persisted — all data is
irrecoverable.
Current flow (seal.go lines 245–313):
1. Generate newMEK
2. barrier.ReWrapKeys(ctx, newMEK) ← commits transaction (barrier_keys updated)
3. crypto.Encrypt(kwk, newMEK, nil) ← encrypt new MEK
4. UPDATE seal_config SET encrypted_mek = ? ← separate statement, not in transaction
*** CRASH HERE = DATA LOSS ***
5. Swap in-memory MEK
Fix: Unify steps 2–4 into a single database transaction.
Refactor ReWrapKeys to accept an optional *sql.Tx:
// ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction.
func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error {
// Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx.
rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys")
// ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ...
}
// SwapMEK updates the in-memory MEK after a committed transaction.
func (b *AESGCMBarrier) SwapMEK(newMEK []byte) {
b.mu.Lock()
defer b.mu.Unlock()
mcrypto.Zeroize(b.mek)
b.mek = newMEK
}
Then in RotateMEK:
func (m *Manager) RotateMEK(ctx context.Context, password string) error {
// ... derive KWK, generate newMEK ...
tx, err := m.db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer tx.Rollback()
// Re-wrap all DEKs within the transaction.
if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil {
return err
}
// Update seal_config within the same transaction.
encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil)
if err != nil {
return err
}
if _, err := tx.ExecContext(ctx,
"UPDATE seal_config SET encrypted_mek = ? WHERE id = 1",
encNewMEK,
); err != nil {
return err
}
if err := tx.Commit(); err != nil {
return err
}
// Only after commit: update in-memory state.
m.barrier.SwapMEK(newMEK)
return nil
}
SQLite in WAL mode handles this correctly — the transaction is atomic
regardless of process crash. The barrier_keys and seal_config updates
either both commit or neither does.
Files
| File | Change |
|---|---|
internal/barrier/barrier.go |
Extend RLock scope in Get/Put/Delete/List; add ReWrapKeysTx, SwapMEK |
internal/seal/seal.go |
Wrap ReWrapKeysTx + seal_config UPDATE in single transaction |
internal/barrier/barrier_test.go |
Add concurrent Get/Seal stress test |
Verification
go test -race ./internal/barrier/ ./internal/seal/- Add
TestConcurrentGetSeal: spawn goroutines doing Get while another goroutine calls Seal. Run with-race. Verify no panics or data races. - Add
TestRotateMEKAtomic: verify thatbarrier_keysandseal_configare updated in the same transaction (mock the DB to detect transaction boundaries, or verify via rollback behavior).
Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69)
These four findings touch separate files with no overlap and can be addressed in parallel.
#49 — No TTL Ceiling in CA Certificate Issuance
Risk: A non-admin user can request an arbitrarily long certificate
lifetime. The issuer's MaxTTL exists in config but is not enforced
during handleIssue or handleSignCSR.
Root cause: The CA engine applies the user's requested TTL directly
to the certificate without comparing it against issuerConfig.MaxTTL.
The SSH CA engine correctly enforces this via resolveTTL — the CA
engine does not.
Fix: Add a resolveTTL method to the CA engine, following the SSH
CA engine's pattern (sshca.go lines 902–932):
func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) {
maxTTL, err := time.ParseDuration(issuer.config.MaxTTL)
if err != nil {
maxTTL = 2160 * time.Hour // 90 days fallback
}
if requested != "" {
ttl, err := time.ParseDuration(requested)
if err != nil {
return 0, fmt.Errorf("invalid TTL: %w", err)
}
if ttl > maxTTL {
return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL)
}
return ttl, nil
}
return maxTTL, nil
}
Call this in handleIssue and handleSignCSR before constructing the
certificate. Replace the raw TTL string with the validated duration.
| File | Change |
|---|---|
internal/engine/ca/ca.go |
Add resolveTTL, call from handleIssue and handleSignCSR |
internal/engine/ca/ca_test.go |
Add test: issue cert with TTL > MaxTTL, verify rejection |
#61 — Ineffective ECDH Key Zeroization
Risk: privKey.Bytes() returns a copy of the private key bytes.
Zeroizing the copy leaves the original inside *ecdh.PrivateKey. Go's
crypto/ecdh API does not expose the internal byte slice.
Root cause: Language/API limitation in Go's crypto/ecdh package.
Fix: Store the raw private key bytes alongside the parsed key in
userState, and zeroize those bytes on seal:
type userState struct {
privKey *ecdh.PrivateKey
privBytes []byte // raw key bytes, retained for zeroization
pubKey *ecdh.PublicKey
config *UserKeyConfig
}
On load from barrier (Unseal, auto-provision):
raw, err := b.Get(ctx, prefix+"priv.key")
priv, err := curve.NewPrivateKey(raw)
state.privBytes = raw // retain for zeroization
state.privKey = priv
On Seal:
mcrypto.Zeroize(u.privBytes)
u.privKey = nil
u.privBytes = nil
Document the limitation: the parsed *ecdh.PrivateKey struct's internal
copy cannot be zeroized from Go code. Setting privKey = nil makes it
eligible for GC, but does not guarantee immediate byte overwrite. This is
an accepted Go runtime limitation.
| File | Change |
|---|---|
internal/engine/user/user.go |
Add privBytes to userState, populate on load, zeroize on Seal |
internal/engine/user/types.go |
Update userState struct |
#62 — User Engine Policy Path Uses mountPath Instead of Mount Name
Risk: Policy checks construct the resource path using e.mountPath
(which is engine/user/{name}/) instead of just the mount name. Policy
rules match against user/{name}/recipient/{username}, so the full mount
path creates a mismatch like user/engine/user/myengine//recipient/alice.
No policy rule will ever match.
Root cause: Line 358 of user.go uses e.mountPath directly. The
SSH CA and transit engines correctly use a mountName() helper.
Fix: Add a mountName() method to the user engine:
func (e *UserEngine) mountName() string {
// mountPath is "engine/user/{name}/"
parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/")
if len(parts) >= 3 {
return parts[2]
}
return e.mountPath
}
Change line 358:
resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r)
Audit all other resource path constructions in the user engine to confirm they also use the correct mount name.
| File | Change |
|---|---|
internal/engine/user/user.go |
Add mountName(), fix resource path on line 358 |
internal/engine/user/user_test.go |
Add test: verify policy resource path format |
#69 — Typed REST Handlers Bypass Policy Engine
Risk: 18 typed REST handlers pass nil for CheckPolicy in the
engine.Request, skipping service-level policy evaluation. The generic
/v1/engine/request endpoint correctly passes a policyChecker. Since
engines #54 and #58 default to allow when no policy matches, typed routes
are effectively unprotected by policy.
Root cause: Typed handlers were modeled after admin-only operations (which don't need policy) but applied to user-accessible operations.
Fix: Extract the policy checker construction from
handleEngineRequest into a shared helper:
func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker {
return func(resource, action string) (string, bool) {
effect, matched, err := s.policy.Check(
info.Username, info.Roles, resource, action,
)
if err != nil || !matched {
return "deny", false
}
return effect, matched
}
}
Then in each typed handler, set CheckPolicy on the request:
req := &engine.Request{
Operation: "get-cert",
Data: data,
CallerInfo: callerInfo,
CheckPolicy: s.newPolicyChecker(callerInfo),
}
18 handlers to update:
| Handler | Operation |
|---|---|
handleGetCert |
get-cert |
handleRevokeCert |
revoke-cert |
handleDeleteCert |
delete-cert |
handleSSHCASignHost |
sign-host |
handleSSHCASignUser |
sign-user |
handleSSHCAGetProfile |
get-profile |
handleSSHCAListProfiles |
list-profiles |
handleSSHCADeleteProfile |
delete-profile |
handleSSHCAGetCert |
get-cert |
handleSSHCAListCerts |
list-certs |
handleSSHCARevokeCert |
revoke-cert |
handleSSHCADeleteCert |
delete-cert |
handleUserRegister |
register |
handleUserProvision |
provision |
handleUserListUsers |
list-users |
handleUserGetPublicKey |
get-public-key |
handleUserDeleteUser |
delete-user |
handleUserDecrypt |
decrypt |
Note: handleUserEncrypt already passes a policy checker — verify it
uses the same shared helper after refactoring. Admin-only handlers
(behind requireAdmin wrapper) do not need a policy checker since admin
bypasses policy.
| File | Change |
|---|---|
internal/server/routes.go |
Add newPolicyChecker, pass to all 18 typed handlers |
internal/server/server_test.go |
Add test: policy-denied user is rejected by typed route |
Verification (Work Item 4, all findings)
go test ./internal/engine/ca/
go test ./internal/engine/user/
go test ./internal/server/
go vet ./...
Implementation Order
1. #68 JSON injection (standalone, ship immediately)
2. #48 Path traversal (standalone, blocks safe engine operation)
3. #39 Barrier TOCTOU race ─┐
#40 ReWrapKeys crash safety ┘ (coupled, requires careful testing)
4. #49 CA TTL enforcement ─┐
#61 ECDH zeroization │
#62 User policy path │ (independent fixes, parallelizable)
#69 Policy bypass ─┘
Items 1 and 2 have no dependencies and can be done in parallel by different engineers.
Items 3 and 4 can also be done in parallel since they touch different subsystems (barrier/seal vs engines/server).
Post-Remediation
After all eight findings are resolved:
- Update AUDIT.md — mark #39, #40, #48, #49, #61, #62, #68, #69 as RESOLVED with resolution summaries.
- Run the full pipeline:
make all(vet, lint, test, build). - Run race detector:
go test -race ./... - Address related medium findings that interact with these fixes:
- #54 (SSH CA default-allow) and #58 (transit default-allow) — once
#69 is fixed, the typed handlers will pass policy checkers to the
engines, but the engines still default-allow when
CheckPolicyreturns no match. Consider changing the engine-level default to deny for non-admin callers. - #72 (policy ID path traversal) — already covered by #48's
ValidateNamefix onCreateRule.
- #54 (SSH CA default-allow) and #58 (transit default-allow) — once
#69 is fixed, the typed handlers will pass policy checkers to the
engines, but the engines still default-allow when