- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored privBytes instead of calling Bytes() (which returns a copy). New state populates privBytes; old references nil'd for GC. - Add audit logging subsystem (internal/audit) with structured event recording for cryptographic operations. - Add audit log engine spec (engines/auditlog.md). - Add ValidateName checks across all engines for path traversal (#48). - Update AUDIT.md: all High findings resolved (0 open). - Add REMEDIATION.md with detailed remediation tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
40 KiB
Remediation Plan — High-Priority Audit Findings
Date: 2026-03-17 Scope: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69
This plan addresses all eight High-severity findings from the 2026-03-17 full system audit. Findings are grouped into four work items by shared root cause or affected subsystem. The order reflects dependency chains: #68 is a standalone fix that should ship first; #48 is a prerequisite for safe operation across all engines; #39/#40 affect the storage core; the remaining four affect specific engines.
Work Item 1: JSON Injection in REST Error Responses (#68)
Risk: An error message containing " or \ breaks the JSON response
structure. If the error contains attacker-controlled input (e.g., a mount
name or key name that triggers a downstream error), this enables JSON
injection in API responses.
Root cause: 13 locations in internal/server/routes.go construct JSON
error responses via string concatenation:
http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError)
The writeEngineError helper (line 1704) is the most common entry point;
most typed handlers call it.
Fix
-
Replace
writeEngineErrorwith a safe JSON encoder:func writeJSONError(w http.ResponseWriter, msg string, code int) { w.Header().Set("Content-Type", "application/json") w.WriteHeader(code) _ = json.NewEncoder(w).Encode(map[string]string{"error": msg}) } -
Replace all 13 call sites that use string concatenation with
writeJSONError(w, grpcMessage(err), status)orwriteJSONError(w, err.Error(), status).The
grpcMessagehelper already exists in the webserver package and extracts human-readable messages from gRPC errors. Add an equivalent to the REST server, and prefer it over rawerr.Error()to avoid leaking internal error details. -
Grep for the pattern
"error":"inroutes.goto confirm no remaining string-concatenated JSON.
Files
| File | Change |
|---|---|
internal/server/routes.go |
Replace writeEngineError and all 13 inline error sites |
Verification
go vet ./internal/server/go test ./internal/server/- Manual test: mount an engine with a name containing
", trigger an error, verify the response is valid JSON.
Work Item 2: Path Traversal via Unsanitized Names (#48)
Risk: User-controlled strings (issuer names, key names, profile names,
usernames, mount names) are concatenated directly into barrier storage
paths. An input containing ../ traverses the barrier namespace, allowing
reads and writes to arbitrary paths. This affects all four engines and the
engine registry.
Root cause: No validation exists at any layer — neither the barrier's
Put/Get/Delete methods nor the engines sanitize path components.
Vulnerable locations
| File | Input | Path Pattern |
|---|---|---|
ca/ca.go |
issuer name |
mountPath + "issuers/" + name + "/" |
sshca/sshca.go |
profile name |
mountPath + "profiles/" + name + ".json" |
transit/transit.go |
key name |
mountPath + "keys/" + name + "/" |
user/user.go |
username |
mountPath + "users/" + username + "/" |
engine/engine.go |
mount name |
engine/{type}/{name}/ |
policy/policy.go |
rule ID |
policy/rules/{id} |
Fix
Enforce validation at two layers (defense in depth):
-
Barrier layer — reject paths containing
..segments.Add a
validatePathcheck at the top ofGet,Put,Delete, andListinbarrier.go:var ErrInvalidPath = errors.New("barrier: invalid path") func validatePath(p string) error { for _, seg := range strings.Split(p, "/") { if seg == ".." { return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p) } } return nil }Call
validatePathat the entry ofGet,Put,Delete,List. ReturnErrInvalidPathon failure. -
Engine/registry layer — validate entity names at input boundaries.
Add a
ValidateNamehelper tointernal/engine/:var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`) func ValidateName(name string) error { if name == "" || len(name) > 128 || !namePattern.MatchString(name) { return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+ "dot, hyphen, or underscore characters", name) } return nil }Call
ValidateNamein:Location Input validated engine.goMount()mount name ca.gohandleCreateIssuerissuer name sshca.gohandleCreateProfileprofile name transit.gohandleCreateKeykey name user.gohandleRegister,handleProvisionusername user.gohandleEncryptrecipient usernames policy.goCreateRulerule ID Note: certificate serials are generated server-side from
crypto/randand hex-encoded, so they are safe. Validate anyway for defense in depth.
Files
| File | Change |
|---|---|
internal/barrier/barrier.go |
Add validatePath, call from Get/Put/Delete/List |
internal/engine/engine.go |
Add ValidateName, call from Mount |
internal/engine/ca/ca.go |
Call ValidateName on issuer name |
internal/engine/sshca/sshca.go |
Call ValidateName on profile name |
internal/engine/transit/transit.go |
Call ValidateName on key name |
internal/engine/user/user.go |
Call ValidateName on usernames |
internal/policy/policy.go |
Call ValidateName on rule ID |
Verification
- Add
TestValidatePathtobarrier_test.go: confirm../and..are rejected; confirm normal paths pass. - Add
TestValidateNametoengine_test.go: confirm../evil, empty string, and overlong names are rejected; confirm valid names pass. go test ./internal/barrier/ ./internal/engine/... ./internal/policy/
Work Item 3: Barrier Concurrency and Crash Safety (#39, #40)
These two findings share the barrier/seal subsystem and should be addressed together.
#39 — TOCTOU Race in Barrier Get/Put
Risk: Get and Put copy the mek slice header and keys map
reference under RLock, release the lock, then use the copied references
for encryption/decryption. A concurrent Seal() zeroizes the underlying
byte slices in place before nil-ing the fields, so a concurrent reader
uses zeroized key material.
Root cause: The lock does not cover the crypto operation. The "copy"
is a shallow reference copy (slice header), not a deep byte copy. Seal()
zeroizes the backing array, which is shared.
Current locking pattern (barrier.go):
Get: RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key)
Put: RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key)
Seal: Lock → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock
Fix: Hold RLock through the entire crypto operation:
func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) {
if err := validatePath(path); err != nil {
return nil, err
}
b.mu.RLock()
defer b.mu.RUnlock()
if b.mek == nil {
return nil, ErrSealed
}
// query DB, resolve key, decrypt — all under RLock
// ...
}
This is the minimal, safest change. RLock permits concurrent readers, so
there is no throughput regression for parallel Get/Put operations. The
only serialization point is Seal(), which acquires the exclusive Lock
and waits for all readers to drain — exactly the semantics we want.
Apply the same pattern to Put, Delete, and List.
Alternative considered: Atomic pointer swap (atomic.Pointer[keyState]).
This eliminates the lock from the hot path entirely, but introduces
complexity around deferred zeroization of the old state (readers may still
hold references). The RLock-through-crypto approach is simpler and
sufficient for Metacrypt's concurrency profile.
#40 — Crash During ReWrapKeys Loses All Data
Risk: RotateMEK calls barrier.ReWrapKeys(newMEK) which commits a
transaction re-wrapping all DEKs, then separately updates seal_config
with the new encrypted MEK. A crash between these two database operations
leaves DEKs wrapped with a MEK that is not persisted — all data is
irrecoverable.
Current flow (seal.go lines 245–313):
1. Generate newMEK
2. barrier.ReWrapKeys(ctx, newMEK) ← commits transaction (barrier_keys updated)
3. crypto.Encrypt(kwk, newMEK, nil) ← encrypt new MEK
4. UPDATE seal_config SET encrypted_mek = ? ← separate statement, not in transaction
*** CRASH HERE = DATA LOSS ***
5. Swap in-memory MEK
Fix: Unify steps 2–4 into a single database transaction.
Refactor ReWrapKeys to accept an optional *sql.Tx:
// ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction.
func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error {
// Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx.
rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys")
// ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ...
}
// SwapMEK updates the in-memory MEK after a committed transaction.
func (b *AESGCMBarrier) SwapMEK(newMEK []byte) {
b.mu.Lock()
defer b.mu.Unlock()
mcrypto.Zeroize(b.mek)
b.mek = newMEK
}
Then in RotateMEK:
func (m *Manager) RotateMEK(ctx context.Context, password string) error {
// ... derive KWK, generate newMEK ...
tx, err := m.db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer tx.Rollback()
// Re-wrap all DEKs within the transaction.
if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil {
return err
}
// Update seal_config within the same transaction.
encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil)
if err != nil {
return err
}
if _, err := tx.ExecContext(ctx,
"UPDATE seal_config SET encrypted_mek = ? WHERE id = 1",
encNewMEK,
); err != nil {
return err
}
if err := tx.Commit(); err != nil {
return err
}
// Only after commit: update in-memory state.
m.barrier.SwapMEK(newMEK)
return nil
}
SQLite in WAL mode handles this correctly — the transaction is atomic
regardless of process crash. The barrier_keys and seal_config updates
either both commit or neither does.
Files
| File | Change |
|---|---|
internal/barrier/barrier.go |
Extend RLock scope in Get/Put/Delete/List; add ReWrapKeysTx, SwapMEK |
internal/seal/seal.go |
Wrap ReWrapKeysTx + seal_config UPDATE in single transaction |
internal/barrier/barrier_test.go |
Add concurrent Get/Seal stress test |
Verification
go test -race ./internal/barrier/ ./internal/seal/- Add
TestConcurrentGetSeal: spawn goroutines doing Get while another goroutine calls Seal. Run with-race. Verify no panics or data races. - Add
TestRotateMEKAtomic: verify thatbarrier_keysandseal_configare updated in the same transaction (mock the DB to detect transaction boundaries, or verify via rollback behavior).
Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69)
These four findings touch separate files with no overlap and can be addressed in parallel.
#49 — No TTL Ceiling in CA Certificate Issuance
Risk: A non-admin user can request an arbitrarily long certificate
lifetime. The issuer's MaxTTL exists in config but is not enforced
during handleIssue or handleSignCSR.
Root cause: The CA engine applies the user's requested TTL directly
to the certificate without comparing it against issuerConfig.MaxTTL.
The SSH CA engine correctly enforces this via resolveTTL — the CA
engine does not.
Fix: Add a resolveTTL method to the CA engine, following the SSH
CA engine's pattern (sshca.go lines 902–932):
func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) {
maxTTL, err := time.ParseDuration(issuer.config.MaxTTL)
if err != nil {
maxTTL = 2160 * time.Hour // 90 days fallback
}
if requested != "" {
ttl, err := time.ParseDuration(requested)
if err != nil {
return 0, fmt.Errorf("invalid TTL: %w", err)
}
if ttl > maxTTL {
return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL)
}
return ttl, nil
}
return maxTTL, nil
}
Call this in handleIssue and handleSignCSR before constructing the
certificate. Replace the raw TTL string with the validated duration.
| File | Change |
|---|---|
internal/engine/ca/ca.go |
Add resolveTTL, call from handleIssue and handleSignCSR |
internal/engine/ca/ca_test.go |
Add test: issue cert with TTL > MaxTTL, verify rejection |
#61 — Ineffective ECDH Key Zeroization
Risk: privKey.Bytes() returns a copy of the private key bytes.
Zeroizing the copy leaves the original inside *ecdh.PrivateKey. Go's
crypto/ecdh API does not expose the internal byte slice.
Root cause: Language/API limitation in Go's crypto/ecdh package.
Fix: Store the raw private key bytes alongside the parsed key in
userState, and zeroize those bytes on seal:
type userState struct {
privKey *ecdh.PrivateKey
privBytes []byte // raw key bytes, retained for zeroization
pubKey *ecdh.PublicKey
config *UserKeyConfig
}
On load from barrier (Unseal, auto-provision):
raw, err := b.Get(ctx, prefix+"priv.key")
priv, err := curve.NewPrivateKey(raw)
state.privBytes = raw // retain for zeroization
state.privKey = priv
On Seal:
mcrypto.Zeroize(u.privBytes)
u.privKey = nil
u.privBytes = nil
Document the limitation: the parsed *ecdh.PrivateKey struct's internal
copy cannot be zeroized from Go code. Setting privKey = nil makes it
eligible for GC, but does not guarantee immediate byte overwrite. This is
an accepted Go runtime limitation.
| File | Change |
|---|---|
internal/engine/user/user.go |
Add privBytes to userState, populate on load, zeroize on Seal |
internal/engine/user/types.go |
Update userState struct |
#62 — User Engine Policy Path Uses mountPath Instead of Mount Name
Risk: Policy checks construct the resource path using e.mountPath
(which is engine/user/{name}/) instead of just the mount name. Policy
rules match against user/{name}/recipient/{username}, so the full mount
path creates a mismatch like user/engine/user/myengine//recipient/alice.
No policy rule will ever match.
Root cause: Line 358 of user.go uses e.mountPath directly. The
SSH CA and transit engines correctly use a mountName() helper.
Fix: Add a mountName() method to the user engine:
func (e *UserEngine) mountName() string {
// mountPath is "engine/user/{name}/"
parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/")
if len(parts) >= 3 {
return parts[2]
}
return e.mountPath
}
Change line 358:
resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r)
Audit all other resource path constructions in the user engine to confirm they also use the correct mount name.
| File | Change |
|---|---|
internal/engine/user/user.go |
Add mountName(), fix resource path on line 358 |
internal/engine/user/user_test.go |
Add test: verify policy resource path format |
#69 — Typed REST Handlers Bypass Policy Engine
Risk: 18 typed REST handlers pass nil for CheckPolicy in the
engine.Request, skipping service-level policy evaluation. The generic
/v1/engine/request endpoint correctly passes a policyChecker. Since
engines #54 and #58 default to allow when no policy matches, typed routes
are effectively unprotected by policy.
Root cause: Typed handlers were modeled after admin-only operations (which don't need policy) but applied to user-accessible operations.
Fix: Extract the policy checker construction from
handleEngineRequest into a shared helper:
func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker {
return func(resource, action string) (string, bool) {
effect, matched, err := s.policy.Check(
info.Username, info.Roles, resource, action,
)
if err != nil || !matched {
return "deny", false
}
return effect, matched
}
}
Then in each typed handler, set CheckPolicy on the request:
req := &engine.Request{
Operation: "get-cert",
Data: data,
CallerInfo: callerInfo,
CheckPolicy: s.newPolicyChecker(callerInfo),
}
18 handlers to update:
| Handler | Operation |
|---|---|
handleGetCert |
get-cert |
handleRevokeCert |
revoke-cert |
handleDeleteCert |
delete-cert |
handleSSHCASignHost |
sign-host |
handleSSHCASignUser |
sign-user |
handleSSHCAGetProfile |
get-profile |
handleSSHCAListProfiles |
list-profiles |
handleSSHCADeleteProfile |
delete-profile |
handleSSHCAGetCert |
get-cert |
handleSSHCAListCerts |
list-certs |
handleSSHCARevokeCert |
revoke-cert |
handleSSHCADeleteCert |
delete-cert |
handleUserRegister |
register |
handleUserProvision |
provision |
handleUserListUsers |
list-users |
handleUserGetPublicKey |
get-public-key |
handleUserDeleteUser |
delete-user |
handleUserDecrypt |
decrypt |
Note: handleUserEncrypt already passes a policy checker — verify it
uses the same shared helper after refactoring. Admin-only handlers
(behind requireAdmin wrapper) do not need a policy checker since admin
bypasses policy.
| File | Change |
|---|---|
internal/server/routes.go |
Add newPolicyChecker, pass to all 18 typed handlers |
internal/server/server_test.go |
Add test: policy-denied user is rejected by typed route |
Verification (Work Item 4, all findings)
go test ./internal/engine/ca/
go test ./internal/engine/user/
go test ./internal/server/
go vet ./...
Implementation Order
1. #68 JSON injection (standalone, ship immediately)
2. #48 Path traversal (standalone, blocks safe engine operation)
3. #39 Barrier TOCTOU race ─┐
#40 ReWrapKeys crash safety ┘ (coupled, requires careful testing)
4. #49 CA TTL enforcement ─┐
#61 ECDH zeroization │
#62 User policy path │ (independent fixes, parallelizable)
#69 Policy bypass ─┘
Items 1 and 2 have no dependencies and can be done in parallel by different engineers.
Items 3 and 4 can also be done in parallel since they touch different subsystems (barrier/seal vs engines/server).
Post-Remediation (High)
All eight High findings (#39, #40, #48, #49, #61, #62, #68, #69) have
been resolved. See commit a80323e.
Remediation Plan — Medium-Priority Audit Findings
Date: 2026-03-17 Scope: AUDIT.md findings #7, #10, #21, #41, #42, #43, #46, #50, #51, #53, #54, #58, #59, #63, #64, #65, #66, #67, #70, #72, #73, #74, #78, #79
This plan addresses all medium-severity findings. Findings are grouped into eight work items by subsystem. Two findings (#72, #79) are already resolved or trivial and are noted inline.
Previously resolved by High remediation:
- #72 (policy rule ID path traversal) — covered by #48's
ValidateNameonCreateRuleand barrier-levelvalidatePath.
Work Item 5: Default-Deny in Engine Policy Checks (#54, #58)
Risk: The SSH CA and transit engines default to allow when no policy
rule matches a non-admin user's request. This contradicts the default-deny
principle in ARCHITECTURE.md and the engineering standards. Now that #69 is
resolved (typed REST handlers pass CheckPolicy), the engines receive a
policy checker — but when CheckPolicy returns no match, they still allow.
Root cause: Both engines treat "no matching policy rule" as implicit allow for authenticated users.
Fix
Change the policy evaluation result handling in both engines.
SSH CA (sshca.go, handleSignHost ~line 309, handleSignUser
~line 443):
// Before (default-allow):
if matched && effect == "deny" {
return nil, ErrForbidden
}
// After (default-deny):
if req.CheckPolicy != nil {
effect, matched := req.CheckPolicy(resource, "sign")
if !matched || effect != "allow" {
return nil, ErrForbidden
}
}
Transit (transit.go, requireUserWithPolicy ~line 330):
// Before (default-allow):
if matched {
if effect == "deny" {
return ErrForbidden
}
return nil
}
return nil // no match → allow
// After (default-deny):
if req.CheckPolicy == nil {
return ErrForbidden
}
effect, matched := req.CheckPolicy(resource, action)
if !matched || effect != "allow" {
return ErrForbidden
}
return nil
Apply the same pattern to any other policy check sites in these engines.
Files
| File | Change |
|---|---|
internal/engine/sshca/sshca.go |
Default-deny in handleSignHost, handleSignUser |
internal/engine/transit/transit.go |
Default-deny in requireUserWithPolicy |
internal/engine/sshca/sshca_test.go |
Add test: no policy rules → user request denied |
internal/engine/transit/transit_test.go |
Add test: no policy rules → user request denied |
Work Item 6: User Engine Concurrency and Safety (#63, #64, #65, #66, #67)
Five findings in the user engine share a root cause: the engine was implemented after the others and does not follow the same concurrency patterns.
#63 — Missing role checks on decrypt, re-encrypt, rotate-key
Fix: Add IsUser() check to handleDecrypt, handleReEncrypt, and
handleRotateKey, matching the pattern used in handleEncrypt:
if !req.CallerInfo.IsUser() {
return nil, ErrForbidden
}
#64 — Initialize holds no mutex
Fix: Acquire the write lock at the top of Initialize:
func (e *UserEngine) Initialize(ctx context.Context, ...) error {
e.mu.Lock()
defer e.mu.Unlock()
// ... existing body ...
}
This matches the pattern in the SSH CA and transit engines.
#65 — handleEncrypt releases lock before using state
Fix: Hold RLock (not write lock) through the ECDH and encryption
operations. After auto-provisioning (which needs the write lock), downgrade
to a read lock for the crypto work:
e.mu.Lock()
// ... auto-provisioning ...
// Capture references while holding write lock.
senderState := e.users[sender]
recipientStates := ...
e.mu.Unlock()
// Re-acquire RLock for crypto operations.
e.mu.RLock()
defer e.mu.RUnlock()
// verify states are still valid (not nil)
if senderState.privKey == nil {
return nil, ErrSealed
}
// ... ECDH, wrap DEK, encrypt ...
Alternatively, hold the write lock through the entire operation (simpler but serializes all encrypts). Given that encryption is the hot path, the downgrade approach is preferred.
#66 — handleReEncrypt manual lock without defer
Fix: Restructure to use defer for all lock releases. Split the
method into a provisioning phase (write lock) and a crypto phase (read
lock), each with its own defer:
func (e *UserEngine) handleReEncrypt(...) {
// Phase 1: read envelope, resolve users.
e.mu.RLock()
// ... read state ...
e.mu.RUnlock()
// Phase 2: crypto operations.
e.mu.RLock()
defer e.mu.RUnlock()
// ... decrypt, re-wrap, encrypt ...
}
#67 — No sealed-state check in HandleRequest
Fix: Add a sealed-state guard at the top of HandleRequest, before
the operation dispatch:
func (e *UserEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) {
e.mu.RLock()
sealed := e.config == nil
e.mu.RUnlock()
if sealed {
return nil, ErrSealed
}
switch req.Operation {
// ...
}
}
Files
| File | Change |
|---|---|
internal/engine/user/user.go |
All five fixes above |
internal/engine/user/user_test.go |
Add tests: guest role denied; concurrent encrypt/seal; sealed-state rejection |
Verification
go test -race ./internal/engine/user/
Work Item 7: Barrier and Seal Fixes (#41, #42, #43)
#41 — loadKeys errors silently swallowed during unseal
Risk: If loadKeys fails to decrypt DEK entries (corrupt data, wrong
MEK), errors are swallowed and the keys map is incomplete. Subsequent
operations fail with confusing ErrKeyNotFound errors.
Fix: Return the error from loadKeys instead of ignoring it. Distinguish
between "table doesn't exist" (acceptable on first run before migration)
and genuine decryption failures:
func (b *AESGCMBarrier) Unseal(mek []byte) error {
// ...
if err := b.loadKeys(); err != nil {
// Check if this is a "no such table" error (pre-migration).
if !isTableMissing(err) {
b.mek = nil
return fmt.Errorf("barrier: load keys: %w", err)
}
// Pre-migration: no barrier_keys table yet, proceed with empty map.
}
return nil
}
func isTableMissing(err error) bool {
return strings.Contains(err.Error(), "no such table")
}
#42 — No AAD binding on MEK encryption with KWK
Risk: The MEK ciphertext in seal_config has no AAD, so there is no
cryptographic binding to its storage purpose.
Fix: Pass a constant AAD string when encrypting/decrypting the MEK:
var mekAAD = []byte("metacrypt/seal_config/mek")
// In Init:
encryptedMEK, err := crypto.Encrypt(kwk, mek, mekAAD)
// In Unseal:
mek, err := crypto.Decrypt(kwk, encryptedMEK, mekAAD)
// In RotateMEK:
newEncMEK, err := crypto.Encrypt(kwk, newMEK, mekAAD)
Migration note: Existing databases have MEKs encrypted with nil AAD.
The unseal path must try mekAAD first, then fall back to nil for
backward compatibility. After successful unseal with nil, re-encrypt with
mekAAD and update seal_config.
#43 — Barrier List SQL LIKE with unescaped prefix
Risk: If a barrier path prefix contains % or _, the LIKE query
matches unintended entries.
Fix: Escape SQL LIKE wildcards in the prefix:
func escapeLIKE(s string) string {
s = strings.ReplaceAll(s, `\`, `\\`)
s = strings.ReplaceAll(s, `%`, `\%`)
s = strings.ReplaceAll(s, `_`, `\_`)
return s
}
// In List:
rows, err := b.db.QueryContext(ctx,
"SELECT path FROM barrier_entries WHERE path LIKE ? ESCAPE '\\'",
escapeLIKE(prefix)+"%")
Files
| File | Change |
|---|---|
internal/barrier/barrier.go |
Fix loadKeys error handling; add escapeLIKE to List |
internal/seal/seal.go |
Add mekAAD constant; use in Init, Unseal, RotateMEK with fallback |
internal/barrier/barrier_test.go |
Add test: List with % and _ in paths |
internal/seal/seal_test.go |
Add test: unseal with AAD-encrypted MEK |
Work Item 8: CA Engine Fixes (#50, #51, #70)
#50 — Non-admin users can override key usages
Risk: A non-admin user can request cert sign or crl sign key usage
on a leaf certificate, creating a de facto CA certificate.
Fix: Only allow admin users to override key_usages and
ext_key_usages. Non-admin users get the profile defaults:
// In handleIssue, after profile lookup:
if req.CallerInfo.IsAdmin {
if v, ok := req.Data["key_usages"].([]interface{}); ok {
profile.KeyUse = toStringSlice(v)
}
if v, ok := req.Data["ext_key_usages"].([]interface{}); ok {
profile.ExtKeyUsages = toStringSlice(v)
}
}
#51 — Certificate renewal does not revoke original
Fix: After issuing the renewed certificate, revoke the original by serial. The original serial is available from the request data:
// In handleRenew, after successful issuance:
oldSerial := req.Data["serial"].(string)
if err := e.revokeCertBySerial(ctx, issuerName, oldSerial, req.CallerInfo.Username); err != nil {
// Log but do not fail — the new cert is already issued.
e.logger.Warn("failed to revoke original during renewal",
"serial", oldSerial, "error", err)
}
#70 — RenewCert has no REST route (API sync violation)
Fix: Add a dedicated REST route for certificate renewal:
r.Post("/v1/ca/{mount}/cert/{serial}/renew", s.requireAuth(s.handleRenewCert))
Implement handleRenewCert following the typed handler pattern (with
CheckPolicy).
Files
| File | Change |
|---|---|
internal/engine/ca/ca.go |
Guard key usage overrides with IsAdmin; revoke original in handleRenew |
internal/server/routes.go |
Add POST /v1/ca/{mount}/cert/{serial}/renew route and handler |
internal/engine/ca/ca_test.go |
Add test: non-admin cannot set cert sign usage; renewal revokes original |
Work Item 9: SSH CA Lock Granularity (#53)
Risk: HandleRequest acquires a write lock for all operations including
reads (get-ca-pubkey, get-cert, list-certs, get-profile,
list-profiles, get-krl). This serializes all operations unnecessarily.
Fix: Move locking into individual handlers. Read-only operations use
RLock; mutating operations use Lock:
func (e *SSHCAEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) {
// No lock here — each handler manages its own.
switch req.Operation {
case "get-ca-pubkey":
return e.handleGetCAPubkey(ctx, req) // uses RLock internally
case "sign-host":
return e.handleSignHost(ctx, req) // uses Lock internally
// ...
}
}
Read-only handlers:
func (e *SSHCAEngine) handleGetCAPubkey(ctx context.Context, req *engine.Request) (*engine.Response, error) {
e.mu.RLock()
defer e.mu.RUnlock()
// ...
}
Mutating handlers (sign-host, sign-user, create-profile,
update-profile, delete-profile, revoke-cert, delete-cert):
func (e *SSHCAEngine) handleSignHost(ctx context.Context, req *engine.Request) (*engine.Response, error) {
e.mu.Lock()
defer e.mu.Unlock()
// ...
}
Files
| File | Change |
|---|---|
internal/engine/sshca/sshca.go |
Remove top-level lock from HandleRequest; add per-handler locks |
Work Item 10: Transit Ciphertext Validation (#59)
Risk: parseVersionedData accepts negative version numbers. A crafted
ciphertext metacrypt:v-1:... parses as version -1, which fails the key
lookup but produces a confusing error.
Fix: Add a bounds check after parsing:
func parseVersionedData(s string) (int, []byte, error) {
// ... existing parse logic ...
version, err := strconv.Atoi(parts[1][1:])
if err != nil {
return 0, nil, ErrInvalidFormat
}
if version < 1 {
return 0, nil, fmt.Errorf("transit: %w: version must be >= 1", ErrInvalidFormat)
}
// ...
}
Files
| File | Change |
|---|---|
internal/engine/transit/transit.go |
Add version < 1 check in parseVersionedData |
internal/engine/transit/transit_test.go |
Add test: metacrypt:v0:... and metacrypt:v-1:... rejected |
Work Item 11: Database and Auth (#46, #74)
#46 — SQLite PRAGMAs only applied to first connection
Risk: database/sql may open additional connections that don't receive
the journal_mode, foreign_keys, and busy_timeout PRAGMAs.
Fix: Use the _pragma DSN parameter supported by modernc.org/sqlite:
dsn := path + "?_pragma=journal_mode(WAL)&_pragma=foreign_keys(ON)&_pragma=busy_timeout(5000)"
db, err := sql.Open("sqlite", dsn)
Remove the manual PRAGMA execution loop. This ensures every connection opened by the pool receives the PRAGMAs.
#74 — Token validation cache grows without bound
Risk: The cache map in auth.go grows indefinitely. Expired entries
are never proactively removed.
Fix: Add periodic eviction. Run a background goroutine that sweeps expired entries every minute:
func (a *Authenticator) startEviction(ctx context.Context) {
ticker := time.NewTicker(time.Minute)
go func() {
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
a.evictExpired()
}
}
}()
}
func (a *Authenticator) evictExpired() {
now := time.Now()
a.mu.Lock()
defer a.mu.Unlock()
for key, entry := range a.cache {
if now.After(entry.expiresAt) {
delete(a.cache, key)
}
}
}
Call startEviction from the Authenticator constructor. Accept a
context.Context to allow cancellation on shutdown.
Files
| File | Change |
|---|---|
internal/db/db.go |
Switch to _pragma DSN parameters; remove manual PRAGMA loop |
internal/auth/auth.go |
Add startEviction goroutine; call from constructor |
internal/db/db_test.go |
Verify PRAGMAs are active on a fresh connection from the pool |
internal/auth/auth_test.go |
Add test: expired entries are evicted after sweep |
Work Item 12: Policy Engine Glob Matching (#73)
Risk: filepath.Match does not support ** for recursive directory
matching. Administrators writing rules like engine/**/certs/* will find
they don't match paths with multiple segments.
Fix: Replace filepath.Match with path.Match (POSIX-style, no
OS-specific behavior) and add support for ** by splitting the pattern
and value on / and matching segments:
import "path"
func matchesAnyGlob(patterns []string, value string) bool {
for _, p := range patterns {
if matchGlob(p, value) {
return true
}
}
return false
}
func matchGlob(pattern, value string) bool {
// Handle ** by checking if any suffix of value matches the rest of pattern.
if strings.Contains(pattern, "**") {
parts := strings.SplitN(pattern, "**", 2)
prefix := parts[0]
suffix := parts[1]
if !strings.HasPrefix(value, prefix) {
return false
}
remainder := value[len(prefix):]
// Try matching suffix against every suffix of remainder.
for i := 0; i <= len(remainder); i++ {
if matched, _ := path.Match(suffix, remainder[i:]); matched {
return true
}
// Advance to next separator.
next := strings.IndexByte(remainder[i:], '/')
if next < 0 {
break
}
i += next
}
return false
}
matched, _ := path.Match(pattern, value)
return matched
}
Document in POLICY.md that ** matches zero or more path segments, while
* matches within a single segment.
Files
| File | Change |
|---|---|
internal/policy/policy.go |
Replace filepath.Match with path.Match + ** support |
internal/policy/policy_test.go |
Add test: engine/**/certs/* matches engine/ca/prod/certs/abc123 |
Work Item 13: Deployment Fixes (#78, #79)
#78 — systemd ExecReload sends SIGHUP with no handler
Fix: Two options:
Option A (recommended): Remove ExecReload entirely. Metacrypt does
not support graceful reload — configuration changes require a restart. The
ExecReload line creates a false expectation.
Option B: Implement SIGHUP handling that re-reads the TOML config and applies non-breaking changes (log level, TLS cert reload). This is significant new functionality and should be a separate feature.
For now, remove ExecReload from both service units:
-ExecReload=/bin/kill -HUP $MAINPID
#79 — Dockerfiles use Go 1.23 but module requires Go 1.25
Fix: Update the builder base image:
-FROM golang:1.23-alpine AS builder
+FROM golang:1.25-alpine AS builder
Apply to both Dockerfile.api and Dockerfile.web. Also remove the
unnecessary gcc and musl-dev installation since CGO_ENABLED=0.
-RUN apk add --no-cache gcc musl-dev
#7 — No audit logging
Note: This is the only medium finding that requires significant new functionality rather than a targeted fix. Audit logging should be implemented as a dedicated subsystem:
- Define an
AuditEventstruct with: timestamp, caller, operation, resource, outcome (success/denied/error), and metadata. - Write events to a structured log sink (slog with JSON output to a
dedicated file at
/srv/metacrypt/audit.log). - Instrument every engine
HandleRequestto emit an event on completion. - Instrument policy
CreateRule/DeleteRule. - Instrument seal/unseal operations.
This is a substantial feature. Track separately from the quick-fix items in this plan.
#10 — No extension allowlist for SSH host certificates
Note: Host certificates typically don't use extensions (extensions are
for user certificates). The fix is to ignore the extensions field on
host signing requests rather than passing it through:
// In handleSignHost, after building the certificate:
cert.Permissions.Extensions = nil // Host certs should not carry extensions
#21 — No rate limiting on transit cryptographic operations
Note: This requires a rate-limiting middleware or per-caller token bucket. Best implemented as server-level middleware applied to engine request handlers:
func (s *Server) requireRateLimit(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
info := TokenInfoFromContext(r.Context())
if !s.rateLimiter.Allow(info.Username) {
writeJSONError(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next(w, r)
}
}
Track separately — this affects API design decisions (limits, quotas, per-user vs per-token).
Files
| File | Change |
|---|---|
deploy/systemd/metacrypt.service |
Remove ExecReload line |
deploy/systemd/metacrypt-web.service |
Remove ExecReload line |
Dockerfile.api |
Update to golang:1.25-alpine; remove gcc musl-dev |
Dockerfile.web |
Update to golang:1.25-alpine; remove gcc musl-dev |
Implementation Order
5. #54, #58 Default-deny in engines (security-critical, do first)
6. #63–#67 User engine concurrency (5 coupled fixes, one change)
7. #41–#43 Barrier/seal fixes (3 independent fixes)
8. #50, #51, #70 CA engine fixes (key usage + renewal + API sync)
9. #53 SSH CA lock granularity (standalone refactor)
10. #59 Transit version validation (one-line fix)
11. #46, #74 DB pragmas + auth cache (independent, no overlap)
12. #73 Policy glob matching (standalone)
13. #78, #79 Deployment fixes (non-code, standalone)
#7 Audit logging (new feature, track separately)
#10 SSH host extensions (one-line fix, ship with #9)
#21 Transit rate limiting (new feature, track separately)
Work items 5–10 can be parallelized across engineers:
- Engineer A: #5 (default-deny) + #9 (SSH CA locks) + #10 (host extensions)
- Engineer B: #6 (user concurrency) + #8 (CA fixes)
- Engineer C: #7 (barrier/seal) + #10 (transit version) + #11 (DB/auth)
- Independent: #12 (policy), #13 (deployment)
Work items #7 (audit logging) and #21 (rate limiting) are substantial new features and should be tracked as separate engineering tasks, not quick fixes.
Post-Remediation (Medium)
After all medium findings are resolved:
- Update AUDIT.md — mark each finding as RESOLVED with summary.
- Run the full pipeline:
make all(vet, lint, test, build). - Run race detector:
go test -race ./...— especially important for work items 6 and 9 (concurrency changes). - Address remaining low findings — 14 low-severity items remain, mostly zeroization gaps and documentation drift.