- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored privBytes instead of calling Bytes() (which returns a copy). New state populates privBytes; old references nil'd for GC. - Add audit logging subsystem (internal/audit) with structured event recording for cryptographic operations. - Add audit log engine spec (engines/auditlog.md). - Add ValidateName checks across all engines for path traversal (#48). - Update AUDIT.md: all High findings resolved (0 open). - Add REMEDIATION.md with detailed remediation tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1263 lines
40 KiB
Markdown
1263 lines
40 KiB
Markdown
# Remediation Plan — High-Priority Audit Findings
|
||
|
||
**Date**: 2026-03-17
|
||
**Scope**: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69
|
||
|
||
This plan addresses all eight High-severity findings from the 2026-03-17
|
||
full system audit. Findings are grouped into four work items by shared root
|
||
cause or affected subsystem. The order reflects dependency chains: #68 is a
|
||
standalone fix that should ship first; #48 is a prerequisite for safe
|
||
operation across all engines; #39/#40 affect the storage core; the remaining
|
||
four affect specific engines.
|
||
|
||
---
|
||
|
||
## Work Item 1: JSON Injection in REST Error Responses (#68)
|
||
|
||
**Risk**: An error message containing `"` or `\` breaks the JSON response
|
||
structure. If the error contains attacker-controlled input (e.g., a mount
|
||
name or key name that triggers a downstream error), this enables JSON
|
||
injection in API responses.
|
||
|
||
**Root cause**: 13 locations in `internal/server/routes.go` construct JSON
|
||
error responses via string concatenation:
|
||
|
||
```go
|
||
http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError)
|
||
```
|
||
|
||
The `writeEngineError` helper (line 1704) is the most common entry point;
|
||
most typed handlers call it.
|
||
|
||
### Fix
|
||
|
||
1. **Replace `writeEngineError`** with a safe JSON encoder:
|
||
|
||
```go
|
||
func writeJSONError(w http.ResponseWriter, msg string, code int) {
|
||
w.Header().Set("Content-Type", "application/json")
|
||
w.WriteHeader(code)
|
||
_ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
|
||
}
|
||
```
|
||
|
||
2. **Replace all 13 call sites** that use string concatenation with
|
||
`writeJSONError(w, grpcMessage(err), status)` or
|
||
`writeJSONError(w, err.Error(), status)`.
|
||
|
||
The `grpcMessage` helper already exists in the webserver package and
|
||
extracts human-readable messages from gRPC errors. Add an equivalent
|
||
to the REST server, and prefer it over raw `err.Error()` to avoid
|
||
leaking internal error details.
|
||
|
||
3. **Grep for the pattern** `"error":"` in `routes.go` to confirm no
|
||
remaining string-concatenated JSON.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/server/routes.go` | Replace `writeEngineError` and all 13 inline error sites |
|
||
|
||
### Verification
|
||
|
||
- `go vet ./internal/server/`
|
||
- `go test ./internal/server/`
|
||
- Manual test: mount an engine with a name containing `"`, trigger an error,
|
||
verify the response is valid JSON.
|
||
|
||
---
|
||
|
||
## Work Item 2: Path Traversal via Unsanitized Names (#48)
|
||
|
||
**Risk**: User-controlled strings (issuer names, key names, profile names,
|
||
usernames, mount names) are concatenated directly into barrier storage
|
||
paths. An input containing `../` traverses the barrier namespace, allowing
|
||
reads and writes to arbitrary paths. This affects all four engines and the
|
||
engine registry.
|
||
|
||
**Root cause**: No validation exists at any layer — neither the barrier's
|
||
`Put`/`Get`/`Delete` methods nor the engines sanitize path components.
|
||
|
||
### Vulnerable locations
|
||
|
||
| File | Input | Path Pattern |
|
||
|------|-------|-------------|
|
||
| `ca/ca.go` | issuer `name` | `mountPath + "issuers/" + name + "/"` |
|
||
| `sshca/sshca.go` | profile `name` | `mountPath + "profiles/" + name + ".json"` |
|
||
| `transit/transit.go` | key `name` | `mountPath + "keys/" + name + "/"` |
|
||
| `user/user.go` | `username` | `mountPath + "users/" + username + "/"` |
|
||
| `engine/engine.go` | mount `name` | `engine/{type}/{name}/` |
|
||
| `policy/policy.go` | rule `ID` | `policy/rules/{id}` |
|
||
|
||
### Fix
|
||
|
||
Enforce validation at **two layers** (defense in depth):
|
||
|
||
1. **Barrier layer** — reject paths containing `..` segments.
|
||
|
||
Add a `validatePath` check at the top of `Get`, `Put`, `Delete`, and
|
||
`List` in `barrier.go`:
|
||
|
||
```go
|
||
var ErrInvalidPath = errors.New("barrier: invalid path")
|
||
|
||
func validatePath(p string) error {
|
||
for _, seg := range strings.Split(p, "/") {
|
||
if seg == ".." {
|
||
return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p)
|
||
}
|
||
}
|
||
return nil
|
||
}
|
||
```
|
||
|
||
Call `validatePath` at the entry of `Get`, `Put`, `Delete`, `List`.
|
||
Return `ErrInvalidPath` on failure.
|
||
|
||
2. **Engine/registry layer** — validate entity names at input boundaries.
|
||
|
||
Add a `ValidateName` helper to `internal/engine/`:
|
||
|
||
```go
|
||
var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
|
||
|
||
func ValidateName(name string) error {
|
||
if name == "" || len(name) > 128 || !namePattern.MatchString(name) {
|
||
return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+
|
||
"dot, hyphen, or underscore characters", name)
|
||
}
|
||
return nil
|
||
}
|
||
```
|
||
|
||
Call `ValidateName` in:
|
||
|
||
| Location | Input validated |
|
||
|----------|----------------|
|
||
| `engine.go` `Mount()` | mount name |
|
||
| `ca.go` `handleCreateIssuer` | issuer name |
|
||
| `sshca.go` `handleCreateProfile` | profile name |
|
||
| `transit.go` `handleCreateKey` | key name |
|
||
| `user.go` `handleRegister`, `handleProvision` | username |
|
||
| `user.go` `handleEncrypt` | recipient usernames |
|
||
| `policy.go` `CreateRule` | rule ID |
|
||
|
||
Note: certificate serials are generated server-side from `crypto/rand`
|
||
and hex-encoded, so they are safe. Validate anyway for defense in depth.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/barrier/barrier.go` | Add `validatePath`, call from Get/Put/Delete/List |
|
||
| `internal/engine/engine.go` | Add `ValidateName`, call from `Mount` |
|
||
| `internal/engine/ca/ca.go` | Call `ValidateName` on issuer name |
|
||
| `internal/engine/sshca/sshca.go` | Call `ValidateName` on profile name |
|
||
| `internal/engine/transit/transit.go` | Call `ValidateName` on key name |
|
||
| `internal/engine/user/user.go` | Call `ValidateName` on usernames |
|
||
| `internal/policy/policy.go` | Call `ValidateName` on rule ID |
|
||
|
||
### Verification
|
||
|
||
- Add `TestValidatePath` to `barrier_test.go`: confirm `../` and `..` are
|
||
rejected; confirm normal paths pass.
|
||
- Add `TestValidateName` to `engine_test.go`: confirm `../evil`, empty
|
||
string, and overlong names are rejected; confirm valid names pass.
|
||
- `go test ./internal/barrier/ ./internal/engine/... ./internal/policy/`
|
||
|
||
---
|
||
|
||
## Work Item 3: Barrier Concurrency and Crash Safety (#39, #40)
|
||
|
||
These two findings share the barrier/seal subsystem and should be addressed
|
||
together.
|
||
|
||
### #39 — TOCTOU Race in Barrier Get/Put
|
||
|
||
**Risk**: `Get` and `Put` copy the `mek` slice header and `keys` map
|
||
reference under `RLock`, release the lock, then use the copied references
|
||
for encryption/decryption. A concurrent `Seal()` zeroizes the underlying
|
||
byte slices in place before nil-ing the fields, so a concurrent reader
|
||
uses zeroized key material.
|
||
|
||
**Root cause**: The lock does not cover the crypto operation. The "copy"
|
||
is a shallow reference copy (slice header), not a deep byte copy. `Seal()`
|
||
zeroizes the backing array, which is shared.
|
||
|
||
**Current locking pattern** (`barrier.go`):
|
||
|
||
```
|
||
Get: RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key)
|
||
Put: RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key)
|
||
Seal: Lock → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock
|
||
```
|
||
|
||
**Fix**: Hold `RLock` through the entire crypto operation:
|
||
|
||
```go
|
||
func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) {
|
||
if err := validatePath(path); err != nil {
|
||
return nil, err
|
||
}
|
||
b.mu.RLock()
|
||
defer b.mu.RUnlock()
|
||
if b.mek == nil {
|
||
return nil, ErrSealed
|
||
}
|
||
// query DB, resolve key, decrypt — all under RLock
|
||
// ...
|
||
}
|
||
```
|
||
|
||
This is the minimal, safest change. `RLock` permits concurrent readers, so
|
||
there is no throughput regression for parallel `Get`/`Put` operations. The
|
||
only serialization point is `Seal()`, which acquires the exclusive `Lock`
|
||
and waits for all readers to drain — exactly the semantics we want.
|
||
|
||
Apply the same pattern to `Put`, `Delete`, and `List`.
|
||
|
||
**Alternative considered**: Atomic pointer swap (`atomic.Pointer[keyState]`).
|
||
This eliminates the lock from the hot path entirely, but introduces
|
||
complexity around deferred zeroization of the old state (readers may still
|
||
hold references). The `RLock`-through-crypto approach is simpler and
|
||
sufficient for Metacrypt's concurrency profile.
|
||
|
||
### #40 — Crash During `ReWrapKeys` Loses All Data
|
||
|
||
**Risk**: `RotateMEK` calls `barrier.ReWrapKeys(newMEK)` which commits a
|
||
transaction re-wrapping all DEKs, then separately updates `seal_config`
|
||
with the new encrypted MEK. A crash between these two database operations
|
||
leaves DEKs wrapped with a MEK that is not persisted — all data is
|
||
irrecoverable.
|
||
|
||
**Current flow** (`seal.go` lines 245–313):
|
||
|
||
```
|
||
1. Generate newMEK
|
||
2. barrier.ReWrapKeys(ctx, newMEK) ← commits transaction (barrier_keys updated)
|
||
3. crypto.Encrypt(kwk, newMEK, nil) ← encrypt new MEK
|
||
4. UPDATE seal_config SET encrypted_mek = ? ← separate statement, not in transaction
|
||
*** CRASH HERE = DATA LOSS ***
|
||
5. Swap in-memory MEK
|
||
```
|
||
|
||
**Fix**: Unify steps 2–4 into a single database transaction.
|
||
|
||
Refactor `ReWrapKeys` to accept an optional `*sql.Tx`:
|
||
|
||
```go
|
||
// ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction.
|
||
func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error {
|
||
// Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx.
|
||
rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys")
|
||
// ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ...
|
||
}
|
||
|
||
// SwapMEK updates the in-memory MEK after a committed transaction.
|
||
func (b *AESGCMBarrier) SwapMEK(newMEK []byte) {
|
||
b.mu.Lock()
|
||
defer b.mu.Unlock()
|
||
mcrypto.Zeroize(b.mek)
|
||
b.mek = newMEK
|
||
}
|
||
```
|
||
|
||
Then in `RotateMEK`:
|
||
|
||
```go
|
||
func (m *Manager) RotateMEK(ctx context.Context, password string) error {
|
||
// ... derive KWK, generate newMEK ...
|
||
|
||
tx, err := m.db.BeginTx(ctx, nil)
|
||
if err != nil {
|
||
return err
|
||
}
|
||
defer tx.Rollback()
|
||
|
||
// Re-wrap all DEKs within the transaction.
|
||
if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil {
|
||
return err
|
||
}
|
||
|
||
// Update seal_config within the same transaction.
|
||
encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil)
|
||
if err != nil {
|
||
return err
|
||
}
|
||
if _, err := tx.ExecContext(ctx,
|
||
"UPDATE seal_config SET encrypted_mek = ? WHERE id = 1",
|
||
encNewMEK,
|
||
); err != nil {
|
||
return err
|
||
}
|
||
|
||
if err := tx.Commit(); err != nil {
|
||
return err
|
||
}
|
||
|
||
// Only after commit: update in-memory state.
|
||
m.barrier.SwapMEK(newMEK)
|
||
return nil
|
||
}
|
||
```
|
||
|
||
SQLite in WAL mode handles this correctly — the transaction is atomic
|
||
regardless of process crash. The `barrier_keys` and `seal_config` updates
|
||
either both commit or neither does.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/barrier/barrier.go` | Extend RLock scope in Get/Put/Delete/List; add `ReWrapKeysTx`, `SwapMEK` |
|
||
| `internal/seal/seal.go` | Wrap ReWrapKeysTx + seal_config UPDATE in single transaction |
|
||
| `internal/barrier/barrier_test.go` | Add concurrent Get/Seal stress test |
|
||
|
||
### Verification
|
||
|
||
- `go test -race ./internal/barrier/ ./internal/seal/`
|
||
- Add `TestConcurrentGetSeal`: spawn goroutines doing Get while another
|
||
goroutine calls Seal. Run with `-race`. Verify no panics or data races.
|
||
- Add `TestRotateMEKAtomic`: verify that `barrier_keys` and `seal_config`
|
||
are updated in the same transaction (mock the DB to detect transaction
|
||
boundaries, or verify via rollback behavior).
|
||
|
||
---
|
||
|
||
## Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69)
|
||
|
||
These four findings touch separate files with no overlap and can be
|
||
addressed in parallel.
|
||
|
||
### #49 — No TTL Ceiling in CA Certificate Issuance
|
||
|
||
**Risk**: A non-admin user can request an arbitrarily long certificate
|
||
lifetime. The issuer's `MaxTTL` exists in config but is not enforced
|
||
during `handleIssue` or `handleSignCSR`.
|
||
|
||
**Root cause**: The CA engine applies the user's requested TTL directly
|
||
to the certificate without comparing it against `issuerConfig.MaxTTL`.
|
||
The SSH CA engine correctly enforces this via `resolveTTL` — the CA
|
||
engine does not.
|
||
|
||
**Fix**: Add a `resolveTTL` method to the CA engine, following the SSH
|
||
CA engine's pattern (`sshca.go` lines 902–932):
|
||
|
||
```go
|
||
func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) {
|
||
maxTTL, err := time.ParseDuration(issuer.config.MaxTTL)
|
||
if err != nil {
|
||
maxTTL = 2160 * time.Hour // 90 days fallback
|
||
}
|
||
|
||
if requested != "" {
|
||
ttl, err := time.ParseDuration(requested)
|
||
if err != nil {
|
||
return 0, fmt.Errorf("invalid TTL: %w", err)
|
||
}
|
||
if ttl > maxTTL {
|
||
return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL)
|
||
}
|
||
return ttl, nil
|
||
}
|
||
|
||
return maxTTL, nil
|
||
}
|
||
```
|
||
|
||
Call this in `handleIssue` and `handleSignCSR` before constructing the
|
||
certificate. Replace the raw TTL string with the validated duration.
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/ca/ca.go` | Add `resolveTTL`, call from `handleIssue` and `handleSignCSR` |
|
||
| `internal/engine/ca/ca_test.go` | Add test: issue cert with TTL > MaxTTL, verify rejection |
|
||
|
||
### #61 — Ineffective ECDH Key Zeroization
|
||
|
||
**Risk**: `privKey.Bytes()` returns a copy of the private key bytes.
|
||
Zeroizing the copy leaves the original inside `*ecdh.PrivateKey`. Go's
|
||
`crypto/ecdh` API does not expose the internal byte slice.
|
||
|
||
**Root cause**: Language/API limitation in Go's `crypto/ecdh` package.
|
||
|
||
**Fix**: Store the raw private key bytes alongside the parsed key in
|
||
`userState`, and zeroize those bytes on seal:
|
||
|
||
```go
|
||
type userState struct {
|
||
privKey *ecdh.PrivateKey
|
||
privBytes []byte // raw key bytes, retained for zeroization
|
||
pubKey *ecdh.PublicKey
|
||
config *UserKeyConfig
|
||
}
|
||
```
|
||
|
||
On **load from barrier** (Unseal, auto-provision):
|
||
```go
|
||
raw, err := b.Get(ctx, prefix+"priv.key")
|
||
priv, err := curve.NewPrivateKey(raw)
|
||
state.privBytes = raw // retain for zeroization
|
||
state.privKey = priv
|
||
```
|
||
|
||
On **Seal**:
|
||
```go
|
||
mcrypto.Zeroize(u.privBytes)
|
||
u.privKey = nil
|
||
u.privBytes = nil
|
||
```
|
||
|
||
Document the limitation: the parsed `*ecdh.PrivateKey` struct's internal
|
||
copy cannot be zeroized from Go code. Setting `privKey = nil` makes it
|
||
eligible for GC, but does not guarantee immediate byte overwrite. This is
|
||
an accepted Go runtime limitation.
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/user/user.go` | Add `privBytes` to `userState`, populate on load, zeroize on Seal |
|
||
| `internal/engine/user/types.go` | Update `userState` struct |
|
||
|
||
### #62 — User Engine Policy Path Uses `mountPath` Instead of Mount Name
|
||
|
||
**Risk**: Policy checks construct the resource path using `e.mountPath`
|
||
(which is `engine/user/{name}/`) instead of just the mount name. Policy
|
||
rules match against `user/{name}/recipient/{username}`, so the full mount
|
||
path creates a mismatch like `user/engine/user/myengine//recipient/alice`.
|
||
No policy rule will ever match.
|
||
|
||
**Root cause**: Line 358 of `user.go` uses `e.mountPath` directly. The
|
||
SSH CA and transit engines correctly use a `mountName()` helper.
|
||
|
||
**Fix**: Add a `mountName()` method to the user engine:
|
||
|
||
```go
|
||
func (e *UserEngine) mountName() string {
|
||
// mountPath is "engine/user/{name}/"
|
||
parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/")
|
||
if len(parts) >= 3 {
|
||
return parts[2]
|
||
}
|
||
return e.mountPath
|
||
}
|
||
```
|
||
|
||
Change line 358:
|
||
|
||
```go
|
||
resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r)
|
||
```
|
||
|
||
Audit all other resource path constructions in the user engine to confirm
|
||
they also use the correct mount name.
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/user/user.go` | Add `mountName()`, fix resource path on line 358 |
|
||
| `internal/engine/user/user_test.go` | Add test: verify policy resource path format |
|
||
|
||
### #69 — Typed REST Handlers Bypass Policy Engine
|
||
|
||
**Risk**: 18 typed REST handlers pass `nil` for `CheckPolicy` in the
|
||
`engine.Request`, skipping service-level policy evaluation. The generic
|
||
`/v1/engine/request` endpoint correctly passes a `policyChecker`. Since
|
||
engines #54 and #58 default to allow when no policy matches, typed routes
|
||
are effectively unprotected by policy.
|
||
|
||
**Root cause**: Typed handlers were modeled after admin-only operations
|
||
(which don't need policy) but applied to user-accessible operations.
|
||
|
||
**Fix**: Extract the policy checker construction from
|
||
`handleEngineRequest` into a shared helper:
|
||
|
||
```go
|
||
func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker {
|
||
return func(resource, action string) (string, bool) {
|
||
effect, matched, err := s.policy.Check(
|
||
info.Username, info.Roles, resource, action,
|
||
)
|
||
if err != nil || !matched {
|
||
return "deny", false
|
||
}
|
||
return effect, matched
|
||
}
|
||
}
|
||
```
|
||
|
||
Then in each typed handler, set `CheckPolicy` on the request:
|
||
|
||
```go
|
||
req := &engine.Request{
|
||
Operation: "get-cert",
|
||
Data: data,
|
||
CallerInfo: callerInfo,
|
||
CheckPolicy: s.newPolicyChecker(callerInfo),
|
||
}
|
||
```
|
||
|
||
**18 handlers to update**:
|
||
|
||
| Handler | Operation |
|
||
|---------|-----------|
|
||
| `handleGetCert` | `get-cert` |
|
||
| `handleRevokeCert` | `revoke-cert` |
|
||
| `handleDeleteCert` | `delete-cert` |
|
||
| `handleSSHCASignHost` | `sign-host` |
|
||
| `handleSSHCASignUser` | `sign-user` |
|
||
| `handleSSHCAGetProfile` | `get-profile` |
|
||
| `handleSSHCAListProfiles` | `list-profiles` |
|
||
| `handleSSHCADeleteProfile` | `delete-profile` |
|
||
| `handleSSHCAGetCert` | `get-cert` |
|
||
| `handleSSHCAListCerts` | `list-certs` |
|
||
| `handleSSHCARevokeCert` | `revoke-cert` |
|
||
| `handleSSHCADeleteCert` | `delete-cert` |
|
||
| `handleUserRegister` | `register` |
|
||
| `handleUserProvision` | `provision` |
|
||
| `handleUserListUsers` | `list-users` |
|
||
| `handleUserGetPublicKey` | `get-public-key` |
|
||
| `handleUserDeleteUser` | `delete-user` |
|
||
| `handleUserDecrypt` | `decrypt` |
|
||
|
||
Note: `handleUserEncrypt` already passes a policy checker — verify it
|
||
uses the same shared helper after refactoring. Admin-only handlers
|
||
(behind `requireAdmin` wrapper) do not need a policy checker since admin
|
||
bypasses policy.
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/server/routes.go` | Add `newPolicyChecker`, pass to all 18 typed handlers |
|
||
| `internal/server/server_test.go` | Add test: policy-denied user is rejected by typed route |
|
||
|
||
### Verification (Work Item 4, all findings)
|
||
|
||
```bash
|
||
go test ./internal/engine/ca/
|
||
go test ./internal/engine/user/
|
||
go test ./internal/server/
|
||
go vet ./...
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Order
|
||
|
||
```
|
||
1. #68 JSON injection (standalone, ship immediately)
|
||
2. #48 Path traversal (standalone, blocks safe engine operation)
|
||
3. #39 Barrier TOCTOU race ─┐
|
||
#40 ReWrapKeys crash safety ┘ (coupled, requires careful testing)
|
||
4. #49 CA TTL enforcement ─┐
|
||
#61 ECDH zeroization │
|
||
#62 User policy path │ (independent fixes, parallelizable)
|
||
#69 Policy bypass ─┘
|
||
```
|
||
|
||
Items 1 and 2 have no dependencies and can be done in parallel by
|
||
different engineers.
|
||
|
||
Items 3 and 4 can also be done in parallel since they touch different
|
||
subsystems (barrier/seal vs engines/server).
|
||
|
||
---
|
||
|
||
## Post-Remediation (High)
|
||
|
||
All eight High findings (#39, #40, #48, #49, #61, #62, #68, #69) have
|
||
been resolved. See commit `a80323e`.
|
||
|
||
---
|
||
---
|
||
|
||
# Remediation Plan — Medium-Priority Audit Findings
|
||
|
||
**Date**: 2026-03-17
|
||
**Scope**: AUDIT.md findings #7, #10, #21, #41, #42, #43, #46, #50, #51,
|
||
#53, #54, #58, #59, #63, #64, #65, #66, #67, #70, #72, #73, #74, #78, #79
|
||
|
||
This plan addresses all medium-severity findings. Findings are grouped into
|
||
eight work items by subsystem. Two findings (#72, #79) are already resolved
|
||
or trivial and are noted inline.
|
||
|
||
**Previously resolved by High remediation**:
|
||
- #72 (policy rule ID path traversal) — covered by #48's `ValidateName`
|
||
on `CreateRule` and barrier-level `validatePath`.
|
||
|
||
---
|
||
|
||
## Work Item 5: Default-Deny in Engine Policy Checks (#54, #58)
|
||
|
||
**Risk**: The SSH CA and transit engines default to **allow** when no policy
|
||
rule matches a non-admin user's request. This contradicts the default-deny
|
||
principle in ARCHITECTURE.md and the engineering standards. Now that #69 is
|
||
resolved (typed REST handlers pass `CheckPolicy`), the engines receive a
|
||
policy checker — but when `CheckPolicy` returns no match, they still allow.
|
||
|
||
**Root cause**: Both engines treat "no matching policy rule" as implicit
|
||
allow for authenticated users.
|
||
|
||
### Fix
|
||
|
||
Change the policy evaluation result handling in both engines.
|
||
|
||
**SSH CA** (`sshca.go`, `handleSignHost` ~line 309, `handleSignUser`
|
||
~line 443):
|
||
|
||
```go
|
||
// Before (default-allow):
|
||
if matched && effect == "deny" {
|
||
return nil, ErrForbidden
|
||
}
|
||
|
||
// After (default-deny):
|
||
if req.CheckPolicy != nil {
|
||
effect, matched := req.CheckPolicy(resource, "sign")
|
||
if !matched || effect != "allow" {
|
||
return nil, ErrForbidden
|
||
}
|
||
}
|
||
```
|
||
|
||
**Transit** (`transit.go`, `requireUserWithPolicy` ~line 330):
|
||
|
||
```go
|
||
// Before (default-allow):
|
||
if matched {
|
||
if effect == "deny" {
|
||
return ErrForbidden
|
||
}
|
||
return nil
|
||
}
|
||
return nil // no match → allow
|
||
|
||
// After (default-deny):
|
||
if req.CheckPolicy == nil {
|
||
return ErrForbidden
|
||
}
|
||
effect, matched := req.CheckPolicy(resource, action)
|
||
if !matched || effect != "allow" {
|
||
return ErrForbidden
|
||
}
|
||
return nil
|
||
```
|
||
|
||
Apply the same pattern to any other policy check sites in these engines.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/sshca/sshca.go` | Default-deny in `handleSignHost`, `handleSignUser` |
|
||
| `internal/engine/transit/transit.go` | Default-deny in `requireUserWithPolicy` |
|
||
| `internal/engine/sshca/sshca_test.go` | Add test: no policy rules → user request denied |
|
||
| `internal/engine/transit/transit_test.go` | Add test: no policy rules → user request denied |
|
||
|
||
---
|
||
|
||
## Work Item 6: User Engine Concurrency and Safety (#63, #64, #65, #66, #67)
|
||
|
||
Five findings in the user engine share a root cause: the engine was
|
||
implemented after the others and does not follow the same concurrency
|
||
patterns.
|
||
|
||
### #63 — Missing role checks on decrypt, re-encrypt, rotate-key
|
||
|
||
**Fix**: Add `IsUser()` check to `handleDecrypt`, `handleReEncrypt`, and
|
||
`handleRotateKey`, matching the pattern used in `handleEncrypt`:
|
||
|
||
```go
|
||
if !req.CallerInfo.IsUser() {
|
||
return nil, ErrForbidden
|
||
}
|
||
```
|
||
|
||
### #64 — Initialize holds no mutex
|
||
|
||
**Fix**: Acquire the write lock at the top of `Initialize`:
|
||
|
||
```go
|
||
func (e *UserEngine) Initialize(ctx context.Context, ...) error {
|
||
e.mu.Lock()
|
||
defer e.mu.Unlock()
|
||
// ... existing body ...
|
||
}
|
||
```
|
||
|
||
This matches the pattern in the SSH CA and transit engines.
|
||
|
||
### #65 — handleEncrypt releases lock before using state
|
||
|
||
**Fix**: Hold `RLock` (not write lock) through the ECDH and encryption
|
||
operations. After auto-provisioning (which needs the write lock), downgrade
|
||
to a read lock for the crypto work:
|
||
|
||
```go
|
||
e.mu.Lock()
|
||
// ... auto-provisioning ...
|
||
// Capture references while holding write lock.
|
||
senderState := e.users[sender]
|
||
recipientStates := ...
|
||
e.mu.Unlock()
|
||
|
||
// Re-acquire RLock for crypto operations.
|
||
e.mu.RLock()
|
||
defer e.mu.RUnlock()
|
||
// verify states are still valid (not nil)
|
||
if senderState.privKey == nil {
|
||
return nil, ErrSealed
|
||
}
|
||
// ... ECDH, wrap DEK, encrypt ...
|
||
```
|
||
|
||
Alternatively, hold the write lock through the entire operation (simpler
|
||
but serializes all encrypts). Given that encryption is the hot path, the
|
||
downgrade approach is preferred.
|
||
|
||
### #66 — handleReEncrypt manual lock without defer
|
||
|
||
**Fix**: Restructure to use `defer` for all lock releases. Split the
|
||
method into a provisioning phase (write lock) and a crypto phase (read
|
||
lock), each with its own `defer`:
|
||
|
||
```go
|
||
func (e *UserEngine) handleReEncrypt(...) {
|
||
// Phase 1: read envelope, resolve users.
|
||
e.mu.RLock()
|
||
// ... read state ...
|
||
e.mu.RUnlock()
|
||
|
||
// Phase 2: crypto operations.
|
||
e.mu.RLock()
|
||
defer e.mu.RUnlock()
|
||
// ... decrypt, re-wrap, encrypt ...
|
||
}
|
||
```
|
||
|
||
### #67 — No sealed-state check in HandleRequest
|
||
|
||
**Fix**: Add a sealed-state guard at the top of `HandleRequest`, before
|
||
the operation dispatch:
|
||
|
||
```go
|
||
func (e *UserEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) {
|
||
e.mu.RLock()
|
||
sealed := e.config == nil
|
||
e.mu.RUnlock()
|
||
if sealed {
|
||
return nil, ErrSealed
|
||
}
|
||
|
||
switch req.Operation {
|
||
// ...
|
||
}
|
||
}
|
||
```
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/user/user.go` | All five fixes above |
|
||
| `internal/engine/user/user_test.go` | Add tests: guest role denied; concurrent encrypt/seal; sealed-state rejection |
|
||
|
||
### Verification
|
||
|
||
```bash
|
||
go test -race ./internal/engine/user/
|
||
```
|
||
|
||
---
|
||
|
||
## Work Item 7: Barrier and Seal Fixes (#41, #42, #43)
|
||
|
||
### #41 — `loadKeys` errors silently swallowed during unseal
|
||
|
||
**Risk**: If `loadKeys` fails to decrypt DEK entries (corrupt data, wrong
|
||
MEK), errors are swallowed and the keys map is incomplete. Subsequent
|
||
operations fail with confusing `ErrKeyNotFound` errors.
|
||
|
||
**Fix**: Return the error from `loadKeys` instead of ignoring it. Distinguish
|
||
between "table doesn't exist" (acceptable on first run before migration)
|
||
and genuine decryption failures:
|
||
|
||
```go
|
||
func (b *AESGCMBarrier) Unseal(mek []byte) error {
|
||
// ...
|
||
if err := b.loadKeys(); err != nil {
|
||
// Check if this is a "no such table" error (pre-migration).
|
||
if !isTableMissing(err) {
|
||
b.mek = nil
|
||
return fmt.Errorf("barrier: load keys: %w", err)
|
||
}
|
||
// Pre-migration: no barrier_keys table yet, proceed with empty map.
|
||
}
|
||
return nil
|
||
}
|
||
|
||
func isTableMissing(err error) bool {
|
||
return strings.Contains(err.Error(), "no such table")
|
||
}
|
||
```
|
||
|
||
### #42 — No AAD binding on MEK encryption with KWK
|
||
|
||
**Risk**: The MEK ciphertext in `seal_config` has no AAD, so there is no
|
||
cryptographic binding to its storage purpose.
|
||
|
||
**Fix**: Pass a constant AAD string when encrypting/decrypting the MEK:
|
||
|
||
```go
|
||
var mekAAD = []byte("metacrypt/seal_config/mek")
|
||
|
||
// In Init:
|
||
encryptedMEK, err := crypto.Encrypt(kwk, mek, mekAAD)
|
||
|
||
// In Unseal:
|
||
mek, err := crypto.Decrypt(kwk, encryptedMEK, mekAAD)
|
||
|
||
// In RotateMEK:
|
||
newEncMEK, err := crypto.Encrypt(kwk, newMEK, mekAAD)
|
||
```
|
||
|
||
**Migration note**: Existing databases have MEKs encrypted with `nil` AAD.
|
||
The unseal path must try `mekAAD` first, then fall back to `nil` for
|
||
backward compatibility. After successful unseal with `nil`, re-encrypt with
|
||
`mekAAD` and update `seal_config`.
|
||
|
||
### #43 — Barrier `List` SQL LIKE with unescaped prefix
|
||
|
||
**Risk**: If a barrier path prefix contains `%` or `_`, the LIKE query
|
||
matches unintended entries.
|
||
|
||
**Fix**: Escape SQL LIKE wildcards in the prefix:
|
||
|
||
```go
|
||
func escapeLIKE(s string) string {
|
||
s = strings.ReplaceAll(s, `\`, `\\`)
|
||
s = strings.ReplaceAll(s, `%`, `\%`)
|
||
s = strings.ReplaceAll(s, `_`, `\_`)
|
||
return s
|
||
}
|
||
|
||
// In List:
|
||
rows, err := b.db.QueryContext(ctx,
|
||
"SELECT path FROM barrier_entries WHERE path LIKE ? ESCAPE '\\'",
|
||
escapeLIKE(prefix)+"%")
|
||
```
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/barrier/barrier.go` | Fix `loadKeys` error handling; add `escapeLIKE` to `List` |
|
||
| `internal/seal/seal.go` | Add `mekAAD` constant; use in Init, Unseal, RotateMEK with fallback |
|
||
| `internal/barrier/barrier_test.go` | Add test: `List` with `%` and `_` in paths |
|
||
| `internal/seal/seal_test.go` | Add test: unseal with AAD-encrypted MEK |
|
||
|
||
---
|
||
|
||
## Work Item 8: CA Engine Fixes (#50, #51, #70)
|
||
|
||
### #50 — Non-admin users can override key usages
|
||
|
||
**Risk**: A non-admin user can request `cert sign` or `crl sign` key usage
|
||
on a leaf certificate, creating a de facto CA certificate.
|
||
|
||
**Fix**: Only allow admin users to override `key_usages` and
|
||
`ext_key_usages`. Non-admin users get the profile defaults:
|
||
|
||
```go
|
||
// In handleIssue, after profile lookup:
|
||
if req.CallerInfo.IsAdmin {
|
||
if v, ok := req.Data["key_usages"].([]interface{}); ok {
|
||
profile.KeyUse = toStringSlice(v)
|
||
}
|
||
if v, ok := req.Data["ext_key_usages"].([]interface{}); ok {
|
||
profile.ExtKeyUsages = toStringSlice(v)
|
||
}
|
||
}
|
||
```
|
||
|
||
### #51 — Certificate renewal does not revoke original
|
||
|
||
**Fix**: After issuing the renewed certificate, revoke the original by
|
||
serial. The original serial is available from the request data:
|
||
|
||
```go
|
||
// In handleRenew, after successful issuance:
|
||
oldSerial := req.Data["serial"].(string)
|
||
if err := e.revokeCertBySerial(ctx, issuerName, oldSerial, req.CallerInfo.Username); err != nil {
|
||
// Log but do not fail — the new cert is already issued.
|
||
e.logger.Warn("failed to revoke original during renewal",
|
||
"serial", oldSerial, "error", err)
|
||
}
|
||
```
|
||
|
||
### #70 — `RenewCert` has no REST route (API sync violation)
|
||
|
||
**Fix**: Add a dedicated REST route for certificate renewal:
|
||
|
||
```go
|
||
r.Post("/v1/ca/{mount}/cert/{serial}/renew", s.requireAuth(s.handleRenewCert))
|
||
```
|
||
|
||
Implement `handleRenewCert` following the typed handler pattern (with
|
||
`CheckPolicy`).
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/ca/ca.go` | Guard key usage overrides with `IsAdmin`; revoke original in `handleRenew` |
|
||
| `internal/server/routes.go` | Add `POST /v1/ca/{mount}/cert/{serial}/renew` route and handler |
|
||
| `internal/engine/ca/ca_test.go` | Add test: non-admin cannot set `cert sign` usage; renewal revokes original |
|
||
|
||
---
|
||
|
||
## Work Item 9: SSH CA Lock Granularity (#53)
|
||
|
||
**Risk**: `HandleRequest` acquires a write lock for all operations including
|
||
reads (`get-ca-pubkey`, `get-cert`, `list-certs`, `get-profile`,
|
||
`list-profiles`, `get-krl`). This serializes all operations unnecessarily.
|
||
|
||
**Fix**: Move locking into individual handlers. Read-only operations use
|
||
`RLock`; mutating operations use `Lock`:
|
||
|
||
```go
|
||
func (e *SSHCAEngine) HandleRequest(ctx context.Context, req *engine.Request) (*engine.Response, error) {
|
||
// No lock here — each handler manages its own.
|
||
switch req.Operation {
|
||
case "get-ca-pubkey":
|
||
return e.handleGetCAPubkey(ctx, req) // uses RLock internally
|
||
case "sign-host":
|
||
return e.handleSignHost(ctx, req) // uses Lock internally
|
||
// ...
|
||
}
|
||
}
|
||
```
|
||
|
||
Read-only handlers:
|
||
```go
|
||
func (e *SSHCAEngine) handleGetCAPubkey(ctx context.Context, req *engine.Request) (*engine.Response, error) {
|
||
e.mu.RLock()
|
||
defer e.mu.RUnlock()
|
||
// ...
|
||
}
|
||
```
|
||
|
||
Mutating handlers (`sign-host`, `sign-user`, `create-profile`,
|
||
`update-profile`, `delete-profile`, `revoke-cert`, `delete-cert`):
|
||
```go
|
||
func (e *SSHCAEngine) handleSignHost(ctx context.Context, req *engine.Request) (*engine.Response, error) {
|
||
e.mu.Lock()
|
||
defer e.mu.Unlock()
|
||
// ...
|
||
}
|
||
```
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/sshca/sshca.go` | Remove top-level lock from `HandleRequest`; add per-handler locks |
|
||
|
||
---
|
||
|
||
## Work Item 10: Transit Ciphertext Validation (#59)
|
||
|
||
**Risk**: `parseVersionedData` accepts negative version numbers. A crafted
|
||
ciphertext `metacrypt:v-1:...` parses as version -1, which fails the key
|
||
lookup but produces a confusing error.
|
||
|
||
**Fix**: Add a bounds check after parsing:
|
||
|
||
```go
|
||
func parseVersionedData(s string) (int, []byte, error) {
|
||
// ... existing parse logic ...
|
||
version, err := strconv.Atoi(parts[1][1:])
|
||
if err != nil {
|
||
return 0, nil, ErrInvalidFormat
|
||
}
|
||
if version < 1 {
|
||
return 0, nil, fmt.Errorf("transit: %w: version must be >= 1", ErrInvalidFormat)
|
||
}
|
||
// ...
|
||
}
|
||
```
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/engine/transit/transit.go` | Add `version < 1` check in `parseVersionedData` |
|
||
| `internal/engine/transit/transit_test.go` | Add test: `metacrypt:v0:...` and `metacrypt:v-1:...` rejected |
|
||
|
||
---
|
||
|
||
## Work Item 11: Database and Auth (#46, #74)
|
||
|
||
### #46 — SQLite PRAGMAs only applied to first connection
|
||
|
||
**Risk**: `database/sql` may open additional connections that don't receive
|
||
the `journal_mode`, `foreign_keys`, and `busy_timeout` PRAGMAs.
|
||
|
||
**Fix**: Use the `_pragma` DSN parameter supported by `modernc.org/sqlite`:
|
||
|
||
```go
|
||
dsn := path + "?_pragma=journal_mode(WAL)&_pragma=foreign_keys(ON)&_pragma=busy_timeout(5000)"
|
||
db, err := sql.Open("sqlite", dsn)
|
||
```
|
||
|
||
Remove the manual PRAGMA execution loop. This ensures every connection
|
||
opened by the pool receives the PRAGMAs.
|
||
|
||
### #74 — Token validation cache grows without bound
|
||
|
||
**Risk**: The `cache` map in `auth.go` grows indefinitely. Expired entries
|
||
are never proactively removed.
|
||
|
||
**Fix**: Add periodic eviction. Run a background goroutine that sweeps
|
||
expired entries every minute:
|
||
|
||
```go
|
||
func (a *Authenticator) startEviction(ctx context.Context) {
|
||
ticker := time.NewTicker(time.Minute)
|
||
go func() {
|
||
defer ticker.Stop()
|
||
for {
|
||
select {
|
||
case <-ctx.Done():
|
||
return
|
||
case <-ticker.C:
|
||
a.evictExpired()
|
||
}
|
||
}
|
||
}()
|
||
}
|
||
|
||
func (a *Authenticator) evictExpired() {
|
||
now := time.Now()
|
||
a.mu.Lock()
|
||
defer a.mu.Unlock()
|
||
for key, entry := range a.cache {
|
||
if now.After(entry.expiresAt) {
|
||
delete(a.cache, key)
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Call `startEviction` from the `Authenticator` constructor. Accept a
|
||
`context.Context` to allow cancellation on shutdown.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/db/db.go` | Switch to `_pragma` DSN parameters; remove manual PRAGMA loop |
|
||
| `internal/auth/auth.go` | Add `startEviction` goroutine; call from constructor |
|
||
| `internal/db/db_test.go` | Verify PRAGMAs are active on a fresh connection from the pool |
|
||
| `internal/auth/auth_test.go` | Add test: expired entries are evicted after sweep |
|
||
|
||
---
|
||
|
||
## Work Item 12: Policy Engine Glob Matching (#73)
|
||
|
||
**Risk**: `filepath.Match` does not support `**` for recursive directory
|
||
matching. Administrators writing rules like `engine/**/certs/*` will find
|
||
they don't match paths with multiple segments.
|
||
|
||
**Fix**: Replace `filepath.Match` with `path.Match` (POSIX-style, no
|
||
OS-specific behavior) and add support for `**` by splitting the pattern
|
||
and value on `/` and matching segments:
|
||
|
||
```go
|
||
import "path"
|
||
|
||
func matchesAnyGlob(patterns []string, value string) bool {
|
||
for _, p := range patterns {
|
||
if matchGlob(p, value) {
|
||
return true
|
||
}
|
||
}
|
||
return false
|
||
}
|
||
|
||
func matchGlob(pattern, value string) bool {
|
||
// Handle ** by checking if any suffix of value matches the rest of pattern.
|
||
if strings.Contains(pattern, "**") {
|
||
parts := strings.SplitN(pattern, "**", 2)
|
||
prefix := parts[0]
|
||
suffix := parts[1]
|
||
if !strings.HasPrefix(value, prefix) {
|
||
return false
|
||
}
|
||
remainder := value[len(prefix):]
|
||
// Try matching suffix against every suffix of remainder.
|
||
for i := 0; i <= len(remainder); i++ {
|
||
if matched, _ := path.Match(suffix, remainder[i:]); matched {
|
||
return true
|
||
}
|
||
// Advance to next separator.
|
||
next := strings.IndexByte(remainder[i:], '/')
|
||
if next < 0 {
|
||
break
|
||
}
|
||
i += next
|
||
}
|
||
return false
|
||
}
|
||
matched, _ := path.Match(pattern, value)
|
||
return matched
|
||
}
|
||
```
|
||
|
||
Document in POLICY.md that `**` matches zero or more path segments, while
|
||
`*` matches within a single segment.
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `internal/policy/policy.go` | Replace `filepath.Match` with `path.Match` + `**` support |
|
||
| `internal/policy/policy_test.go` | Add test: `engine/**/certs/*` matches `engine/ca/prod/certs/abc123` |
|
||
|
||
---
|
||
|
||
## Work Item 13: Deployment Fixes (#78, #79)
|
||
|
||
### #78 — systemd `ExecReload` sends SIGHUP with no handler
|
||
|
||
**Fix**: Two options:
|
||
|
||
**Option A** (recommended): Remove `ExecReload` entirely. Metacrypt does
|
||
not support graceful reload — configuration changes require a restart. The
|
||
`ExecReload` line creates a false expectation.
|
||
|
||
**Option B**: Implement SIGHUP handling that re-reads the TOML config and
|
||
applies non-breaking changes (log level, TLS cert reload). This is
|
||
significant new functionality and should be a separate feature.
|
||
|
||
For now, remove `ExecReload` from both service units:
|
||
|
||
```diff
|
||
-ExecReload=/bin/kill -HUP $MAINPID
|
||
```
|
||
|
||
### #79 — Dockerfiles use Go 1.23 but module requires Go 1.25
|
||
|
||
**Fix**: Update the builder base image:
|
||
|
||
```diff
|
||
-FROM golang:1.23-alpine AS builder
|
||
+FROM golang:1.25-alpine AS builder
|
||
```
|
||
|
||
Apply to both `Dockerfile.api` and `Dockerfile.web`. Also remove the
|
||
unnecessary `gcc` and `musl-dev` installation since `CGO_ENABLED=0`.
|
||
|
||
```diff
|
||
-RUN apk add --no-cache gcc musl-dev
|
||
```
|
||
|
||
### #7 — No audit logging
|
||
|
||
**Note**: This is the only medium finding that requires significant new
|
||
functionality rather than a targeted fix. Audit logging should be
|
||
implemented as a dedicated subsystem:
|
||
|
||
1. Define an `AuditEvent` struct with: timestamp, caller, operation,
|
||
resource, outcome (success/denied/error), and metadata.
|
||
2. Write events to a structured log sink (slog with JSON output to a
|
||
dedicated file at `/srv/metacrypt/audit.log`).
|
||
3. Instrument every engine `HandleRequest` to emit an event on completion.
|
||
4. Instrument policy `CreateRule`/`DeleteRule`.
|
||
5. Instrument seal/unseal operations.
|
||
|
||
This is a substantial feature. Track separately from the quick-fix items
|
||
in this plan.
|
||
|
||
### #10 — No extension allowlist for SSH host certificates
|
||
|
||
**Note**: Host certificates typically don't use extensions (extensions are
|
||
for user certificates). The fix is to ignore the `extensions` field on
|
||
host signing requests rather than passing it through:
|
||
|
||
```go
|
||
// In handleSignHost, after building the certificate:
|
||
cert.Permissions.Extensions = nil // Host certs should not carry extensions
|
||
```
|
||
|
||
### #21 — No rate limiting on transit cryptographic operations
|
||
|
||
**Note**: This requires a rate-limiting middleware or per-caller token
|
||
bucket. Best implemented as server-level middleware applied to engine
|
||
request handlers:
|
||
|
||
```go
|
||
func (s *Server) requireRateLimit(next http.HandlerFunc) http.HandlerFunc {
|
||
return func(w http.ResponseWriter, r *http.Request) {
|
||
info := TokenInfoFromContext(r.Context())
|
||
if !s.rateLimiter.Allow(info.Username) {
|
||
writeJSONError(w, "rate limit exceeded", http.StatusTooManyRequests)
|
||
return
|
||
}
|
||
next(w, r)
|
||
}
|
||
}
|
||
```
|
||
|
||
Track separately — this affects API design decisions (limits, quotas,
|
||
per-user vs per-token).
|
||
|
||
### Files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `deploy/systemd/metacrypt.service` | Remove `ExecReload` line |
|
||
| `deploy/systemd/metacrypt-web.service` | Remove `ExecReload` line |
|
||
| `Dockerfile.api` | Update to `golang:1.25-alpine`; remove `gcc musl-dev` |
|
||
| `Dockerfile.web` | Update to `golang:1.25-alpine`; remove `gcc musl-dev` |
|
||
|
||
---
|
||
|
||
## Implementation Order
|
||
|
||
```
|
||
5. #54, #58 Default-deny in engines (security-critical, do first)
|
||
6. #63–#67 User engine concurrency (5 coupled fixes, one change)
|
||
7. #41–#43 Barrier/seal fixes (3 independent fixes)
|
||
8. #50, #51, #70 CA engine fixes (key usage + renewal + API sync)
|
||
9. #53 SSH CA lock granularity (standalone refactor)
|
||
10. #59 Transit version validation (one-line fix)
|
||
11. #46, #74 DB pragmas + auth cache (independent, no overlap)
|
||
12. #73 Policy glob matching (standalone)
|
||
13. #78, #79 Deployment fixes (non-code, standalone)
|
||
#7 Audit logging (new feature, track separately)
|
||
#10 SSH host extensions (one-line fix, ship with #9)
|
||
#21 Transit rate limiting (new feature, track separately)
|
||
```
|
||
|
||
Work items 5–10 can be parallelized across engineers:
|
||
- Engineer A: #5 (default-deny) + #9 (SSH CA locks) + #10 (host extensions)
|
||
- Engineer B: #6 (user concurrency) + #8 (CA fixes)
|
||
- Engineer C: #7 (barrier/seal) + #10 (transit version) + #11 (DB/auth)
|
||
- Independent: #12 (policy), #13 (deployment)
|
||
|
||
Work items #7 (audit logging) and #21 (rate limiting) are substantial new
|
||
features and should be tracked as separate engineering tasks, not quick
|
||
fixes.
|
||
|
||
---
|
||
|
||
## Post-Remediation (Medium)
|
||
|
||
After all medium findings are resolved:
|
||
|
||
1. **Update AUDIT.md** — mark each finding as RESOLVED with summary.
|
||
2. **Run the full pipeline**: `make all` (vet, lint, test, build).
|
||
3. **Run race detector**: `go test -race ./...` — especially important
|
||
for work items 6 and 9 (concurrency changes).
|
||
4. **Address remaining low findings** — 14 low-severity items remain,
|
||
mostly zeroization gaps and documentation drift.
|