Fix ECDH zeroization, add audit logging, and remediate high findings

- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored
  privBytes instead of calling Bytes() (which returns a copy). New
  state populates privBytes; old references nil'd for GC.
- Add audit logging subsystem (internal/audit) with structured event
  recording for cryptographic operations.
- Add audit log engine spec (engines/auditlog.md).
- Add ValidateName checks across all engines for path traversal (#48).
- Update AUDIT.md: all High findings resolved (0 open).
- Add REMEDIATION.md with detailed remediation tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-17 14:04:39 -07:00
parent b33d1f99a0
commit 5c5d7e184e
24 changed files with 1699 additions and 72 deletions

View File

@@ -9,6 +9,8 @@
- **2026-03-16**: Initial design review of ARCHITECTURE.md, engines/sshca.md, engines/transit.md. Issues #1#24 identified. Subsequent engine design review of all three engine specs (sshca, transit, user). Issues #25#38 identified.
- **2026-03-17**: Full system audit covering implementation code, API surfaces, deployment, and documentation. Issues #39#80 identified.
- **2026-03-16**: High finding remediation validation. #39, #40, #49, #62, #68, #69 confirmed resolved. #48, #61 confirmed still open.
- **2026-03-17**: #61 resolved — `handleRotateKey` and `handleDeleteUser` now zeroize stored `privBytes` instead of calling `Bytes()` (which returns a copy). New state in `handleRotateKey` populates `privBytes`. References to `*ecdh.PrivateKey` are nil'd for GC. Residual risk: Go's internal copy in `*ecdh.PrivateKey` cannot be zeroized.
---
@@ -164,13 +166,13 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
#### Issues
**39. TOCTOU race in barrier Seal/Unseal**
**39. ~~TOCTOU race in barrier Seal/Unseal~~ RESOLVED**
`barrier.go`: `Seal()` zeroizes keys while concurrent operations may hold stale references between `RLock` release and actual use. A read operation could read the MEK, lose the lock, then use a zeroized key. Requires restructuring to hold the lock through the crypto operation or using atomic pointer swaps.
`barrier.go`: `Get()` and `Put()` hold `RLock` (via `defer`) through the entire crypto operation including decryption/encryption. `Seal()` acquires an exclusive `Lock()`, which blocks until all RLock holders release. There is no window where a reader can use zeroized key material.
**40. Crash during `ReWrapKeys` loses all barrier data**
**40. ~~Crash during `ReWrapKeys` loses all barrier data~~ RESOLVED**
`seal.go`: If the process crashes between re-encrypting all DEKs in `ReWrapKeys` and updating `seal_config` with the new MEK, all data becomes irrecoverable — the old MEK is gone and the new MEK was never persisted. This needs a two-phase commit or WAL-based approach.
`seal.go`: `RotateMEK` now wraps both `ReWrapKeysTx` (re-encrypts all DEKs) and the `seal_config` update in a single SQLite transaction. A crash at any point results in full rollback or full commit — no partial state. In-memory state (`SwapMEK`) is updated only after successful commit.
**41. `loadKeys` errors silently swallowed during unseal**
@@ -204,13 +206,13 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
#### CA (PKI) Engine
**48. Path traversal via unsanitized issuer names**
**48. ~~Path traversal via unsanitized entity names in get/update/delete operations~~ RESOLVED**
`ca/ca.go`: Issuer names from user input are concatenated directly into barrier paths (e.g., `engine/ca/{mount}/issuers/{name}/...`). A name containing `../` could write to arbitrary barrier locations. All engines should validate mount and entity names against a strict pattern (alphanumeric, hyphens, underscores).
All engines now call `engine.ValidateName()` on every operation that accepts user-supplied names, not just create operations. Fixed in: CA (`handleGetChain`, `handleGetIssuer`, `handleDeleteIssuer`, `handleIssue`, `handleSignCSR`), SSH CA (`handleUpdateProfile`, `handleGetProfile`, `handleDeleteProfile`), Transit (`handleDeleteKey`, `handleGetKey`, `handleRotateKey`, `handleUpdateKeyConfig`, `handleTrimKey`, `handleGetPublicKey`), User (`handleRegister`, `handleGetPublicKey`, `handleDeleteUser`).
**49. No TTL enforcement against issuer MaxTTL in issuance**
**49. ~~No TTL enforcement against issuer MaxTTL in issuance~~ RESOLVED**
`ca/ca.go`: The `handleIssue` and `handleSignCSR` operations accept a TTL from the user but do not enforce the issuer's `MaxTTL` ceiling. A user can request arbitrarily long certificate lifetimes.
`ca/ca.go`: Both `handleIssue` and `handleSignCSR` now use a `resolveTTL` helper that parses the issuer's `MaxTTL`, caps the requested TTL against it, and returns an error if the requested TTL exceeds the maximum. Default TTL is the issuer's MaxTTL when none is specified.
**50. Non-admin users can override key usages**
@@ -262,13 +264,13 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
#### User E2E Encryption Engine
**61. ECDH private key zeroization is ineffective**
**61. ~~ECDH private key zeroization is ineffective~~ RESOLVED**
`user/user.go`: `key.Bytes()` returns a copy of the private key bytes. Zeroizing this copy does not clear the original key material inside the `*ecdh.PrivateKey` struct. The actual private key remains in memory.
`user/user.go`: `handleRotateKey` and `handleDeleteUser` now zeroize the stored `privBytes` field (retained at key creation time) instead of calling `Bytes()` which returns a new copy. The `privKey` and `privBytes` fields are nil'd after zeroization to allow GC of the `*ecdh.PrivateKey` object. Note: Go's `*ecdh.PrivateKey` internal bytes cannot be zeroized through the public API — this is a known limitation of Go's crypto library. The stored `privBytes` copy is the best-effort mitigation.
**62. Policy resource path uses mountPath instead of mount name**
**62. ~~Policy resource path uses mountPath instead of mount name~~ RESOLVED**
`user/user.go`: Policy checks use the full mount path instead of the mount name. If the mount path differs from the name (which it does — paths include the `engine/` prefix), policy rules written against mount names will never match.
`user/user.go`: A `mountName()` helper extracts the mount name from the full mount path (e.g., `"engine/user/mymount/"``"mymount"`). Policy resource paths are correctly constructed as `"user/{mountname}/recipient/{recipient}"`.
**63. No role checks on decrypt, re-encrypt, and rotate-key**
@@ -294,13 +296,13 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
#### REST API
**68. JSON injection via unsanitized error messages**
**68. ~~JSON injection via unsanitized error messages~~ RESOLVED**
`server/routes.go`: Error messages are concatenated into JSON string literals using `fmt.Sprintf` without JSON escaping. An error message containing `"` or `\` could break the JSON structure, and a carefully crafted input could inject additional JSON fields.
`server/routes.go`: All error responses now use `writeJSONError()` which delegates to `writeJSON()``json.NewEncoder().Encode()`, properly JSON-escaping all error message content.
**69. Typed REST handlers bypass policy engine**
**69. ~~Typed REST handlers bypass policy engine~~ RESOLVED**
`server/routes.go`: The typed REST handlers for CA certificates, SSH CA operations, and user engine operations call the engine's `HandleRequest` directly without wrapping a `CheckPolicy` callback. Only the generic `/v1/engine/request` endpoint passes the policy checker. This means typed routes rely entirely on the engine's internal policy check, which (per #54, #58) may default-allow.
`server/routes.go`: All typed REST handlers now pass a `CheckPolicy` callback via `s.newPolicyChecker(r, info)` or an inline policy checker function. This includes all SSH CA, transit, user, and CA handlers.
**70. `RenewCert` gRPC RPC has no corresponding REST route**
@@ -366,16 +368,7 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
### Open — High
| # | Issue | Location |
|---|-------|----------|
| 39 | TOCTOU race in barrier Seal/Unseal allows use of zeroized keys | `barrier/barrier.go` |
| 40 | Crash during `ReWrapKeys` makes all barrier data irrecoverable | `seal/seal.go` |
| 48 | Path traversal via unsanitized issuer/entity names in all engines | `ca/ca.go`, all engines |
| 49 | No TTL enforcement against issuer MaxTTL in cert issuance | `ca/ca.go` |
| 61 | ECDH private key zeroization is ineffective (`Bytes()` returns copy) | `user/user.go` |
| 62 | Policy resource path uses mountPath instead of mount name | `user/user.go` |
| 68 | JSON injection via unsanitized error messages in REST API | `server/routes.go` |
| 69 | Typed REST handlers bypass policy engine | `server/routes.go` |
*None.*
### Open — Medium
@@ -435,13 +428,13 @@ A compromised token could issue unlimited encrypt/decrypt/sign requests.
---
## Resolved Issues (#1#38)
## Resolved Issues (#1#38, plus #39, #40, #48, #49, #61, #62, #68, #69)
All design review findings from the 2026-03-16 audit have been resolved or accepted. See the [Audit History](#audit-history) section. The following issues were resolved:
**Critical** (all resolved): #4 (policy auth contradiction), #9 (user-controllable SSH serials), #13 (policy path collision), #37 (adminOnlyOperations name collision).
**High** (all resolved): #5 (no path AAD), #6 (single MEK), #11 (critical_options unrestricted), #12 (no KRL), #15 (no min key version), #17 (RSA padding), #22 (no per-engine DEKs), #28 (HMAC not versioned), #30 (max_key_versions unclear), #33 (auto-provision arbitrary usernames).
**High** (all resolved): #5 (no path AAD), #6 (single MEK), #11 (critical_options unrestricted), #12 (no KRL), #15 (no min key version), #17 (RSA padding), #22 (no per-engine DEKs), #28 (HMAC not versioned), #30 (max_key_versions unclear), #33 (auto-provision arbitrary usernames), #39 (TOCTOU race — RLock held through crypto ops), #40 (ReWrapKeys crash — atomic transaction), #48 (path traversal — ValidateName on all ops), #49 (TTL enforcement — resolveTTL helper), #61 (ECDH zeroization — use stored privBytes), #62 (policy path — mountName helper), #68 (JSON injection — writeJSONError), #69 (policy bypass — newPolicyChecker).
**Medium** (all resolved or accepted): #1, #2, #3, #8, #20, #23, #24, #25, #26, #27, #29, #31, #34.
@@ -453,10 +446,10 @@ All design review findings from the 2026-03-16 audit have been resolved or accep
| Priority | Count | Status |
|----------|-------|--------|
| High | 8 | Open |
| High | 0 | All resolved |
| Medium | 21 | Open |
| Low | 14 | Open |
| Accepted | 3 | Closed |
| Resolved | 38 | Closed |
| Resolved | 46 | Closed |
**Recommendation**: Address all High findings before the next deployment. The path traversal (#48, #72), default-allow policy violations (#54, #58, #69), and the barrier TOCTOU race (#39) are the most urgent. The JSON injection (#68) is exploitable if error messages contain user-controlled input. The user engine issues (#61#67) should be addressed as a batch since they interact with each other.
**Recommendation**: All High findings are resolved. The user engine medium issues (#63#67) should be addressed as a batch since they interact with each other.