# Security Audit Report **Date**: 2026-03-16 (design review), 2026-03-17 (full system audit) **Scope**: Full system — architecture, cryptographic core, all engine implementations, API servers (REST/gRPC), web UI, policy engine, authentication, deployment, documentation --- ## Audit History - **2026-03-16**: Initial design review of ARCHITECTURE.md, engines/sshca.md, engines/transit.md. Issues #1–#24 identified. Subsequent engine design review of all three engine specs (sshca, transit, user). Issues #25–#38 identified. - **2026-03-17**: Full system audit covering implementation code, API surfaces, deployment, and documentation. Issues #39–#80 identified. - **2026-03-16**: High finding remediation validation. #39, #40, #49, #62, #68, #69 confirmed resolved. #48, #61 confirmed still open. - **2026-03-17**: #61 resolved — `handleRotateKey` and `handleDeleteUser` now zeroize stored `privBytes` instead of calling `Bytes()` (which returns a copy). New state in `handleRotateKey` populates `privBytes`. References to `*ecdh.PrivateKey` are nil'd for GC. Residual risk: Go's internal copy in `*ecdh.PrivateKey` cannot be zeroized. --- ## Design Review Findings (#1–#38) ### ARCHITECTURE.md #### Strengths - Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth. - Fail-closed design with `ErrSealed` on all operations when sealed. - Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals. - Default-deny policy engine with priority-based rule evaluation. - Issued leaf private keys never stored — good principle of least persistence. #### Issues **1. ~~TLS minimum version should be 1.3, not 1.2~~ RESOLVED** Updated all TLS configurations from `tls.VersionTLS12` to `tls.VersionTLS13`. Removed explicit cipher suite list (TLS 1.3 manages its own). **2. ~~Token cache TTL of 30 seconds is a revocation gap~~ ACCEPTED** Accepted as an explicit trade-off. 30-second cache TTL balances MCIAS load against revocation latency. **3. ~~Admin bypass in policy engine is an all-or-nothing model~~ ACCEPTED** The all-or-nothing admin model is intentional. MCIAS admin users get full access to all engines and operations. **4. ~~Policy rule creation is listed as both Admin-only and User-accessible~~ RESOLVED** Removed duplicate table. gRPC `adminRequiredMethods` now includes `ListPolicies` and `GetPolicy`. All policy CRUD is admin-only across both API surfaces. **5. ~~No integrity protection on barrier entry paths~~ RESOLVED** Barrier now passes entry path as GCM AAD on both `Put` and `Get`. Migration tool created for existing entries. **6. ~~Single MEK with no rotation mechanism~~ RESOLVED** Implemented MEK rotation and per-engine DEKs with v2 ciphertext format. **7. No audit logging** Every certificate issuance, sign operation, and policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind. **8. ~~Rate limiting is in-memory only~~ ACCEPTED** Accepted: Argon2id cost parameters are the primary brute-force mitigation. ### engines/sshca.md #### Strengths - Flat CA model is correct for SSH. - Default principal restriction — users can only sign certs for their own username. - `max_ttl` enforced server-side. - Key zeroization on seal, no private keys in cert records. - RSA excluded — reduces attack surface. - Signing profiles are the only path to critical options — good privilege separation. - Server-side serial generation with `crypto/rand`. #### Issues **9. ~~User-controllable serial numbers~~ RESOLVED** **10. No explicit extension allowlist for host certificates** The `extensions` field for `sign-host` accepts an arbitrary map. The engine should define a default extension set and restrict to an allowlist or require admin for non-default extensions. **11. ~~`critical_options` on user certs is a privilege escalation surface~~ RESOLVED** **12. ~~No KRL (Key Revocation List) support~~ RESOLVED** **13. ~~Policy resource path uses `ca/` prefix instead of `sshca/`~~ RESOLVED** **14. No source-address restriction by default** User certificates should ideally include `source-address` critical options. Consider a mount-level configuration for default critical options. ### engines/transit.md #### Strengths - Ciphertext format with version prefix enables clean key rotation. - `exportable` and `allow_deletion` immutable after creation. - AAD/context binding for AEAD ciphers. - Rewrap never exposes plaintext to caller. - XChaCha20-Poly1305 with 24-byte nonce — correct for random nonce safety. - `trim-key` logic is safe. Batch operations hold read lock for atomicity. - 500-item batch limit prevents resource exhaustion. #### Issues **15. ~~No minimum key version enforcement~~ RESOLVED** **16. ~~Key version pruning safety check~~ RESOLVED** **17. ~~RSA encryption without specifying padding scheme~~ RESOLVED** (RSA removed entirely) **18. ~~HMAC keys used for `sign` operation~~ RESOLVED** **19. ~~No batch encrypt/decrypt operations~~ RESOLVED** **20. ~~`read` action maps to `decrypt`~~ RESOLVED** (granular actions) **21. No rate limiting or quota on cryptographic operations** A compromised token could issue unlimited encrypt/decrypt/sign requests. ### engines/user.md #### Strengths - HKDF with per-recipient random salt prevents wrapping key reuse. - AES-256-GCM for DEK wrapping (consistent with codebase). - ECDH key agreement with info-string binding prevents key confusion. - Explicit zeroization of all intermediate secrets documented. - Envelope format includes salt per-recipient. #### Issues **22. ~~No forward secrecy for stored data~~ RESOLVED** (per-engine DEKs) **23. ~~Generic `POST /v1/engine/request` bypasses typed route middleware~~ RESOLVED** **24. ~~No CSRF protection for web UI~~ RESOLVED** **25–32.** ~~Various spec issues~~ **RESOLVED** (see detailed history below) **33. ~~Auto-provisioning creates keys for arbitrary usernames~~ RESOLVED** **34. ~~No recipient limit on encrypt~~ RESOLVED** **35. ~~No re-encryption support for key rotation~~ RESOLVED** **36–38.** ~~Various spec/cross-cutting issues~~ **RESOLVED** --- ## Full System Audit (2026-03-17) **Scope**: All implementation code, deployment, and documentation. ### Cryptographic Core #### Strengths - AES-256-GCM with 12-byte random nonces from `crypto/rand` — correct. - Argon2id with configurable parameters stored in `seal_config` — correct. - Path-bound AAD in barrier — defense against ciphertext relocation. - Per-engine DEKs with v2 ciphertext format — limits blast radius. - Constant-time comparison via `crypto/subtle` for all secret comparisons. #### Issues **39. ~~TOCTOU race in barrier Seal/Unseal~~ RESOLVED** `barrier.go`: `Get()` and `Put()` hold `RLock` (via `defer`) through the entire crypto operation including decryption/encryption. `Seal()` acquires an exclusive `Lock()`, which blocks until all RLock holders release. There is no window where a reader can use zeroized key material. **40. ~~Crash during `ReWrapKeys` loses all barrier data~~ RESOLVED** `seal.go`: `RotateMEK` now wraps both `ReWrapKeysTx` (re-encrypts all DEKs) and the `seal_config` update in a single SQLite transaction. A crash at any point results in full rollback or full commit — no partial state. In-memory state (`SwapMEK`) is updated only after successful commit. **41. `loadKeys` errors silently swallowed during unseal** `barrier.go`: If `loadKeys` fails to decrypt DEK entries (e.g., corrupt `barrier_keys` rows), errors are silently ignored and the keys map may be incomplete. Subsequent operations on engine mounts with missing DEKs will fail with confusing errors instead of failing at unseal time. **42. No AAD binding on MEK encryption with KWK** `seal.go`: The MEK is encrypted with the KWK (derived from password via Argon2id) using `crypto.Encrypt(kwk, mek, nil)`. There is no AAD binding this ciphertext to its purpose. An attacker who can swap `encrypted_mek` in `seal_config` could substitute a different ciphertext (though the practical impact is limited since the KWK is password-derived). **43. Barrier `List` uses SQL LIKE with unescaped prefix** `barrier.go`: The `List` method passes the prefix directly into a SQL `LIKE` clause without escaping `%` and `_` characters. A path containing these characters would match unintended entries. **44. System key rotation query may miss entries** `barrier.go`: `RotateKey` for the system key excludes all `engine/%` paths, but entries at shorter paths encrypted with the system key could be missed if they don't follow the expected naming convention. **45. `Zeroize` loop may be optimized away by compiler** `crypto.go`: The `Zeroize` function uses a simple `for` loop to zero memory. The Go compiler may optimize this away if the slice is not used after zeroization. Use `crypto/subtle.XORBytes` or a volatile-equivalent pattern. **46. SQLite PRAGMAs only applied to first connection** `db.go`: `PRAGMA journal_mode`, `foreign_keys`, and `busy_timeout` are applied once at open time but `database/sql` may open additional connections in its pool that don't receive these PRAGMAs. Use a `ConnInitHook` or `_pragma` DSN parameters. **47. Plaintext not zeroized after re-encryption during key rotation** `barrier.go`: During `RotateKey`, decrypted plaintext is held in a `[]byte` but not zeroized after re-encryption. This leaves plaintext in memory longer than necessary. ### Engine Implementations #### CA (PKI) Engine **48. ~~Path traversal via unsanitized entity names in get/update/delete operations~~ RESOLVED** All engines now call `engine.ValidateName()` on every operation that accepts user-supplied names, not just create operations. Fixed in: CA (`handleGetChain`, `handleGetIssuer`, `handleDeleteIssuer`, `handleIssue`, `handleSignCSR`), SSH CA (`handleUpdateProfile`, `handleGetProfile`, `handleDeleteProfile`), Transit (`handleDeleteKey`, `handleGetKey`, `handleRotateKey`, `handleUpdateKeyConfig`, `handleTrimKey`, `handleGetPublicKey`), User (`handleRegister`, `handleGetPublicKey`, `handleDeleteUser`). **49. ~~No TTL enforcement against issuer MaxTTL in issuance~~ RESOLVED** `ca/ca.go`: Both `handleIssue` and `handleSignCSR` now use a `resolveTTL` helper that parses the issuer's `MaxTTL`, caps the requested TTL against it, and returns an error if the requested TTL exceeds the maximum. Default TTL is the issuer's MaxTTL when none is specified. **50. Non-admin users can override key usages** `ca/ca.go`: The `key_usages` and `ext_key_usages` fields are accepted from non-admin users. A user could request a certificate with `cert sign` or `crl sign` key usage, potentially creating an intermediate CA certificate. **51. Certificate renewal does not revoke original** `ca/ca.go`: `handleRenew` creates a new certificate but does not revoke the original. This creates duplicate valid certificates for the same identity, which complicates revocation and weakens the security model. **52. Leaf private key in API response not zeroized** `ca/ca.go`: After marshalling the leaf private key to PEM for the API response, the in-memory key material is not zeroized. The key persists in memory until garbage collected. #### SSH CA Engine **53. HandleRequest uses exclusive write lock for all operations** `sshca/sshca.go`: All operations (including reads like `get-cert`, `list-certs`, `get-profile`) acquire a write lock (`mu.Lock()`), serializing the entire engine. Read operations should use `mu.RLock()`. **54. Host signing is default-allow without policy rules** `sshca/sshca.go`: When no policy rules match a host signing request, the engine allows it by default. This contradicts the default-deny principle established in the engineering standards and ARCHITECTURE.md. **55. SSH certificate serial collision risk** `sshca/sshca.go`: Random `uint64` serials have a birthday collision probability of ~50% at ~4 billion certificates. While far beyond typical scale, the engine should detect and retry on collision. **56. KRL is not signed** `sshca/sshca.go`: The generated KRL is not cryptographically signed. An attacker who can intercept the KRL distribution (e.g., MITM on the `GET /v1/sshca/{mount}/krl` endpoint, though TLS mitigates this) could serve a truncated KRL that omits revoked certificates. **57. PEM key bytes not zeroized after parsing in Unseal** `sshca/sshca.go`: After reading the CA private key PEM from the barrier and parsing it, the raw PEM bytes are not zeroized. #### Transit Engine **58. Default-allow for non-admin users contradicts default-deny** `transit/transit.go`: Similar to #54 — when no policy rules match a transit operation, the engine allows it. This should default to deny. **59. Negative ciphertext version not rejected** `transit/transit.go`: `parseVersionedData` does not reject negative version numbers. A crafted ciphertext with a negative version could cause unexpected behavior in version lookups. **60. ECDSA big.Int internals not fully zeroized** `transit/transit.go`: The local `zeroizeKey` clears `D` on ECDSA keys but not `PublicKey.X/Y`. While the public key is not secret, the `big.Int` internal representation may retain data from the private key computation. #### User E2E Encryption Engine **61. ~~ECDH private key zeroization is ineffective~~ RESOLVED** `user/user.go`: `handleRotateKey` and `handleDeleteUser` now zeroize the stored `privBytes` field (retained at key creation time) instead of calling `Bytes()` which returns a new copy. The `privKey` and `privBytes` fields are nil'd after zeroization to allow GC of the `*ecdh.PrivateKey` object. Note: Go's `*ecdh.PrivateKey` internal bytes cannot be zeroized through the public API — this is a known limitation of Go's crypto library. The stored `privBytes` copy is the best-effort mitigation. **62. ~~Policy resource path uses mountPath instead of mount name~~ RESOLVED** `user/user.go`: A `mountName()` helper extracts the mount name from the full mount path (e.g., `"engine/user/mymount/"` → `"mymount"`). Policy resource paths are correctly constructed as `"user/{mountname}/recipient/{recipient}"`. **63. No role checks on decrypt, re-encrypt, and rotate-key** `user/user.go`: The `handleDecrypt`, `handleReEncrypt`, and `handleRotateKey` operations have no role checks. A guest-role user (who should have restricted access per MCIAS role definitions) can perform these operations. **64. Initialize does not acquire mutex** `user/user.go`: The `Initialize` method writes to shared state without holding the mutex, creating a data race if called concurrently. **65. handleEncrypt uses stale state after releasing lock** `user/user.go`: After releasing the write lock during encryption, the handler continues to use pointers to user state that may have been modified by another goroutine. **66. handleReEncrypt uses manual lock without defer** `user/user.go`: Manual `RLock`/`Unlock` calls without `defer` — a panic between lock and unlock will leak the lock, deadlocking the engine. **67. No sealed-state check in user HandleRequest** `user/user.go`: Unlike other engines, the user engine's `HandleRequest` does not check if the engine is sealed. A request reaching the engine after seal but before the HTTP layer catches it could panic on nil map access. ### API Servers #### REST API **68. ~~JSON injection via unsanitized error messages~~ RESOLVED** `server/routes.go`: All error responses now use `writeJSONError()` which delegates to `writeJSON()` → `json.NewEncoder().Encode()`, properly JSON-escaping all error message content. **69. ~~Typed REST handlers bypass policy engine~~ RESOLVED** `server/routes.go`: All typed REST handlers now pass a `CheckPolicy` callback via `s.newPolicyChecker(r, info)` or an inline policy checker function. This includes all SSH CA, transit, user, and CA handlers. **70. `RenewCert` gRPC RPC has no corresponding REST route** `server/routes.go`: The `CAService/RenewCert` gRPC RPC exists but has no REST endpoint, violating the API sync rule. #### gRPC API **71. `PKIService/GetCRL` missing from `sealRequiredMethods`** `grpcserver/server.go`: The `GetCRL` RPC can be called even when the service is sealed. While this is arguably intentional (public endpoint), it is inconsistent with the interceptor design where all RPCs are gated. #### Policy Engine **72. Policy rule ID allows path traversal** `policy/policy.go`: Policy rule IDs are not validated. An ID containing `/` or `..` could write to arbitrary paths in the barrier, since rules are stored at `policy/rules/{id}`. **73. `filepath.Match` does not support `**` recursive globs** `policy/policy.go`: Policy resource patterns use `filepath.Match`, which does not support `**` for recursive directory matching. Administrators writing rules like `engine/**/certs/*` will find they don't match as expected. #### Authentication **74. Token validation cache grows without bound** `auth/auth.go`: The token cache has no size limit or eviction of expired entries beyond lazy expiry checks. Under sustained load with many unique tokens, this is an unbounded memory growth vector. ### Web UI **75. CSRF token not bound to user session** `webserver/csrf.go`: CSRF tokens are signed with a server-wide HMAC key but not bound to the user's session. Any valid server-generated CSRF token works for any user, reducing CSRF protection to a server-origin check rather than a session-integrity check. **76. Login cookie missing explicit expiry** `webserver/routes.go`: The `metacrypt_token` cookie has no `MaxAge` or `Expires`, making it a session cookie that persists until the browser is closed. Consider an explicit TTL matching the MCIAS token lifetime. **77. Several POST handlers missing `MaxBytesReader`** `webserver/routes.go`, `webserver/user.go`, `webserver/sshca.go`: `handlePolicyCreate`, `handlePolicyDelete`, `handleUserRegister`, `handleUserRotateKey`, SSH CA cert revoke/delete — all accept POST bodies without `MaxBytesReader`, allowing arbitrarily large request bodies. ### Deployment & Documentation **78. `ExecReload` sends SIGHUP but no handler exists** `deploy/systemd/metacrypt.service`, `deploy/systemd/metacrypt-web.service`: Both units define `ExecReload=/bin/kill -HUP $MAINPID`, but the Go binary does not handle SIGHUP. A `systemctl reload` would crash the process. **79. Dockerfiles use `golang:1.23-alpine` but `go.mod` requires Go 1.25** `Dockerfile.api`, `Dockerfile.web`: The builder stage uses Go 1.23 but the module requires Go 1.25. Builds will fail. **80. ARCHITECTURE.md system overview says "TLS 1.2+" but code enforces TLS 1.3** `ARCHITECTURE.md:33`: The ASCII diagram still says "TLS 1.2+" despite issue #1 being resolved in code. The diagram was not updated. --- ## Open Issues (Unresolved) ### Open — Critical *None.* ### Open — High *None.* ### Open — Medium | # | Issue | Location | |---|-------|----------| | 7 | No audit logging for cryptographic operations | ARCHITECTURE.md | | 10 | No extension allowlist for SSH host certificates | `sshca/sshca.go` | | 21 | No rate limiting on transit cryptographic operations | `transit/transit.go` | | 41 | `loadKeys` errors silently swallowed during unseal | `barrier/barrier.go` | | 42 | No AAD binding on MEK encryption with KWK | `seal/seal.go` | | 43 | Barrier `List` SQL LIKE with unescaped prefix | `barrier/barrier.go` | | 46 | SQLite PRAGMAs only applied to first connection | `db/db.go` | | 50 | Non-admin users can override key usages (cert sign, CRL sign) | `ca/ca.go` | | 51 | Certificate renewal does not revoke original | `ca/ca.go` | | 53 | SSH CA write-locks all operations including reads | `sshca/sshca.go` | | 54 | SSH CA host signing is default-allow (contradicts default-deny) | `sshca/sshca.go` | | 58 | Transit default-allow contradicts default-deny | `transit/transit.go` | | 59 | Negative ciphertext version not rejected in transit | `transit/transit.go` | | 63 | No role checks on user decrypt/re-encrypt/rotate | `user/user.go` | | 64 | User engine Initialize has no mutex | `user/user.go` | | 65 | handleEncrypt uses stale state after lock release | `user/user.go` | | 66 | handleReEncrypt manual lock without defer (leak risk) | `user/user.go` | | 67 | No sealed-state check in user HandleRequest | `user/user.go` | | 70 | `RenewCert` has no REST route (API sync violation) | `server/routes.go` | | 72 | Policy rule ID allows path traversal in barrier | `policy/policy.go` | | 73 | `filepath.Match` does not support `**` recursive globs | `policy/policy.go` | | 74 | Token validation cache grows without bound | `auth/auth.go` | | 78 | systemd `ExecReload` sends SIGHUP with no handler | `deploy/systemd/` | | 79 | Dockerfiles use Go 1.23 but module requires Go 1.25 | `Dockerfile.*` | ### Open — Low | # | Issue | Location | |---|-------|----------| | 14 | No source-address restriction by default in SSH certs | `sshca/sshca.go` | | 44 | System key rotation query may miss entries | `barrier/barrier.go` | | 45 | `Zeroize` loop may be optimized away by compiler | `crypto/crypto.go` | | 47 | Plaintext not zeroized after re-encryption in rotation | `barrier/barrier.go` | | 52 | Leaf private key in API response not zeroized | `ca/ca.go` | | 55 | SSH certificate serial collision risk at scale | `sshca/sshca.go` | | 56 | KRL is not cryptographically signed | `sshca/sshca.go` | | 57 | PEM key bytes not zeroized after parsing in SSH CA | `sshca/sshca.go` | | 60 | ECDSA big.Int internals not fully zeroized | `transit/transit.go` | | 71 | `GetCRL` missing from `sealRequiredMethods` | `grpcserver/server.go` | | 75 | CSRF token not bound to user session | `webserver/csrf.go` | | 76 | Login cookie missing explicit expiry | `webserver/routes.go` | | 77 | POST handlers missing `MaxBytesReader` | `webserver/` | | 80 | ARCHITECTURE.md diagram still says "TLS 1.2+" | `ARCHITECTURE.md` | ### Accepted | # | Issue | Rationale | |---|-------|-----------| | 2 | Token cache 30s revocation gap | Trade-off: MCIAS load vs revocation latency | | 3 | Admin all-or-nothing access | Intentional design | | 8 | Unseal rate limit resets on restart | Argon2id is the primary mitigation | --- ## Resolved Issues (#1–#38, plus #39, #40, #48, #49, #61, #62, #68, #69) All design review findings from the 2026-03-16 audit have been resolved or accepted. See the [Audit History](#audit-history) section. The following issues were resolved: **Critical** (all resolved): #4 (policy auth contradiction), #9 (user-controllable SSH serials), #13 (policy path collision), #37 (adminOnlyOperations name collision). **High** (all resolved): #5 (no path AAD), #6 (single MEK), #11 (critical_options unrestricted), #12 (no KRL), #15 (no min key version), #17 (RSA padding), #22 (no per-engine DEKs), #28 (HMAC not versioned), #30 (max_key_versions unclear), #33 (auto-provision arbitrary usernames), #39 (TOCTOU race — RLock held through crypto ops), #40 (ReWrapKeys crash — atomic transaction), #48 (path traversal — ValidateName on all ops), #49 (TTL enforcement — resolveTTL helper), #61 (ECDH zeroization — use stored privBytes), #62 (policy path — mountName helper), #68 (JSON injection — writeJSONError), #69 (policy bypass — newPolicyChecker). **Medium** (all resolved or accepted): #1, #2, #3, #8, #20, #23, #24, #25, #26, #27, #29, #31, #34. **Low** (all resolved): #18, #19, #32, #35, #36, #38. --- ## Priority Summary | Priority | Count | Status | |----------|-------|--------| | High | 0 | All resolved | | Medium | 21 | Open | | Low | 14 | Open | | Accepted | 3 | Closed | | Resolved | 46 | Closed | **Recommendation**: All High findings are resolved. The user engine medium issues (#63–#67) should be addressed as a batch since they interact with each other.