# Security Audit Report **Date**: 2026-03-16 (design review), 2026-03-17 (full system audit) **Scope**: Full system — architecture, cryptographic core, all engine implementations, API servers (REST/gRPC), web UI, policy engine, authentication, deployment, documentation --- ## Audit History - **2026-03-16**: Initial design review of ARCHITECTURE.md, engines/sshca.md, engines/transit.md. Issues #1–#24 identified. Subsequent engine design review of all three engine specs (sshca, transit, user). Issues #25–#38 identified. - **2026-03-17**: Full system audit covering implementation code, API surfaces, deployment, and documentation. Issues #39–#80 identified. --- ## Design Review Findings (#1–#38) ### ARCHITECTURE.md #### Strengths - Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth. - Fail-closed design with `ErrSealed` on all operations when sealed. - Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals. - Default-deny policy engine with priority-based rule evaluation. - Issued leaf private keys never stored — good principle of least persistence. #### Issues **1. ~~TLS minimum version should be 1.3, not 1.2~~ RESOLVED** Updated all TLS configurations from `tls.VersionTLS12` to `tls.VersionTLS13`. Removed explicit cipher suite list (TLS 1.3 manages its own). **2. ~~Token cache TTL of 30 seconds is a revocation gap~~ ACCEPTED** Accepted as an explicit trade-off. 30-second cache TTL balances MCIAS load against revocation latency. **3. ~~Admin bypass in policy engine is an all-or-nothing model~~ ACCEPTED** The all-or-nothing admin model is intentional. MCIAS admin users get full access to all engines and operations. **4. ~~Policy rule creation is listed as both Admin-only and User-accessible~~ RESOLVED** Removed duplicate table. gRPC `adminRequiredMethods` now includes `ListPolicies` and `GetPolicy`. All policy CRUD is admin-only across both API surfaces. **5. ~~No integrity protection on barrier entry paths~~ RESOLVED** Barrier now passes entry path as GCM AAD on both `Put` and `Get`. Migration tool created for existing entries. **6. ~~Single MEK with no rotation mechanism~~ RESOLVED** Implemented MEK rotation and per-engine DEKs with v2 ciphertext format. **7. No audit logging** Every certificate issuance, sign operation, and policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind. **8. ~~Rate limiting is in-memory only~~ ACCEPTED** Accepted: Argon2id cost parameters are the primary brute-force mitigation. ### engines/sshca.md #### Strengths - Flat CA model is correct for SSH. - Default principal restriction — users can only sign certs for their own username. - `max_ttl` enforced server-side. - Key zeroization on seal, no private keys in cert records. - RSA excluded — reduces attack surface. - Signing profiles are the only path to critical options — good privilege separation. - Server-side serial generation with `crypto/rand`. #### Issues **9. ~~User-controllable serial numbers~~ RESOLVED** **10. No explicit extension allowlist for host certificates** The `extensions` field for `sign-host` accepts an arbitrary map. The engine should define a default extension set and restrict to an allowlist or require admin for non-default extensions. **11. ~~`critical_options` on user certs is a privilege escalation surface~~ RESOLVED** **12. ~~No KRL (Key Revocation List) support~~ RESOLVED** **13. ~~Policy resource path uses `ca/` prefix instead of `sshca/`~~ RESOLVED** **14. No source-address restriction by default** User certificates should ideally include `source-address` critical options. Consider a mount-level configuration for default critical options. ### engines/transit.md #### Strengths - Ciphertext format with version prefix enables clean key rotation. - `exportable` and `allow_deletion` immutable after creation. - AAD/context binding for AEAD ciphers. - Rewrap never exposes plaintext to caller. - XChaCha20-Poly1305 with 24-byte nonce — correct for random nonce safety. - `trim-key` logic is safe. Batch operations hold read lock for atomicity. - 500-item batch limit prevents resource exhaustion. #### Issues **15. ~~No minimum key version enforcement~~ RESOLVED** **16. ~~Key version pruning safety check~~ RESOLVED** **17. ~~RSA encryption without specifying padding scheme~~ RESOLVED** (RSA removed entirely) **18. ~~HMAC keys used for `sign` operation~~ RESOLVED** **19. ~~No batch encrypt/decrypt operations~~ RESOLVED** **20. ~~`read` action maps to `decrypt`~~ RESOLVED** (granular actions) **21. No rate limiting or quota on cryptographic operations** A compromised token could issue unlimited encrypt/decrypt/sign requests. ### engines/user.md #### Strengths - HKDF with per-recipient random salt prevents wrapping key reuse. - AES-256-GCM for DEK wrapping (consistent with codebase). - ECDH key agreement with info-string binding prevents key confusion. - Explicit zeroization of all intermediate secrets documented. - Envelope format includes salt per-recipient. #### Issues **22. ~~No forward secrecy for stored data~~ RESOLVED** (per-engine DEKs) **23. ~~Generic `POST /v1/engine/request` bypasses typed route middleware~~ RESOLVED** **24. ~~No CSRF protection for web UI~~ RESOLVED** **25–32.** ~~Various spec issues~~ **RESOLVED** (see detailed history below) **33. ~~Auto-provisioning creates keys for arbitrary usernames~~ RESOLVED** **34. ~~No recipient limit on encrypt~~ RESOLVED** **35. ~~No re-encryption support for key rotation~~ RESOLVED** **36–38.** ~~Various spec/cross-cutting issues~~ **RESOLVED** --- ## Full System Audit (2026-03-17) **Scope**: All implementation code, deployment, and documentation. ### Cryptographic Core #### Strengths - AES-256-GCM with 12-byte random nonces from `crypto/rand` — correct. - Argon2id with configurable parameters stored in `seal_config` — correct. - Path-bound AAD in barrier — defense against ciphertext relocation. - Per-engine DEKs with v2 ciphertext format — limits blast radius. - Constant-time comparison via `crypto/subtle` for all secret comparisons. #### Issues **39. TOCTOU race in barrier Seal/Unseal** `barrier.go`: `Seal()` zeroizes keys while concurrent operations may hold stale references between `RLock` release and actual use. A read operation could read the MEK, lose the lock, then use a zeroized key. Requires restructuring to hold the lock through the crypto operation or using atomic pointer swaps. **40. Crash during `ReWrapKeys` loses all barrier data** `seal.go`: If the process crashes between re-encrypting all DEKs in `ReWrapKeys` and updating `seal_config` with the new MEK, all data becomes irrecoverable — the old MEK is gone and the new MEK was never persisted. This needs a two-phase commit or WAL-based approach. **41. `loadKeys` errors silently swallowed during unseal** `barrier.go`: If `loadKeys` fails to decrypt DEK entries (e.g., corrupt `barrier_keys` rows), errors are silently ignored and the keys map may be incomplete. Subsequent operations on engine mounts with missing DEKs will fail with confusing errors instead of failing at unseal time. **42. No AAD binding on MEK encryption with KWK** `seal.go`: The MEK is encrypted with the KWK (derived from password via Argon2id) using `crypto.Encrypt(kwk, mek, nil)`. There is no AAD binding this ciphertext to its purpose. An attacker who can swap `encrypted_mek` in `seal_config` could substitute a different ciphertext (though the practical impact is limited since the KWK is password-derived). **43. Barrier `List` uses SQL LIKE with unescaped prefix** `barrier.go`: The `List` method passes the prefix directly into a SQL `LIKE` clause without escaping `%` and `_` characters. A path containing these characters would match unintended entries. **44. System key rotation query may miss entries** `barrier.go`: `RotateKey` for the system key excludes all `engine/%` paths, but entries at shorter paths encrypted with the system key could be missed if they don't follow the expected naming convention. **45. `Zeroize` loop may be optimized away by compiler** `crypto.go`: The `Zeroize` function uses a simple `for` loop to zero memory. The Go compiler may optimize this away if the slice is not used after zeroization. Use `crypto/subtle.XORBytes` or a volatile-equivalent pattern. **46. SQLite PRAGMAs only applied to first connection** `db.go`: `PRAGMA journal_mode`, `foreign_keys`, and `busy_timeout` are applied once at open time but `database/sql` may open additional connections in its pool that don't receive these PRAGMAs. Use a `ConnInitHook` or `_pragma` DSN parameters. **47. Plaintext not zeroized after re-encryption during key rotation** `barrier.go`: During `RotateKey`, decrypted plaintext is held in a `[]byte` but not zeroized after re-encryption. This leaves plaintext in memory longer than necessary. ### Engine Implementations #### CA (PKI) Engine **48. Path traversal via unsanitized issuer names** `ca/ca.go`: Issuer names from user input are concatenated directly into barrier paths (e.g., `engine/ca/{mount}/issuers/{name}/...`). A name containing `../` could write to arbitrary barrier locations. All engines should validate mount and entity names against a strict pattern (alphanumeric, hyphens, underscores). **49. No TTL enforcement against issuer MaxTTL in issuance** `ca/ca.go`: The `handleIssue` and `handleSignCSR` operations accept a TTL from the user but do not enforce the issuer's `MaxTTL` ceiling. A user can request arbitrarily long certificate lifetimes. **50. Non-admin users can override key usages** `ca/ca.go`: The `key_usages` and `ext_key_usages` fields are accepted from non-admin users. A user could request a certificate with `cert sign` or `crl sign` key usage, potentially creating an intermediate CA certificate. **51. Certificate renewal does not revoke original** `ca/ca.go`: `handleRenew` creates a new certificate but does not revoke the original. This creates duplicate valid certificates for the same identity, which complicates revocation and weakens the security model. **52. Leaf private key in API response not zeroized** `ca/ca.go`: After marshalling the leaf private key to PEM for the API response, the in-memory key material is not zeroized. The key persists in memory until garbage collected. #### SSH CA Engine **53. HandleRequest uses exclusive write lock for all operations** `sshca/sshca.go`: All operations (including reads like `get-cert`, `list-certs`, `get-profile`) acquire a write lock (`mu.Lock()`), serializing the entire engine. Read operations should use `mu.RLock()`. **54. Host signing is default-allow without policy rules** `sshca/sshca.go`: When no policy rules match a host signing request, the engine allows it by default. This contradicts the default-deny principle established in the engineering standards and ARCHITECTURE.md. **55. SSH certificate serial collision risk** `sshca/sshca.go`: Random `uint64` serials have a birthday collision probability of ~50% at ~4 billion certificates. While far beyond typical scale, the engine should detect and retry on collision. **56. KRL is not signed** `sshca/sshca.go`: The generated KRL is not cryptographically signed. An attacker who can intercept the KRL distribution (e.g., MITM on the `GET /v1/sshca/{mount}/krl` endpoint, though TLS mitigates this) could serve a truncated KRL that omits revoked certificates. **57. PEM key bytes not zeroized after parsing in Unseal** `sshca/sshca.go`: After reading the CA private key PEM from the barrier and parsing it, the raw PEM bytes are not zeroized. #### Transit Engine **58. Default-allow for non-admin users contradicts default-deny** `transit/transit.go`: Similar to #54 — when no policy rules match a transit operation, the engine allows it. This should default to deny. **59. Negative ciphertext version not rejected** `transit/transit.go`: `parseVersionedData` does not reject negative version numbers. A crafted ciphertext with a negative version could cause unexpected behavior in version lookups. **60. ECDSA big.Int internals not fully zeroized** `transit/transit.go`: The local `zeroizeKey` clears `D` on ECDSA keys but not `PublicKey.X/Y`. While the public key is not secret, the `big.Int` internal representation may retain data from the private key computation. #### User E2E Encryption Engine **61. ECDH private key zeroization is ineffective** `user/user.go`: `key.Bytes()` returns a copy of the private key bytes. Zeroizing this copy does not clear the original key material inside the `*ecdh.PrivateKey` struct. The actual private key remains in memory. **62. Policy resource path uses mountPath instead of mount name** `user/user.go`: Policy checks use the full mount path instead of the mount name. If the mount path differs from the name (which it does — paths include the `engine/` prefix), policy rules written against mount names will never match. **63. No role checks on decrypt, re-encrypt, and rotate-key** `user/user.go`: The `handleDecrypt`, `handleReEncrypt`, and `handleRotateKey` operations have no role checks. A guest-role user (who should have restricted access per MCIAS role definitions) can perform these operations. **64. Initialize does not acquire mutex** `user/user.go`: The `Initialize` method writes to shared state without holding the mutex, creating a data race if called concurrently. **65. handleEncrypt uses stale state after releasing lock** `user/user.go`: After releasing the write lock during encryption, the handler continues to use pointers to user state that may have been modified by another goroutine. **66. handleReEncrypt uses manual lock without defer** `user/user.go`: Manual `RLock`/`Unlock` calls without `defer` — a panic between lock and unlock will leak the lock, deadlocking the engine. **67. No sealed-state check in user HandleRequest** `user/user.go`: Unlike other engines, the user engine's `HandleRequest` does not check if the engine is sealed. A request reaching the engine after seal but before the HTTP layer catches it could panic on nil map access. ### API Servers #### REST API **68. JSON injection via unsanitized error messages** `server/routes.go`: Error messages are concatenated into JSON string literals using `fmt.Sprintf` without JSON escaping. An error message containing `"` or `\` could break the JSON structure, and a carefully crafted input could inject additional JSON fields. **69. Typed REST handlers bypass policy engine** `server/routes.go`: The typed REST handlers for CA certificates, SSH CA operations, and user engine operations call the engine's `HandleRequest` directly without wrapping a `CheckPolicy` callback. Only the generic `/v1/engine/request` endpoint passes the policy checker. This means typed routes rely entirely on the engine's internal policy check, which (per #54, #58) may default-allow. **70. `RenewCert` gRPC RPC has no corresponding REST route** `server/routes.go`: The `CAService/RenewCert` gRPC RPC exists but has no REST endpoint, violating the API sync rule. #### gRPC API **71. `PKIService/GetCRL` missing from `sealRequiredMethods`** `grpcserver/server.go`: The `GetCRL` RPC can be called even when the service is sealed. While this is arguably intentional (public endpoint), it is inconsistent with the interceptor design where all RPCs are gated. #### Policy Engine **72. Policy rule ID allows path traversal** `policy/policy.go`: Policy rule IDs are not validated. An ID containing `/` or `..` could write to arbitrary paths in the barrier, since rules are stored at `policy/rules/{id}`. **73. `filepath.Match` does not support `**` recursive globs** `policy/policy.go`: Policy resource patterns use `filepath.Match`, which does not support `**` for recursive directory matching. Administrators writing rules like `engine/**/certs/*` will find they don't match as expected. #### Authentication **74. Token validation cache grows without bound** `auth/auth.go`: The token cache has no size limit or eviction of expired entries beyond lazy expiry checks. Under sustained load with many unique tokens, this is an unbounded memory growth vector. ### Web UI **75. CSRF token not bound to user session** `webserver/csrf.go`: CSRF tokens are signed with a server-wide HMAC key but not bound to the user's session. Any valid server-generated CSRF token works for any user, reducing CSRF protection to a server-origin check rather than a session-integrity check. **76. Login cookie missing explicit expiry** `webserver/routes.go`: The `metacrypt_token` cookie has no `MaxAge` or `Expires`, making it a session cookie that persists until the browser is closed. Consider an explicit TTL matching the MCIAS token lifetime. **77. Several POST handlers missing `MaxBytesReader`** `webserver/routes.go`, `webserver/user.go`, `webserver/sshca.go`: `handlePolicyCreate`, `handlePolicyDelete`, `handleUserRegister`, `handleUserRotateKey`, SSH CA cert revoke/delete — all accept POST bodies without `MaxBytesReader`, allowing arbitrarily large request bodies. ### Deployment & Documentation **78. `ExecReload` sends SIGHUP but no handler exists** `deploy/systemd/metacrypt.service`, `deploy/systemd/metacrypt-web.service`: Both units define `ExecReload=/bin/kill -HUP $MAINPID`, but the Go binary does not handle SIGHUP. A `systemctl reload` would crash the process. **79. Dockerfiles use `golang:1.23-alpine` but `go.mod` requires Go 1.25** `Dockerfile.api`, `Dockerfile.web`: The builder stage uses Go 1.23 but the module requires Go 1.25. Builds will fail. **80. ARCHITECTURE.md system overview says "TLS 1.2+" but code enforces TLS 1.3** `ARCHITECTURE.md:33`: The ASCII diagram still says "TLS 1.2+" despite issue #1 being resolved in code. The diagram was not updated. --- ## Open Issues (Unresolved) ### Open — Critical *None.* ### Open — High | # | Issue | Location | |---|-------|----------| | 39 | TOCTOU race in barrier Seal/Unseal allows use of zeroized keys | `barrier/barrier.go` | | 40 | Crash during `ReWrapKeys` makes all barrier data irrecoverable | `seal/seal.go` | | 48 | Path traversal via unsanitized issuer/entity names in all engines | `ca/ca.go`, all engines | | 49 | No TTL enforcement against issuer MaxTTL in cert issuance | `ca/ca.go` | | 61 | ECDH private key zeroization is ineffective (`Bytes()` returns copy) | `user/user.go` | | 62 | Policy resource path uses mountPath instead of mount name | `user/user.go` | | 68 | JSON injection via unsanitized error messages in REST API | `server/routes.go` | | 69 | Typed REST handlers bypass policy engine | `server/routes.go` | ### Open — Medium | # | Issue | Location | |---|-------|----------| | 7 | No audit logging for cryptographic operations | ARCHITECTURE.md | | 10 | No extension allowlist for SSH host certificates | `sshca/sshca.go` | | 21 | No rate limiting on transit cryptographic operations | `transit/transit.go` | | 41 | `loadKeys` errors silently swallowed during unseal | `barrier/barrier.go` | | 42 | No AAD binding on MEK encryption with KWK | `seal/seal.go` | | 43 | Barrier `List` SQL LIKE with unescaped prefix | `barrier/barrier.go` | | 46 | SQLite PRAGMAs only applied to first connection | `db/db.go` | | 50 | Non-admin users can override key usages (cert sign, CRL sign) | `ca/ca.go` | | 51 | Certificate renewal does not revoke original | `ca/ca.go` | | 53 | SSH CA write-locks all operations including reads | `sshca/sshca.go` | | 54 | SSH CA host signing is default-allow (contradicts default-deny) | `sshca/sshca.go` | | 58 | Transit default-allow contradicts default-deny | `transit/transit.go` | | 59 | Negative ciphertext version not rejected in transit | `transit/transit.go` | | 63 | No role checks on user decrypt/re-encrypt/rotate | `user/user.go` | | 64 | User engine Initialize has no mutex | `user/user.go` | | 65 | handleEncrypt uses stale state after lock release | `user/user.go` | | 66 | handleReEncrypt manual lock without defer (leak risk) | `user/user.go` | | 67 | No sealed-state check in user HandleRequest | `user/user.go` | | 70 | `RenewCert` has no REST route (API sync violation) | `server/routes.go` | | 72 | Policy rule ID allows path traversal in barrier | `policy/policy.go` | | 73 | `filepath.Match` does not support `**` recursive globs | `policy/policy.go` | | 74 | Token validation cache grows without bound | `auth/auth.go` | | 78 | systemd `ExecReload` sends SIGHUP with no handler | `deploy/systemd/` | | 79 | Dockerfiles use Go 1.23 but module requires Go 1.25 | `Dockerfile.*` | ### Open — Low | # | Issue | Location | |---|-------|----------| | 14 | No source-address restriction by default in SSH certs | `sshca/sshca.go` | | 44 | System key rotation query may miss entries | `barrier/barrier.go` | | 45 | `Zeroize` loop may be optimized away by compiler | `crypto/crypto.go` | | 47 | Plaintext not zeroized after re-encryption in rotation | `barrier/barrier.go` | | 52 | Leaf private key in API response not zeroized | `ca/ca.go` | | 55 | SSH certificate serial collision risk at scale | `sshca/sshca.go` | | 56 | KRL is not cryptographically signed | `sshca/sshca.go` | | 57 | PEM key bytes not zeroized after parsing in SSH CA | `sshca/sshca.go` | | 60 | ECDSA big.Int internals not fully zeroized | `transit/transit.go` | | 71 | `GetCRL` missing from `sealRequiredMethods` | `grpcserver/server.go` | | 75 | CSRF token not bound to user session | `webserver/csrf.go` | | 76 | Login cookie missing explicit expiry | `webserver/routes.go` | | 77 | POST handlers missing `MaxBytesReader` | `webserver/` | | 80 | ARCHITECTURE.md diagram still says "TLS 1.2+" | `ARCHITECTURE.md` | ### Accepted | # | Issue | Rationale | |---|-------|-----------| | 2 | Token cache 30s revocation gap | Trade-off: MCIAS load vs revocation latency | | 3 | Admin all-or-nothing access | Intentional design | | 8 | Unseal rate limit resets on restart | Argon2id is the primary mitigation | --- ## Resolved Issues (#1–#38) All design review findings from the 2026-03-16 audit have been resolved or accepted. See the [Audit History](#audit-history) section. The following issues were resolved: **Critical** (all resolved): #4 (policy auth contradiction), #9 (user-controllable SSH serials), #13 (policy path collision), #37 (adminOnlyOperations name collision). **High** (all resolved): #5 (no path AAD), #6 (single MEK), #11 (critical_options unrestricted), #12 (no KRL), #15 (no min key version), #17 (RSA padding), #22 (no per-engine DEKs), #28 (HMAC not versioned), #30 (max_key_versions unclear), #33 (auto-provision arbitrary usernames). **Medium** (all resolved or accepted): #1, #2, #3, #8, #20, #23, #24, #25, #26, #27, #29, #31, #34. **Low** (all resolved): #18, #19, #32, #35, #36, #38. --- ## Priority Summary | Priority | Count | Status | |----------|-------|--------| | High | 8 | Open | | Medium | 21 | Open | | Low | 14 | Open | | Accepted | 3 | Closed | | Resolved | 38 | Closed | **Recommendation**: Address all High findings before the next deployment. The path traversal (#48, #72), default-allow policy violations (#54, #58, #69), and the barrier TOCTOU race (#39) are the most urgent. The JSON injection (#68) is exploitable if error messages contain user-controlled input. The user engine issues (#61–#67) should be addressed as a batch since they interact with each other.