20 KiB
Security Audit Report
Date: 2026-03-16 Scope: ARCHITECTURE.md, engines/sshca.md, engines/transit.md
ARCHITECTURE.md
Strengths
- Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth.
- Fail-closed design with
ErrSealedon all operations when sealed. - Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals.
- Default-deny policy engine with priority-based rule evaluation.
- Issued leaf private keys never stored — good principle of least persistence.
Issues
1. TLS minimum version should be 1.3, not 1.2 RESOLVED
Updated all TLS configurations (HTTP server, gRPC server, web server, vault client, Go client library, CLI commands) from tls.VersionTLS12 to tls.VersionTLS13. Removed explicit cipher suite list from HTTP server (TLS 1.3 manages its own). Updated ARCHITECTURE.md TLS section and threat mitigations table.
2. Token cache TTL of 30 seconds is a revocation gap ACCEPTED
Accepted as an explicit trade-off. The 30-second cache TTL balances MCIAS load against revocation latency. For this system's scale and threat model, the window is acceptable.
3. Admin bypass in policy engine is an all-or-nothing model ACCEPTED
The all-or-nothing admin model is intentional by design. MCIAS admin users get full access to all engines and operations. This is the desired behavior for this system.
4. Policy rule creation is listed as both Admin-only and User-accessible RESOLVED
The second policy table in ARCHITECTURE.md incorrectly listed User auth; removed the duplicate. gRPC adminRequiredMethods now includes ListPolicies and GetPolicy to match REST behavior. All policy CRUD is admin-only across both API surfaces.
5. No integrity protection on barrier entry paths RESOLVED
Updated crypto.Encrypt/crypto.Decrypt to accept an additionalData parameter. The barrier now passes the entry path as GCM AAD on both Put and Get, binding each ciphertext to its storage path. Seal operations pass nil (no path context). Added TestEncryptDecryptWithAAD covering correct-AAD, wrong-AAD, and nil-AAD cases. Existing barrier entries will fail to decrypt after this change — a one-off migration tool is needed to re-encrypt all entries (decrypt with nil AAD under old code, re-encrypt with path AAD).
6. Single MEK with no rotation mechanism RESOLVED
Implemented MEK rotation and per-engine DEKs. The v2 ciphertext format (0x02) embeds a key ID that identifies which DEK encrypted each entry. MEK rotation (POST /v1/barrier/rotate-mek) re-wraps all DEKs without re-encrypting data. DEK rotation (POST /v1/barrier/rotate-key) re-encrypts entries under a specific key. A migration endpoint converts v1 entries to v2 format. The barrier_keys table stores MEK-wrapped DEKs with version tracking.
7. No audit logging
Acknowledged as future work, but for a cryptographic service this is a significant gap. Every certificate issuance, every sign operation, every policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind.
8. Rate limiting is in-memory only ACCEPTED
The in-memory rate limit protects against remote brute-force over the network, which is the realistic threat. Persisting the counter in the database would not add tamper resistance: the barrier is sealed during unseal attempts so encrypted storage is unavailable, and the unencrypted database could be reset by an attacker with disk access. An attacker who can restart the service already has local system access, making the rate limit moot regardless of persistence. Argon2id cost parameters (128 MiB memory-hard) are the primary brute-force mitigation and are stored in seal_config.
engines/sshca.md
Strengths
- Flat CA model is correct for SSH (no intermediate hierarchy needed).
- Default principal restriction (users can only sign certs for their own username) is the right default.
max_ttlenforced server-side — good.- Key zeroization on seal, no private keys in cert records.
Issues
9. User-controllable serial numbers RESOLVED
Removed the optional serial field from both sign-host and sign-user request data. Serials are always generated server-side using crypto/rand (64-bit). Updated flows and security considerations in sshca.md.
10. No explicit extension allowlist for host certificates
The extensions field for sign-host accepts an arbitrary map. SSH extensions have security implications (e.g., permit-pty, permit-port-forwarding, permit-user-rc). Without an allowlist, a user could request extensions that grant more capabilities than intended. The engine should define a default extension set and either:
- Restrict to an allowlist, or
- Require admin for non-default extensions.
11. RESOLVEDcritical_options on user certs is a privilege escalation surface
Removed critical_options from the sign-user request. Critical options can only be applied via admin-defined signing profiles, which are policy-gated (sshca/{mount}/profile/{name}, action read). Profile CRUD is admin-only. Profiles specify critical options, extensions, optional max TTL, and optional principal restrictions. Security considerations updated accordingly.
12. No KRL (Key Revocation List) support RESOLVED
Added a full KRL section to sshca.md covering: in-memory KRL generation from revoked serials, barrier persistence at engine/sshca/{mount}/krl.bin, automatic rebuild on revoke/delete/unseal, a public GET /v1/sshca/{mount}/krl endpoint with ETag and Cache-Control headers, GetKRL gRPC RPC, and a pull-based distribution model with example sshd_config and cron fetch.
13. Policy resource path uses RESOLVEDca/ prefix instead of sshca/
Updated policy check paths in sshca.md from ca/{mount}/id/... to sshca/{mount}/id/... for both sign-host and sign-user flows, eliminating the namespace collision with the CA (PKI) engine.
14. No source-address restriction by default
User certificates should ideally include source-address critical options to limit where they can be used from. At minimum, consider a mount-level configuration for default critical options that get applied to all user certs.
engines/transit.md
Strengths
- Ciphertext format with version prefix enables clean key rotation.
exportableandallow_deletionimmutable after creation — prevents policy weakening.- AAD/context binding for AEAD ciphers.
- Rewrap never exposes plaintext to caller.
Issues
15. No minimum key version enforcement RESOLVED
Added min_decryption_version per key (default 1). Decryption requests for versions below the minimum are rejected. New update-key-config operation (admin-only) advances the minimum (can only increase, cannot exceed current version). New trim-key operation permanently deletes versions older than the minimum. Both have corresponding gRPC RPCs and REST endpoints. The rotation cycle is documented: rotate → rewrap → advance min → trim.
16. Key version pruning with RESOLVEDmax_key_versions has no safety check
Added explicit max_key_versions behavior: auto-pruning during rotate-key only deletes versions strictly less than min_decryption_version. If the version count exceeds the limit but no eligible candidates remain, a warning is returned. This ensures pruning never destroys versions that may still have unrewrapped ciphertext. See also #30.
17. RSA encryption without specifying padding scheme RESOLVED
RSA key types (rsa-2048, rsa-4096) removed entirely from the transit engine. Asymmetric encryption belongs in the user engine (via ECDH); RSA signing offers no advantage over Ed25519/ECDSA. crypto/rsa removed from dependencies. Rationale documented in key types section and security considerations.
18. HMAC keys used for RESOLVEDsign operation is confusing
sign and verify are now restricted to asymmetric key types (Ed25519, ECDSA). HMAC keys are rejected with an error — HMAC must use the dedicated hmac operation. Policy actions are already split: sign, verify, and hmac are separate granular actions, all matched by any.
19. No batch encrypt/decrypt operations RESOLVED
Added batch-encrypt, batch-decrypt, and batch-rewrap operations to the transit engine plan. Each targets a single named key with an array of items; results are returned in order with per-item errors (partial success model). An optional reference field lets callers correlate results with source records. Policy is checked once per batch. Added corresponding gRPC RPCs and REST endpoints. operationAction maps batch variants to the same granular actions as their single counterparts.
20. RESOLVEDread action maps to decrypt and verify — semantics are misleading
Replaced the coarse read/write action model with granular per-operation actions: encrypt, decrypt, sign, verify, hmac for cryptographic operations; read for metadata retrieval; write for key management; admin for administrative operations. Added any action that matches all non-admin actions. Added LintRule validation that rejects unknown effects and actions. CreateRule now validates before storing. Updated operationAction mapping and all tests.
21. No rate limiting or quota on cryptographic operations
A compromised or malicious user token could issue unlimited encrypt/decrypt/sign requests, potentially using the service as a cryptographic oracle. Consider per-user rate limits on transit operations.
Cross-Cutting Issues
22. No forward secrecy for stored data RESOLVED: Per-engine DEKs limit blast radius — compromise of one DEK only exposes that engine's data, not the entire barrier. MEK compromise still exposes all DEKs, but MEK rotation enables periodic re-keying. Each engine mount gets its own DEK created automatically; a "system" DEK protects non-engine data. v2 ciphertext format embeds key IDs for DEK lookup.
23. Generic RESOLVED: Added an POST /v1/engine/request bypasses typed route middlewareadminOnlyOperations map to handleEngineRequest that mirrors the admin gates on typed REST routes (e.g. create-issuer, delete-cert, create-key, rotate-key, create-profile, provision). Non-admin users are rejected with 403 before policy evaluation or engine dispatch. The v1 gRPC Execute RPC is defined in the proto but not registered in the server — only v2 typed RPCs are used, so the gRPC surface is not affected. Tests cover both admin and non-admin paths through the generic endpoint.
24. No CSRF protection mentioned for web UI RESOLVED: Added signed double-submit cookie CSRF protection. A per-server HMAC secret signs random nonce-based tokens. Every form includes a {{csrfField}} hidden input; a middleware validates that the form field matches the cookie and has a valid HMAC signature on all POST/PUT/PATCH/DELETE requests. Session cookie upgraded from SameSite=Lax to SameSite=Strict. CSRF cookie is also HttpOnly, Secure, SameSite=Strict. Tests cover token generation/validation, cross-secret rejection, middleware pass/block/mismatch scenarios.
Engine Design Review (2026-03-16)
Scope: engines/sshca.md, engines/transit.md, engines/user.md (patched specs)
engines/sshca.md
Strengths
- RSA excluded — reduces attack surface, correct for SSH CA use case.
- Detailed Go code snippets for Initialize, sign-host, sign-user flows.
- KRL custom implementation correctly identified that
x/crypto/sshlacks KRL builders. - Signing profiles are the only path to critical options — good privilege separation.
- Server-side serial generation with
crypto/rand— no user-controllable serials.
Issues
25. Missing RESOLVEDlist-certs REST route
Added GET /v1/sshca/{mount}/certs to the REST endpoints table and route registration code block. API sync restored.
26. KRL section type description contradicts pseudocode RESOLVED
Fixed the description block to use KRL_SECTION_CERTIFICATES (0x01) for the outer section type, matching the pseudocode and the OpenSSH PROTOCOL.krl spec.
27. Policy check after certificate construction in sign-host RESOLVED
Reordered both sign-host and sign-user flows to perform the policy check before generating the serial and building the certificate. Serial generation now only happens after authorization succeeds.
engines/transit.md
Strengths
- XChaCha20-Poly1305 (not ChaCha20-Poly1305) — correct for random nonce safety.
- All nonce sizes, hash algorithms, and signature encodings now specified.
trim-keylogic is detailed and safe (no-op whenmin_decryption_versionis 1).- Batch operations hold a read lock for atomicity with respect to key rotation.
- 500-item batch limit prevents resource exhaustion.
Issues
28. HMAC output not versioned — unverifiable after key rotation RESOLVED
HMAC output now uses the same metacrypt:v{version}:{base64} format as ciphertext and signatures. Verification parses the version prefix, loads the corresponding key (subject to min_decryption_version), and uses hmac.Equal for constant-time comparison.
29. RESOLVEDrewrap policy action not specified
rewrap and batch-rewrap now map to the decrypt action — rewrap internally decrypts and re-encrypts, so the caller must have decrypt permission. Batch variants map to the same action as their single counterparts. Documented in the authorization section.
30. RESOLVEDmax_key_versions interaction with min_decryption_version unclear
Added explicit max_key_versions behavior section. Pruning happens during rotate-key and only deletes versions strictly less than min_decryption_version. If the limit is exceeded but no eligible candidates remain, a warning is returned. This also resolves audit finding #16.
31. Missing RESOLVEDget-public-key REST route
Added GET /v1/transit/{mount}/keys/{name}/public-key to the REST endpoints table and route registration code block. API sync restored.
32. RESOLVEDexportable flag with no export operation
Removed the exportable flag from create-key. Transit's value proposition is that keys never leave the service. If export is needed for migration, a dedicated admin-only operation can be added later with audit logging.
engines/user.md
Strengths
- HKDF with per-recipient random salt — prevents wrapping key reuse across messages.
- AES-256-GCM for DEK wrapping (consistent with codebase, avoids new primitive).
- ECDH key agreement with info-string binding prevents key confusion.
- Explicit zeroization of all intermediate secrets documented.
- Envelope format includes salt per-recipient — correct for HKDF security.
Issues
33. Auto-provisioning creates keys for arbitrary usernames RESOLVED
The encrypt flow now validates recipient usernames against MCIAS via auth.ValidateUsername before auto-provisioning. Non-existent usernames are rejected with an error, preventing barrier pollution.
34. No recipient limit on encrypt RESOLVED
Added a maxRecipients = 100 limit. Requests exceeding this limit are rejected with 400 Bad Request before any ECDH computation.
35. No re-encryption support for key rotation RESOLVED
Added a re-encrypt operation that decrypts an envelope and re-encrypts it with current key pairs for all recipients. This enables safe key rotation: re-encrypt all stored envelopes first, then call rotate-key. Added to HandleRequest dispatch, gRPC service, REST endpoints, and route registration.
36. RESOLVEDUserKeyConfig type undefined
Defined UserKeyConfig struct with Algorithm, CreatedAt, and AutoProvisioned fields in the in-memory state section.
Cross-Cutting Issues (Engine Designs)
37. RESOLVEDadminOnlyOperations name collision blocks user engine rotate-key
Changed the adminOnlyOperations map from flat operation names to engine-type-qualified keys (engineType:operation, e.g. "transit:rotate-key"). The generic endpoint now resolves the mount's engine type via GetMount before checking the map. Added tests verifying that rotate-key on a user mount succeeds for non-admin users while rotate-key on a transit mount correctly requires admin.
38. RESOLVEDengine.ZeroizeKey helper prerequisite not cross-referenced
Added prerequisite step to both transit and user implementation steps referencing engines/sshca.md step 1 for the engine.ZeroizeKey shared helper.
Priority Summary
| Priority | Issue | Location |
|---|---|---|
| ARCHITECTURE.md | ||
| sshca.md | ||
ca/ vs sshca/) |
sshca.md | |
adminOnlyOperations name collision blocks user rotate-key |
Cross-cutting | |
| ARCHITECTURE.md | ||
| sshca.md | ||
| transit.md | ||
| transit.md | ||
critical_options not restricted |
sshca.md | |
| ARCHITECTURE.md | ||
| Cross-cutting | ||
| transit.md | ||
max_key_versions vs min_decryption_version unclear |
transit.md | |
| user.md | ||
| ARCHITECTURE.md | ||
| ARCHITECTURE.md | ||
| ARCHITECTURE.md | ||
decrypt mapped to read action |
transit.md | |
| ARCHITECTURE.md | ||
list-certs REST route |
sshca.md | |
| sshca.md | ||
| sshca.md | ||
rewrap policy action not specified |
transit.md | |
get-public-key REST route |
transit.md | |
| user.md | ||
| ARCHITECTURE.md | ||
| transit.md | ||
| transit.md | ||
| Cross-cutting | ||
exportable flag with no export operation |
transit.md | |
| user.md | ||
UserKeyConfig type undefined |
user.md | |
ZeroizeKey prerequisite not cross-referenced |
Cross-cutting |