Files

Kyle Isom 128f5abc4d Update engine specs, audit doc, and server tests for SSH CA, transit, and user engines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-16 20:16:23 -07:00

20 KiB

Raw Blame History

Security Audit Report

Date: 2026-03-16 Scope: ARCHITECTURE.md, engines/sshca.md, engines/transit.md

ARCHITECTURE.md

Strengths

Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth.
Fail-closed design with ErrSealed on all operations when sealed.
Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals.
Default-deny policy engine with priority-based rule evaluation.
Issued leaf private keys never stored — good principle of least persistence.

Issues

1. ~~TLS minimum version should be 1.3, not 1.2~~ RESOLVED

Updated all TLS configurations (HTTP server, gRPC server, web server, vault client, Go client library, CLI commands) from tls.VersionTLS12 to tls.VersionTLS13. Removed explicit cipher suite list from HTTP server (TLS 1.3 manages its own). Updated ARCHITECTURE.md TLS section and threat mitigations table.

2. ~~Token cache TTL of 30 seconds is a revocation gap~~ ACCEPTED

Accepted as an explicit trade-off. The 30-second cache TTL balances MCIAS load against revocation latency. For this system's scale and threat model, the window is acceptable.

3. ~~Admin bypass in policy engine is an all-or-nothing model~~ ACCEPTED

The all-or-nothing admin model is intentional by design. MCIAS admin users get full access to all engines and operations. This is the desired behavior for this system.

4. ~~Policy rule creation is listed as both Admin-only and User-accessible~~ RESOLVED

The second policy table in ARCHITECTURE.md incorrectly listed User auth; removed the duplicate. gRPC adminRequiredMethods now includes ListPolicies and GetPolicy to match REST behavior. All policy CRUD is admin-only across both API surfaces.

5. ~~No integrity protection on barrier entry paths~~ RESOLVED

Updated crypto.Encrypt/crypto.Decrypt to accept an additionalData parameter. The barrier now passes the entry path as GCM AAD on both Put and Get, binding each ciphertext to its storage path. Seal operations pass nil (no path context). Added TestEncryptDecryptWithAAD covering correct-AAD, wrong-AAD, and nil-AAD cases. Existing barrier entries will fail to decrypt after this change — a one-off migration tool is needed to re-encrypt all entries (decrypt with nil AAD under old code, re-encrypt with path AAD).

6. ~~Single MEK with no rotation mechanism~~ RESOLVED

Implemented MEK rotation and per-engine DEKs. The v2 ciphertext format (0x02) embeds a key ID that identifies which DEK encrypted each entry. MEK rotation (POST /v1/barrier/rotate-mek) re-wraps all DEKs without re-encrypting data. DEK rotation (POST /v1/barrier/rotate-key) re-encrypts entries under a specific key. A migration endpoint converts v1 entries to v2 format. The barrier_keys table stores MEK-wrapped DEKs with version tracking.

7. No audit logging

Acknowledged as future work, but for a cryptographic service this is a significant gap. Every certificate issuance, every sign operation, every policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind.

8. ~~Rate limiting is in-memory only~~ ACCEPTED

The in-memory rate limit protects against remote brute-force over the network, which is the realistic threat. Persisting the counter in the database would not add tamper resistance: the barrier is sealed during unseal attempts so encrypted storage is unavailable, and the unencrypted database could be reset by an attacker with disk access. An attacker who can restart the service already has local system access, making the rate limit moot regardless of persistence. Argon2id cost parameters (128 MiB memory-hard) are the primary brute-force mitigation and are stored in seal_config.

engines/sshca.md

Strengths

Flat CA model is correct for SSH (no intermediate hierarchy needed).
Default principal restriction (users can only sign certs for their own username) is the right default.
max_ttl enforced server-side — good.
Key zeroization on seal, no private keys in cert records.

Issues

9. ~~User-controllable serial numbers~~ RESOLVED

Removed the optional serial field from both sign-host and sign-user request data. Serials are always generated server-side using crypto/rand (64-bit). Updated flows and security considerations in sshca.md.

10. No explicit extension allowlist for host certificates

The extensions field for sign-host accepts an arbitrary map. SSH extensions have security implications (e.g., permit-pty, permit-port-forwarding, permit-user-rc). Without an allowlist, a user could request extensions that grant more capabilities than intended. The engine should define a default extension set and either:

Restrict to an allowlist, or
Require admin for non-default extensions.

11. ~~critical_options on user certs is a privilege escalation surface~~ RESOLVED

Removed critical_options from the sign-user request. Critical options can only be applied via admin-defined signing profiles, which are policy-gated (sshca/{mount}/profile/{name}, action read). Profile CRUD is admin-only. Profiles specify critical options, extensions, optional max TTL, and optional principal restrictions. Security considerations updated accordingly.

12. ~~No KRL (Key Revocation List) support~~ RESOLVED

Added a full KRL section to sshca.md covering: in-memory KRL generation from revoked serials, barrier persistence at engine/sshca/{mount}/krl.bin, automatic rebuild on revoke/delete/unseal, a public GET /v1/sshca/{mount}/krl endpoint with ETag and Cache-Control headers, GetKRL gRPC RPC, and a pull-based distribution model with example sshd_config and cron fetch.

13. ~~Policy resource path uses ca/ prefix instead of sshca/~~ RESOLVED

Updated policy check paths in sshca.md from ca/{mount}/id/... to sshca/{mount}/id/... for both sign-host and sign-user flows, eliminating the namespace collision with the CA (PKI) engine.

14. No source-address restriction by default

User certificates should ideally include source-address critical options to limit where they can be used from. At minimum, consider a mount-level configuration for default critical options that get applied to all user certs.

engines/transit.md

Strengths

Ciphertext format with version prefix enables clean key rotation.
exportable and allow_deletion immutable after creation — prevents policy weakening.
AAD/context binding for AEAD ciphers.
Rewrap never exposes plaintext to caller.

Issues

15. ~~No minimum key version enforcement~~ RESOLVED

Added min_decryption_version per key (default 1). Decryption requests for versions below the minimum are rejected. New update-key-config operation (admin-only) advances the minimum (can only increase, cannot exceed current version). New trim-key operation permanently deletes versions older than the minimum. Both have corresponding gRPC RPCs and REST endpoints. The rotation cycle is documented: rotate → rewrap → advance min → trim.

16. ~~Key version pruning with max_key_versions has no safety check~~ RESOLVED

Added explicit max_key_versions behavior: auto-pruning during rotate-key only deletes versions strictly less than min_decryption_version. If the version count exceeds the limit but no eligible candidates remain, a warning is returned. This ensures pruning never destroys versions that may still have unrewrapped ciphertext. See also #30.

17. ~~RSA encryption without specifying padding scheme~~ RESOLVED

RSA key types (rsa-2048, rsa-4096) removed entirely from the transit engine. Asymmetric encryption belongs in the user engine (via ECDH); RSA signing offers no advantage over Ed25519/ECDSA. crypto/rsa removed from dependencies. Rationale documented in key types section and security considerations.

18. ~~HMAC keys used for sign operation is confusing~~ RESOLVED

sign and verify are now restricted to asymmetric key types (Ed25519, ECDSA). HMAC keys are rejected with an error — HMAC must use the dedicated hmac operation. Policy actions are already split: sign, verify, and hmac are separate granular actions, all matched by any.

19. ~~No batch encrypt/decrypt operations~~ RESOLVED

Added batch-encrypt, batch-decrypt, and batch-rewrap operations to the transit engine plan. Each targets a single named key with an array of items; results are returned in order with per-item errors (partial success model). An optional reference field lets callers correlate results with source records. Policy is checked once per batch. Added corresponding gRPC RPCs and REST endpoints. operationAction maps batch variants to the same granular actions as their single counterparts.

20. ~~read action maps to decrypt and verify — semantics are misleading~~ RESOLVED

Replaced the coarse read/write action model with granular per-operation actions: encrypt, decrypt, sign, verify, hmac for cryptographic operations; read for metadata retrieval; write for key management; admin for administrative operations. Added any action that matches all non-admin actions. Added LintRule validation that rejects unknown effects and actions. CreateRule now validates before storing. Updated operationAction mapping and all tests.

21. No rate limiting or quota on cryptographic operations

A compromised or malicious user token could issue unlimited encrypt/decrypt/sign requests, potentially using the service as a cryptographic oracle. Consider per-user rate limits on transit operations.

Cross-Cutting Issues

22. ~~No forward secrecy for stored data~~ RESOLVED: Per-engine DEKs limit blast radius — compromise of one DEK only exposes that engine's data, not the entire barrier. MEK compromise still exposes all DEKs, but MEK rotation enables periodic re-keying. Each engine mount gets its own DEK created automatically; a "system" DEK protects non-engine data. v2 ciphertext format embeds key IDs for DEK lookup.

23. ~~Generic POST /v1/engine/request bypasses typed route middleware~~ RESOLVED: Added an adminOnlyOperations map to handleEngineRequest that mirrors the admin gates on typed REST routes (e.g. create-issuer, delete-cert, create-key, rotate-key, create-profile, provision). Non-admin users are rejected with 403 before policy evaluation or engine dispatch. The v1 gRPC Execute RPC is defined in the proto but not registered in the server — only v2 typed RPCs are used, so the gRPC surface is not affected. Tests cover both admin and non-admin paths through the generic endpoint.

24. ~~No CSRF protection mentioned for web UI~~ RESOLVED: Added signed double-submit cookie CSRF protection. A per-server HMAC secret signs random nonce-based tokens. Every form includes a {{csrfField}} hidden input; a middleware validates that the form field matches the cookie and has a valid HMAC signature on all POST/PUT/PATCH/DELETE requests. Session cookie upgraded from SameSite=Lax to SameSite=Strict. CSRF cookie is also HttpOnly, Secure, SameSite=Strict. Tests cover token generation/validation, cross-secret rejection, middleware pass/block/mismatch scenarios.

Engine Design Review (2026-03-16)

Scope: engines/sshca.md, engines/transit.md, engines/user.md (patched specs)

engines/sshca.md

Strengths

RSA excluded — reduces attack surface, correct for SSH CA use case.
Detailed Go code snippets for Initialize, sign-host, sign-user flows.
KRL custom implementation correctly identified that x/crypto/ssh lacks KRL builders.
Signing profiles are the only path to critical options — good privilege separation.
Server-side serial generation with crypto/rand — no user-controllable serials.

Issues

25. ~~Missing list-certs REST route~~ RESOLVED

Added GET /v1/sshca/{mount}/certs to the REST endpoints table and route registration code block. API sync restored.

26. ~~KRL section type description contradicts pseudocode~~ RESOLVED

Fixed the description block to use KRL_SECTION_CERTIFICATES (0x01) for the outer section type, matching the pseudocode and the OpenSSH PROTOCOL.krl spec.

27. ~~Policy check after certificate construction in sign-host~~ RESOLVED

Reordered both sign-host and sign-user flows to perform the policy check before generating the serial and building the certificate. Serial generation now only happens after authorization succeeds.

engines/transit.md

Strengths

XChaCha20-Poly1305 (not ChaCha20-Poly1305) — correct for random nonce safety.
All nonce sizes, hash algorithms, and signature encodings now specified.
trim-key logic is detailed and safe (no-op when min_decryption_version is 1).
Batch operations hold a read lock for atomicity with respect to key rotation.
500-item batch limit prevents resource exhaustion.

Issues

28. ~~HMAC output not versioned — unverifiable after key rotation~~ RESOLVED

HMAC output now uses the same metacrypt:v{version}:{base64} format as ciphertext and signatures. Verification parses the version prefix, loads the corresponding key (subject to min_decryption_version), and uses hmac.Equal for constant-time comparison.

29. ~~rewrap policy action not specified~~ RESOLVED

rewrap and batch-rewrap now map to the decrypt action — rewrap internally decrypts and re-encrypts, so the caller must have decrypt permission. Batch variants map to the same action as their single counterparts. Documented in the authorization section.

30. ~~max_key_versions interaction with min_decryption_version unclear~~ RESOLVED

Added explicit max_key_versions behavior section. Pruning happens during rotate-key and only deletes versions strictly less than min_decryption_version. If the limit is exceeded but no eligible candidates remain, a warning is returned. This also resolves audit finding #16.

31. ~~Missing get-public-key REST route~~ RESOLVED

Added GET /v1/transit/{mount}/keys/{name}/public-key to the REST endpoints table and route registration code block. API sync restored.

32. ~~exportable flag with no export operation~~ RESOLVED

Removed the exportable flag from create-key. Transit's value proposition is that keys never leave the service. If export is needed for migration, a dedicated admin-only operation can be added later with audit logging.

engines/user.md

Strengths

HKDF with per-recipient random salt — prevents wrapping key reuse across messages.
AES-256-GCM for DEK wrapping (consistent with codebase, avoids new primitive).
ECDH key agreement with info-string binding prevents key confusion.
Explicit zeroization of all intermediate secrets documented.
Envelope format includes salt per-recipient — correct for HKDF security.

Issues

33. ~~Auto-provisioning creates keys for arbitrary usernames~~ RESOLVED

The encrypt flow now validates recipient usernames against MCIAS via auth.ValidateUsername before auto-provisioning. Non-existent usernames are rejected with an error, preventing barrier pollution.

34. ~~No recipient limit on encrypt~~ RESOLVED

Added a maxRecipients = 100 limit. Requests exceeding this limit are rejected with 400 Bad Request before any ECDH computation.

35. ~~No re-encryption support for key rotation~~ RESOLVED

Added a re-encrypt operation that decrypts an envelope and re-encrypts it with current key pairs for all recipients. This enables safe key rotation: re-encrypt all stored envelopes first, then call rotate-key. Added to HandleRequest dispatch, gRPC service, REST endpoints, and route registration.

36. ~~UserKeyConfig type undefined~~ RESOLVED

Defined UserKeyConfig struct with Algorithm, CreatedAt, and AutoProvisioned fields in the in-memory state section.

Cross-Cutting Issues (Engine Designs)

37. ~~adminOnlyOperations name collision blocks user engine rotate-key~~ RESOLVED

Changed the adminOnlyOperations map from flat operation names to engine-type-qualified keys (engineType:operation, e.g. "transit:rotate-key"). The generic endpoint now resolves the mount's engine type via GetMount before checking the map. Added tests verifying that rotate-key on a user mount succeeds for non-admin users while rotate-key on a transit mount correctly requires admin.

38. ~~engine.ZeroizeKey helper prerequisite not cross-referenced~~ RESOLVED

Added prerequisite step to both transit and user implementation steps referencing engines/sshca.md step 1 for the engine.ZeroizeKey shared helper.

Priority Summary

Priority	Issue	Location
~~Critical~~	~~#4 — Policy auth contradiction (admin vs user)~~ RESOLVED	ARCHITECTURE.md
~~Critical~~	~~#9 — User-controllable SSH cert serials~~ RESOLVED	sshca.md
~~Critical~~	~~#13 — Policy path collision (`ca/` vs `sshca/`)~~ RESOLVED	sshca.md
~~Critical~~	~~#37 — `adminOnlyOperations` name collision blocks user `rotate-key`~~ RESOLVED	Cross-cutting
~~High~~	~~#5 — No path AAD in barrier encryption~~ RESOLVED	ARCHITECTURE.md
~~High~~	~~#12 — No KRL distribution for SSH revocation~~ RESOLVED	sshca.md
~~High~~	~~#15 — No min key version for transit rotation~~ RESOLVED	transit.md
~~High~~	~~#17 — RSA padding scheme unspecified~~ RESOLVED	transit.md
~~High~~	~~#11 — `critical_options` not restricted~~ RESOLVED	sshca.md
~~High~~	~~#6 — Single MEK with no rotation~~ RESOLVED	ARCHITECTURE.md
~~High~~	~~#22 — No forward secrecy / per-engine DEKs~~ RESOLVED	Cross-cutting
~~High~~	~~#28 — HMAC output not versioned~~ RESOLVED	transit.md
~~High~~	~~#30 — `max_key_versions` vs `min_decryption_version` unclear~~ RESOLVED	transit.md
~~High~~	~~#33 — Auto-provision creates keys for arbitrary usernames~~ RESOLVED	user.md
~~Medium~~	~~#2 — Token cache revocation gap~~ ACCEPTED	ARCHITECTURE.md
~~Medium~~	~~#3 — Admin all-or-nothing access~~ ACCEPTED	ARCHITECTURE.md
~~Medium~~	~~#8 — Unseal rate limit resets on restart~~ ACCEPTED	ARCHITECTURE.md
~~Medium~~	~~#20 — `decrypt` mapped to `read` action~~ RESOLVED	transit.md
~~Medium~~	~~#24 — No CSRF protection for web UI~~ RESOLVED	ARCHITECTURE.md
~~Medium~~	~~#25 — Missing `list-certs` REST route~~ RESOLVED	sshca.md
~~Medium~~	~~#26 — KRL section type description error~~ RESOLVED	sshca.md
~~Medium~~	~~#27 — Policy check after cert construction~~ RESOLVED	sshca.md
~~Medium~~	~~#29 — `rewrap` policy action not specified~~ RESOLVED	transit.md
~~Medium~~	~~#31 — Missing `get-public-key` REST route~~ RESOLVED	transit.md
~~Medium~~	~~#34 — No recipient limit on encrypt~~ RESOLVED	user.md
~~Low~~	~~#1 — TLS 1.2 vs 1.3~~ RESOLVED	ARCHITECTURE.md
~~Low~~	~~#19 — No batch transit operations~~ RESOLVED	transit.md
~~Low~~	~~#18 — HMAC/sign semantic confusion~~ RESOLVED	transit.md
~~Medium~~	~~#23 — Generic endpoint bypasses typed route middleware~~ RESOLVED	Cross-cutting
~~Low~~	~~#32 — `exportable` flag with no export operation~~ RESOLVED	transit.md
~~Low~~	~~#35 — No re-encryption support for user key rotation~~ RESOLVED	user.md
~~Low~~	~~#36 — `UserKeyConfig` type undefined~~ RESOLVED	user.md
~~Low~~	~~#38 — `ZeroizeKey` prerequisite not cross-referenced~~ RESOLVED	Cross-cutting

20 KiB Raw Blame History

Security Audit Report

ARCHITECTURE.md

Strengths

Issues

engines/sshca.md

Strengths

Issues

engines/transit.md

Strengths

Issues

Cross-Cutting Issues

Engine Design Review (2026-03-16)

engines/sshca.md

Strengths

Issues

engines/transit.md

Strengths

Issues

engines/user.md

Strengths

Issues

Cross-Cutting Issues (Engine Designs)

Priority Summary

20 KiB

Raw Blame History