Files

Kyle Isom a80323e320 Add web UI for SSH CA, Transit, and User engines; full security audit and remediation

Web UI: Added browser-based management for all three remaining engines
(SSH CA, Transit, User E2E). Includes gRPC client wiring, handler files,
7 HTML templates, dashboard mount forms, and conditional navigation links.
Fixed REST API routes to match design specs (SSH CA cert singular paths,
Transit PATCH for update-key-config).

Security audit: Conducted full-system audit covering crypto core, all
engine implementations, API servers, policy engine, auth, deployment,
and documentation. Identified 42 new findings (#39-#80) across all
severity levels.

Remediation of all 8 High findings:
- #68: Replaced 14 JSON-injection-vulnerable error responses with safe
  json.Encoder via writeJSONError helper
- #48: Added two-layer path traversal defense (barrier validatePath
  rejects ".." segments; engine ValidateName enforces safe name pattern)
- #39: Extended RLock through entire crypto operations in barrier
  Get/Put/Delete/List to eliminate TOCTOU race with Seal
- #40: Unified ReWrapKeys and seal_config UPDATE into single SQLite
  transaction to prevent irrecoverable data loss on crash during MEK
  rotation
- #49: Added resolveTTL to CA engine enforcing issuer MaxTTL ceiling
  on handleIssue and handleSignCSR
- #61: Store raw ECDH private key bytes in userState for effective
  zeroization on Seal
- #62: Fixed user engine policy resource path from mountPath to
  mountName() so policy rules match correctly
- #69: Added newPolicyChecker helper and passed service-level policy
  evaluation to all 25 typed REST handler engine.Request structs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 22:02:06 -07:00

23 KiB

Raw Blame History

Security Audit Report

Date: 2026-03-16 (design review), 2026-03-17 (full system audit) Scope: Full system — architecture, cryptographic core, all engine implementations, API servers (REST/gRPC), web UI, policy engine, authentication, deployment, documentation

Audit History

2026-03-16: Initial design review of ARCHITECTURE.md, engines/sshca.md, engines/transit.md. Issues #1–#24 identified. Subsequent engine design review of all three engine specs (sshca, transit, user). Issues #25–#38 identified.
2026-03-17: Full system audit covering implementation code, API surfaces, deployment, and documentation. Issues #39–#80 identified.

Design Review Findings (#1–#38)

ARCHITECTURE.md

Strengths

Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth.
Fail-closed design with ErrSealed on all operations when sealed.
Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals.
Default-deny policy engine with priority-based rule evaluation.
Issued leaf private keys never stored — good principle of least persistence.

Issues

1. ~~TLS minimum version should be 1.3, not 1.2~~ RESOLVED

Updated all TLS configurations from tls.VersionTLS12 to tls.VersionTLS13. Removed explicit cipher suite list (TLS 1.3 manages its own).

2. ~~Token cache TTL of 30 seconds is a revocation gap~~ ACCEPTED

Accepted as an explicit trade-off. 30-second cache TTL balances MCIAS load against revocation latency.

3. ~~Admin bypass in policy engine is an all-or-nothing model~~ ACCEPTED

The all-or-nothing admin model is intentional. MCIAS admin users get full access to all engines and operations.

4. ~~Policy rule creation is listed as both Admin-only and User-accessible~~ RESOLVED

Removed duplicate table. gRPC adminRequiredMethods now includes ListPolicies and GetPolicy. All policy CRUD is admin-only across both API surfaces.

5. ~~No integrity protection on barrier entry paths~~ RESOLVED

Barrier now passes entry path as GCM AAD on both Put and Get. Migration tool created for existing entries.

6. ~~Single MEK with no rotation mechanism~~ RESOLVED

Implemented MEK rotation and per-engine DEKs with v2 ciphertext format.

7. No audit logging

Every certificate issuance, sign operation, and policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind.

8. ~~Rate limiting is in-memory only~~ ACCEPTED

Accepted: Argon2id cost parameters are the primary brute-force mitigation.

engines/sshca.md

Strengths

Flat CA model is correct for SSH.
Default principal restriction — users can only sign certs for their own username.
max_ttl enforced server-side.
Key zeroization on seal, no private keys in cert records.
RSA excluded — reduces attack surface.
Signing profiles are the only path to critical options — good privilege separation.
Server-side serial generation with crypto/rand.

Issues

9. ~~User-controllable serial numbers~~ RESOLVED

10. No explicit extension allowlist for host certificates

The extensions field for sign-host accepts an arbitrary map. The engine should define a default extension set and restrict to an allowlist or require admin for non-default extensions.

11. ~~critical_options on user certs is a privilege escalation surface~~ RESOLVED

12. ~~No KRL (Key Revocation List) support~~ RESOLVED

13. ~~Policy resource path uses ca/ prefix instead of sshca/~~ RESOLVED

14. No source-address restriction by default

User certificates should ideally include source-address critical options. Consider a mount-level configuration for default critical options.

engines/transit.md

Strengths

Ciphertext format with version prefix enables clean key rotation.
exportable and allow_deletion immutable after creation.
AAD/context binding for AEAD ciphers.
Rewrap never exposes plaintext to caller.
XChaCha20-Poly1305 with 24-byte nonce — correct for random nonce safety.
trim-key logic is safe. Batch operations hold read lock for atomicity.
500-item batch limit prevents resource exhaustion.

Issues

15. ~~No minimum key version enforcement~~ RESOLVED

16. ~~Key version pruning safety check~~ RESOLVED

17. ~~RSA encryption without specifying padding scheme~~ RESOLVED (RSA removed entirely)

18. ~~HMAC keys used for sign operation~~ RESOLVED

19. ~~No batch encrypt/decrypt operations~~ RESOLVED

20. ~~read action maps to decrypt~~ RESOLVED (granular actions)

21. No rate limiting or quota on cryptographic operations

A compromised token could issue unlimited encrypt/decrypt/sign requests.

engines/user.md

Strengths

HKDF with per-recipient random salt prevents wrapping key reuse.
AES-256-GCM for DEK wrapping (consistent with codebase).
ECDH key agreement with info-string binding prevents key confusion.
Explicit zeroization of all intermediate secrets documented.
Envelope format includes salt per-recipient.

Issues

22. ~~No forward secrecy for stored data~~ RESOLVED (per-engine DEKs)

23. ~~Generic POST /v1/engine/request bypasses typed route middleware~~ RESOLVED

24. ~~No CSRF protection for web UI~~ RESOLVED

25–32. ~~Various spec issues~~ RESOLVED (see detailed history below)

33. ~~Auto-provisioning creates keys for arbitrary usernames~~ RESOLVED

34. ~~No recipient limit on encrypt~~ RESOLVED

35. ~~No re-encryption support for key rotation~~ RESOLVED

36–38. ~~Various spec/cross-cutting issues~~ RESOLVED

Full System Audit (2026-03-17)

Scope: All implementation code, deployment, and documentation.

Cryptographic Core

Strengths

AES-256-GCM with 12-byte random nonces from crypto/rand — correct.
Argon2id with configurable parameters stored in seal_config — correct.
Path-bound AAD in barrier — defense against ciphertext relocation.
Per-engine DEKs with v2 ciphertext format — limits blast radius.
Constant-time comparison via crypto/subtle for all secret comparisons.

Issues

39. TOCTOU race in barrier Seal/Unseal

barrier.go: Seal() zeroizes keys while concurrent operations may hold stale references between RLock release and actual use. A read operation could read the MEK, lose the lock, then use a zeroized key. Requires restructuring to hold the lock through the crypto operation or using atomic pointer swaps.

40. Crash during ReWrapKeys loses all barrier data

seal.go: If the process crashes between re-encrypting all DEKs in ReWrapKeys and updating seal_config with the new MEK, all data becomes irrecoverable — the old MEK is gone and the new MEK was never persisted. This needs a two-phase commit or WAL-based approach.

41. loadKeys errors silently swallowed during unseal

barrier.go: If loadKeys fails to decrypt DEK entries (e.g., corrupt barrier_keys rows), errors are silently ignored and the keys map may be incomplete. Subsequent operations on engine mounts with missing DEKs will fail with confusing errors instead of failing at unseal time.

42. No AAD binding on MEK encryption with KWK

seal.go: The MEK is encrypted with the KWK (derived from password via Argon2id) using crypto.Encrypt(kwk, mek, nil). There is no AAD binding this ciphertext to its purpose. An attacker who can swap encrypted_mek in seal_config could substitute a different ciphertext (though the practical impact is limited since the KWK is password-derived).

43. Barrier List uses SQL LIKE with unescaped prefix

barrier.go: The List method passes the prefix directly into a SQL LIKE clause without escaping % and _ characters. A path containing these characters would match unintended entries.

44. System key rotation query may miss entries

barrier.go: RotateKey for the system key excludes all engine/% paths, but entries at shorter paths encrypted with the system key could be missed if they don't follow the expected naming convention.

45. Zeroize loop may be optimized away by compiler

crypto.go: The Zeroize function uses a simple for loop to zero memory. The Go compiler may optimize this away if the slice is not used after zeroization. Use crypto/subtle.XORBytes or a volatile-equivalent pattern.

46. SQLite PRAGMAs only applied to first connection

db.go: PRAGMA journal_mode, foreign_keys, and busy_timeout are applied once at open time but database/sql may open additional connections in its pool that don't receive these PRAGMAs. Use a ConnInitHook or _pragma DSN parameters.

47. Plaintext not zeroized after re-encryption during key rotation

barrier.go: During RotateKey, decrypted plaintext is held in a []byte but not zeroized after re-encryption. This leaves plaintext in memory longer than necessary.

Engine Implementations

CA (PKI) Engine

48. Path traversal via unsanitized issuer names

ca/ca.go: Issuer names from user input are concatenated directly into barrier paths (e.g., engine/ca/{mount}/issuers/{name}/...). A name containing ../ could write to arbitrary barrier locations. All engines should validate mount and entity names against a strict pattern (alphanumeric, hyphens, underscores).

49. No TTL enforcement against issuer MaxTTL in issuance

ca/ca.go: The handleIssue and handleSignCSR operations accept a TTL from the user but do not enforce the issuer's MaxTTL ceiling. A user can request arbitrarily long certificate lifetimes.

50. Non-admin users can override key usages

ca/ca.go: The key_usages and ext_key_usages fields are accepted from non-admin users. A user could request a certificate with cert sign or crl sign key usage, potentially creating an intermediate CA certificate.

51. Certificate renewal does not revoke original

ca/ca.go: handleRenew creates a new certificate but does not revoke the original. This creates duplicate valid certificates for the same identity, which complicates revocation and weakens the security model.

52. Leaf private key in API response not zeroized

ca/ca.go: After marshalling the leaf private key to PEM for the API response, the in-memory key material is not zeroized. The key persists in memory until garbage collected.

SSH CA Engine

53. HandleRequest uses exclusive write lock for all operations

sshca/sshca.go: All operations (including reads like get-cert, list-certs, get-profile) acquire a write lock (mu.Lock()), serializing the entire engine. Read operations should use mu.RLock().

54. Host signing is default-allow without policy rules

sshca/sshca.go: When no policy rules match a host signing request, the engine allows it by default. This contradicts the default-deny principle established in the engineering standards and ARCHITECTURE.md.

55. SSH certificate serial collision risk

sshca/sshca.go: Random uint64 serials have a birthday collision probability of ~50% at ~4 billion certificates. While far beyond typical scale, the engine should detect and retry on collision.

56. KRL is not signed

sshca/sshca.go: The generated KRL is not cryptographically signed. An attacker who can intercept the KRL distribution (e.g., MITM on the GET /v1/sshca/{mount}/krl endpoint, though TLS mitigates this) could serve a truncated KRL that omits revoked certificates.

57. PEM key bytes not zeroized after parsing in Unseal

sshca/sshca.go: After reading the CA private key PEM from the barrier and parsing it, the raw PEM bytes are not zeroized.

Transit Engine

58. Default-allow for non-admin users contradicts default-deny

transit/transit.go: Similar to #54 — when no policy rules match a transit operation, the engine allows it. This should default to deny.

59. Negative ciphertext version not rejected

transit/transit.go: parseVersionedData does not reject negative version numbers. A crafted ciphertext with a negative version could cause unexpected behavior in version lookups.

60. ECDSA big.Int internals not fully zeroized

transit/transit.go: The local zeroizeKey clears D on ECDSA keys but not PublicKey.X/Y. While the public key is not secret, the big.Int internal representation may retain data from the private key computation.

User E2E Encryption Engine

61. ECDH private key zeroization is ineffective

user/user.go: key.Bytes() returns a copy of the private key bytes. Zeroizing this copy does not clear the original key material inside the *ecdh.PrivateKey struct. The actual private key remains in memory.

62. Policy resource path uses mountPath instead of mount name

user/user.go: Policy checks use the full mount path instead of the mount name. If the mount path differs from the name (which it does — paths include the engine/ prefix), policy rules written against mount names will never match.

63. No role checks on decrypt, re-encrypt, and rotate-key

user/user.go: The handleDecrypt, handleReEncrypt, and handleRotateKey operations have no role checks. A guest-role user (who should have restricted access per MCIAS role definitions) can perform these operations.

64. Initialize does not acquire mutex

user/user.go: The Initialize method writes to shared state without holding the mutex, creating a data race if called concurrently.

65. handleEncrypt uses stale state after releasing lock

user/user.go: After releasing the write lock during encryption, the handler continues to use pointers to user state that may have been modified by another goroutine.

66. handleReEncrypt uses manual lock without defer

user/user.go: Manual RLock/Unlock calls without defer — a panic between lock and unlock will leak the lock, deadlocking the engine.

67. No sealed-state check in user HandleRequest

user/user.go: Unlike other engines, the user engine's HandleRequest does not check if the engine is sealed. A request reaching the engine after seal but before the HTTP layer catches it could panic on nil map access.

API Servers

REST API

68. JSON injection via unsanitized error messages

server/routes.go: Error messages are concatenated into JSON string literals using fmt.Sprintf without JSON escaping. An error message containing " or \ could break the JSON structure, and a carefully crafted input could inject additional JSON fields.

69. Typed REST handlers bypass policy engine

server/routes.go: The typed REST handlers for CA certificates, SSH CA operations, and user engine operations call the engine's HandleRequest directly without wrapping a CheckPolicy callback. Only the generic /v1/engine/request endpoint passes the policy checker. This means typed routes rely entirely on the engine's internal policy check, which (per #54, #58) may default-allow.

70. RenewCert gRPC RPC has no corresponding REST route

server/routes.go: The CAService/RenewCert gRPC RPC exists but has no REST endpoint, violating the API sync rule.

gRPC API

71. PKIService/GetCRL missing from sealRequiredMethods

grpcserver/server.go: The GetCRL RPC can be called even when the service is sealed. While this is arguably intentional (public endpoint), it is inconsistent with the interceptor design where all RPCs are gated.

Policy Engine

72. Policy rule ID allows path traversal

policy/policy.go: Policy rule IDs are not validated. An ID containing / or .. could write to arbitrary paths in the barrier, since rules are stored at policy/rules/{id}.

73. filepath.Match does not support ** recursive globs

policy/policy.go: Policy resource patterns use filepath.Match, which does not support ** for recursive directory matching. Administrators writing rules like engine/**/certs/* will find they don't match as expected.

Authentication

74. Token validation cache grows without bound

auth/auth.go: The token cache has no size limit or eviction of expired entries beyond lazy expiry checks. Under sustained load with many unique tokens, this is an unbounded memory growth vector.

Web UI

75. CSRF token not bound to user session

webserver/csrf.go: CSRF tokens are signed with a server-wide HMAC key but not bound to the user's session. Any valid server-generated CSRF token works for any user, reducing CSRF protection to a server-origin check rather than a session-integrity check.

76. Login cookie missing explicit expiry

webserver/routes.go: The metacrypt_token cookie has no MaxAge or Expires, making it a session cookie that persists until the browser is closed. Consider an explicit TTL matching the MCIAS token lifetime.

77. Several POST handlers missing MaxBytesReader

webserver/routes.go, webserver/user.go, webserver/sshca.go: handlePolicyCreate, handlePolicyDelete, handleUserRegister, handleUserRotateKey, SSH CA cert revoke/delete — all accept POST bodies without MaxBytesReader, allowing arbitrarily large request bodies.

Deployment & Documentation

78. ExecReload sends SIGHUP but no handler exists

deploy/systemd/metacrypt.service, deploy/systemd/metacrypt-web.service: Both units define ExecReload=/bin/kill -HUP $MAINPID, but the Go binary does not handle SIGHUP. A systemctl reload would crash the process.

79. Dockerfiles use golang:1.23-alpine but go.mod requires Go 1.25

Dockerfile.api, Dockerfile.web: The builder stage uses Go 1.23 but the module requires Go 1.25. Builds will fail.

80. ARCHITECTURE.md system overview says "TLS 1.2+" but code enforces TLS 1.3

ARCHITECTURE.md:33: The ASCII diagram still says "TLS 1.2+" despite issue #1 being resolved in code. The diagram was not updated.

Open Issues (Unresolved)

Open — Critical

None.

Open — High

#	Issue	Location
39	TOCTOU race in barrier Seal/Unseal allows use of zeroized keys	`barrier/barrier.go`
40	Crash during `ReWrapKeys` makes all barrier data irrecoverable	`seal/seal.go`
48	Path traversal via unsanitized issuer/entity names in all engines	`ca/ca.go`, all engines
49	No TTL enforcement against issuer MaxTTL in cert issuance	`ca/ca.go`
61	ECDH private key zeroization is ineffective (`Bytes()` returns copy)	`user/user.go`
62	Policy resource path uses mountPath instead of mount name	`user/user.go`
68	JSON injection via unsanitized error messages in REST API	`server/routes.go`
69	Typed REST handlers bypass policy engine	`server/routes.go`

Open — Medium

#	Issue	Location
7	No audit logging for cryptographic operations	ARCHITECTURE.md
10	No extension allowlist for SSH host certificates	`sshca/sshca.go`
21	No rate limiting on transit cryptographic operations	`transit/transit.go`
41	`loadKeys` errors silently swallowed during unseal	`barrier/barrier.go`
42	No AAD binding on MEK encryption with KWK	`seal/seal.go`
43	Barrier `List` SQL LIKE with unescaped prefix	`barrier/barrier.go`
46	SQLite PRAGMAs only applied to first connection	`db/db.go`
50	Non-admin users can override key usages (cert sign, CRL sign)	`ca/ca.go`
51	Certificate renewal does not revoke original	`ca/ca.go`
53	SSH CA write-locks all operations including reads	`sshca/sshca.go`
54	SSH CA host signing is default-allow (contradicts default-deny)	`sshca/sshca.go`
58	Transit default-allow contradicts default-deny	`transit/transit.go`
59	Negative ciphertext version not rejected in transit	`transit/transit.go`
63	No role checks on user decrypt/re-encrypt/rotate	`user/user.go`
64	User engine Initialize has no mutex	`user/user.go`
65	handleEncrypt uses stale state after lock release	`user/user.go`
66	handleReEncrypt manual lock without defer (leak risk)	`user/user.go`
67	No sealed-state check in user HandleRequest	`user/user.go`
70	`RenewCert` has no REST route (API sync violation)	`server/routes.go`
72	Policy rule ID allows path traversal in barrier	`policy/policy.go`
73	`filepath.Match` does not support `**` recursive globs	`policy/policy.go`
74	Token validation cache grows without bound	`auth/auth.go`
78	systemd `ExecReload` sends SIGHUP with no handler	`deploy/systemd/`
79	Dockerfiles use Go 1.23 but module requires Go 1.25	`Dockerfile.*`

Open — Low

#	Issue	Location
14	No source-address restriction by default in SSH certs	`sshca/sshca.go`
44	System key rotation query may miss entries	`barrier/barrier.go`
45	`Zeroize` loop may be optimized away by compiler	`crypto/crypto.go`
47	Plaintext not zeroized after re-encryption in rotation	`barrier/barrier.go`
52	Leaf private key in API response not zeroized	`ca/ca.go`
55	SSH certificate serial collision risk at scale	`sshca/sshca.go`
56	KRL is not cryptographically signed	`sshca/sshca.go`
57	PEM key bytes not zeroized after parsing in SSH CA	`sshca/sshca.go`
60	ECDSA big.Int internals not fully zeroized	`transit/transit.go`
71	`GetCRL` missing from `sealRequiredMethods`	`grpcserver/server.go`
75	CSRF token not bound to user session	`webserver/csrf.go`
76	Login cookie missing explicit expiry	`webserver/routes.go`
77	POST handlers missing `MaxBytesReader`	`webserver/`
80	ARCHITECTURE.md diagram still says "TLS 1.2+"	`ARCHITECTURE.md`

Accepted

#	Issue	Rationale
2	Token cache 30s revocation gap	Trade-off: MCIAS load vs revocation latency
3	Admin all-or-nothing access	Intentional design
8	Unseal rate limit resets on restart	Argon2id is the primary mitigation

Resolved Issues (#1–#38)

All design review findings from the 2026-03-16 audit have been resolved or accepted. See the Audit History section. The following issues were resolved:

Critical (all resolved): #4 (policy auth contradiction), #9 (user-controllable SSH serials), #13 (policy path collision), #37 (adminOnlyOperations name collision).

High (all resolved): #5 (no path AAD), #6 (single MEK), #11 (critical_options unrestricted), #12 (no KRL), #15 (no min key version), #17 (RSA padding), #22 (no per-engine DEKs), #28 (HMAC not versioned), #30 (max_key_versions unclear), #33 (auto-provision arbitrary usernames).

Medium (all resolved or accepted): #1, #2, #3, #8, #20, #23, #24, #25, #26, #27, #29, #31, #34.

Low (all resolved): #18, #19, #32, #35, #36, #38.

Priority Summary

Priority	Count	Status
High	8	Open
Medium	21	Open
Low	14	Open
Accepted	3	Closed
Resolved	38	Closed

Recommendation: Address all High findings before the next deployment. The path traversal (#48, #72), default-allow policy violations (#54, #58, #69), and the barrier TOCTOU race (#39) are the most urgent. The JSON injection (#68) is exploitable if error messages contain user-controlled input. The user engine issues (#61–#67) should be addressed as a batch since they interact with each other.

23 KiB Raw Blame History Unescape Escape

Security Audit Report

Audit History

Design Review Findings (#1–#38)

ARCHITECTURE.md

Strengths

Issues

engines/sshca.md

Strengths

Issues

engines/transit.md

Strengths

Issues

engines/user.md

Strengths

Issues

Full System Audit (2026-03-17)

Cryptographic Core

Strengths

Issues

Engine Implementations

CA (PKI) Engine

SSH CA Engine

Transit Engine

User E2E Encryption Engine

API Servers

REST API

gRPC API

Policy Engine

Authentication

Web UI

Deployment & Documentation

Open Issues (Unresolved)

Open — Critical

Open — High

Open — Medium

Open — Low

Accepted

Resolved Issues (#1–#38)

Priority Summary

23 KiB

Raw Blame History