# Security Audit Report

**Date**: 2026-03-16
**Scope**: ARCHITECTURE.md, engines/sshca.md, engines/transit.md

---

## ARCHITECTURE.md

### Strengths

- Solid key hierarchy: password → Argon2id → KWK → MEK → per-entry encryption. Defense-in-depth.
- Fail-closed design with `ErrSealed` on all operations when sealed.
- Fresh nonce per write, constant-time comparisons, explicit zeroization — all correct fundamentals.
- Default-deny policy engine with priority-based rule evaluation.
- Issued leaf private keys never stored — good principle of least persistence.

### Issues

**1. ~~TLS minimum version should be 1.3, not 1.2~~ RESOLVED**

Updated all TLS configurations (HTTP server, gRPC server, web server, vault client, Go client library, CLI commands) from `tls.VersionTLS12` to `tls.VersionTLS13`. Removed explicit cipher suite list from HTTP server (TLS 1.3 manages its own). Updated ARCHITECTURE.md TLS section and threat mitigations table.

**2. ~~Token cache TTL of 30 seconds is a revocation gap~~ ACCEPTED**

Accepted as an explicit trade-off. The 30-second cache TTL balances MCIAS load against revocation latency. For this system's scale and threat model, the window is acceptable.

**3. ~~Admin bypass in policy engine is an all-or-nothing model~~ ACCEPTED**

The all-or-nothing admin model is intentional by design. MCIAS admin users get full access to all engines and operations. This is the desired behavior for this system.

**4. ~~Policy rule creation is listed as both Admin-only and User-accessible~~ RESOLVED**

The second policy table in ARCHITECTURE.md incorrectly listed User auth; removed the duplicate. gRPC `adminRequiredMethods` now includes `ListPolicies` and `GetPolicy` to match REST behavior. All policy CRUD is admin-only across both API surfaces.

**5. ~~No integrity protection on barrier entry paths~~ RESOLVED**

Updated `crypto.Encrypt`/`crypto.Decrypt` to accept an `additionalData` parameter. The barrier now passes the entry path as GCM AAD on both `Put` and `Get`, binding each ciphertext to its storage path. Seal operations pass `nil` (no path context). Added `TestEncryptDecryptWithAAD` covering correct-AAD, wrong-AAD, and nil-AAD cases. Existing barrier entries will fail to decrypt after this change — a one-off migration tool is needed to re-encrypt all entries (decrypt with nil AAD under old code, re-encrypt with path AAD).

**6. ~~Single MEK with no rotation mechanism~~ RESOLVED**

Implemented MEK rotation and per-engine DEKs. The v2 ciphertext format (`0x02`) embeds a key ID that identifies which DEK encrypted each entry. MEK rotation (`POST /v1/barrier/rotate-mek`) re-wraps all DEKs without re-encrypting data. DEK rotation (`POST /v1/barrier/rotate-key`) re-encrypts entries under a specific key. A migration endpoint converts v1 entries to v2 format. The `barrier_keys` table stores MEK-wrapped DEKs with version tracking.

**7. No audit logging**

Acknowledged as future work, but for a cryptographic service this is a significant gap. Every certificate issuance, every sign operation, every policy change should be logged with caller identity, timestamp, and operation details. Without this, incident response is blind.

**8. ~~Rate limiting is in-memory only~~ ACCEPTED**

The in-memory rate limit protects against remote brute-force over the network, which is the realistic threat. Persisting the counter in the database would not add tamper resistance: the barrier is sealed during unseal attempts so encrypted storage is unavailable, and the unencrypted database could be reset by an attacker with disk access. An attacker who can restart the service already has local system access, making the rate limit moot regardless of persistence. Argon2id cost parameters (128 MiB memory-hard) are the primary brute-force mitigation and are stored in `seal_config`.

---

## engines/sshca.md

### Strengths

- Flat CA model is correct for SSH (no intermediate hierarchy needed).
- Default principal restriction (users can only sign certs for their own username) is the right default.
- `max_ttl` enforced server-side — good.
- Key zeroization on seal, no private keys in cert records.

### Issues

**9. ~~User-controllable serial numbers~~ RESOLVED**

Removed the optional `serial` field from both `sign-host` and `sign-user` request data. Serials are always generated server-side using `crypto/rand` (64-bit). Updated flows and security considerations in sshca.md.

**10. No explicit extension allowlist for host certificates**

The `extensions` field for `sign-host` accepts an arbitrary map. SSH extensions have security implications (e.g., `permit-pty`, `permit-port-forwarding`, `permit-user-rc`). Without an allowlist, a user could request extensions that grant more capabilities than intended. The engine should define a default extension set and either:
- Restrict to an allowlist, or
- Require admin for non-default extensions.

**11. ~~`critical_options` on user certs is a privilege escalation surface~~ RESOLVED**

Removed `critical_options` from the `sign-user` request. Critical options can only be applied via admin-defined signing profiles, which are policy-gated (`sshca/{mount}/profile/{name}`, action `read`). Profile CRUD is admin-only. Profiles specify critical options, extensions, optional max TTL, and optional principal restrictions. Security considerations updated accordingly.

**12. ~~No KRL (Key Revocation List) support~~ RESOLVED**

Added a full KRL section to sshca.md covering: in-memory KRL generation from revoked serials, barrier persistence at `engine/sshca/{mount}/krl.bin`, automatic rebuild on revoke/delete/unseal, a public `GET /v1/sshca/{mount}/krl` endpoint with ETag and Cache-Control headers, `GetKRL` gRPC RPC, and a pull-based distribution model with example sshd_config and cron fetch.

**13. ~~Policy resource path uses `ca/` prefix instead of `sshca/`~~ RESOLVED**

Updated policy check paths in sshca.md from `ca/{mount}/id/...` to `sshca/{mount}/id/...` for both `sign-host` and `sign-user` flows, eliminating the namespace collision with the CA (PKI) engine.

**14. No source-address restriction by default**

User certificates should ideally include `source-address` critical options to limit where they can be used from. At minimum, consider a mount-level configuration for default critical options that get applied to all user certs.

---

## engines/transit.md

### Strengths

- Ciphertext format with version prefix enables clean key rotation.
- `exportable` and `allow_deletion` immutable after creation — prevents policy weakening.
- AAD/context binding for AEAD ciphers.
- Rewrap never exposes plaintext to caller.

### Issues

**15. ~~No minimum key version enforcement~~ RESOLVED**

Added `min_decryption_version` per key (default 1). Decryption requests for versions below the minimum are rejected. New `update-key-config` operation (admin-only) advances the minimum (can only increase, cannot exceed current version). New `trim-key` operation permanently deletes versions older than the minimum. Both have corresponding gRPC RPCs and REST endpoints. The rotation cycle is documented: rotate → rewrap → advance min → trim.

**16. Key version pruning with `max_key_versions` has no safety check**

If `max_key_versions` is set and data encrypted with an old version hasn't been re-wrapped, pruning that version makes the data permanently unrecoverable. There should be either:
- A warning/confirmation mechanism, or
- A way to scan for ciphertext referencing a version before pruning, or
- At minimum, clear documentation that pruning is destructive.

**17. ~~RSA encryption without specifying padding scheme~~ RESOLVED**

RSA key types (`rsa-2048`, `rsa-4096`) removed entirely from the transit engine. Asymmetric encryption belongs in the user engine (via ECDH); RSA signing offers no advantage over Ed25519/ECDSA. `crypto/rsa` removed from dependencies. Rationale documented in key types section and security considerations.

**18. ~~HMAC keys used for `sign` operation is confusing~~ RESOLVED**

`sign` and `verify` are now restricted to asymmetric key types (Ed25519, ECDSA). HMAC keys are rejected with an error — HMAC must use the dedicated `hmac` operation. Policy actions are already split: `sign`, `verify`, and `hmac` are separate granular actions, all matched by `any`.

**19. ~~No batch encrypt/decrypt operations~~ RESOLVED**

Added `batch-encrypt`, `batch-decrypt`, and `batch-rewrap` operations to the transit engine plan. Each targets a single named key with an array of items; results are returned in order with per-item errors (partial success model). An optional `reference` field lets callers correlate results with source records. Policy is checked once per batch. Added corresponding gRPC RPCs and REST endpoints. `operationAction` maps batch variants to the same granular actions as their single counterparts.

**20. ~~`read` action maps to `decrypt` and `verify` — semantics are misleading~~ RESOLVED**

Replaced the coarse `read`/`write` action model with granular per-operation actions: `encrypt`, `decrypt`, `sign`, `verify`, `hmac` for cryptographic operations; `read` for metadata retrieval; `write` for key management; `admin` for administrative operations. Added `any` action that matches all non-admin actions. Added `LintRule` validation that rejects unknown effects and actions. `CreateRule` now validates before storing. Updated `operationAction` mapping and all tests.

**21. No rate limiting or quota on cryptographic operations**

A compromised or malicious user token could issue unlimited encrypt/decrypt/sign requests, potentially using the service as a cryptographic oracle. Consider per-user rate limits on transit operations.

---

## Cross-Cutting Issues

**22. ~~No forward secrecy for stored data~~ RESOLVED**: Per-engine DEKs limit blast radius — compromise of one DEK only exposes that engine's data, not the entire barrier. MEK compromise still exposes all DEKs, but MEK rotation enables periodic re-keying. Each engine mount gets its own DEK created automatically; a `"system"` DEK protects non-engine data. v2 ciphertext format embeds key IDs for DEK lookup.

**23. ~~Generic `POST /v1/engine/request` bypasses typed route middleware~~ RESOLVED**: Added an `adminOnlyOperations` map to `handleEngineRequest` that mirrors the admin gates on typed REST routes (e.g. `create-issuer`, `delete-cert`, `create-key`, `rotate-key`, `create-profile`, `provision`). Non-admin users are rejected with 403 before policy evaluation or engine dispatch. The v1 gRPC `Execute` RPC is defined in the proto but not registered in the server — only v2 typed RPCs are used, so the gRPC surface is not affected. Tests cover both admin and non-admin paths through the generic endpoint.

**24. ~~No CSRF protection mentioned for web UI~~ RESOLVED**: Added signed double-submit cookie CSRF protection. A per-server HMAC secret signs random nonce-based tokens. Every form includes a `{{csrfField}}` hidden input; a middleware validates that the form field matches the cookie and has a valid HMAC signature on all POST/PUT/PATCH/DELETE requests. Session cookie upgraded from `SameSite=Lax` to `SameSite=Strict`. CSRF cookie is also `HttpOnly`, `Secure`, `SameSite=Strict`. Tests cover token generation/validation, cross-secret rejection, middleware pass/block/mismatch scenarios.

---

## Priority Summary

| Priority | Issue | Location |
|----------|-------|----------|
| ~~**Critical**~~ | ~~#4 — Policy auth contradiction (admin vs user)~~ **RESOLVED** | ARCHITECTURE.md |
| ~~**Critical**~~ | ~~#9 — User-controllable SSH cert serials~~ **RESOLVED** | sshca.md |
| ~~**Critical**~~ | ~~#13 — Policy path collision (`ca/` vs `sshca/`)~~ **RESOLVED** | sshca.md |
| ~~**High**~~ | ~~#5 — No path AAD in barrier encryption~~ **RESOLVED** | ARCHITECTURE.md |
| ~~**High**~~ | ~~#12 — No KRL distribution for SSH revocation~~ **RESOLVED** | sshca.md |
| ~~**High**~~ | ~~#15 — No min key version for transit rotation~~ **RESOLVED** | transit.md |
| ~~**High**~~ | ~~#17 — RSA padding scheme unspecified~~ **RESOLVED** | transit.md |
| ~~**High**~~ | ~~#11 — `critical_options` not restricted~~ **RESOLVED** | sshca.md |
| ~~**High**~~ | ~~#6 — Single MEK with no rotation~~ **RESOLVED** | ARCHITECTURE.md |
| ~~**High**~~ | ~~#22 — No forward secrecy / per-engine DEKs~~ **RESOLVED** | Cross-cutting |
| ~~**Medium**~~ | ~~#2 — Token cache revocation gap~~ **ACCEPTED** | ARCHITECTURE.md |
| ~~**Medium**~~ | ~~#3 — Admin all-or-nothing access~~ **ACCEPTED** | ARCHITECTURE.md |
| ~~**Medium**~~ | ~~#8 — Unseal rate limit resets on restart~~ **ACCEPTED** | ARCHITECTURE.md |
| ~~**Medium**~~ | ~~#20 — `decrypt` mapped to `read` action~~ **RESOLVED** | transit.md |
| ~~**Medium**~~ | ~~#24 — No CSRF protection for web UI~~ **RESOLVED** | ARCHITECTURE.md |
| ~~**Low**~~ | ~~#1 — TLS 1.2 vs 1.3~~ **RESOLVED** | ARCHITECTURE.md |
| ~~**Low**~~ | ~~#19 — No batch transit operations~~ **RESOLVED** | transit.md |
| ~~**Low**~~ | ~~#18 — HMAC/sign semantic confusion~~ **RESOLVED** | transit.md |
| ~~**Medium**~~ | ~~#23 — Generic endpoint bypasses typed route middleware~~ **RESOLVED** | Cross-cutting |