Files
metacrypt/REMEDIATION.md

12 KiB
Raw Blame History

Remediation Plan

Date: 2026-03-16 Scope: Audit findings #25#38 from engine design review

This document provides a concrete remediation plan for each open finding. Items are grouped by priority and ordered for efficient implementation (dependencies first).


Critical

#37 — adminOnlyOperations name collision blocks user rotate-key

Problem: The adminOnlyOperations map in handleEngineRequest (internal/server/routes.go:265) is a flat map[string]bool keyed by operation name. The transit engine's rotate-key is admin-only, but the user engine's rotate-key is user-self. Since the map is checked before engine dispatch, non-admin users are blocked from calling rotate-key on any engine mount — including user engine mounts where it should be allowed.

Fix: Replace the flat map with an engine-type-qualified lookup. Two options:

Option A — Qualify the map key (minimal change):

Change the map type to include the engine type prefix:

var adminOnlyOperations = map[string]bool{
    "ca:import-root":          true,
    "ca:create-issuer":        true,
    "ca:delete-issuer":        true,
    "ca:revoke-cert":          true,
    "ca:delete-cert":          true,
    "transit:create-key":      true,
    "transit:delete-key":      true,
    "transit:rotate-key":      true,
    "transit:update-key-config": true,
    "transit:trim-key":        true,
    "sshca:create-profile":    true,
    "sshca:update-profile":    true,
    "sshca:delete-profile":    true,
    "sshca:revoke-cert":       true,
    "sshca:delete-cert":       true,
    "user:provision":          true,
    "user:delete-user":        true,
}

In handleEngineRequest, look up engineType + ":" + operation instead of just operation. The engineType is already known from the mount registry (the generic endpoint resolves the mount to an engine type).

Option B — Per-engine admin operations (cleaner but more code):

Each engine implements an AdminOperations() []string method. The server queries the resolved engine for its admin operations instead of using a global map.

Recommendation: Option A. It requires a one-line change to the lookup and a mechanical update to the map keys. The generic endpoint already resolves the mount to get the engine type.

Files to change:

  • internal/server/routes.go — update map and lookup in handleEngineRequest
  • engines/sshca.md — update adminOnlyOperations section
  • engines/transit.md — update adminOnlyOperations section
  • engines/user.md — update adminOnlyOperations section

Tests: Add test case in internal/server/server_test.go — non-admin user calling rotate-key via generic endpoint on a user engine mount should succeed (policy permitting). Same call on a transit mount should return 403.


High

#28 — HMAC output not versioned

Problem: HMAC output is raw base64 with no key version indicator. After key rotation and min_decryption_version advancement, old HMACs are unverifiable because the engine doesn't know which key version produced them.

Fix: Use the same versioned prefix format as ciphertext and signatures:

metacrypt:v{version}:{base64(mac_bytes)}

Update the hmac operation to include key_version in the response. Update internal HMAC verification to parse the version prefix and select the corresponding key version (subject to min_decryption_version enforcement).

Files to change:

  • engines/transit.md — update HMAC section, add HMAC output format, update Cryptographic Details section
  • Implementation: internal/engine/transit/sign.go (when implemented)

#30 — max_key_versions vs min_decryption_version unclear

Problem: The spec doesn't define when max_key_versions pruning happens or whether it respects min_decryption_version. Auto-pruning on rotation could destroy versions that still have unrewrapped ciphertext.

Fix: Define the behavior explicitly in engines/transit.md:

  1. max_key_versions pruning happens during rotate-key, after the new version is created.
  2. Pruning only deletes versions strictly less than min_decryption_version. If max_key_versions would require deleting a version at or above min_decryption_version, the version is retained and a warning is included in the response: "warning": "max_key_versions exceeded; advance min_decryption_version to enable pruning".
  3. This means max_key_versions is a soft limit — it is only enforceable after the operator completes the rotation cycle (rotate → rewrap → advance min → prune happens automatically on next rotate).

This resolves the original audit finding #16 as well.

Files to change:

  • engines/transit.md — add max_key_versions behavior to Key Rotation section and rotate-key flow
  • AUDIT.md — mark #16 as RESOLVED with reference to the new behavior

#33 — Auto-provision creates keys for arbitrary usernames

Problem: The encrypt flow auto-provisions recipients without validating that the username exists in MCIAS. Any authenticated user can create barrier entries for non-existent users.

Fix: Before auto-provisioning, validate the recipient username against MCIAS. The engine has access to the auth system via req.CallerInfo context. Add an MCIAS user lookup:

  1. Add a ValidateUsername(username string) (bool, error) method to the auth client interface. This calls the MCIAS user info endpoint to check if the username exists.
  2. In the encrypt flow, before auto-provisioning a recipient, call ValidateUsername. If the user doesn't exist in MCIAS, return an error: "recipient not found: {username}".
  3. Document this validation in the encrypt flow and security considerations.

Alternative (simpler, weaker): Skip MCIAS validation but add a rate limit on auto-provisioning (e.g., max 10 new provisions per encrypt request, max 100 total auto-provisions per hour per caller). This prevents storage inflation but doesn't prevent phantom users.

Recommendation: MCIAS validation. It's the correct security boundary — only real MCIAS users should have keypairs.

Files to change:

  • engines/user.md — update encrypt flow step 2, add MCIAS validation
  • internal/auth/ — add ValidateUsername to auth client (when implemented)

Medium

#25 — Missing list-certs REST route (SSH CA)

Fix: Add to the REST endpoints table:

| GET | `/v1/sshca/{mount}/certs` | List cert records |

Add to the route registration code block:

r.Get("/v1/sshca/{mount}/certs", s.requireAuth(s.handleSSHCAListCerts))

Files to change: engines/sshca.md

#26 — KRL section type description error

Fix: Change the description block from:

Section type: KRL_SECTION_CERT_SERIAL_LIST (0x21)

to:

Section type: KRL_SECTION_CERTIFICATES (0x01)
  CA key blob: ssh.MarshalAuthorizedKey(caSigner.PublicKey())
  Subsection type: KRL_SECTION_CERT_SERIAL_LIST (0x20)

This matches the pseudocode comments and the OpenSSH PROTOCOL.krl spec.

Files to change: engines/sshca.md

#27 — Policy check after cert construction (SSH CA)

Fix: Reorder the sign-host flow steps:

  1. Authenticate caller.
  2. Parse the supplied SSH public key.
  3. Parse TTL.
  4. Policy check: for each hostname, check policy on sshca/{mount}/id/{hostname}, action sign.
  5. Generate serial (only after policy passes).
  6. Build ssh.Certificate.
  7. Sign, store, return.

Same reordering for sign-user.

Files to change: engines/sshca.md

#29 — rewrap policy action not specified

Fix: Add rewrap as an explicit action in the operationAction mapping. rewrap maps to decrypt (since it requires internal access to plaintext). Batch variants map to the same action.

Add to the authorization section in engines/transit.md:

The rewrap and batch-rewrap operations require the decrypt action — rewrap internally decrypts with the old version and re-encrypts with the latest, so the caller must have decrypt permission. Alternatively, a dedicated rewrap action could be added for finer-grained control, but decrypt is the safer default (granting rewrap without decrypt would be odd since rewrap implies decrypt capability).

Recommendation: Map to decrypt. Simpler, and anyone who should rewrap should also be able to decrypt.

Files to change: engines/transit.md

#31 — Missing get-public-key REST route (Transit)

Fix: Add to the REST endpoints table:

| GET | `/v1/transit/{mount}/keys/{name}/public-key` | Get public key |

Add to the route registration code block:

r.Get("/v1/transit/{mount}/keys/{name}/public-key", s.requireAuth(s.handleTransitGetPublicKey))

Files to change: engines/transit.md

#34 — No recipient limit on encrypt (User)

Fix: Add a compile-time constant maxRecipients = 100 to the user engine. Reject requests exceeding this limit with 400 Bad Request / InvalidArgument before any ECDH computation.

Add to the encrypt flow in engines/user.md after step 1:

Validate that len(recipients) <= maxRecipients (100). Reject with error if exceeded.

Add to the security considerations section.

Files to change: engines/user.md


Low

#32 — exportable flag with no export operation (Transit)

Fix: Add an export-key operation to the transit engine:

  • Auth: User+Policy (action read).
  • Only succeeds if the key's exportable flag is true.
  • Returns raw key material (base64-encoded) for the current version only.
  • Asymmetric keys: returns private key in PKCS8 PEM.
  • Symmetric keys: returns raw key bytes, base64-encoded.
  • Add to HandleRequest dispatch, gRPC service, REST endpoints.

Alternatively, if key export is never intended, remove the exportable flag from create-key to avoid dead code. Given that transit is meant to keep keys server-side, removing the flag may be the better choice. Document the decision either way.

Recommendation: Remove exportable. Transit's entire value proposition is that keys never leave the service. If export is needed for migration, a dedicated admin-only export-key can be added later with appropriate audit logging (#7).

Files to change: engines/transit.md

#35 — No re-encryption support for user key rotation

Fix: Add a re-encrypt operation:

  • Auth: User (self) — only the envelope recipient can re-encrypt.
  • Input: old envelope.
  • Flow: decrypt with current key, generate new DEK, re-encrypt, return new envelope.
  • The old key must still be valid at the time of re-encryption. Document the workflow: re-encrypt all stored envelopes, then rotate-key.

This is a quality-of-life improvement, not a security fix. The current design (decrypt + encrypt separately) works but requires the caller to handle plaintext.

Files to change: engines/user.md

#36 — UserKeyConfig type undefined

Fix: Add the type definition to the in-memory state section:

type UserKeyConfig struct {
    Algorithm       string    `json:"algorithm"`        // key exchange algorithm used
    CreatedAt       time.Time `json:"created_at"`
    AutoProvisioned bool      `json:"auto_provisioned"` // created via auto-provision
}

Files to change: engines/user.md

#38 — ZeroizeKey prerequisite not cross-referenced

Fix: Add to the Implementation Steps section in both engines/transit.md and engines/user.md:

Prerequisite: engine.ZeroizeKey must exist in internal/engine/helpers.go (created as part of the SSH CA engine implementation — see engines/sshca.md step 1).

Files to change: engines/transit.md, engines/user.md


Implementation Order

The remediation items should be implemented in this order to respect dependencies:

  1. #37adminOnlyOperations qualification (critical, blocks user engine rotate-key). This is a code change to internal/server/routes.go plus spec updates. Do first because it affects all engine implementations.

  2. #28, #29, #30, #31, #32 — Transit spec fixes (can be done as a single spec update pass).

  3. #25, #26, #27 — SSH CA spec fixes (single spec update pass).

  4. #33, #34, #35, #36 — User spec fixes (single spec update pass).

  5. #38 — Cross-reference update (trivial, do with transit and user spec fixes).

Items within the same group are independent and can be done in parallel.