Files
metacrypt/REMEDIATION.md

355 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Remediation Plan
**Date**: 2026-03-16
**Scope**: Audit findings #25#38 from engine design review
This document provides a concrete remediation plan for each open finding. Items
are grouped by priority and ordered for efficient implementation (dependencies
first).
---
## Critical
### #37 — `adminOnlyOperations` name collision blocks user `rotate-key`
**Problem**: The `adminOnlyOperations` map in `handleEngineRequest`
(`internal/server/routes.go:265`) is a flat `map[string]bool` keyed by
operation name. The transit engine's `rotate-key` is admin-only, but the user
engine's `rotate-key` is user-self. Since the map is checked before engine
dispatch, non-admin users are blocked from calling `rotate-key` on any engine
mount — including user engine mounts where it should be allowed.
**Fix**: Replace the flat map with an engine-type-qualified lookup. Two options:
**Option A — Qualify the map key** (minimal change):
Change the map type to include the engine type prefix:
```go
var adminOnlyOperations = map[string]bool{
"ca:import-root": true,
"ca:create-issuer": true,
"ca:delete-issuer": true,
"ca:revoke-cert": true,
"ca:delete-cert": true,
"transit:create-key": true,
"transit:delete-key": true,
"transit:rotate-key": true,
"transit:update-key-config": true,
"transit:trim-key": true,
"sshca:create-profile": true,
"sshca:update-profile": true,
"sshca:delete-profile": true,
"sshca:revoke-cert": true,
"sshca:delete-cert": true,
"user:provision": true,
"user:delete-user": true,
}
```
In `handleEngineRequest`, look up `engineType + ":" + operation` instead of
just `operation`. The `engineType` is already known from the mount registry
(the generic endpoint resolves the mount to an engine type).
**Option B — Per-engine admin operations** (cleaner but more code):
Each engine implements an `AdminOperations() []string` method. The server
queries the resolved engine for its admin operations instead of using a global
map.
**Recommendation**: Option A. It requires a one-line change to the lookup and
a mechanical update to the map keys. The generic endpoint already resolves the
mount to get the engine type.
**Files to change**:
- `internal/server/routes.go` — update map and lookup in `handleEngineRequest`
- `engines/sshca.md` — update `adminOnlyOperations` section
- `engines/transit.md` — update `adminOnlyOperations` section
- `engines/user.md` — update `adminOnlyOperations` section
**Tests**: Add test case in `internal/server/server_test.go` — non-admin user
calling `rotate-key` via generic endpoint on a user engine mount should succeed
(policy permitting). Same call on a transit mount should return 403.
---
## High
### #28 — HMAC output not versioned
**Problem**: HMAC output is raw base64 with no key version indicator. After key
rotation and `min_decryption_version` advancement, old HMACs are unverifiable
because the engine doesn't know which key version produced them.
**Fix**: Use the same versioned prefix format as ciphertext and signatures:
```
metacrypt:v{version}:{base64(mac_bytes)}
```
Update the `hmac` operation to include `key_version` in the response. Update
internal HMAC verification to parse the version prefix and select the
corresponding key version (subject to `min_decryption_version` enforcement).
**Files to change**:
- `engines/transit.md` — update HMAC section, add HMAC output format, update
Cryptographic Details section
- Implementation: `internal/engine/transit/sign.go` (when implemented)
### #30 — `max_key_versions` vs `min_decryption_version` unclear
**Problem**: The spec doesn't define when `max_key_versions` pruning happens or
whether it respects `min_decryption_version`. Auto-pruning on rotation could
destroy versions that still have unrewrapped ciphertext.
**Fix**: Define the behavior explicitly in `engines/transit.md`:
1. `max_key_versions` pruning happens during `rotate-key`, after the new
version is created.
2. Pruning **only** deletes versions **strictly less than**
`min_decryption_version`. If `max_key_versions` would require deleting a
version at or above `min_decryption_version`, the version is **retained**
and a warning is included in the response:
`"warning": "max_key_versions exceeded; advance min_decryption_version to enable pruning"`.
3. This means `max_key_versions` is a soft limit — it is only enforceable
after the operator completes the rotation cycle (rotate → rewrap → advance
min → prune happens automatically on next rotate).
This resolves the original audit finding #16 as well.
**Files to change**:
- `engines/transit.md` — add `max_key_versions` behavior to Key Rotation
section and `rotate-key` flow
- `AUDIT.md` — mark #16 as RESOLVED with reference to the new behavior
### #33 — Auto-provision creates keys for arbitrary usernames
**Problem**: The encrypt flow auto-provisions recipients without validating
that the username exists in MCIAS. Any authenticated user can create barrier
entries for non-existent users.
**Fix**: Before auto-provisioning, validate the recipient username against
MCIAS. The engine has access to the auth system via `req.CallerInfo` context.
Add an MCIAS user lookup:
1. Add a `ValidateUsername(username string) (bool, error)` method to the auth
client interface. This calls the MCIAS user info endpoint to check if the
username exists.
2. In the encrypt flow, before auto-provisioning a recipient, call
`ValidateUsername`. If the user doesn't exist in MCIAS, return an error:
`"recipient not found: {username}"`.
3. Document this validation in the encrypt flow and security considerations.
**Alternative** (simpler, weaker): Skip MCIAS validation but add a
rate limit on auto-provisioning (e.g., max 10 new provisions per encrypt
request, max 100 total auto-provisions per hour per caller). This prevents
storage inflation but doesn't prevent phantom users.
**Recommendation**: MCIAS validation. It's the correct security boundary —
only real MCIAS users should have keypairs.
**Files to change**:
- `engines/user.md` — update encrypt flow step 2, add MCIAS validation
- `internal/auth/` — add `ValidateUsername` to auth client (when implemented)
---
## Medium
### #25 — Missing `list-certs` REST route (SSH CA)
**Fix**: Add to the REST endpoints table:
```
| GET | `/v1/sshca/{mount}/certs` | List cert records |
```
Add to the route registration code block:
```go
r.Get("/v1/sshca/{mount}/certs", s.requireAuth(s.handleSSHCAListCerts))
```
**Files to change**: `engines/sshca.md`
### #26 — KRL section type description error
**Fix**: Change the description block from:
```
Section type: KRL_SECTION_CERT_SERIAL_LIST (0x21)
```
to:
```
Section type: KRL_SECTION_CERTIFICATES (0x01)
CA key blob: ssh.MarshalAuthorizedKey(caSigner.PublicKey())
Subsection type: KRL_SECTION_CERT_SERIAL_LIST (0x20)
```
This matches the pseudocode comments and the OpenSSH `PROTOCOL.krl` spec.
**Files to change**: `engines/sshca.md`
### #27 — Policy check after cert construction (SSH CA)
**Fix**: Reorder the sign-host flow steps:
1. Authenticate caller.
2. Parse the supplied SSH public key.
3. Parse TTL.
4. **Policy check**: for each hostname, check policy on
`sshca/{mount}/id/{hostname}`, action `sign`.
5. Generate serial (only after policy passes).
6. Build `ssh.Certificate`.
7. Sign, store, return.
Same reordering for sign-user.
**Files to change**: `engines/sshca.md`
### #29 — `rewrap` policy action not specified
**Fix**: Add `rewrap` as an explicit action in the `operationAction` mapping.
`rewrap` maps to `decrypt` (since it requires internal access to plaintext).
Batch variants map to the same action.
Add to the authorization section in `engines/transit.md`:
> The `rewrap` and `batch-rewrap` operations require the `decrypt` action —
> rewrap internally decrypts with the old version and re-encrypts with the
> latest, so the caller must have decrypt permission. Alternatively, a
> dedicated `rewrap` action could be added for finer-grained control, but
> `decrypt` is the safer default (granting `rewrap` without `decrypt` would be
> odd since rewrap implies decrypt capability).
**Recommendation**: Map to `decrypt`. Simpler, and anyone who should rewrap
should also be able to decrypt.
**Files to change**: `engines/transit.md`
### #31 — Missing `get-public-key` REST route (Transit)
**Fix**: Add to the REST endpoints table:
```
| GET | `/v1/transit/{mount}/keys/{name}/public-key` | Get public key |
```
Add to the route registration code block:
```go
r.Get("/v1/transit/{mount}/keys/{name}/public-key", s.requireAuth(s.handleTransitGetPublicKey))
```
**Files to change**: `engines/transit.md`
### #34 — No recipient limit on encrypt (User)
**Fix**: Add a compile-time constant `maxRecipients = 100` to the user engine.
Reject requests exceeding this limit with `400 Bad Request` / `InvalidArgument`
before any ECDH computation.
Add to the encrypt flow in `engines/user.md` after step 1:
> Validate that `len(recipients) <= maxRecipients` (100). Reject with error if
> exceeded.
Add to the security considerations section.
**Files to change**: `engines/user.md`
---
## Low
### #32 — `exportable` flag with no export operation (Transit)
**Fix**: Add an `export-key` operation to the transit engine:
- Auth: User+Policy (action `read`).
- Only succeeds if the key's `exportable` flag is `true`.
- Returns raw key material (base64-encoded) for the current version only.
- Asymmetric keys: returns private key in PKCS8 PEM.
- Symmetric keys: returns raw key bytes, base64-encoded.
- Add to HandleRequest dispatch, gRPC service, REST endpoints.
Alternatively, if key export is never intended, remove the `exportable` flag
from `create-key` to avoid dead code. Given that transit is meant to keep keys
server-side, **removing the flag** may be the better choice. Document the
decision either way.
**Recommendation**: Remove `exportable`. Transit's entire value proposition is
that keys never leave the service. If export is needed for migration, a
dedicated admin-only `export-key` can be added later with appropriate audit
logging (#7).
**Files to change**: `engines/transit.md`
### #35 — No re-encryption support for user key rotation
**Fix**: Add a `re-encrypt` operation:
- Auth: User (self) — only the envelope recipient can re-encrypt.
- Input: old envelope.
- Flow: decrypt with current key, generate new DEK, re-encrypt, return new
envelope.
- The old key must still be valid at the time of re-encryption. Document the
workflow: re-encrypt all stored envelopes, then rotate-key.
This is a quality-of-life improvement, not a security fix. The current design
(decrypt + encrypt separately) works but requires the caller to handle
plaintext.
**Files to change**: `engines/user.md`
### #36 — `UserKeyConfig` type undefined
**Fix**: Add the type definition to the in-memory state section:
```go
type UserKeyConfig struct {
Algorithm string `json:"algorithm"` // key exchange algorithm used
CreatedAt time.Time `json:"created_at"`
AutoProvisioned bool `json:"auto_provisioned"` // created via auto-provision
}
```
**Files to change**: `engines/user.md`
### #38 — `ZeroizeKey` prerequisite not cross-referenced
**Fix**: Add to the Implementation Steps section in both `engines/transit.md`
and `engines/user.md`:
> **Prerequisite**: `engine.ZeroizeKey` must exist in
> `internal/engine/helpers.go` (created as part of the SSH CA engine
> implementation — see `engines/sshca.md` step 1).
**Files to change**: `engines/transit.md`, `engines/user.md`
---
## Implementation Order
The remediation items should be implemented in this order to respect
dependencies:
1. **#37** — `adminOnlyOperations` qualification (critical, blocks user engine
`rotate-key`). This is a code change to `internal/server/routes.go` plus
spec updates. Do first because it affects all engine implementations.
2. **#28, #29, #30, #31, #32** — Transit spec fixes (can be done as a single
spec update pass).
3. **#25, #26, #27** — SSH CA spec fixes (single spec update pass).
4. **#33, #34, #35, #36** — User spec fixes (single spec update pass).
5. **#38** — Cross-reference update (trivial, do with transit and user spec
fixes).
Items within the same group are independent and can be done in parallel.