Add web UI for SSH CA, Transit, and User engines; full security audit and remediation
Web UI: Added browser-based management for all three remaining engines (SSH CA, Transit, User E2E). Includes gRPC client wiring, handler files, 7 HTML templates, dashboard mount forms, and conditional navigation links. Fixed REST API routes to match design specs (SSH CA cert singular paths, Transit PATCH for update-key-config). Security audit: Conducted full-system audit covering crypto core, all engine implementations, API servers, policy engine, auth, deployment, and documentation. Identified 42 new findings (#39-#80) across all severity levels. Remediation of all 8 High findings: - #68: Replaced 14 JSON-injection-vulnerable error responses with safe json.Encoder via writeJSONError helper - #48: Added two-layer path traversal defense (barrier validatePath rejects ".." segments; engine ValidateName enforces safe name pattern) - #39: Extended RLock through entire crypto operations in barrier Get/Put/Delete/List to eliminate TOCTOU race with Seal - #40: Unified ReWrapKeys and seal_config UPDATE into single SQLite transaction to prevent irrecoverable data loss on crash during MEK rotation - #49: Added resolveTTL to CA engine enforcing issuer MaxTTL ceiling on handleIssue and handleSignCSR - #61: Store raw ECDH private key bytes in userState for effective zeroization on Seal - #62: Fixed user engine policy resource path from mountPath to mountName() so policy rules match correctly - #69: Added newPolicyChecker helper and passed service-level policy evaluation to all 25 typed REST handler engine.Request structs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
847
REMEDIATION.md
847
REMEDIATION.md
@@ -1,354 +1,579 @@
|
||||
# Remediation Plan
|
||||
# Remediation Plan — High-Priority Audit Findings
|
||||
|
||||
**Date**: 2026-03-16
|
||||
**Scope**: Audit findings #25–#38 from engine design review
|
||||
**Date**: 2026-03-17
|
||||
**Scope**: AUDIT.md findings #39, #40, #48, #49, #61, #62, #68, #69
|
||||
|
||||
This document provides a concrete remediation plan for each open finding. Items
|
||||
are grouped by priority and ordered for efficient implementation (dependencies
|
||||
first).
|
||||
This plan addresses all eight High-severity findings from the 2026-03-17
|
||||
full system audit. Findings are grouped into four work items by shared root
|
||||
cause or affected subsystem. The order reflects dependency chains: #68 is a
|
||||
standalone fix that should ship first; #48 is a prerequisite for safe
|
||||
operation across all engines; #39/#40 affect the storage core; the remaining
|
||||
four affect specific engines.
|
||||
|
||||
---
|
||||
|
||||
## Critical
|
||||
## Work Item 1: JSON Injection in REST Error Responses (#68)
|
||||
|
||||
### #37 — `adminOnlyOperations` name collision blocks user `rotate-key`
|
||||
**Risk**: An error message containing `"` or `\` breaks the JSON response
|
||||
structure. If the error contains attacker-controlled input (e.g., a mount
|
||||
name or key name that triggers a downstream error), this enables JSON
|
||||
injection in API responses.
|
||||
|
||||
**Problem**: The `adminOnlyOperations` map in `handleEngineRequest`
|
||||
(`internal/server/routes.go:265`) is a flat `map[string]bool` keyed by
|
||||
operation name. The transit engine's `rotate-key` is admin-only, but the user
|
||||
engine's `rotate-key` is user-self. Since the map is checked before engine
|
||||
dispatch, non-admin users are blocked from calling `rotate-key` on any engine
|
||||
mount — including user engine mounts where it should be allowed.
|
||||
|
||||
**Fix**: Replace the flat map with an engine-type-qualified lookup. Two options:
|
||||
|
||||
**Option A — Qualify the map key** (minimal change):
|
||||
|
||||
Change the map type to include the engine type prefix:
|
||||
**Root cause**: 13 locations in `internal/server/routes.go` construct JSON
|
||||
error responses via string concatenation:
|
||||
|
||||
```go
|
||||
var adminOnlyOperations = map[string]bool{
|
||||
"ca:import-root": true,
|
||||
"ca:create-issuer": true,
|
||||
"ca:delete-issuer": true,
|
||||
"ca:revoke-cert": true,
|
||||
"ca:delete-cert": true,
|
||||
"transit:create-key": true,
|
||||
"transit:delete-key": true,
|
||||
"transit:rotate-key": true,
|
||||
"transit:update-key-config": true,
|
||||
"transit:trim-key": true,
|
||||
"sshca:create-profile": true,
|
||||
"sshca:update-profile": true,
|
||||
"sshca:delete-profile": true,
|
||||
"sshca:revoke-cert": true,
|
||||
"sshca:delete-cert": true,
|
||||
"user:provision": true,
|
||||
"user:delete-user": true,
|
||||
http.Error(w, `{"error":"`+err.Error()+`"}`, http.StatusInternalServerError)
|
||||
```
|
||||
|
||||
The `writeEngineError` helper (line 1704) is the most common entry point;
|
||||
most typed handlers call it.
|
||||
|
||||
### Fix
|
||||
|
||||
1. **Replace `writeEngineError`** with a safe JSON encoder:
|
||||
|
||||
```go
|
||||
func writeJSONError(w http.ResponseWriter, msg string, code int) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(code)
|
||||
_ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
|
||||
}
|
||||
```
|
||||
|
||||
2. **Replace all 13 call sites** that use string concatenation with
|
||||
`writeJSONError(w, grpcMessage(err), status)` or
|
||||
`writeJSONError(w, err.Error(), status)`.
|
||||
|
||||
The `grpcMessage` helper already exists in the webserver package and
|
||||
extracts human-readable messages from gRPC errors. Add an equivalent
|
||||
to the REST server, and prefer it over raw `err.Error()` to avoid
|
||||
leaking internal error details.
|
||||
|
||||
3. **Grep for the pattern** `"error":"` in `routes.go` to confirm no
|
||||
remaining string-concatenated JSON.
|
||||
|
||||
### Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/server/routes.go` | Replace `writeEngineError` and all 13 inline error sites |
|
||||
|
||||
### Verification
|
||||
|
||||
- `go vet ./internal/server/`
|
||||
- `go test ./internal/server/`
|
||||
- Manual test: mount an engine with a name containing `"`, trigger an error,
|
||||
verify the response is valid JSON.
|
||||
|
||||
---
|
||||
|
||||
## Work Item 2: Path Traversal via Unsanitized Names (#48)
|
||||
|
||||
**Risk**: User-controlled strings (issuer names, key names, profile names,
|
||||
usernames, mount names) are concatenated directly into barrier storage
|
||||
paths. An input containing `../` traverses the barrier namespace, allowing
|
||||
reads and writes to arbitrary paths. This affects all four engines and the
|
||||
engine registry.
|
||||
|
||||
**Root cause**: No validation exists at any layer — neither the barrier's
|
||||
`Put`/`Get`/`Delete` methods nor the engines sanitize path components.
|
||||
|
||||
### Vulnerable locations
|
||||
|
||||
| File | Input | Path Pattern |
|
||||
|------|-------|-------------|
|
||||
| `ca/ca.go` | issuer `name` | `mountPath + "issuers/" + name + "/"` |
|
||||
| `sshca/sshca.go` | profile `name` | `mountPath + "profiles/" + name + ".json"` |
|
||||
| `transit/transit.go` | key `name` | `mountPath + "keys/" + name + "/"` |
|
||||
| `user/user.go` | `username` | `mountPath + "users/" + username + "/"` |
|
||||
| `engine/engine.go` | mount `name` | `engine/{type}/{name}/` |
|
||||
| `policy/policy.go` | rule `ID` | `policy/rules/{id}` |
|
||||
|
||||
### Fix
|
||||
|
||||
Enforce validation at **two layers** (defense in depth):
|
||||
|
||||
1. **Barrier layer** — reject paths containing `..` segments.
|
||||
|
||||
Add a `validatePath` check at the top of `Get`, `Put`, `Delete`, and
|
||||
`List` in `barrier.go`:
|
||||
|
||||
```go
|
||||
var ErrInvalidPath = errors.New("barrier: invalid path")
|
||||
|
||||
func validatePath(p string) error {
|
||||
for _, seg := range strings.Split(p, "/") {
|
||||
if seg == ".." {
|
||||
return fmt.Errorf("%w: path traversal rejected: %q", ErrInvalidPath, p)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
Call `validatePath` at the entry of `Get`, `Put`, `Delete`, `List`.
|
||||
Return `ErrInvalidPath` on failure.
|
||||
|
||||
2. **Engine/registry layer** — validate entity names at input boundaries.
|
||||
|
||||
Add a `ValidateName` helper to `internal/engine/`:
|
||||
|
||||
```go
|
||||
var namePattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9._-]*$`)
|
||||
|
||||
func ValidateName(name string) error {
|
||||
if name == "" || len(name) > 128 || !namePattern.MatchString(name) {
|
||||
return fmt.Errorf("invalid name %q: must be 1-128 alphanumeric, "+
|
||||
"dot, hyphen, or underscore characters", name)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
Call `ValidateName` in:
|
||||
|
||||
| Location | Input validated |
|
||||
|----------|----------------|
|
||||
| `engine.go` `Mount()` | mount name |
|
||||
| `ca.go` `handleCreateIssuer` | issuer name |
|
||||
| `sshca.go` `handleCreateProfile` | profile name |
|
||||
| `transit.go` `handleCreateKey` | key name |
|
||||
| `user.go` `handleRegister`, `handleProvision` | username |
|
||||
| `user.go` `handleEncrypt` | recipient usernames |
|
||||
| `policy.go` `CreateRule` | rule ID |
|
||||
|
||||
Note: certificate serials are generated server-side from `crypto/rand`
|
||||
and hex-encoded, so they are safe. Validate anyway for defense in depth.
|
||||
|
||||
### Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/barrier/barrier.go` | Add `validatePath`, call from Get/Put/Delete/List |
|
||||
| `internal/engine/engine.go` | Add `ValidateName`, call from `Mount` |
|
||||
| `internal/engine/ca/ca.go` | Call `ValidateName` on issuer name |
|
||||
| `internal/engine/sshca/sshca.go` | Call `ValidateName` on profile name |
|
||||
| `internal/engine/transit/transit.go` | Call `ValidateName` on key name |
|
||||
| `internal/engine/user/user.go` | Call `ValidateName` on usernames |
|
||||
| `internal/policy/policy.go` | Call `ValidateName` on rule ID |
|
||||
|
||||
### Verification
|
||||
|
||||
- Add `TestValidatePath` to `barrier_test.go`: confirm `../` and `..` are
|
||||
rejected; confirm normal paths pass.
|
||||
- Add `TestValidateName` to `engine_test.go`: confirm `../evil`, empty
|
||||
string, and overlong names are rejected; confirm valid names pass.
|
||||
- `go test ./internal/barrier/ ./internal/engine/... ./internal/policy/`
|
||||
|
||||
---
|
||||
|
||||
## Work Item 3: Barrier Concurrency and Crash Safety (#39, #40)
|
||||
|
||||
These two findings share the barrier/seal subsystem and should be addressed
|
||||
together.
|
||||
|
||||
### #39 — TOCTOU Race in Barrier Get/Put
|
||||
|
||||
**Risk**: `Get` and `Put` copy the `mek` slice header and `keys` map
|
||||
reference under `RLock`, release the lock, then use the copied references
|
||||
for encryption/decryption. A concurrent `Seal()` zeroizes the underlying
|
||||
byte slices in place before nil-ing the fields, so a concurrent reader
|
||||
uses zeroized key material.
|
||||
|
||||
**Root cause**: The lock does not cover the crypto operation. The "copy"
|
||||
is a shallow reference copy (slice header), not a deep byte copy. `Seal()`
|
||||
zeroizes the backing array, which is shared.
|
||||
|
||||
**Current locking pattern** (`barrier.go`):
|
||||
|
||||
```
|
||||
Get: RLock → copy mek/keys refs → RUnlock → decrypt (uses zeroized key)
|
||||
Put: RLock → copy mek/keys refs → RUnlock → encrypt (uses zeroized key)
|
||||
Seal: Lock → zeroize mek bytes → nil mek → zeroize keys → nil keys → Unlock
|
||||
```
|
||||
|
||||
**Fix**: Hold `RLock` through the entire crypto operation:
|
||||
|
||||
```go
|
||||
func (b *AESGCMBarrier) Get(ctx context.Context, path string) ([]byte, error) {
|
||||
if err := validatePath(path); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
b.mu.RLock()
|
||||
defer b.mu.RUnlock()
|
||||
if b.mek == nil {
|
||||
return nil, ErrSealed
|
||||
}
|
||||
// query DB, resolve key, decrypt — all under RLock
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
In `handleEngineRequest`, look up `engineType + ":" + operation` instead of
|
||||
just `operation`. The `engineType` is already known from the mount registry
|
||||
(the generic endpoint resolves the mount to an engine type).
|
||||
This is the minimal, safest change. `RLock` permits concurrent readers, so
|
||||
there is no throughput regression for parallel `Get`/`Put` operations. The
|
||||
only serialization point is `Seal()`, which acquires the exclusive `Lock`
|
||||
and waits for all readers to drain — exactly the semantics we want.
|
||||
|
||||
**Option B — Per-engine admin operations** (cleaner but more code):
|
||||
Apply the same pattern to `Put`, `Delete`, and `List`.
|
||||
|
||||
Each engine implements an `AdminOperations() []string` method. The server
|
||||
queries the resolved engine for its admin operations instead of using a global
|
||||
map.
|
||||
**Alternative considered**: Atomic pointer swap (`atomic.Pointer[keyState]`).
|
||||
This eliminates the lock from the hot path entirely, but introduces
|
||||
complexity around deferred zeroization of the old state (readers may still
|
||||
hold references). The `RLock`-through-crypto approach is simpler and
|
||||
sufficient for Metacrypt's concurrency profile.
|
||||
|
||||
**Recommendation**: Option A. It requires a one-line change to the lookup and
|
||||
a mechanical update to the map keys. The generic endpoint already resolves the
|
||||
mount to get the engine type.
|
||||
### #40 — Crash During `ReWrapKeys` Loses All Data
|
||||
|
||||
**Files to change**:
|
||||
- `internal/server/routes.go` — update map and lookup in `handleEngineRequest`
|
||||
- `engines/sshca.md` — update `adminOnlyOperations` section
|
||||
- `engines/transit.md` — update `adminOnlyOperations` section
|
||||
- `engines/user.md` — update `adminOnlyOperations` section
|
||||
**Risk**: `RotateMEK` calls `barrier.ReWrapKeys(newMEK)` which commits a
|
||||
transaction re-wrapping all DEKs, then separately updates `seal_config`
|
||||
with the new encrypted MEK. A crash between these two database operations
|
||||
leaves DEKs wrapped with a MEK that is not persisted — all data is
|
||||
irrecoverable.
|
||||
|
||||
**Tests**: Add test case in `internal/server/server_test.go` — non-admin user
|
||||
calling `rotate-key` via generic endpoint on a user engine mount should succeed
|
||||
(policy permitting). Same call on a transit mount should return 403.
|
||||
|
||||
---
|
||||
|
||||
## High
|
||||
|
||||
### #28 — HMAC output not versioned
|
||||
|
||||
**Problem**: HMAC output is raw base64 with no key version indicator. After key
|
||||
rotation and `min_decryption_version` advancement, old HMACs are unverifiable
|
||||
because the engine doesn't know which key version produced them.
|
||||
|
||||
**Fix**: Use the same versioned prefix format as ciphertext and signatures:
|
||||
**Current flow** (`seal.go` lines 245–313):
|
||||
|
||||
```
|
||||
metacrypt:v{version}:{base64(mac_bytes)}
|
||||
1. Generate newMEK
|
||||
2. barrier.ReWrapKeys(ctx, newMEK) ← commits transaction (barrier_keys updated)
|
||||
3. crypto.Encrypt(kwk, newMEK, nil) ← encrypt new MEK
|
||||
4. UPDATE seal_config SET encrypted_mek = ? ← separate statement, not in transaction
|
||||
*** CRASH HERE = DATA LOSS ***
|
||||
5. Swap in-memory MEK
|
||||
```
|
||||
|
||||
Update the `hmac` operation to include `key_version` in the response. Update
|
||||
internal HMAC verification to parse the version prefix and select the
|
||||
corresponding key version (subject to `min_decryption_version` enforcement).
|
||||
**Fix**: Unify steps 2–4 into a single database transaction.
|
||||
|
||||
**Files to change**:
|
||||
- `engines/transit.md` — update HMAC section, add HMAC output format, update
|
||||
Cryptographic Details section
|
||||
- Implementation: `internal/engine/transit/sign.go` (when implemented)
|
||||
|
||||
### #30 — `max_key_versions` vs `min_decryption_version` unclear
|
||||
|
||||
**Problem**: The spec doesn't define when `max_key_versions` pruning happens or
|
||||
whether it respects `min_decryption_version`. Auto-pruning on rotation could
|
||||
destroy versions that still have unrewrapped ciphertext.
|
||||
|
||||
**Fix**: Define the behavior explicitly in `engines/transit.md`:
|
||||
|
||||
1. `max_key_versions` pruning happens during `rotate-key`, after the new
|
||||
version is created.
|
||||
2. Pruning **only** deletes versions **strictly less than**
|
||||
`min_decryption_version`. If `max_key_versions` would require deleting a
|
||||
version at or above `min_decryption_version`, the version is **retained**
|
||||
and a warning is included in the response:
|
||||
`"warning": "max_key_versions exceeded; advance min_decryption_version to enable pruning"`.
|
||||
3. This means `max_key_versions` is a soft limit — it is only enforceable
|
||||
after the operator completes the rotation cycle (rotate → rewrap → advance
|
||||
min → prune happens automatically on next rotate).
|
||||
|
||||
This resolves the original audit finding #16 as well.
|
||||
|
||||
**Files to change**:
|
||||
- `engines/transit.md` — add `max_key_versions` behavior to Key Rotation
|
||||
section and `rotate-key` flow
|
||||
- `AUDIT.md` — mark #16 as RESOLVED with reference to the new behavior
|
||||
|
||||
### #33 — Auto-provision creates keys for arbitrary usernames
|
||||
|
||||
**Problem**: The encrypt flow auto-provisions recipients without validating
|
||||
that the username exists in MCIAS. Any authenticated user can create barrier
|
||||
entries for non-existent users.
|
||||
|
||||
**Fix**: Before auto-provisioning, validate the recipient username against
|
||||
MCIAS. The engine has access to the auth system via `req.CallerInfo` context.
|
||||
Add an MCIAS user lookup:
|
||||
|
||||
1. Add a `ValidateUsername(username string) (bool, error)` method to the auth
|
||||
client interface. This calls the MCIAS user info endpoint to check if the
|
||||
username exists.
|
||||
2. In the encrypt flow, before auto-provisioning a recipient, call
|
||||
`ValidateUsername`. If the user doesn't exist in MCIAS, return an error:
|
||||
`"recipient not found: {username}"`.
|
||||
3. Document this validation in the encrypt flow and security considerations.
|
||||
|
||||
**Alternative** (simpler, weaker): Skip MCIAS validation but add a
|
||||
rate limit on auto-provisioning (e.g., max 10 new provisions per encrypt
|
||||
request, max 100 total auto-provisions per hour per caller). This prevents
|
||||
storage inflation but doesn't prevent phantom users.
|
||||
|
||||
**Recommendation**: MCIAS validation. It's the correct security boundary —
|
||||
only real MCIAS users should have keypairs.
|
||||
|
||||
**Files to change**:
|
||||
- `engines/user.md` — update encrypt flow step 2, add MCIAS validation
|
||||
- `internal/auth/` — add `ValidateUsername` to auth client (when implemented)
|
||||
|
||||
---
|
||||
|
||||
## Medium
|
||||
|
||||
### #25 — Missing `list-certs` REST route (SSH CA)
|
||||
|
||||
**Fix**: Add to the REST endpoints table:
|
||||
|
||||
```
|
||||
| GET | `/v1/sshca/{mount}/certs` | List cert records |
|
||||
```
|
||||
|
||||
Add to the route registration code block:
|
||||
Refactor `ReWrapKeys` to accept an optional `*sql.Tx`:
|
||||
|
||||
```go
|
||||
r.Get("/v1/sshca/{mount}/certs", s.requireAuth(s.handleSSHCAListCerts))
|
||||
```
|
||||
// ReWrapKeysTx re-wraps all DEKs with newMEK within the given transaction.
|
||||
func (b *AESGCMBarrier) ReWrapKeysTx(ctx context.Context, tx *sql.Tx, newMEK []byte) error {
|
||||
// Same logic as ReWrapKeys, but use tx instead of b.db.BeginTx.
|
||||
rows, err := tx.QueryContext(ctx, "SELECT key_id, wrapped_key FROM barrier_keys")
|
||||
// ... decrypt with old MEK, encrypt with new MEK, UPDATE barrier_keys ...
|
||||
}
|
||||
|
||||
**Files to change**: `engines/sshca.md`
|
||||
|
||||
### #26 — KRL section type description error
|
||||
|
||||
**Fix**: Change the description block from:
|
||||
|
||||
```
|
||||
Section type: KRL_SECTION_CERT_SERIAL_LIST (0x21)
|
||||
```
|
||||
|
||||
to:
|
||||
|
||||
```
|
||||
Section type: KRL_SECTION_CERTIFICATES (0x01)
|
||||
CA key blob: ssh.MarshalAuthorizedKey(caSigner.PublicKey())
|
||||
Subsection type: KRL_SECTION_CERT_SERIAL_LIST (0x20)
|
||||
```
|
||||
|
||||
This matches the pseudocode comments and the OpenSSH `PROTOCOL.krl` spec.
|
||||
|
||||
**Files to change**: `engines/sshca.md`
|
||||
|
||||
### #27 — Policy check after cert construction (SSH CA)
|
||||
|
||||
**Fix**: Reorder the sign-host flow steps:
|
||||
|
||||
1. Authenticate caller.
|
||||
2. Parse the supplied SSH public key.
|
||||
3. Parse TTL.
|
||||
4. **Policy check**: for each hostname, check policy on
|
||||
`sshca/{mount}/id/{hostname}`, action `sign`.
|
||||
5. Generate serial (only after policy passes).
|
||||
6. Build `ssh.Certificate`.
|
||||
7. Sign, store, return.
|
||||
|
||||
Same reordering for sign-user.
|
||||
|
||||
**Files to change**: `engines/sshca.md`
|
||||
|
||||
### #29 — `rewrap` policy action not specified
|
||||
|
||||
**Fix**: Add `rewrap` as an explicit action in the `operationAction` mapping.
|
||||
`rewrap` maps to `decrypt` (since it requires internal access to plaintext).
|
||||
Batch variants map to the same action.
|
||||
|
||||
Add to the authorization section in `engines/transit.md`:
|
||||
|
||||
> The `rewrap` and `batch-rewrap` operations require the `decrypt` action —
|
||||
> rewrap internally decrypts with the old version and re-encrypts with the
|
||||
> latest, so the caller must have decrypt permission. Alternatively, a
|
||||
> dedicated `rewrap` action could be added for finer-grained control, but
|
||||
> `decrypt` is the safer default (granting `rewrap` without `decrypt` would be
|
||||
> odd since rewrap implies decrypt capability).
|
||||
|
||||
**Recommendation**: Map to `decrypt`. Simpler, and anyone who should rewrap
|
||||
should also be able to decrypt.
|
||||
|
||||
**Files to change**: `engines/transit.md`
|
||||
|
||||
### #31 — Missing `get-public-key` REST route (Transit)
|
||||
|
||||
**Fix**: Add to the REST endpoints table:
|
||||
|
||||
```
|
||||
| GET | `/v1/transit/{mount}/keys/{name}/public-key` | Get public key |
|
||||
```
|
||||
|
||||
Add to the route registration code block:
|
||||
|
||||
```go
|
||||
r.Get("/v1/transit/{mount}/keys/{name}/public-key", s.requireAuth(s.handleTransitGetPublicKey))
|
||||
```
|
||||
|
||||
**Files to change**: `engines/transit.md`
|
||||
|
||||
### #34 — No recipient limit on encrypt (User)
|
||||
|
||||
**Fix**: Add a compile-time constant `maxRecipients = 100` to the user engine.
|
||||
Reject requests exceeding this limit with `400 Bad Request` / `InvalidArgument`
|
||||
before any ECDH computation.
|
||||
|
||||
Add to the encrypt flow in `engines/user.md` after step 1:
|
||||
|
||||
> Validate that `len(recipients) <= maxRecipients` (100). Reject with error if
|
||||
> exceeded.
|
||||
|
||||
Add to the security considerations section.
|
||||
|
||||
**Files to change**: `engines/user.md`
|
||||
|
||||
---
|
||||
|
||||
## Low
|
||||
|
||||
### #32 — `exportable` flag with no export operation (Transit)
|
||||
|
||||
**Fix**: Add an `export-key` operation to the transit engine:
|
||||
|
||||
- Auth: User+Policy (action `read`).
|
||||
- Only succeeds if the key's `exportable` flag is `true`.
|
||||
- Returns raw key material (base64-encoded) for the current version only.
|
||||
- Asymmetric keys: returns private key in PKCS8 PEM.
|
||||
- Symmetric keys: returns raw key bytes, base64-encoded.
|
||||
- Add to HandleRequest dispatch, gRPC service, REST endpoints.
|
||||
|
||||
Alternatively, if key export is never intended, remove the `exportable` flag
|
||||
from `create-key` to avoid dead code. Given that transit is meant to keep keys
|
||||
server-side, **removing the flag** may be the better choice. Document the
|
||||
decision either way.
|
||||
|
||||
**Recommendation**: Remove `exportable`. Transit's entire value proposition is
|
||||
that keys never leave the service. If export is needed for migration, a
|
||||
dedicated admin-only `export-key` can be added later with appropriate audit
|
||||
logging (#7).
|
||||
|
||||
**Files to change**: `engines/transit.md`
|
||||
|
||||
### #35 — No re-encryption support for user key rotation
|
||||
|
||||
**Fix**: Add a `re-encrypt` operation:
|
||||
|
||||
- Auth: User (self) — only the envelope recipient can re-encrypt.
|
||||
- Input: old envelope.
|
||||
- Flow: decrypt with current key, generate new DEK, re-encrypt, return new
|
||||
envelope.
|
||||
- The old key must still be valid at the time of re-encryption. Document the
|
||||
workflow: re-encrypt all stored envelopes, then rotate-key.
|
||||
|
||||
This is a quality-of-life improvement, not a security fix. The current design
|
||||
(decrypt + encrypt separately) works but requires the caller to handle
|
||||
plaintext.
|
||||
|
||||
**Files to change**: `engines/user.md`
|
||||
|
||||
### #36 — `UserKeyConfig` type undefined
|
||||
|
||||
**Fix**: Add the type definition to the in-memory state section:
|
||||
|
||||
```go
|
||||
type UserKeyConfig struct {
|
||||
Algorithm string `json:"algorithm"` // key exchange algorithm used
|
||||
CreatedAt time.Time `json:"created_at"`
|
||||
AutoProvisioned bool `json:"auto_provisioned"` // created via auto-provision
|
||||
// SwapMEK updates the in-memory MEK after a committed transaction.
|
||||
func (b *AESGCMBarrier) SwapMEK(newMEK []byte) {
|
||||
b.mu.Lock()
|
||||
defer b.mu.Unlock()
|
||||
mcrypto.Zeroize(b.mek)
|
||||
b.mek = newMEK
|
||||
}
|
||||
```
|
||||
|
||||
**Files to change**: `engines/user.md`
|
||||
Then in `RotateMEK`:
|
||||
|
||||
### #38 — `ZeroizeKey` prerequisite not cross-referenced
|
||||
```go
|
||||
func (m *Manager) RotateMEK(ctx context.Context, password string) error {
|
||||
// ... derive KWK, generate newMEK ...
|
||||
|
||||
**Fix**: Add to the Implementation Steps section in both `engines/transit.md`
|
||||
and `engines/user.md`:
|
||||
tx, err := m.db.BeginTx(ctx, nil)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer tx.Rollback()
|
||||
|
||||
> **Prerequisite**: `engine.ZeroizeKey` must exist in
|
||||
> `internal/engine/helpers.go` (created as part of the SSH CA engine
|
||||
> implementation — see `engines/sshca.md` step 1).
|
||||
// Re-wrap all DEKs within the transaction.
|
||||
if err := m.barrier.ReWrapKeysTx(ctx, tx, newMEK); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
**Files to change**: `engines/transit.md`, `engines/user.md`
|
||||
// Update seal_config within the same transaction.
|
||||
encNewMEK, err := crypto.Encrypt(kwk, newMEK, nil)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if _, err := tx.ExecContext(ctx,
|
||||
"UPDATE seal_config SET encrypted_mek = ? WHERE id = 1",
|
||||
encNewMEK,
|
||||
); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err := tx.Commit(); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Only after commit: update in-memory state.
|
||||
m.barrier.SwapMEK(newMEK)
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
SQLite in WAL mode handles this correctly — the transaction is atomic
|
||||
regardless of process crash. The `barrier_keys` and `seal_config` updates
|
||||
either both commit or neither does.
|
||||
|
||||
### Files
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/barrier/barrier.go` | Extend RLock scope in Get/Put/Delete/List; add `ReWrapKeysTx`, `SwapMEK` |
|
||||
| `internal/seal/seal.go` | Wrap ReWrapKeysTx + seal_config UPDATE in single transaction |
|
||||
| `internal/barrier/barrier_test.go` | Add concurrent Get/Seal stress test |
|
||||
|
||||
### Verification
|
||||
|
||||
- `go test -race ./internal/barrier/ ./internal/seal/`
|
||||
- Add `TestConcurrentGetSeal`: spawn goroutines doing Get while another
|
||||
goroutine calls Seal. Run with `-race`. Verify no panics or data races.
|
||||
- Add `TestRotateMEKAtomic`: verify that `barrier_keys` and `seal_config`
|
||||
are updated in the same transaction (mock the DB to detect transaction
|
||||
boundaries, or verify via rollback behavior).
|
||||
|
||||
---
|
||||
|
||||
## Work Item 4: CA TTL Enforcement, User Engine Fixes, Policy Bypass (#49, #61, #62, #69)
|
||||
|
||||
These four findings touch separate files with no overlap and can be
|
||||
addressed in parallel.
|
||||
|
||||
### #49 — No TTL Ceiling in CA Certificate Issuance
|
||||
|
||||
**Risk**: A non-admin user can request an arbitrarily long certificate
|
||||
lifetime. The issuer's `MaxTTL` exists in config but is not enforced
|
||||
during `handleIssue` or `handleSignCSR`.
|
||||
|
||||
**Root cause**: The CA engine applies the user's requested TTL directly
|
||||
to the certificate without comparing it against `issuerConfig.MaxTTL`.
|
||||
The SSH CA engine correctly enforces this via `resolveTTL` — the CA
|
||||
engine does not.
|
||||
|
||||
**Fix**: Add a `resolveTTL` method to the CA engine, following the SSH
|
||||
CA engine's pattern (`sshca.go` lines 902–932):
|
||||
|
||||
```go
|
||||
func (e *CAEngine) resolveTTL(requested string, issuer *issuerState) (time.Duration, error) {
|
||||
maxTTL, err := time.ParseDuration(issuer.config.MaxTTL)
|
||||
if err != nil {
|
||||
maxTTL = 2160 * time.Hour // 90 days fallback
|
||||
}
|
||||
|
||||
if requested != "" {
|
||||
ttl, err := time.ParseDuration(requested)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("invalid TTL: %w", err)
|
||||
}
|
||||
if ttl > maxTTL {
|
||||
return 0, fmt.Errorf("requested TTL %s exceeds issuer maximum %s", ttl, maxTTL)
|
||||
}
|
||||
return ttl, nil
|
||||
}
|
||||
|
||||
return maxTTL, nil
|
||||
}
|
||||
```
|
||||
|
||||
Call this in `handleIssue` and `handleSignCSR` before constructing the
|
||||
certificate. Replace the raw TTL string with the validated duration.
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/engine/ca/ca.go` | Add `resolveTTL`, call from `handleIssue` and `handleSignCSR` |
|
||||
| `internal/engine/ca/ca_test.go` | Add test: issue cert with TTL > MaxTTL, verify rejection |
|
||||
|
||||
### #61 — Ineffective ECDH Key Zeroization
|
||||
|
||||
**Risk**: `privKey.Bytes()` returns a copy of the private key bytes.
|
||||
Zeroizing the copy leaves the original inside `*ecdh.PrivateKey`. Go's
|
||||
`crypto/ecdh` API does not expose the internal byte slice.
|
||||
|
||||
**Root cause**: Language/API limitation in Go's `crypto/ecdh` package.
|
||||
|
||||
**Fix**: Store the raw private key bytes alongside the parsed key in
|
||||
`userState`, and zeroize those bytes on seal:
|
||||
|
||||
```go
|
||||
type userState struct {
|
||||
privKey *ecdh.PrivateKey
|
||||
privBytes []byte // raw key bytes, retained for zeroization
|
||||
pubKey *ecdh.PublicKey
|
||||
config *UserKeyConfig
|
||||
}
|
||||
```
|
||||
|
||||
On **load from barrier** (Unseal, auto-provision):
|
||||
```go
|
||||
raw, err := b.Get(ctx, prefix+"priv.key")
|
||||
priv, err := curve.NewPrivateKey(raw)
|
||||
state.privBytes = raw // retain for zeroization
|
||||
state.privKey = priv
|
||||
```
|
||||
|
||||
On **Seal**:
|
||||
```go
|
||||
mcrypto.Zeroize(u.privBytes)
|
||||
u.privKey = nil
|
||||
u.privBytes = nil
|
||||
```
|
||||
|
||||
Document the limitation: the parsed `*ecdh.PrivateKey` struct's internal
|
||||
copy cannot be zeroized from Go code. Setting `privKey = nil` makes it
|
||||
eligible for GC, but does not guarantee immediate byte overwrite. This is
|
||||
an accepted Go runtime limitation.
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/engine/user/user.go` | Add `privBytes` to `userState`, populate on load, zeroize on Seal |
|
||||
| `internal/engine/user/types.go` | Update `userState` struct |
|
||||
|
||||
### #62 — User Engine Policy Path Uses `mountPath` Instead of Mount Name
|
||||
|
||||
**Risk**: Policy checks construct the resource path using `e.mountPath`
|
||||
(which is `engine/user/{name}/`) instead of just the mount name. Policy
|
||||
rules match against `user/{name}/recipient/{username}`, so the full mount
|
||||
path creates a mismatch like `user/engine/user/myengine//recipient/alice`.
|
||||
No policy rule will ever match.
|
||||
|
||||
**Root cause**: Line 358 of `user.go` uses `e.mountPath` directly. The
|
||||
SSH CA and transit engines correctly use a `mountName()` helper.
|
||||
|
||||
**Fix**: Add a `mountName()` method to the user engine:
|
||||
|
||||
```go
|
||||
func (e *UserEngine) mountName() string {
|
||||
// mountPath is "engine/user/{name}/"
|
||||
parts := strings.Split(strings.TrimSuffix(e.mountPath, "/"), "/")
|
||||
if len(parts) >= 3 {
|
||||
return parts[2]
|
||||
}
|
||||
return e.mountPath
|
||||
}
|
||||
```
|
||||
|
||||
Change line 358:
|
||||
|
||||
```go
|
||||
resource := fmt.Sprintf("user/%s/recipient/%s", e.mountName(), r)
|
||||
```
|
||||
|
||||
Audit all other resource path constructions in the user engine to confirm
|
||||
they also use the correct mount name.
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/engine/user/user.go` | Add `mountName()`, fix resource path on line 358 |
|
||||
| `internal/engine/user/user_test.go` | Add test: verify policy resource path format |
|
||||
|
||||
### #69 — Typed REST Handlers Bypass Policy Engine
|
||||
|
||||
**Risk**: 18 typed REST handlers pass `nil` for `CheckPolicy` in the
|
||||
`engine.Request`, skipping service-level policy evaluation. The generic
|
||||
`/v1/engine/request` endpoint correctly passes a `policyChecker`. Since
|
||||
engines #54 and #58 default to allow when no policy matches, typed routes
|
||||
are effectively unprotected by policy.
|
||||
|
||||
**Root cause**: Typed handlers were modeled after admin-only operations
|
||||
(which don't need policy) but applied to user-accessible operations.
|
||||
|
||||
**Fix**: Extract the policy checker construction from
|
||||
`handleEngineRequest` into a shared helper:
|
||||
|
||||
```go
|
||||
func (s *Server) newPolicyChecker(info *CallerInfo) engine.PolicyChecker {
|
||||
return func(resource, action string) (string, bool) {
|
||||
effect, matched, err := s.policy.Check(
|
||||
info.Username, info.Roles, resource, action,
|
||||
)
|
||||
if err != nil || !matched {
|
||||
return "deny", false
|
||||
}
|
||||
return effect, matched
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then in each typed handler, set `CheckPolicy` on the request:
|
||||
|
||||
```go
|
||||
req := &engine.Request{
|
||||
Operation: "get-cert",
|
||||
Data: data,
|
||||
CallerInfo: callerInfo,
|
||||
CheckPolicy: s.newPolicyChecker(callerInfo),
|
||||
}
|
||||
```
|
||||
|
||||
**18 handlers to update**:
|
||||
|
||||
| Handler | Operation |
|
||||
|---------|-----------|
|
||||
| `handleGetCert` | `get-cert` |
|
||||
| `handleRevokeCert` | `revoke-cert` |
|
||||
| `handleDeleteCert` | `delete-cert` |
|
||||
| `handleSSHCASignHost` | `sign-host` |
|
||||
| `handleSSHCASignUser` | `sign-user` |
|
||||
| `handleSSHCAGetProfile` | `get-profile` |
|
||||
| `handleSSHCAListProfiles` | `list-profiles` |
|
||||
| `handleSSHCADeleteProfile` | `delete-profile` |
|
||||
| `handleSSHCAGetCert` | `get-cert` |
|
||||
| `handleSSHCAListCerts` | `list-certs` |
|
||||
| `handleSSHCARevokeCert` | `revoke-cert` |
|
||||
| `handleSSHCADeleteCert` | `delete-cert` |
|
||||
| `handleUserRegister` | `register` |
|
||||
| `handleUserProvision` | `provision` |
|
||||
| `handleUserListUsers` | `list-users` |
|
||||
| `handleUserGetPublicKey` | `get-public-key` |
|
||||
| `handleUserDeleteUser` | `delete-user` |
|
||||
| `handleUserDecrypt` | `decrypt` |
|
||||
|
||||
Note: `handleUserEncrypt` already passes a policy checker — verify it
|
||||
uses the same shared helper after refactoring. Admin-only handlers
|
||||
(behind `requireAdmin` wrapper) do not need a policy checker since admin
|
||||
bypasses policy.
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `internal/server/routes.go` | Add `newPolicyChecker`, pass to all 18 typed handlers |
|
||||
| `internal/server/server_test.go` | Add test: policy-denied user is rejected by typed route |
|
||||
|
||||
### Verification (Work Item 4, all findings)
|
||||
|
||||
```bash
|
||||
go test ./internal/engine/ca/
|
||||
go test ./internal/engine/user/
|
||||
go test ./internal/server/
|
||||
go vet ./...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
The remediation items should be implemented in this order to respect
|
||||
dependencies:
|
||||
```
|
||||
1. #68 JSON injection (standalone, ship immediately)
|
||||
2. #48 Path traversal (standalone, blocks safe engine operation)
|
||||
3. #39 Barrier TOCTOU race ─┐
|
||||
#40 ReWrapKeys crash safety ┘ (coupled, requires careful testing)
|
||||
4. #49 CA TTL enforcement ─┐
|
||||
#61 ECDH zeroization │
|
||||
#62 User policy path │ (independent fixes, parallelizable)
|
||||
#69 Policy bypass ─┘
|
||||
```
|
||||
|
||||
1. **#37** — `adminOnlyOperations` qualification (critical, blocks user engine
|
||||
`rotate-key`). This is a code change to `internal/server/routes.go` plus
|
||||
spec updates. Do first because it affects all engine implementations.
|
||||
Items 1 and 2 have no dependencies and can be done in parallel by
|
||||
different engineers.
|
||||
|
||||
2. **#28, #29, #30, #31, #32** — Transit spec fixes (can be done as a single
|
||||
spec update pass).
|
||||
Items 3 and 4 can also be done in parallel since they touch different
|
||||
subsystems (barrier/seal vs engines/server).
|
||||
|
||||
3. **#25, #26, #27** — SSH CA spec fixes (single spec update pass).
|
||||
---
|
||||
|
||||
4. **#33, #34, #35, #36** — User spec fixes (single spec update pass).
|
||||
## Post-Remediation
|
||||
|
||||
5. **#38** — Cross-reference update (trivial, do with transit and user spec
|
||||
fixes).
|
||||
After all eight findings are resolved:
|
||||
|
||||
Items within the same group are independent and can be done in parallel.
|
||||
1. **Update AUDIT.md** — mark #39, #40, #48, #49, #61, #62, #68, #69 as
|
||||
RESOLVED with resolution summaries.
|
||||
2. **Run the full pipeline**: `make all` (vet, lint, test, build).
|
||||
3. **Run race detector**: `go test -race ./...`
|
||||
4. **Address related medium findings** that interact with these fixes:
|
||||
- #54 (SSH CA default-allow) and #58 (transit default-allow) — once
|
||||
#69 is fixed, the typed handlers will pass policy checkers to the
|
||||
engines, but the engines still default-allow when `CheckPolicy`
|
||||
returns no match. Consider changing the engine-level default to deny
|
||||
for non-admin callers.
|
||||
- #72 (policy ID path traversal) — already covered by #48's
|
||||
`ValidateName` fix on `CreateRule`.
|
||||
|
||||
Reference in New Issue
Block a user