Fix ECDH zeroization, add audit logging, and remediate high findings

- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored
  privBytes instead of calling Bytes() (which returns a copy). New
  state populates privBytes; old references nil'd for GC.
- Add audit logging subsystem (internal/audit) with structured event
  recording for cryptographic operations.
- Add audit log engine spec (engines/auditlog.md).
- Add ValidateName checks across all engines for path traversal (#48).
- Update AUDIT.md: all High findings resolved (0 open).
- Add REMEDIATION.md with detailed remediation tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-17 14:04:39 -07:00
parent b33d1f99a0
commit 5c5d7e184e
24 changed files with 1699 additions and 72 deletions

513
engines/auditlog.md Normal file
View File

@@ -0,0 +1,513 @@
# Audit Logging Design
## Overview
Metacrypt is a cryptographic service for a homelab/personal infrastructure
platform. Audit logging gives the operator visibility into what happened,
when, and by whom — essential for a service that issues certificates, signs
SSH keys, and manages encryption keys, even at homelab scale.
The design prioritizes simplicity and operational clarity over enterprise
features. There is one operator. There is no SIEM. The audit log should be
a structured, append-only file that can be read with `jq`, tailed with
`journalctl`, and rotated with `logrotate`. It should not require a
database, a separate service, or additional infrastructure.
## Goals
1. **Record all security-relevant operations** — who did what, when, and
whether it succeeded.
2. **Separate audit events from operational logs** — operational logs
(`slog.Info`) are for debugging; audit events are for accountability.
3. **Zero additional dependencies** — use Go's `log/slog` with a dedicated
handler writing to a file or stdout.
4. **No performance overhead that matters at homelab scale** — synchronous
writes are fine. This is not a high-throughput system.
5. **Queryable with standard tools** — one JSON object per line, greppable,
`jq`-friendly.
## Non-Goals
- Tamper-evident chaining (hash chains, Merkle trees). The operator has
root access to the machine; tamper evidence against the operator is
theatre. If the threat model changes, this can be added later.
- Remote log shipping. If needed, `journalctl` or `filebeat` can ship
the file externally.
- Log aggregation across services. Each Metacircular service logs
independently.
- Structured querying (SQL, full-text search). `jq` and `grep` are
sufficient.
## Event Model
Every audit event is a single JSON line with these fields:
```json
{
"time": "2026-03-17T04:15:42.577Z",
"level": "AUDIT",
"msg": "operation completed",
"caller": "kyle",
"roles": ["admin"],
"operation": "issue",
"engine": "ca",
"mount": "pki",
"resource": "ca/pki/id/example.com",
"outcome": "success",
"detail": {"serial": "01:02:03", "issuer": "default", "cn": "example.com"}
}
```
### Required Fields
| Field | Type | Description |
|-------|------|-------------|
| `time` | RFC 3339 | When the event occurred |
| `level` | string | Always `"AUDIT"` — distinguishes from operational logs |
| `msg` | string | Human-readable summary |
| `caller` | string | MCIAS username, or `"anonymous"` for unauthenticated ops |
| `operation` | string | Engine operation name (e.g., `issue`, `sign-user`, `encrypt`) |
| `outcome` | string | `"success"`, `"denied"`, or `"error"` |
### Optional Fields
| Field | Type | Description |
|-------|------|-------------|
| `roles` | []string | Caller's MCIAS roles |
| `engine` | string | Engine type (`ca`, `sshca`, `transit`, `user`) |
| `mount` | string | Mount name |
| `resource` | string | Policy resource path evaluated |
| `detail` | object | Operation-specific metadata (see below) |
| `error` | string | Error message on `"error"` or `"denied"` outcomes |
### Detail Fields by Operation Category
**Certificate operations** (CA):
- `serial`, `issuer`, `cn`, `profile`, `ttl`
**SSH CA operations**:
- `serial`, `cert_type` (`user`/`host`), `principals`, `profile`, `key_id`
**Transit operations**:
- `key` (key name), `key_version`, `batch_size` (for batch ops)
**User E2E operations**:
- `recipients` (list), `sender`
**Policy operations**:
- `rule_id`, `effect`
**System operations** (seal/unseal/init):
- No detail fields; the operation name is sufficient.
### What NOT to Log
- Plaintext, ciphertext, signatures, HMACs, envelopes, or any
cryptographic material.
- Private keys, public keys, or key bytes.
- Passwords, tokens, or credentials.
- Full request/response bodies.
The audit log records **what happened**, not **what the data was**.
## Architecture
### Audit Logger
A thin wrapper around `slog.Logger` with a dedicated handler:
```go
// Package audit provides structured audit event logging.
package audit
import (
"context"
"log/slog"
)
// Logger writes structured audit events.
type Logger struct {
logger *slog.Logger
}
// New creates an audit logger that writes to the given handler.
func New(h slog.Handler) *Logger {
return &Logger{logger: slog.New(h)}
}
// Event represents a single audit event.
type Event struct {
Caller string
Roles []string
Operation string
Engine string
Mount string
Resource string
Outcome string // "success", "denied", "error"
Error string
Detail map[string]interface{}
}
// Log writes an audit event.
func (l *Logger) Log(ctx context.Context, e Event) {
attrs := []slog.Attr{
slog.String("caller", e.Caller),
slog.String("operation", e.Operation),
slog.String("outcome", e.Outcome),
}
if len(e.Roles) > 0 {
attrs = append(attrs, slog.Any("roles", e.Roles))
}
if e.Engine != "" {
attrs = append(attrs, slog.String("engine", e.Engine))
}
if e.Mount != "" {
attrs = append(attrs, slog.String("mount", e.Mount))
}
if e.Resource != "" {
attrs = append(attrs, slog.String("resource", e.Resource))
}
if e.Error != "" {
attrs = append(attrs, slog.String("error", e.Error))
}
if len(e.Detail) > 0 {
attrs = append(attrs, slog.Any("detail", e.Detail))
}
// Use a custom level that sorts above Info but is labelled "AUDIT".
l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...)
}
// LevelAudit is a custom slog level for audit events.
const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+)
```
The custom level ensures audit events are never suppressed by log level
filtering (operators may set `level = "warn"` to quiet debug noise, but
audit events must always be emitted).
### Output Configuration
Two modes, controlled by a config option:
```toml
[audit]
# "file" writes to a dedicated audit log file.
# "stdout" writes to stdout alongside operational logs (for journalctl).
# Empty string disables audit logging.
mode = "file"
path = "/srv/metacrypt/audit.log"
```
**File mode**: Opens the file append-only with `0600` permissions. Uses
`slog.NewJSONHandler` writing to the file. The file can be rotated with
`logrotate` — the logger re-opens on the next write if the file is
renamed/truncated. For simplicity, just write and let logrotate handle
rotation; Go's `slog.JSONHandler` does not buffer.
**Stdout mode**: Uses `slog.NewJSONHandler` writing to `os.Stdout`. Events
are interleaved with operational logs but distinguishable by the `"AUDIT"`
level. Suitable for systemd/journalctl capture where all output goes to
the journal.
**Disabled**: No audit logger is created. The `Logger` is nil-safe — all
methods are no-ops on a nil receiver.
```go
func (l *Logger) Log(ctx context.Context, e Event) {
if l == nil {
return
}
// ...
}
```
### Integration Points
The audit logger is created at startup and injected into the components
that need it:
```
cmd/metacrypt/server.go
└── audit.New(handler)
├── server.Server (REST handlers)
├── grpcserver.GRPCServer (gRPC interceptor)
├── seal.Manager (seal/unseal/init)
└── policy.Engine (rule create/delete)
```
Engine operations are logged at the **server layer** (REST handlers and
gRPC interceptors), not inside the engines themselves. This keeps the
engines focused on business logic and avoids threading the audit logger
through every engine method.
### Instrumentation
#### REST API (`internal/server/`)
Instrument `handleEngineRequest` and every typed handler. The audit event
is emitted **after** the operation completes (success or failure):
```go
func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) {
// ... existing handler logic ...
s.audit.Log(r.Context(), audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: "get-cert",
Engine: "ca",
Mount: mountName,
Outcome: "success",
Detail: map[string]interface{}{"serial": serial},
})
}
```
On error:
```go
s.audit.Log(r.Context(), audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: "get-cert",
Engine: "ca",
Mount: mountName,
Outcome: "error",
Error: err.Error(),
})
```
To avoid duplicating this in every handler, use a helper:
```go
func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo,
op, engineType, mount, outcome string, detail map[string]interface{}, err error) {
e := audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: op,
Engine: engineType,
Mount: mount,
Outcome: outcome,
Detail: detail,
}
if err != nil {
e.Error = err.Error()
}
s.audit.Log(r.Context(), e)
}
```
#### gRPC API (`internal/grpcserver/`)
Add an audit interceptor that fires after each RPC completes. This is
cleaner than instrumenting every handler individually:
```go
func (g *GRPCServer) auditInterceptor(
ctx context.Context,
req interface{},
info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler,
) (interface{}, error) {
resp, err := handler(ctx, req)
// Extract caller info from context (set by auth interceptor).
caller := callerFromContext(ctx)
outcome := "success"
if err != nil {
outcome = "error"
}
g.audit.Log(ctx, audit.Event{
Caller: caller.Username,
Roles: caller.Roles,
Operation: path.Base(info.FullMethod), // e.g., "IssueCert"
Resource: info.FullMethod,
Outcome: outcome,
Error: errString(err),
})
return resp, err
}
```
Register this interceptor **after** the auth interceptor in the chain so
that caller info is available.
#### Seal/Unseal (`internal/seal/`)
Instrument `Init`, `Unseal`, `Seal`, and `RotateMEK`:
```go
// In Manager.Unseal, after success:
m.audit.Log(ctx, audit.Event{
Caller: "operator", // unseal is not authenticated
Operation: "unseal",
Outcome: "success",
})
// On failure:
m.audit.Log(ctx, audit.Event{
Caller: "operator",
Operation: "unseal",
Outcome: "denied",
Error: "invalid password",
})
```
#### Policy (`internal/policy/`)
Instrument `CreateRule` and `DeleteRule`:
```go
// In Engine.CreateRule, after success:
e.audit.Log(ctx, audit.Event{
Caller: callerUsername, // passed from the handler
Operation: "create-policy",
Outcome: "success",
Detail: map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect},
})
```
### Operations to Audit
| Category | Operations | Outcome on deny |
|----------|------------|-----------------|
| System | `init`, `unseal`, `seal`, `rotate-mek`, `rotate-key`, `migrate` | `denied` or `error` |
| CA | `import-root`, `create-issuer`, `delete-issuer`, `issue`, `sign-csr`, `renew`, `revoke-cert`, `delete-cert` | `denied` |
| SSH CA | `sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert` | `denied` |
| Transit | `create-key`, `delete-key`, `rotate-key`, `update-key-config`, `trim-key`, `encrypt`, `decrypt`, `rewrap`, `sign`, `verify`, `hmac` | `denied` |
| User | `register`, `provision`, `encrypt`, `decrypt`, `re-encrypt`, `rotate-key`, `delete-user` | `denied` |
| Policy | `create-policy`, `delete-policy` | N/A (admin-only) |
| Auth | `login` (success and failure) | `denied` |
**Read-only operations** (`get-cert`, `list-certs`, `get-profile`,
`list-profiles`, `get-key`, `list-keys`, `list-users`, `get-public-key`,
`status`) are **not audited** by default. They generate operational log
entries via the existing HTTP/gRPC logging middleware but do not produce
audit events. This keeps the audit log focused on state-changing operations.
If the operator wants read auditing, a config flag can enable it:
```toml
[audit]
include_reads = false # default
```
## File Layout
```
internal/
audit/
audit.go # Logger, Event, LevelAudit
audit_test.go # Tests
```
One file, one type, no interfaces. The audit logger is a concrete struct
passed by pointer. Nil-safe for disabled mode.
## Configuration
Add to `config.go`:
```go
type AuditConfig struct {
Mode string `toml:"mode"` // "file", "stdout", ""
Path string `toml:"path"` // file path (mode=file)
IncludeReads bool `toml:"include_reads"` // audit read operations
}
```
Add to example config:
```toml
[audit]
mode = "file"
path = "/srv/metacrypt/audit.log"
include_reads = false
```
## Implementation Steps
1. **Create `internal/audit/audit.go`**`Logger`, `Event`, `LevelAudit`,
`New(handler)`, nil-safe `Log` method.
2. **Add `AuditConfig` to config** — mode, path, include_reads. Validate
that `path` is set when `mode = "file"`.
3. **Create audit logger in `cmd/metacrypt/server.go`** — based on config,
open file or use stdout. Pass to Server, GRPCServer, SealManager,
PolicyEngine.
4. **Add `audit *audit.Logger` field** to `Server`, `GRPCServer`,
`seal.Manager`, `policy.Engine`. Update constructors.
5. **Instrument REST handlers** — add `auditEngineOp` helper to `Server`.
Call after every mutating operation in typed handlers and
`handleEngineRequest`.
6. **Instrument gRPC** — add audit interceptor to the interceptor chain.
7. **Instrument seal/unseal** — emit events in `Init`, `Unseal`, `Seal`,
`RotateMEK`.
8. **Instrument policy** — emit events in `CreateRule`, `DeleteRule`.
9. **Instrument login** — emit events in the auth login handler (both
REST and gRPC).
10. **Update ARCHITECTURE.md** — document audit logging in the Security
Model section. Remove from Future Work.
11. **Update example configs** — add `[audit]` section.
12. **Add tests** — verify events are emitted for success, denied, and
error outcomes. Verify nil logger is safe. Verify read operations are
excluded by default.
## Querying the Audit Log
```bash
# All events for a user:
jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log
# All certificate issuances:
jq 'select(.operation == "issue")' /srv/metacrypt/audit.log
# All denied operations:
jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log
# All SSH CA events in the last hour:
jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log
# Count operations by type:
jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn
# Failed unseal attempts:
jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log
```
## Rotation
For file mode, use logrotate:
```
/srv/metacrypt/audit.log {
daily
rotate 90
compress
delaycompress
missingok
notifempty
copytruncate
}
```
`copytruncate` avoids the need for a signal-based reopen mechanism. The
Go `slog.JSONHandler` writes are not buffered, so no data is lost.
At homelab scale with moderate usage, 90 days of uncompressed audit logs
will be well under 100 MB.