Fix ECDH zeroization, add audit logging, and remediate high findings
- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored privBytes instead of calling Bytes() (which returns a copy). New state populates privBytes; old references nil'd for GC. - Add audit logging subsystem (internal/audit) with structured event recording for cryptographic operations. - Add audit log engine spec (engines/auditlog.md). - Add ValidateName checks across all engines for path traversal (#48). - Update AUDIT.md: all High findings resolved (0 open). - Add REMEDIATION.md with detailed remediation tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
513
engines/auditlog.md
Normal file
513
engines/auditlog.md
Normal file
@@ -0,0 +1,513 @@
|
||||
# Audit Logging Design
|
||||
|
||||
## Overview
|
||||
|
||||
Metacrypt is a cryptographic service for a homelab/personal infrastructure
|
||||
platform. Audit logging gives the operator visibility into what happened,
|
||||
when, and by whom — essential for a service that issues certificates, signs
|
||||
SSH keys, and manages encryption keys, even at homelab scale.
|
||||
|
||||
The design prioritizes simplicity and operational clarity over enterprise
|
||||
features. There is one operator. There is no SIEM. The audit log should be
|
||||
a structured, append-only file that can be read with `jq`, tailed with
|
||||
`journalctl`, and rotated with `logrotate`. It should not require a
|
||||
database, a separate service, or additional infrastructure.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Record all security-relevant operations** — who did what, when, and
|
||||
whether it succeeded.
|
||||
2. **Separate audit events from operational logs** — operational logs
|
||||
(`slog.Info`) are for debugging; audit events are for accountability.
|
||||
3. **Zero additional dependencies** — use Go's `log/slog` with a dedicated
|
||||
handler writing to a file or stdout.
|
||||
4. **No performance overhead that matters at homelab scale** — synchronous
|
||||
writes are fine. This is not a high-throughput system.
|
||||
5. **Queryable with standard tools** — one JSON object per line, greppable,
|
||||
`jq`-friendly.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Tamper-evident chaining (hash chains, Merkle trees). The operator has
|
||||
root access to the machine; tamper evidence against the operator is
|
||||
theatre. If the threat model changes, this can be added later.
|
||||
- Remote log shipping. If needed, `journalctl` or `filebeat` can ship
|
||||
the file externally.
|
||||
- Log aggregation across services. Each Metacircular service logs
|
||||
independently.
|
||||
- Structured querying (SQL, full-text search). `jq` and `grep` are
|
||||
sufficient.
|
||||
|
||||
## Event Model
|
||||
|
||||
Every audit event is a single JSON line with these fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"time": "2026-03-17T04:15:42.577Z",
|
||||
"level": "AUDIT",
|
||||
"msg": "operation completed",
|
||||
"caller": "kyle",
|
||||
"roles": ["admin"],
|
||||
"operation": "issue",
|
||||
"engine": "ca",
|
||||
"mount": "pki",
|
||||
"resource": "ca/pki/id/example.com",
|
||||
"outcome": "success",
|
||||
"detail": {"serial": "01:02:03", "issuer": "default", "cn": "example.com"}
|
||||
}
|
||||
```
|
||||
|
||||
### Required Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `time` | RFC 3339 | When the event occurred |
|
||||
| `level` | string | Always `"AUDIT"` — distinguishes from operational logs |
|
||||
| `msg` | string | Human-readable summary |
|
||||
| `caller` | string | MCIAS username, or `"anonymous"` for unauthenticated ops |
|
||||
| `operation` | string | Engine operation name (e.g., `issue`, `sign-user`, `encrypt`) |
|
||||
| `outcome` | string | `"success"`, `"denied"`, or `"error"` |
|
||||
|
||||
### Optional Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `roles` | []string | Caller's MCIAS roles |
|
||||
| `engine` | string | Engine type (`ca`, `sshca`, `transit`, `user`) |
|
||||
| `mount` | string | Mount name |
|
||||
| `resource` | string | Policy resource path evaluated |
|
||||
| `detail` | object | Operation-specific metadata (see below) |
|
||||
| `error` | string | Error message on `"error"` or `"denied"` outcomes |
|
||||
|
||||
### Detail Fields by Operation Category
|
||||
|
||||
**Certificate operations** (CA):
|
||||
- `serial`, `issuer`, `cn`, `profile`, `ttl`
|
||||
|
||||
**SSH CA operations**:
|
||||
- `serial`, `cert_type` (`user`/`host`), `principals`, `profile`, `key_id`
|
||||
|
||||
**Transit operations**:
|
||||
- `key` (key name), `key_version`, `batch_size` (for batch ops)
|
||||
|
||||
**User E2E operations**:
|
||||
- `recipients` (list), `sender`
|
||||
|
||||
**Policy operations**:
|
||||
- `rule_id`, `effect`
|
||||
|
||||
**System operations** (seal/unseal/init):
|
||||
- No detail fields; the operation name is sufficient.
|
||||
|
||||
### What NOT to Log
|
||||
|
||||
- Plaintext, ciphertext, signatures, HMACs, envelopes, or any
|
||||
cryptographic material.
|
||||
- Private keys, public keys, or key bytes.
|
||||
- Passwords, tokens, or credentials.
|
||||
- Full request/response bodies.
|
||||
|
||||
The audit log records **what happened**, not **what the data was**.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Audit Logger
|
||||
|
||||
A thin wrapper around `slog.Logger` with a dedicated handler:
|
||||
|
||||
```go
|
||||
// Package audit provides structured audit event logging.
|
||||
package audit
|
||||
|
||||
import (
|
||||
"context"
|
||||
"log/slog"
|
||||
)
|
||||
|
||||
// Logger writes structured audit events.
|
||||
type Logger struct {
|
||||
logger *slog.Logger
|
||||
}
|
||||
|
||||
// New creates an audit logger that writes to the given handler.
|
||||
func New(h slog.Handler) *Logger {
|
||||
return &Logger{logger: slog.New(h)}
|
||||
}
|
||||
|
||||
// Event represents a single audit event.
|
||||
type Event struct {
|
||||
Caller string
|
||||
Roles []string
|
||||
Operation string
|
||||
Engine string
|
||||
Mount string
|
||||
Resource string
|
||||
Outcome string // "success", "denied", "error"
|
||||
Error string
|
||||
Detail map[string]interface{}
|
||||
}
|
||||
|
||||
// Log writes an audit event.
|
||||
func (l *Logger) Log(ctx context.Context, e Event) {
|
||||
attrs := []slog.Attr{
|
||||
slog.String("caller", e.Caller),
|
||||
slog.String("operation", e.Operation),
|
||||
slog.String("outcome", e.Outcome),
|
||||
}
|
||||
if len(e.Roles) > 0 {
|
||||
attrs = append(attrs, slog.Any("roles", e.Roles))
|
||||
}
|
||||
if e.Engine != "" {
|
||||
attrs = append(attrs, slog.String("engine", e.Engine))
|
||||
}
|
||||
if e.Mount != "" {
|
||||
attrs = append(attrs, slog.String("mount", e.Mount))
|
||||
}
|
||||
if e.Resource != "" {
|
||||
attrs = append(attrs, slog.String("resource", e.Resource))
|
||||
}
|
||||
if e.Error != "" {
|
||||
attrs = append(attrs, slog.String("error", e.Error))
|
||||
}
|
||||
if len(e.Detail) > 0 {
|
||||
attrs = append(attrs, slog.Any("detail", e.Detail))
|
||||
}
|
||||
|
||||
// Use a custom level that sorts above Info but is labelled "AUDIT".
|
||||
l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...)
|
||||
}
|
||||
|
||||
// LevelAudit is a custom slog level for audit events.
|
||||
const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+)
|
||||
```
|
||||
|
||||
The custom level ensures audit events are never suppressed by log level
|
||||
filtering (operators may set `level = "warn"` to quiet debug noise, but
|
||||
audit events must always be emitted).
|
||||
|
||||
### Output Configuration
|
||||
|
||||
Two modes, controlled by a config option:
|
||||
|
||||
```toml
|
||||
[audit]
|
||||
# "file" writes to a dedicated audit log file.
|
||||
# "stdout" writes to stdout alongside operational logs (for journalctl).
|
||||
# Empty string disables audit logging.
|
||||
mode = "file"
|
||||
path = "/srv/metacrypt/audit.log"
|
||||
```
|
||||
|
||||
**File mode**: Opens the file append-only with `0600` permissions. Uses
|
||||
`slog.NewJSONHandler` writing to the file. The file can be rotated with
|
||||
`logrotate` — the logger re-opens on the next write if the file is
|
||||
renamed/truncated. For simplicity, just write and let logrotate handle
|
||||
rotation; Go's `slog.JSONHandler` does not buffer.
|
||||
|
||||
**Stdout mode**: Uses `slog.NewJSONHandler` writing to `os.Stdout`. Events
|
||||
are interleaved with operational logs but distinguishable by the `"AUDIT"`
|
||||
level. Suitable for systemd/journalctl capture where all output goes to
|
||||
the journal.
|
||||
|
||||
**Disabled**: No audit logger is created. The `Logger` is nil-safe — all
|
||||
methods are no-ops on a nil receiver.
|
||||
|
||||
```go
|
||||
func (l *Logger) Log(ctx context.Context, e Event) {
|
||||
if l == nil {
|
||||
return
|
||||
}
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### Integration Points
|
||||
|
||||
The audit logger is created at startup and injected into the components
|
||||
that need it:
|
||||
|
||||
```
|
||||
cmd/metacrypt/server.go
|
||||
└── audit.New(handler)
|
||||
├── server.Server (REST handlers)
|
||||
├── grpcserver.GRPCServer (gRPC interceptor)
|
||||
├── seal.Manager (seal/unseal/init)
|
||||
└── policy.Engine (rule create/delete)
|
||||
```
|
||||
|
||||
Engine operations are logged at the **server layer** (REST handlers and
|
||||
gRPC interceptors), not inside the engines themselves. This keeps the
|
||||
engines focused on business logic and avoids threading the audit logger
|
||||
through every engine method.
|
||||
|
||||
### Instrumentation
|
||||
|
||||
#### REST API (`internal/server/`)
|
||||
|
||||
Instrument `handleEngineRequest` and every typed handler. The audit event
|
||||
is emitted **after** the operation completes (success or failure):
|
||||
|
||||
```go
|
||||
func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) {
|
||||
// ... existing handler logic ...
|
||||
|
||||
s.audit.Log(r.Context(), audit.Event{
|
||||
Caller: info.Username,
|
||||
Roles: info.Roles,
|
||||
Operation: "get-cert",
|
||||
Engine: "ca",
|
||||
Mount: mountName,
|
||||
Outcome: "success",
|
||||
Detail: map[string]interface{}{"serial": serial},
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
On error:
|
||||
|
||||
```go
|
||||
s.audit.Log(r.Context(), audit.Event{
|
||||
Caller: info.Username,
|
||||
Roles: info.Roles,
|
||||
Operation: "get-cert",
|
||||
Engine: "ca",
|
||||
Mount: mountName,
|
||||
Outcome: "error",
|
||||
Error: err.Error(),
|
||||
})
|
||||
```
|
||||
|
||||
To avoid duplicating this in every handler, use a helper:
|
||||
|
||||
```go
|
||||
func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo,
|
||||
op, engineType, mount, outcome string, detail map[string]interface{}, err error) {
|
||||
e := audit.Event{
|
||||
Caller: info.Username,
|
||||
Roles: info.Roles,
|
||||
Operation: op,
|
||||
Engine: engineType,
|
||||
Mount: mount,
|
||||
Outcome: outcome,
|
||||
Detail: detail,
|
||||
}
|
||||
if err != nil {
|
||||
e.Error = err.Error()
|
||||
}
|
||||
s.audit.Log(r.Context(), e)
|
||||
}
|
||||
```
|
||||
|
||||
#### gRPC API (`internal/grpcserver/`)
|
||||
|
||||
Add an audit interceptor that fires after each RPC completes. This is
|
||||
cleaner than instrumenting every handler individually:
|
||||
|
||||
```go
|
||||
func (g *GRPCServer) auditInterceptor(
|
||||
ctx context.Context,
|
||||
req interface{},
|
||||
info *grpc.UnaryServerInfo,
|
||||
handler grpc.UnaryHandler,
|
||||
) (interface{}, error) {
|
||||
resp, err := handler(ctx, req)
|
||||
|
||||
// Extract caller info from context (set by auth interceptor).
|
||||
caller := callerFromContext(ctx)
|
||||
|
||||
outcome := "success"
|
||||
if err != nil {
|
||||
outcome = "error"
|
||||
}
|
||||
|
||||
g.audit.Log(ctx, audit.Event{
|
||||
Caller: caller.Username,
|
||||
Roles: caller.Roles,
|
||||
Operation: path.Base(info.FullMethod), // e.g., "IssueCert"
|
||||
Resource: info.FullMethod,
|
||||
Outcome: outcome,
|
||||
Error: errString(err),
|
||||
})
|
||||
|
||||
return resp, err
|
||||
}
|
||||
```
|
||||
|
||||
Register this interceptor **after** the auth interceptor in the chain so
|
||||
that caller info is available.
|
||||
|
||||
#### Seal/Unseal (`internal/seal/`)
|
||||
|
||||
Instrument `Init`, `Unseal`, `Seal`, and `RotateMEK`:
|
||||
|
||||
```go
|
||||
// In Manager.Unseal, after success:
|
||||
m.audit.Log(ctx, audit.Event{
|
||||
Caller: "operator", // unseal is not authenticated
|
||||
Operation: "unseal",
|
||||
Outcome: "success",
|
||||
})
|
||||
|
||||
// On failure:
|
||||
m.audit.Log(ctx, audit.Event{
|
||||
Caller: "operator",
|
||||
Operation: "unseal",
|
||||
Outcome: "denied",
|
||||
Error: "invalid password",
|
||||
})
|
||||
```
|
||||
|
||||
#### Policy (`internal/policy/`)
|
||||
|
||||
Instrument `CreateRule` and `DeleteRule`:
|
||||
|
||||
```go
|
||||
// In Engine.CreateRule, after success:
|
||||
e.audit.Log(ctx, audit.Event{
|
||||
Caller: callerUsername, // passed from the handler
|
||||
Operation: "create-policy",
|
||||
Outcome: "success",
|
||||
Detail: map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect},
|
||||
})
|
||||
```
|
||||
|
||||
### Operations to Audit
|
||||
|
||||
| Category | Operations | Outcome on deny |
|
||||
|----------|------------|-----------------|
|
||||
| System | `init`, `unseal`, `seal`, `rotate-mek`, `rotate-key`, `migrate` | `denied` or `error` |
|
||||
| CA | `import-root`, `create-issuer`, `delete-issuer`, `issue`, `sign-csr`, `renew`, `revoke-cert`, `delete-cert` | `denied` |
|
||||
| SSH CA | `sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert` | `denied` |
|
||||
| Transit | `create-key`, `delete-key`, `rotate-key`, `update-key-config`, `trim-key`, `encrypt`, `decrypt`, `rewrap`, `sign`, `verify`, `hmac` | `denied` |
|
||||
| User | `register`, `provision`, `encrypt`, `decrypt`, `re-encrypt`, `rotate-key`, `delete-user` | `denied` |
|
||||
| Policy | `create-policy`, `delete-policy` | N/A (admin-only) |
|
||||
| Auth | `login` (success and failure) | `denied` |
|
||||
|
||||
**Read-only operations** (`get-cert`, `list-certs`, `get-profile`,
|
||||
`list-profiles`, `get-key`, `list-keys`, `list-users`, `get-public-key`,
|
||||
`status`) are **not audited** by default. They generate operational log
|
||||
entries via the existing HTTP/gRPC logging middleware but do not produce
|
||||
audit events. This keeps the audit log focused on state-changing operations.
|
||||
|
||||
If the operator wants read auditing, a config flag can enable it:
|
||||
|
||||
```toml
|
||||
[audit]
|
||||
include_reads = false # default
|
||||
```
|
||||
|
||||
## File Layout
|
||||
|
||||
```
|
||||
internal/
|
||||
audit/
|
||||
audit.go # Logger, Event, LevelAudit
|
||||
audit_test.go # Tests
|
||||
```
|
||||
|
||||
One file, one type, no interfaces. The audit logger is a concrete struct
|
||||
passed by pointer. Nil-safe for disabled mode.
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to `config.go`:
|
||||
|
||||
```go
|
||||
type AuditConfig struct {
|
||||
Mode string `toml:"mode"` // "file", "stdout", ""
|
||||
Path string `toml:"path"` // file path (mode=file)
|
||||
IncludeReads bool `toml:"include_reads"` // audit read operations
|
||||
}
|
||||
```
|
||||
|
||||
Add to example config:
|
||||
|
||||
```toml
|
||||
[audit]
|
||||
mode = "file"
|
||||
path = "/srv/metacrypt/audit.log"
|
||||
include_reads = false
|
||||
```
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
1. **Create `internal/audit/audit.go`** — `Logger`, `Event`, `LevelAudit`,
|
||||
`New(handler)`, nil-safe `Log` method.
|
||||
|
||||
2. **Add `AuditConfig` to config** — mode, path, include_reads. Validate
|
||||
that `path` is set when `mode = "file"`.
|
||||
|
||||
3. **Create audit logger in `cmd/metacrypt/server.go`** — based on config,
|
||||
open file or use stdout. Pass to Server, GRPCServer, SealManager,
|
||||
PolicyEngine.
|
||||
|
||||
4. **Add `audit *audit.Logger` field** to `Server`, `GRPCServer`,
|
||||
`seal.Manager`, `policy.Engine`. Update constructors.
|
||||
|
||||
5. **Instrument REST handlers** — add `auditEngineOp` helper to `Server`.
|
||||
Call after every mutating operation in typed handlers and
|
||||
`handleEngineRequest`.
|
||||
|
||||
6. **Instrument gRPC** — add audit interceptor to the interceptor chain.
|
||||
|
||||
7. **Instrument seal/unseal** — emit events in `Init`, `Unseal`, `Seal`,
|
||||
`RotateMEK`.
|
||||
|
||||
8. **Instrument policy** — emit events in `CreateRule`, `DeleteRule`.
|
||||
|
||||
9. **Instrument login** — emit events in the auth login handler (both
|
||||
REST and gRPC).
|
||||
|
||||
10. **Update ARCHITECTURE.md** — document audit logging in the Security
|
||||
Model section. Remove from Future Work.
|
||||
|
||||
11. **Update example configs** — add `[audit]` section.
|
||||
|
||||
12. **Add tests** — verify events are emitted for success, denied, and
|
||||
error outcomes. Verify nil logger is safe. Verify read operations are
|
||||
excluded by default.
|
||||
|
||||
## Querying the Audit Log
|
||||
|
||||
```bash
|
||||
# All events for a user:
|
||||
jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log
|
||||
|
||||
# All certificate issuances:
|
||||
jq 'select(.operation == "issue")' /srv/metacrypt/audit.log
|
||||
|
||||
# All denied operations:
|
||||
jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log
|
||||
|
||||
# All SSH CA events in the last hour:
|
||||
jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log
|
||||
|
||||
# Count operations by type:
|
||||
jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn
|
||||
|
||||
# Failed unseal attempts:
|
||||
jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log
|
||||
```
|
||||
|
||||
## Rotation
|
||||
|
||||
For file mode, use logrotate:
|
||||
|
||||
```
|
||||
/srv/metacrypt/audit.log {
|
||||
daily
|
||||
rotate 90
|
||||
compress
|
||||
delaycompress
|
||||
missingok
|
||||
notifempty
|
||||
copytruncate
|
||||
}
|
||||
```
|
||||
|
||||
`copytruncate` avoids the need for a signal-based reopen mechanism. The
|
||||
Go `slog.JSONHandler` writes are not buffered, so no data is lost.
|
||||
|
||||
At homelab scale with moderate usage, 90 days of uncompressed audit logs
|
||||
will be well under 100 MB.
|
||||
Reference in New Issue
Block a user