- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored privBytes instead of calling Bytes() (which returns a copy). New state populates privBytes; old references nil'd for GC. - Add audit logging subsystem (internal/audit) with structured event recording for cryptographic operations. - Add audit log engine spec (engines/auditlog.md). - Add ValidateName checks across all engines for path traversal (#48). - Update AUDIT.md: all High findings resolved (0 open). - Add REMEDIATION.md with detailed remediation tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
514 lines
15 KiB
Markdown
514 lines
15 KiB
Markdown
# Audit Logging Design
|
|
|
|
## Overview
|
|
|
|
Metacrypt is a cryptographic service for a homelab/personal infrastructure
|
|
platform. Audit logging gives the operator visibility into what happened,
|
|
when, and by whom — essential for a service that issues certificates, signs
|
|
SSH keys, and manages encryption keys, even at homelab scale.
|
|
|
|
The design prioritizes simplicity and operational clarity over enterprise
|
|
features. There is one operator. There is no SIEM. The audit log should be
|
|
a structured, append-only file that can be read with `jq`, tailed with
|
|
`journalctl`, and rotated with `logrotate`. It should not require a
|
|
database, a separate service, or additional infrastructure.
|
|
|
|
## Goals
|
|
|
|
1. **Record all security-relevant operations** — who did what, when, and
|
|
whether it succeeded.
|
|
2. **Separate audit events from operational logs** — operational logs
|
|
(`slog.Info`) are for debugging; audit events are for accountability.
|
|
3. **Zero additional dependencies** — use Go's `log/slog` with a dedicated
|
|
handler writing to a file or stdout.
|
|
4. **No performance overhead that matters at homelab scale** — synchronous
|
|
writes are fine. This is not a high-throughput system.
|
|
5. **Queryable with standard tools** — one JSON object per line, greppable,
|
|
`jq`-friendly.
|
|
|
|
## Non-Goals
|
|
|
|
- Tamper-evident chaining (hash chains, Merkle trees). The operator has
|
|
root access to the machine; tamper evidence against the operator is
|
|
theatre. If the threat model changes, this can be added later.
|
|
- Remote log shipping. If needed, `journalctl` or `filebeat` can ship
|
|
the file externally.
|
|
- Log aggregation across services. Each Metacircular service logs
|
|
independently.
|
|
- Structured querying (SQL, full-text search). `jq` and `grep` are
|
|
sufficient.
|
|
|
|
## Event Model
|
|
|
|
Every audit event is a single JSON line with these fields:
|
|
|
|
```json
|
|
{
|
|
"time": "2026-03-17T04:15:42.577Z",
|
|
"level": "AUDIT",
|
|
"msg": "operation completed",
|
|
"caller": "kyle",
|
|
"roles": ["admin"],
|
|
"operation": "issue",
|
|
"engine": "ca",
|
|
"mount": "pki",
|
|
"resource": "ca/pki/id/example.com",
|
|
"outcome": "success",
|
|
"detail": {"serial": "01:02:03", "issuer": "default", "cn": "example.com"}
|
|
}
|
|
```
|
|
|
|
### Required Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `time` | RFC 3339 | When the event occurred |
|
|
| `level` | string | Always `"AUDIT"` — distinguishes from operational logs |
|
|
| `msg` | string | Human-readable summary |
|
|
| `caller` | string | MCIAS username, or `"anonymous"` for unauthenticated ops |
|
|
| `operation` | string | Engine operation name (e.g., `issue`, `sign-user`, `encrypt`) |
|
|
| `outcome` | string | `"success"`, `"denied"`, or `"error"` |
|
|
|
|
### Optional Fields
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `roles` | []string | Caller's MCIAS roles |
|
|
| `engine` | string | Engine type (`ca`, `sshca`, `transit`, `user`) |
|
|
| `mount` | string | Mount name |
|
|
| `resource` | string | Policy resource path evaluated |
|
|
| `detail` | object | Operation-specific metadata (see below) |
|
|
| `error` | string | Error message on `"error"` or `"denied"` outcomes |
|
|
|
|
### Detail Fields by Operation Category
|
|
|
|
**Certificate operations** (CA):
|
|
- `serial`, `issuer`, `cn`, `profile`, `ttl`
|
|
|
|
**SSH CA operations**:
|
|
- `serial`, `cert_type` (`user`/`host`), `principals`, `profile`, `key_id`
|
|
|
|
**Transit operations**:
|
|
- `key` (key name), `key_version`, `batch_size` (for batch ops)
|
|
|
|
**User E2E operations**:
|
|
- `recipients` (list), `sender`
|
|
|
|
**Policy operations**:
|
|
- `rule_id`, `effect`
|
|
|
|
**System operations** (seal/unseal/init):
|
|
- No detail fields; the operation name is sufficient.
|
|
|
|
### What NOT to Log
|
|
|
|
- Plaintext, ciphertext, signatures, HMACs, envelopes, or any
|
|
cryptographic material.
|
|
- Private keys, public keys, or key bytes.
|
|
- Passwords, tokens, or credentials.
|
|
- Full request/response bodies.
|
|
|
|
The audit log records **what happened**, not **what the data was**.
|
|
|
|
## Architecture
|
|
|
|
### Audit Logger
|
|
|
|
A thin wrapper around `slog.Logger` with a dedicated handler:
|
|
|
|
```go
|
|
// Package audit provides structured audit event logging.
|
|
package audit
|
|
|
|
import (
|
|
"context"
|
|
"log/slog"
|
|
)
|
|
|
|
// Logger writes structured audit events.
|
|
type Logger struct {
|
|
logger *slog.Logger
|
|
}
|
|
|
|
// New creates an audit logger that writes to the given handler.
|
|
func New(h slog.Handler) *Logger {
|
|
return &Logger{logger: slog.New(h)}
|
|
}
|
|
|
|
// Event represents a single audit event.
|
|
type Event struct {
|
|
Caller string
|
|
Roles []string
|
|
Operation string
|
|
Engine string
|
|
Mount string
|
|
Resource string
|
|
Outcome string // "success", "denied", "error"
|
|
Error string
|
|
Detail map[string]interface{}
|
|
}
|
|
|
|
// Log writes an audit event.
|
|
func (l *Logger) Log(ctx context.Context, e Event) {
|
|
attrs := []slog.Attr{
|
|
slog.String("caller", e.Caller),
|
|
slog.String("operation", e.Operation),
|
|
slog.String("outcome", e.Outcome),
|
|
}
|
|
if len(e.Roles) > 0 {
|
|
attrs = append(attrs, slog.Any("roles", e.Roles))
|
|
}
|
|
if e.Engine != "" {
|
|
attrs = append(attrs, slog.String("engine", e.Engine))
|
|
}
|
|
if e.Mount != "" {
|
|
attrs = append(attrs, slog.String("mount", e.Mount))
|
|
}
|
|
if e.Resource != "" {
|
|
attrs = append(attrs, slog.String("resource", e.Resource))
|
|
}
|
|
if e.Error != "" {
|
|
attrs = append(attrs, slog.String("error", e.Error))
|
|
}
|
|
if len(e.Detail) > 0 {
|
|
attrs = append(attrs, slog.Any("detail", e.Detail))
|
|
}
|
|
|
|
// Use a custom level that sorts above Info but is labelled "AUDIT".
|
|
l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...)
|
|
}
|
|
|
|
// LevelAudit is a custom slog level for audit events.
|
|
const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+)
|
|
```
|
|
|
|
The custom level ensures audit events are never suppressed by log level
|
|
filtering (operators may set `level = "warn"` to quiet debug noise, but
|
|
audit events must always be emitted).
|
|
|
|
### Output Configuration
|
|
|
|
Two modes, controlled by a config option:
|
|
|
|
```toml
|
|
[audit]
|
|
# "file" writes to a dedicated audit log file.
|
|
# "stdout" writes to stdout alongside operational logs (for journalctl).
|
|
# Empty string disables audit logging.
|
|
mode = "file"
|
|
path = "/srv/metacrypt/audit.log"
|
|
```
|
|
|
|
**File mode**: Opens the file append-only with `0600` permissions. Uses
|
|
`slog.NewJSONHandler` writing to the file. The file can be rotated with
|
|
`logrotate` — the logger re-opens on the next write if the file is
|
|
renamed/truncated. For simplicity, just write and let logrotate handle
|
|
rotation; Go's `slog.JSONHandler` does not buffer.
|
|
|
|
**Stdout mode**: Uses `slog.NewJSONHandler` writing to `os.Stdout`. Events
|
|
are interleaved with operational logs but distinguishable by the `"AUDIT"`
|
|
level. Suitable for systemd/journalctl capture where all output goes to
|
|
the journal.
|
|
|
|
**Disabled**: No audit logger is created. The `Logger` is nil-safe — all
|
|
methods are no-ops on a nil receiver.
|
|
|
|
```go
|
|
func (l *Logger) Log(ctx context.Context, e Event) {
|
|
if l == nil {
|
|
return
|
|
}
|
|
// ...
|
|
}
|
|
```
|
|
|
|
### Integration Points
|
|
|
|
The audit logger is created at startup and injected into the components
|
|
that need it:
|
|
|
|
```
|
|
cmd/metacrypt/server.go
|
|
└── audit.New(handler)
|
|
├── server.Server (REST handlers)
|
|
├── grpcserver.GRPCServer (gRPC interceptor)
|
|
├── seal.Manager (seal/unseal/init)
|
|
└── policy.Engine (rule create/delete)
|
|
```
|
|
|
|
Engine operations are logged at the **server layer** (REST handlers and
|
|
gRPC interceptors), not inside the engines themselves. This keeps the
|
|
engines focused on business logic and avoids threading the audit logger
|
|
through every engine method.
|
|
|
|
### Instrumentation
|
|
|
|
#### REST API (`internal/server/`)
|
|
|
|
Instrument `handleEngineRequest` and every typed handler. The audit event
|
|
is emitted **after** the operation completes (success or failure):
|
|
|
|
```go
|
|
func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) {
|
|
// ... existing handler logic ...
|
|
|
|
s.audit.Log(r.Context(), audit.Event{
|
|
Caller: info.Username,
|
|
Roles: info.Roles,
|
|
Operation: "get-cert",
|
|
Engine: "ca",
|
|
Mount: mountName,
|
|
Outcome: "success",
|
|
Detail: map[string]interface{}{"serial": serial},
|
|
})
|
|
}
|
|
```
|
|
|
|
On error:
|
|
|
|
```go
|
|
s.audit.Log(r.Context(), audit.Event{
|
|
Caller: info.Username,
|
|
Roles: info.Roles,
|
|
Operation: "get-cert",
|
|
Engine: "ca",
|
|
Mount: mountName,
|
|
Outcome: "error",
|
|
Error: err.Error(),
|
|
})
|
|
```
|
|
|
|
To avoid duplicating this in every handler, use a helper:
|
|
|
|
```go
|
|
func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo,
|
|
op, engineType, mount, outcome string, detail map[string]interface{}, err error) {
|
|
e := audit.Event{
|
|
Caller: info.Username,
|
|
Roles: info.Roles,
|
|
Operation: op,
|
|
Engine: engineType,
|
|
Mount: mount,
|
|
Outcome: outcome,
|
|
Detail: detail,
|
|
}
|
|
if err != nil {
|
|
e.Error = err.Error()
|
|
}
|
|
s.audit.Log(r.Context(), e)
|
|
}
|
|
```
|
|
|
|
#### gRPC API (`internal/grpcserver/`)
|
|
|
|
Add an audit interceptor that fires after each RPC completes. This is
|
|
cleaner than instrumenting every handler individually:
|
|
|
|
```go
|
|
func (g *GRPCServer) auditInterceptor(
|
|
ctx context.Context,
|
|
req interface{},
|
|
info *grpc.UnaryServerInfo,
|
|
handler grpc.UnaryHandler,
|
|
) (interface{}, error) {
|
|
resp, err := handler(ctx, req)
|
|
|
|
// Extract caller info from context (set by auth interceptor).
|
|
caller := callerFromContext(ctx)
|
|
|
|
outcome := "success"
|
|
if err != nil {
|
|
outcome = "error"
|
|
}
|
|
|
|
g.audit.Log(ctx, audit.Event{
|
|
Caller: caller.Username,
|
|
Roles: caller.Roles,
|
|
Operation: path.Base(info.FullMethod), // e.g., "IssueCert"
|
|
Resource: info.FullMethod,
|
|
Outcome: outcome,
|
|
Error: errString(err),
|
|
})
|
|
|
|
return resp, err
|
|
}
|
|
```
|
|
|
|
Register this interceptor **after** the auth interceptor in the chain so
|
|
that caller info is available.
|
|
|
|
#### Seal/Unseal (`internal/seal/`)
|
|
|
|
Instrument `Init`, `Unseal`, `Seal`, and `RotateMEK`:
|
|
|
|
```go
|
|
// In Manager.Unseal, after success:
|
|
m.audit.Log(ctx, audit.Event{
|
|
Caller: "operator", // unseal is not authenticated
|
|
Operation: "unseal",
|
|
Outcome: "success",
|
|
})
|
|
|
|
// On failure:
|
|
m.audit.Log(ctx, audit.Event{
|
|
Caller: "operator",
|
|
Operation: "unseal",
|
|
Outcome: "denied",
|
|
Error: "invalid password",
|
|
})
|
|
```
|
|
|
|
#### Policy (`internal/policy/`)
|
|
|
|
Instrument `CreateRule` and `DeleteRule`:
|
|
|
|
```go
|
|
// In Engine.CreateRule, after success:
|
|
e.audit.Log(ctx, audit.Event{
|
|
Caller: callerUsername, // passed from the handler
|
|
Operation: "create-policy",
|
|
Outcome: "success",
|
|
Detail: map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect},
|
|
})
|
|
```
|
|
|
|
### Operations to Audit
|
|
|
|
| Category | Operations | Outcome on deny |
|
|
|----------|------------|-----------------|
|
|
| System | `init`, `unseal`, `seal`, `rotate-mek`, `rotate-key`, `migrate` | `denied` or `error` |
|
|
| CA | `import-root`, `create-issuer`, `delete-issuer`, `issue`, `sign-csr`, `renew`, `revoke-cert`, `delete-cert` | `denied` |
|
|
| SSH CA | `sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert` | `denied` |
|
|
| Transit | `create-key`, `delete-key`, `rotate-key`, `update-key-config`, `trim-key`, `encrypt`, `decrypt`, `rewrap`, `sign`, `verify`, `hmac` | `denied` |
|
|
| User | `register`, `provision`, `encrypt`, `decrypt`, `re-encrypt`, `rotate-key`, `delete-user` | `denied` |
|
|
| Policy | `create-policy`, `delete-policy` | N/A (admin-only) |
|
|
| Auth | `login` (success and failure) | `denied` |
|
|
|
|
**Read-only operations** (`get-cert`, `list-certs`, `get-profile`,
|
|
`list-profiles`, `get-key`, `list-keys`, `list-users`, `get-public-key`,
|
|
`status`) are **not audited** by default. They generate operational log
|
|
entries via the existing HTTP/gRPC logging middleware but do not produce
|
|
audit events. This keeps the audit log focused on state-changing operations.
|
|
|
|
If the operator wants read auditing, a config flag can enable it:
|
|
|
|
```toml
|
|
[audit]
|
|
include_reads = false # default
|
|
```
|
|
|
|
## File Layout
|
|
|
|
```
|
|
internal/
|
|
audit/
|
|
audit.go # Logger, Event, LevelAudit
|
|
audit_test.go # Tests
|
|
```
|
|
|
|
One file, one type, no interfaces. The audit logger is a concrete struct
|
|
passed by pointer. Nil-safe for disabled mode.
|
|
|
|
## Configuration
|
|
|
|
Add to `config.go`:
|
|
|
|
```go
|
|
type AuditConfig struct {
|
|
Mode string `toml:"mode"` // "file", "stdout", ""
|
|
Path string `toml:"path"` // file path (mode=file)
|
|
IncludeReads bool `toml:"include_reads"` // audit read operations
|
|
}
|
|
```
|
|
|
|
Add to example config:
|
|
|
|
```toml
|
|
[audit]
|
|
mode = "file"
|
|
path = "/srv/metacrypt/audit.log"
|
|
include_reads = false
|
|
```
|
|
|
|
## Implementation Steps
|
|
|
|
1. **Create `internal/audit/audit.go`** — `Logger`, `Event`, `LevelAudit`,
|
|
`New(handler)`, nil-safe `Log` method.
|
|
|
|
2. **Add `AuditConfig` to config** — mode, path, include_reads. Validate
|
|
that `path` is set when `mode = "file"`.
|
|
|
|
3. **Create audit logger in `cmd/metacrypt/server.go`** — based on config,
|
|
open file or use stdout. Pass to Server, GRPCServer, SealManager,
|
|
PolicyEngine.
|
|
|
|
4. **Add `audit *audit.Logger` field** to `Server`, `GRPCServer`,
|
|
`seal.Manager`, `policy.Engine`. Update constructors.
|
|
|
|
5. **Instrument REST handlers** — add `auditEngineOp` helper to `Server`.
|
|
Call after every mutating operation in typed handlers and
|
|
`handleEngineRequest`.
|
|
|
|
6. **Instrument gRPC** — add audit interceptor to the interceptor chain.
|
|
|
|
7. **Instrument seal/unseal** — emit events in `Init`, `Unseal`, `Seal`,
|
|
`RotateMEK`.
|
|
|
|
8. **Instrument policy** — emit events in `CreateRule`, `DeleteRule`.
|
|
|
|
9. **Instrument login** — emit events in the auth login handler (both
|
|
REST and gRPC).
|
|
|
|
10. **Update ARCHITECTURE.md** — document audit logging in the Security
|
|
Model section. Remove from Future Work.
|
|
|
|
11. **Update example configs** — add `[audit]` section.
|
|
|
|
12. **Add tests** — verify events are emitted for success, denied, and
|
|
error outcomes. Verify nil logger is safe. Verify read operations are
|
|
excluded by default.
|
|
|
|
## Querying the Audit Log
|
|
|
|
```bash
|
|
# All events for a user:
|
|
jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log
|
|
|
|
# All certificate issuances:
|
|
jq 'select(.operation == "issue")' /srv/metacrypt/audit.log
|
|
|
|
# All denied operations:
|
|
jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log
|
|
|
|
# All SSH CA events in the last hour:
|
|
jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log
|
|
|
|
# Count operations by type:
|
|
jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn
|
|
|
|
# Failed unseal attempts:
|
|
jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log
|
|
```
|
|
|
|
## Rotation
|
|
|
|
For file mode, use logrotate:
|
|
|
|
```
|
|
/srv/metacrypt/audit.log {
|
|
daily
|
|
rotate 90
|
|
compress
|
|
delaycompress
|
|
missingok
|
|
notifempty
|
|
copytruncate
|
|
}
|
|
```
|
|
|
|
`copytruncate` avoids the need for a signal-based reopen mechanism. The
|
|
Go `slog.JSONHandler` writes are not buffered, so no data is lost.
|
|
|
|
At homelab scale with moderate usage, 90 days of uncompressed audit logs
|
|
will be well under 100 MB.
|