# Audit Logging Design ## Overview Metacrypt is a cryptographic service for a homelab/personal infrastructure platform. Audit logging gives the operator visibility into what happened, when, and by whom — essential for a service that issues certificates, signs SSH keys, and manages encryption keys, even at homelab scale. The design prioritizes simplicity and operational clarity over enterprise features. There is one operator. There is no SIEM. The audit log should be a structured, append-only file that can be read with `jq`, tailed with `journalctl`, and rotated with `logrotate`. It should not require a database, a separate service, or additional infrastructure. ## Goals 1. **Record all security-relevant operations** — who did what, when, and whether it succeeded. 2. **Separate audit events from operational logs** — operational logs (`slog.Info`) are for debugging; audit events are for accountability. 3. **Zero additional dependencies** — use Go's `log/slog` with a dedicated handler writing to a file or stdout. 4. **No performance overhead that matters at homelab scale** — synchronous writes are fine. This is not a high-throughput system. 5. **Queryable with standard tools** — one JSON object per line, greppable, `jq`-friendly. ## Non-Goals - Tamper-evident chaining (hash chains, Merkle trees). The operator has root access to the machine; tamper evidence against the operator is theatre. If the threat model changes, this can be added later. - Remote log shipping. If needed, `journalctl` or `filebeat` can ship the file externally. - Log aggregation across services. Each Metacircular service logs independently. - Structured querying (SQL, full-text search). `jq` and `grep` are sufficient. ## Event Model Every audit event is a single JSON line with these fields: ```json { "time": "2026-03-17T04:15:42.577Z", "level": "AUDIT", "msg": "operation completed", "caller": "kyle", "roles": ["admin"], "operation": "issue", "engine": "ca", "mount": "pki", "resource": "ca/pki/id/example.com", "outcome": "success", "detail": {"serial": "01:02:03", "issuer": "default", "cn": "example.com"} } ``` ### Required Fields | Field | Type | Description | |-------|------|-------------| | `time` | RFC 3339 | When the event occurred | | `level` | string | Always `"AUDIT"` — distinguishes from operational logs | | `msg` | string | Human-readable summary | | `caller` | string | MCIAS username, or `"anonymous"` for unauthenticated ops | | `operation` | string | Engine operation name (e.g., `issue`, `sign-user`, `encrypt`) | | `outcome` | string | `"success"`, `"denied"`, or `"error"` | ### Optional Fields | Field | Type | Description | |-------|------|-------------| | `roles` | []string | Caller's MCIAS roles | | `engine` | string | Engine type (`ca`, `sshca`, `transit`, `user`) | | `mount` | string | Mount name | | `resource` | string | Policy resource path evaluated | | `detail` | object | Operation-specific metadata (see below) | | `error` | string | Error message on `"error"` or `"denied"` outcomes | ### Detail Fields by Operation Category **Certificate operations** (CA): - `serial`, `issuer`, `cn`, `profile`, `ttl` **SSH CA operations**: - `serial`, `cert_type` (`user`/`host`), `principals`, `profile`, `key_id` **Transit operations**: - `key` (key name), `key_version`, `batch_size` (for batch ops) **User E2E operations**: - `recipients` (list), `sender` **Policy operations**: - `rule_id`, `effect` **System operations** (seal/unseal/init): - No detail fields; the operation name is sufficient. ### What NOT to Log - Plaintext, ciphertext, signatures, HMACs, envelopes, or any cryptographic material. - Private keys, public keys, or key bytes. - Passwords, tokens, or credentials. - Full request/response bodies. The audit log records **what happened**, not **what the data was**. ## Architecture ### Audit Logger A thin wrapper around `slog.Logger` with a dedicated handler: ```go // Package audit provides structured audit event logging. package audit import ( "context" "log/slog" ) // Logger writes structured audit events. type Logger struct { logger *slog.Logger } // New creates an audit logger that writes to the given handler. func New(h slog.Handler) *Logger { return &Logger{logger: slog.New(h)} } // Event represents a single audit event. type Event struct { Caller string Roles []string Operation string Engine string Mount string Resource string Outcome string // "success", "denied", "error" Error string Detail map[string]interface{} } // Log writes an audit event. func (l *Logger) Log(ctx context.Context, e Event) { attrs := []slog.Attr{ slog.String("caller", e.Caller), slog.String("operation", e.Operation), slog.String("outcome", e.Outcome), } if len(e.Roles) > 0 { attrs = append(attrs, slog.Any("roles", e.Roles)) } if e.Engine != "" { attrs = append(attrs, slog.String("engine", e.Engine)) } if e.Mount != "" { attrs = append(attrs, slog.String("mount", e.Mount)) } if e.Resource != "" { attrs = append(attrs, slog.String("resource", e.Resource)) } if e.Error != "" { attrs = append(attrs, slog.String("error", e.Error)) } if len(e.Detail) > 0 { attrs = append(attrs, slog.Any("detail", e.Detail)) } // Use a custom level that sorts above Info but is labelled "AUDIT". l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...) } // LevelAudit is a custom slog level for audit events. const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+) ``` The custom level ensures audit events are never suppressed by log level filtering (operators may set `level = "warn"` to quiet debug noise, but audit events must always be emitted). ### Output Configuration Two modes, controlled by a config option: ```toml [audit] # "file" writes to a dedicated audit log file. # "stdout" writes to stdout alongside operational logs (for journalctl). # Empty string disables audit logging. mode = "file" path = "/srv/metacrypt/audit.log" ``` **File mode**: Opens the file append-only with `0600` permissions. Uses `slog.NewJSONHandler` writing to the file. The file can be rotated with `logrotate` — the logger re-opens on the next write if the file is renamed/truncated. For simplicity, just write and let logrotate handle rotation; Go's `slog.JSONHandler` does not buffer. **Stdout mode**: Uses `slog.NewJSONHandler` writing to `os.Stdout`. Events are interleaved with operational logs but distinguishable by the `"AUDIT"` level. Suitable for systemd/journalctl capture where all output goes to the journal. **Disabled**: No audit logger is created. The `Logger` is nil-safe — all methods are no-ops on a nil receiver. ```go func (l *Logger) Log(ctx context.Context, e Event) { if l == nil { return } // ... } ``` ### Integration Points The audit logger is created at startup and injected into the components that need it: ``` cmd/metacrypt/server.go └── audit.New(handler) ├── server.Server (REST handlers) ├── grpcserver.GRPCServer (gRPC interceptor) ├── seal.Manager (seal/unseal/init) └── policy.Engine (rule create/delete) ``` Engine operations are logged at the **server layer** (REST handlers and gRPC interceptors), not inside the engines themselves. This keeps the engines focused on business logic and avoids threading the audit logger through every engine method. ### Instrumentation #### REST API (`internal/server/`) Instrument `handleEngineRequest` and every typed handler. The audit event is emitted **after** the operation completes (success or failure): ```go func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) { // ... existing handler logic ... s.audit.Log(r.Context(), audit.Event{ Caller: info.Username, Roles: info.Roles, Operation: "get-cert", Engine: "ca", Mount: mountName, Outcome: "success", Detail: map[string]interface{}{"serial": serial}, }) } ``` On error: ```go s.audit.Log(r.Context(), audit.Event{ Caller: info.Username, Roles: info.Roles, Operation: "get-cert", Engine: "ca", Mount: mountName, Outcome: "error", Error: err.Error(), }) ``` To avoid duplicating this in every handler, use a helper: ```go func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo, op, engineType, mount, outcome string, detail map[string]interface{}, err error) { e := audit.Event{ Caller: info.Username, Roles: info.Roles, Operation: op, Engine: engineType, Mount: mount, Outcome: outcome, Detail: detail, } if err != nil { e.Error = err.Error() } s.audit.Log(r.Context(), e) } ``` #### gRPC API (`internal/grpcserver/`) Add an audit interceptor that fires after each RPC completes. This is cleaner than instrumenting every handler individually: ```go func (g *GRPCServer) auditInterceptor( ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler, ) (interface{}, error) { resp, err := handler(ctx, req) // Extract caller info from context (set by auth interceptor). caller := callerFromContext(ctx) outcome := "success" if err != nil { outcome = "error" } g.audit.Log(ctx, audit.Event{ Caller: caller.Username, Roles: caller.Roles, Operation: path.Base(info.FullMethod), // e.g., "IssueCert" Resource: info.FullMethod, Outcome: outcome, Error: errString(err), }) return resp, err } ``` Register this interceptor **after** the auth interceptor in the chain so that caller info is available. #### Seal/Unseal (`internal/seal/`) Instrument `Init`, `Unseal`, `Seal`, and `RotateMEK`: ```go // In Manager.Unseal, after success: m.audit.Log(ctx, audit.Event{ Caller: "operator", // unseal is not authenticated Operation: "unseal", Outcome: "success", }) // On failure: m.audit.Log(ctx, audit.Event{ Caller: "operator", Operation: "unseal", Outcome: "denied", Error: "invalid password", }) ``` #### Policy (`internal/policy/`) Instrument `CreateRule` and `DeleteRule`: ```go // In Engine.CreateRule, after success: e.audit.Log(ctx, audit.Event{ Caller: callerUsername, // passed from the handler Operation: "create-policy", Outcome: "success", Detail: map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect}, }) ``` ### Operations to Audit | Category | Operations | Outcome on deny | |----------|------------|-----------------| | System | `init`, `unseal`, `seal`, `rotate-mek`, `rotate-key`, `migrate` | `denied` or `error` | | CA | `import-root`, `create-issuer`, `delete-issuer`, `issue`, `sign-csr`, `renew`, `revoke-cert`, `delete-cert` | `denied` | | SSH CA | `sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert` | `denied` | | Transit | `create-key`, `delete-key`, `rotate-key`, `update-key-config`, `trim-key`, `encrypt`, `decrypt`, `rewrap`, `sign`, `verify`, `hmac` | `denied` | | User | `register`, `provision`, `encrypt`, `decrypt`, `re-encrypt`, `rotate-key`, `delete-user` | `denied` | | Policy | `create-policy`, `delete-policy` | N/A (admin-only) | | Auth | `login` (success and failure) | `denied` | **Read-only operations** (`get-cert`, `list-certs`, `get-profile`, `list-profiles`, `get-key`, `list-keys`, `list-users`, `get-public-key`, `status`) are **not audited** by default. They generate operational log entries via the existing HTTP/gRPC logging middleware but do not produce audit events. This keeps the audit log focused on state-changing operations. If the operator wants read auditing, a config flag can enable it: ```toml [audit] include_reads = false # default ``` ## File Layout ``` internal/ audit/ audit.go # Logger, Event, LevelAudit audit_test.go # Tests ``` One file, one type, no interfaces. The audit logger is a concrete struct passed by pointer. Nil-safe for disabled mode. ## Configuration Add to `config.go`: ```go type AuditConfig struct { Mode string `toml:"mode"` // "file", "stdout", "" Path string `toml:"path"` // file path (mode=file) IncludeReads bool `toml:"include_reads"` // audit read operations } ``` Add to example config: ```toml [audit] mode = "file" path = "/srv/metacrypt/audit.log" include_reads = false ``` ## Implementation Steps 1. **Create `internal/audit/audit.go`** — `Logger`, `Event`, `LevelAudit`, `New(handler)`, nil-safe `Log` method. 2. **Add `AuditConfig` to config** — mode, path, include_reads. Validate that `path` is set when `mode = "file"`. 3. **Create audit logger in `cmd/metacrypt/server.go`** — based on config, open file or use stdout. Pass to Server, GRPCServer, SealManager, PolicyEngine. 4. **Add `audit *audit.Logger` field** to `Server`, `GRPCServer`, `seal.Manager`, `policy.Engine`. Update constructors. 5. **Instrument REST handlers** — add `auditEngineOp` helper to `Server`. Call after every mutating operation in typed handlers and `handleEngineRequest`. 6. **Instrument gRPC** — add audit interceptor to the interceptor chain. 7. **Instrument seal/unseal** — emit events in `Init`, `Unseal`, `Seal`, `RotateMEK`. 8. **Instrument policy** — emit events in `CreateRule`, `DeleteRule`. 9. **Instrument login** — emit events in the auth login handler (both REST and gRPC). 10. **Update ARCHITECTURE.md** — document audit logging in the Security Model section. Remove from Future Work. 11. **Update example configs** — add `[audit]` section. 12. **Add tests** — verify events are emitted for success, denied, and error outcomes. Verify nil logger is safe. Verify read operations are excluded by default. ## Querying the Audit Log ```bash # All events for a user: jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log # All certificate issuances: jq 'select(.operation == "issue")' /srv/metacrypt/audit.log # All denied operations: jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log # All SSH CA events in the last hour: jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log # Count operations by type: jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn # Failed unseal attempts: jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log ``` ## Rotation For file mode, use logrotate: ``` /srv/metacrypt/audit.log { daily rotate 90 compress delaycompress missingok notifempty copytruncate } ``` `copytruncate` avoids the need for a signal-based reopen mechanism. The Go `slog.JSONHandler` writes are not buffered, so no data is lost. At homelab scale with moderate usage, 90 days of uncompressed audit logs will be well under 100 MB.