- Fix #61: handleRotateKey and handleDeleteUser now zeroize stored privBytes instead of calling Bytes() (which returns a copy). New state populates privBytes; old references nil'd for GC. - Add audit logging subsystem (internal/audit) with structured event recording for cryptographic operations. - Add audit log engine spec (engines/auditlog.md). - Add ValidateName checks across all engines for path traversal (#48). - Update AUDIT.md: all High findings resolved (0 open). - Add REMEDIATION.md with detailed remediation tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 KiB
Audit Logging Design
Overview
Metacrypt is a cryptographic service for a homelab/personal infrastructure platform. Audit logging gives the operator visibility into what happened, when, and by whom — essential for a service that issues certificates, signs SSH keys, and manages encryption keys, even at homelab scale.
The design prioritizes simplicity and operational clarity over enterprise
features. There is one operator. There is no SIEM. The audit log should be
a structured, append-only file that can be read with jq, tailed with
journalctl, and rotated with logrotate. It should not require a
database, a separate service, or additional infrastructure.
Goals
- Record all security-relevant operations — who did what, when, and whether it succeeded.
- Separate audit events from operational logs — operational logs
(
slog.Info) are for debugging; audit events are for accountability. - Zero additional dependencies — use Go's
log/slogwith a dedicated handler writing to a file or stdout. - No performance overhead that matters at homelab scale — synchronous writes are fine. This is not a high-throughput system.
- Queryable with standard tools — one JSON object per line, greppable,
jq-friendly.
Non-Goals
- Tamper-evident chaining (hash chains, Merkle trees). The operator has root access to the machine; tamper evidence against the operator is theatre. If the threat model changes, this can be added later.
- Remote log shipping. If needed,
journalctlorfilebeatcan ship the file externally. - Log aggregation across services. Each Metacircular service logs independently.
- Structured querying (SQL, full-text search).
jqandgrepare sufficient.
Event Model
Every audit event is a single JSON line with these fields:
{
"time": "2026-03-17T04:15:42.577Z",
"level": "AUDIT",
"msg": "operation completed",
"caller": "kyle",
"roles": ["admin"],
"operation": "issue",
"engine": "ca",
"mount": "pki",
"resource": "ca/pki/id/example.com",
"outcome": "success",
"detail": {"serial": "01:02:03", "issuer": "default", "cn": "example.com"}
}
Required Fields
| Field | Type | Description |
|---|---|---|
time |
RFC 3339 | When the event occurred |
level |
string | Always "AUDIT" — distinguishes from operational logs |
msg |
string | Human-readable summary |
caller |
string | MCIAS username, or "anonymous" for unauthenticated ops |
operation |
string | Engine operation name (e.g., issue, sign-user, encrypt) |
outcome |
string | "success", "denied", or "error" |
Optional Fields
| Field | Type | Description |
|---|---|---|
roles |
[]string | Caller's MCIAS roles |
engine |
string | Engine type (ca, sshca, transit, user) |
mount |
string | Mount name |
resource |
string | Policy resource path evaluated |
detail |
object | Operation-specific metadata (see below) |
error |
string | Error message on "error" or "denied" outcomes |
Detail Fields by Operation Category
Certificate operations (CA):
serial,issuer,cn,profile,ttl
SSH CA operations:
serial,cert_type(user/host),principals,profile,key_id
Transit operations:
key(key name),key_version,batch_size(for batch ops)
User E2E operations:
recipients(list),sender
Policy operations:
rule_id,effect
System operations (seal/unseal/init):
- No detail fields; the operation name is sufficient.
What NOT to Log
- Plaintext, ciphertext, signatures, HMACs, envelopes, or any cryptographic material.
- Private keys, public keys, or key bytes.
- Passwords, tokens, or credentials.
- Full request/response bodies.
The audit log records what happened, not what the data was.
Architecture
Audit Logger
A thin wrapper around slog.Logger with a dedicated handler:
// Package audit provides structured audit event logging.
package audit
import (
"context"
"log/slog"
)
// Logger writes structured audit events.
type Logger struct {
logger *slog.Logger
}
// New creates an audit logger that writes to the given handler.
func New(h slog.Handler) *Logger {
return &Logger{logger: slog.New(h)}
}
// Event represents a single audit event.
type Event struct {
Caller string
Roles []string
Operation string
Engine string
Mount string
Resource string
Outcome string // "success", "denied", "error"
Error string
Detail map[string]interface{}
}
// Log writes an audit event.
func (l *Logger) Log(ctx context.Context, e Event) {
attrs := []slog.Attr{
slog.String("caller", e.Caller),
slog.String("operation", e.Operation),
slog.String("outcome", e.Outcome),
}
if len(e.Roles) > 0 {
attrs = append(attrs, slog.Any("roles", e.Roles))
}
if e.Engine != "" {
attrs = append(attrs, slog.String("engine", e.Engine))
}
if e.Mount != "" {
attrs = append(attrs, slog.String("mount", e.Mount))
}
if e.Resource != "" {
attrs = append(attrs, slog.String("resource", e.Resource))
}
if e.Error != "" {
attrs = append(attrs, slog.String("error", e.Error))
}
if len(e.Detail) > 0 {
attrs = append(attrs, slog.Any("detail", e.Detail))
}
// Use a custom level that sorts above Info but is labelled "AUDIT".
l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...)
}
// LevelAudit is a custom slog level for audit events.
const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+)
The custom level ensures audit events are never suppressed by log level
filtering (operators may set level = "warn" to quiet debug noise, but
audit events must always be emitted).
Output Configuration
Two modes, controlled by a config option:
[audit]
# "file" writes to a dedicated audit log file.
# "stdout" writes to stdout alongside operational logs (for journalctl).
# Empty string disables audit logging.
mode = "file"
path = "/srv/metacrypt/audit.log"
File mode: Opens the file append-only with 0600 permissions. Uses
slog.NewJSONHandler writing to the file. The file can be rotated with
logrotate — the logger re-opens on the next write if the file is
renamed/truncated. For simplicity, just write and let logrotate handle
rotation; Go's slog.JSONHandler does not buffer.
Stdout mode: Uses slog.NewJSONHandler writing to os.Stdout. Events
are interleaved with operational logs but distinguishable by the "AUDIT"
level. Suitable for systemd/journalctl capture where all output goes to
the journal.
Disabled: No audit logger is created. The Logger is nil-safe — all
methods are no-ops on a nil receiver.
func (l *Logger) Log(ctx context.Context, e Event) {
if l == nil {
return
}
// ...
}
Integration Points
The audit logger is created at startup and injected into the components that need it:
cmd/metacrypt/server.go
└── audit.New(handler)
├── server.Server (REST handlers)
├── grpcserver.GRPCServer (gRPC interceptor)
├── seal.Manager (seal/unseal/init)
└── policy.Engine (rule create/delete)
Engine operations are logged at the server layer (REST handlers and gRPC interceptors), not inside the engines themselves. This keeps the engines focused on business logic and avoids threading the audit logger through every engine method.
Instrumentation
REST API (internal/server/)
Instrument handleEngineRequest and every typed handler. The audit event
is emitted after the operation completes (success or failure):
func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) {
// ... existing handler logic ...
s.audit.Log(r.Context(), audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: "get-cert",
Engine: "ca",
Mount: mountName,
Outcome: "success",
Detail: map[string]interface{}{"serial": serial},
})
}
On error:
s.audit.Log(r.Context(), audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: "get-cert",
Engine: "ca",
Mount: mountName,
Outcome: "error",
Error: err.Error(),
})
To avoid duplicating this in every handler, use a helper:
func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo,
op, engineType, mount, outcome string, detail map[string]interface{}, err error) {
e := audit.Event{
Caller: info.Username,
Roles: info.Roles,
Operation: op,
Engine: engineType,
Mount: mount,
Outcome: outcome,
Detail: detail,
}
if err != nil {
e.Error = err.Error()
}
s.audit.Log(r.Context(), e)
}
gRPC API (internal/grpcserver/)
Add an audit interceptor that fires after each RPC completes. This is cleaner than instrumenting every handler individually:
func (g *GRPCServer) auditInterceptor(
ctx context.Context,
req interface{},
info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler,
) (interface{}, error) {
resp, err := handler(ctx, req)
// Extract caller info from context (set by auth interceptor).
caller := callerFromContext(ctx)
outcome := "success"
if err != nil {
outcome = "error"
}
g.audit.Log(ctx, audit.Event{
Caller: caller.Username,
Roles: caller.Roles,
Operation: path.Base(info.FullMethod), // e.g., "IssueCert"
Resource: info.FullMethod,
Outcome: outcome,
Error: errString(err),
})
return resp, err
}
Register this interceptor after the auth interceptor in the chain so that caller info is available.
Seal/Unseal (internal/seal/)
Instrument Init, Unseal, Seal, and RotateMEK:
// In Manager.Unseal, after success:
m.audit.Log(ctx, audit.Event{
Caller: "operator", // unseal is not authenticated
Operation: "unseal",
Outcome: "success",
})
// On failure:
m.audit.Log(ctx, audit.Event{
Caller: "operator",
Operation: "unseal",
Outcome: "denied",
Error: "invalid password",
})
Policy (internal/policy/)
Instrument CreateRule and DeleteRule:
// In Engine.CreateRule, after success:
e.audit.Log(ctx, audit.Event{
Caller: callerUsername, // passed from the handler
Operation: "create-policy",
Outcome: "success",
Detail: map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect},
})
Operations to Audit
| Category | Operations | Outcome on deny |
|---|---|---|
| System | init, unseal, seal, rotate-mek, rotate-key, migrate |
denied or error |
| CA | import-root, create-issuer, delete-issuer, issue, sign-csr, renew, revoke-cert, delete-cert |
denied |
| SSH CA | sign-host, sign-user, create-profile, update-profile, delete-profile, revoke-cert, delete-cert |
denied |
| Transit | create-key, delete-key, rotate-key, update-key-config, trim-key, encrypt, decrypt, rewrap, sign, verify, hmac |
denied |
| User | register, provision, encrypt, decrypt, re-encrypt, rotate-key, delete-user |
denied |
| Policy | create-policy, delete-policy |
N/A (admin-only) |
| Auth | login (success and failure) |
denied |
Read-only operations (get-cert, list-certs, get-profile,
list-profiles, get-key, list-keys, list-users, get-public-key,
status) are not audited by default. They generate operational log
entries via the existing HTTP/gRPC logging middleware but do not produce
audit events. This keeps the audit log focused on state-changing operations.
If the operator wants read auditing, a config flag can enable it:
[audit]
include_reads = false # default
File Layout
internal/
audit/
audit.go # Logger, Event, LevelAudit
audit_test.go # Tests
One file, one type, no interfaces. The audit logger is a concrete struct passed by pointer. Nil-safe for disabled mode.
Configuration
Add to config.go:
type AuditConfig struct {
Mode string `toml:"mode"` // "file", "stdout", ""
Path string `toml:"path"` // file path (mode=file)
IncludeReads bool `toml:"include_reads"` // audit read operations
}
Add to example config:
[audit]
mode = "file"
path = "/srv/metacrypt/audit.log"
include_reads = false
Implementation Steps
-
Create
internal/audit/audit.go—Logger,Event,LevelAudit,New(handler), nil-safeLogmethod. -
Add
AuditConfigto config — mode, path, include_reads. Validate thatpathis set whenmode = "file". -
Create audit logger in
cmd/metacrypt/server.go— based on config, open file or use stdout. Pass to Server, GRPCServer, SealManager, PolicyEngine. -
Add
audit *audit.Loggerfield toServer,GRPCServer,seal.Manager,policy.Engine. Update constructors. -
Instrument REST handlers — add
auditEngineOphelper toServer. Call after every mutating operation in typed handlers andhandleEngineRequest. -
Instrument gRPC — add audit interceptor to the interceptor chain.
-
Instrument seal/unseal — emit events in
Init,Unseal,Seal,RotateMEK. -
Instrument policy — emit events in
CreateRule,DeleteRule. -
Instrument login — emit events in the auth login handler (both REST and gRPC).
-
Update ARCHITECTURE.md — document audit logging in the Security Model section. Remove from Future Work.
-
Update example configs — add
[audit]section. -
Add tests — verify events are emitted for success, denied, and error outcomes. Verify nil logger is safe. Verify read operations are excluded by default.
Querying the Audit Log
# All events for a user:
jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log
# All certificate issuances:
jq 'select(.operation == "issue")' /srv/metacrypt/audit.log
# All denied operations:
jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log
# All SSH CA events in the last hour:
jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log
# Count operations by type:
jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn
# Failed unseal attempts:
jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log
Rotation
For file mode, use logrotate:
/srv/metacrypt/audit.log {
daily
rotate 90
compress
delaycompress
missingok
notifempty
copytruncate
}
copytruncate avoids the need for a signal-based reopen mechanism. The
Go slog.JSONHandler writes are not buffered, so no data is lost.
At homelab scale with moderate usage, 90 days of uncompressed audit logs will be well under 100 MB.