# Audit Logging Design

## Overview

Metacrypt is a cryptographic service for a homelab/personal infrastructure
platform. Audit logging gives the operator visibility into what happened,
when, and by whom — essential for a service that issues certificates, signs
SSH keys, and manages encryption keys, even at homelab scale.

The design prioritizes simplicity and operational clarity over enterprise
features. There is one operator. There is no SIEM. The audit log should be
a structured, append-only file that can be read with `jq`, tailed with
`journalctl`, and rotated with `logrotate`. It should not require a
database, a separate service, or additional infrastructure.

## Goals

1. **Record all security-relevant operations** — who did what, when, and
   whether it succeeded.
2. **Separate audit events from operational logs** — operational logs
   (`slog.Info`) are for debugging; audit events are for accountability.
3. **Zero additional dependencies** — use Go's `log/slog` with a dedicated
   handler writing to a file or stdout.
4. **No performance overhead that matters at homelab scale** — synchronous
   writes are fine. This is not a high-throughput system.
5. **Queryable with standard tools** — one JSON object per line, greppable,
   `jq`-friendly.

## Non-Goals

- Tamper-evident chaining (hash chains, Merkle trees). The operator has
  root access to the machine; tamper evidence against the operator is
  theatre. If the threat model changes, this can be added later.
- Remote log shipping. If needed, `journalctl` or `filebeat` can ship
  the file externally.
- Log aggregation across services. Each Metacircular service logs
  independently.
- Structured querying (SQL, full-text search). `jq` and `grep` are
  sufficient.

## Event Model

Every audit event is a single JSON line with these fields:

```json
{
  "time":      "2026-03-17T04:15:42.577Z",
  "level":     "AUDIT",
  "msg":       "operation completed",
  "caller":    "kyle",
  "roles":     ["admin"],
  "operation": "issue",
  "engine":    "ca",
  "mount":     "pki",
  "resource":  "ca/pki/id/example.com",
  "outcome":   "success",
  "detail":    {"serial": "01:02:03", "issuer": "default", "cn": "example.com"}
}
```

### Required Fields

| Field | Type | Description |
|-------|------|-------------|
| `time` | RFC 3339 | When the event occurred |
| `level` | string | Always `"AUDIT"` — distinguishes from operational logs |
| `msg` | string | Human-readable summary |
| `caller` | string | MCIAS username, or `"anonymous"` for unauthenticated ops |
| `operation` | string | Engine operation name (e.g., `issue`, `sign-user`, `encrypt`) |
| `outcome` | string | `"success"`, `"denied"`, or `"error"` |

### Optional Fields

| Field | Type | Description |
|-------|------|-------------|
| `roles` | []string | Caller's MCIAS roles |
| `engine` | string | Engine type (`ca`, `sshca`, `transit`, `user`) |
| `mount` | string | Mount name |
| `resource` | string | Policy resource path evaluated |
| `detail` | object | Operation-specific metadata (see below) |
| `error` | string | Error message on `"error"` or `"denied"` outcomes |

### Detail Fields by Operation Category

**Certificate operations** (CA):
- `serial`, `issuer`, `cn`, `profile`, `ttl`

**SSH CA operations**:
- `serial`, `cert_type` (`user`/`host`), `principals`, `profile`, `key_id`

**Transit operations**:
- `key` (key name), `key_version`, `batch_size` (for batch ops)

**User E2E operations**:
- `recipients` (list), `sender`

**Policy operations**:
- `rule_id`, `effect`

**System operations** (seal/unseal/init):
- No detail fields; the operation name is sufficient.

### What NOT to Log

- Plaintext, ciphertext, signatures, HMACs, envelopes, or any
  cryptographic material.
- Private keys, public keys, or key bytes.
- Passwords, tokens, or credentials.
- Full request/response bodies.

The audit log records **what happened**, not **what the data was**.

## Architecture

### Audit Logger

A thin wrapper around `slog.Logger` with a dedicated handler:

```go
// Package audit provides structured audit event logging.
package audit

import (
    "context"
    "log/slog"
)

// Logger writes structured audit events.
type Logger struct {
    logger *slog.Logger
}

// New creates an audit logger that writes to the given handler.
func New(h slog.Handler) *Logger {
    return &Logger{logger: slog.New(h)}
}

// Event represents a single audit event.
type Event struct {
    Caller    string
    Roles     []string
    Operation string
    Engine    string
    Mount     string
    Resource  string
    Outcome   string // "success", "denied", "error"
    Error     string
    Detail    map[string]interface{}
}

// Log writes an audit event.
func (l *Logger) Log(ctx context.Context, e Event) {
    attrs := []slog.Attr{
        slog.String("caller", e.Caller),
        slog.String("operation", e.Operation),
        slog.String("outcome", e.Outcome),
    }
    if len(e.Roles) > 0 {
        attrs = append(attrs, slog.Any("roles", e.Roles))
    }
    if e.Engine != "" {
        attrs = append(attrs, slog.String("engine", e.Engine))
    }
    if e.Mount != "" {
        attrs = append(attrs, slog.String("mount", e.Mount))
    }
    if e.Resource != "" {
        attrs = append(attrs, slog.String("resource", e.Resource))
    }
    if e.Error != "" {
        attrs = append(attrs, slog.String("error", e.Error))
    }
    if len(e.Detail) > 0 {
        attrs = append(attrs, slog.Any("detail", e.Detail))
    }

    // Use a custom level that sorts above Info but is labelled "AUDIT".
    l.logger.LogAttrs(ctx, LevelAudit, "operation completed", attrs...)
}

// LevelAudit is a custom slog level for audit events.
const LevelAudit = slog.Level(12) // between Warn (4) and Error (8+)
```

The custom level ensures audit events are never suppressed by log level
filtering (operators may set `level = "warn"` to quiet debug noise, but
audit events must always be emitted).

### Output Configuration

Two modes, controlled by a config option:

```toml
[audit]
# "file" writes to a dedicated audit log file.
# "stdout" writes to stdout alongside operational logs (for journalctl).
# Empty string disables audit logging.
mode = "file"
path = "/srv/metacrypt/audit.log"
```

**File mode**: Opens the file append-only with `0600` permissions. Uses
`slog.NewJSONHandler` writing to the file. The file can be rotated with
`logrotate` — the logger re-opens on the next write if the file is
renamed/truncated. For simplicity, just write and let logrotate handle
rotation; Go's `slog.JSONHandler` does not buffer.

**Stdout mode**: Uses `slog.NewJSONHandler` writing to `os.Stdout`. Events
are interleaved with operational logs but distinguishable by the `"AUDIT"`
level. Suitable for systemd/journalctl capture where all output goes to
the journal.

**Disabled**: No audit logger is created. The `Logger` is nil-safe — all
methods are no-ops on a nil receiver.

```go
func (l *Logger) Log(ctx context.Context, e Event) {
    if l == nil {
        return
    }
    // ...
}
```

### Integration Points

The audit logger is created at startup and injected into the components
that need it:

```
cmd/metacrypt/server.go
  └── audit.New(handler)
        ├── server.Server        (REST handlers)
        ├── grpcserver.GRPCServer (gRPC interceptor)
        ├── seal.Manager         (seal/unseal/init)
        └── policy.Engine        (rule create/delete)
```

Engine operations are logged at the **server layer** (REST handlers and
gRPC interceptors), not inside the engines themselves. This keeps the
engines focused on business logic and avoids threading the audit logger
through every engine method.

### Instrumentation

#### REST API (`internal/server/`)

Instrument `handleEngineRequest` and every typed handler. The audit event
is emitted **after** the operation completes (success or failure):

```go
func (s *Server) handleGetCert(w http.ResponseWriter, r *http.Request) {
    // ... existing handler logic ...

    s.audit.Log(r.Context(), audit.Event{
        Caller:    info.Username,
        Roles:     info.Roles,
        Operation: "get-cert",
        Engine:    "ca",
        Mount:     mountName,
        Outcome:   "success",
        Detail:    map[string]interface{}{"serial": serial},
    })
}
```

On error:

```go
s.audit.Log(r.Context(), audit.Event{
    Caller:    info.Username,
    Roles:     info.Roles,
    Operation: "get-cert",
    Engine:    "ca",
    Mount:     mountName,
    Outcome:   "error",
    Error:     err.Error(),
})
```

To avoid duplicating this in every handler, use a helper:

```go
func (s *Server) auditEngineOp(r *http.Request, info *auth.TokenInfo,
    op, engineType, mount, outcome string, detail map[string]interface{}, err error) {
    e := audit.Event{
        Caller:    info.Username,
        Roles:     info.Roles,
        Operation: op,
        Engine:    engineType,
        Mount:     mount,
        Outcome:   outcome,
        Detail:    detail,
    }
    if err != nil {
        e.Error = err.Error()
    }
    s.audit.Log(r.Context(), e)
}
```

#### gRPC API (`internal/grpcserver/`)

Add an audit interceptor that fires after each RPC completes. This is
cleaner than instrumenting every handler individually:

```go
func (g *GRPCServer) auditInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    resp, err := handler(ctx, req)

    // Extract caller info from context (set by auth interceptor).
    caller := callerFromContext(ctx)

    outcome := "success"
    if err != nil {
        outcome = "error"
    }

    g.audit.Log(ctx, audit.Event{
        Caller:    caller.Username,
        Roles:     caller.Roles,
        Operation: path.Base(info.FullMethod), // e.g., "IssueCert"
        Resource:  info.FullMethod,
        Outcome:   outcome,
        Error:     errString(err),
    })

    return resp, err
}
```

Register this interceptor **after** the auth interceptor in the chain so
that caller info is available.

#### Seal/Unseal (`internal/seal/`)

Instrument `Init`, `Unseal`, `Seal`, and `RotateMEK`:

```go
// In Manager.Unseal, after success:
m.audit.Log(ctx, audit.Event{
    Caller:    "operator", // unseal is not authenticated
    Operation: "unseal",
    Outcome:   "success",
})

// On failure:
m.audit.Log(ctx, audit.Event{
    Caller:    "operator",
    Operation: "unseal",
    Outcome:   "denied",
    Error:     "invalid password",
})
```

#### Policy (`internal/policy/`)

Instrument `CreateRule` and `DeleteRule`:

```go
// In Engine.CreateRule, after success:
e.audit.Log(ctx, audit.Event{
    Caller:    callerUsername, // passed from the handler
    Operation: "create-policy",
    Outcome:   "success",
    Detail:    map[string]interface{}{"rule_id": rule.ID, "effect": rule.Effect},
})
```

### Operations to Audit

| Category | Operations | Outcome on deny |
|----------|------------|-----------------|
| System | `init`, `unseal`, `seal`, `rotate-mek`, `rotate-key`, `migrate` | `denied` or `error` |
| CA | `import-root`, `create-issuer`, `delete-issuer`, `issue`, `sign-csr`, `renew`, `revoke-cert`, `delete-cert` | `denied` |
| SSH CA | `sign-host`, `sign-user`, `create-profile`, `update-profile`, `delete-profile`, `revoke-cert`, `delete-cert` | `denied` |
| Transit | `create-key`, `delete-key`, `rotate-key`, `update-key-config`, `trim-key`, `encrypt`, `decrypt`, `rewrap`, `sign`, `verify`, `hmac` | `denied` |
| User | `register`, `provision`, `encrypt`, `decrypt`, `re-encrypt`, `rotate-key`, `delete-user` | `denied` |
| Policy | `create-policy`, `delete-policy` | N/A (admin-only) |
| Auth | `login` (success and failure) | `denied` |

**Read-only operations** (`get-cert`, `list-certs`, `get-profile`,
`list-profiles`, `get-key`, `list-keys`, `list-users`, `get-public-key`,
`status`) are **not audited** by default. They generate operational log
entries via the existing HTTP/gRPC logging middleware but do not produce
audit events. This keeps the audit log focused on state-changing operations.

If the operator wants read auditing, a config flag can enable it:

```toml
[audit]
include_reads = false  # default
```

## File Layout

```
internal/
  audit/
    audit.go          # Logger, Event, LevelAudit
    audit_test.go     # Tests
```

One file, one type, no interfaces. The audit logger is a concrete struct
passed by pointer. Nil-safe for disabled mode.

## Configuration

Add to `config.go`:

```go
type AuditConfig struct {
    Mode         string `toml:"mode"`          // "file", "stdout", ""
    Path         string `toml:"path"`          // file path (mode=file)
    IncludeReads bool   `toml:"include_reads"` // audit read operations
}
```

Add to example config:

```toml
[audit]
mode = "file"
path = "/srv/metacrypt/audit.log"
include_reads = false
```

## Implementation Steps

1. **Create `internal/audit/audit.go`** — `Logger`, `Event`, `LevelAudit`,
   `New(handler)`, nil-safe `Log` method.

2. **Add `AuditConfig` to config** — mode, path, include_reads. Validate
   that `path` is set when `mode = "file"`.

3. **Create audit logger in `cmd/metacrypt/server.go`** — based on config,
   open file or use stdout. Pass to Server, GRPCServer, SealManager,
   PolicyEngine.

4. **Add `audit *audit.Logger` field** to `Server`, `GRPCServer`,
   `seal.Manager`, `policy.Engine`. Update constructors.

5. **Instrument REST handlers** — add `auditEngineOp` helper to `Server`.
   Call after every mutating operation in typed handlers and
   `handleEngineRequest`.

6. **Instrument gRPC** — add audit interceptor to the interceptor chain.

7. **Instrument seal/unseal** — emit events in `Init`, `Unseal`, `Seal`,
   `RotateMEK`.

8. **Instrument policy** — emit events in `CreateRule`, `DeleteRule`.

9. **Instrument login** — emit events in the auth login handler (both
   REST and gRPC).

10. **Update ARCHITECTURE.md** — document audit logging in the Security
    Model section. Remove from Future Work.

11. **Update example configs** — add `[audit]` section.

12. **Add tests** — verify events are emitted for success, denied, and
    error outcomes. Verify nil logger is safe. Verify read operations are
    excluded by default.

## Querying the Audit Log

```bash
# All events for a user:
jq 'select(.caller == "kyle")' /srv/metacrypt/audit.log

# All certificate issuances:
jq 'select(.operation == "issue")' /srv/metacrypt/audit.log

# All denied operations:
jq 'select(.outcome == "denied")' /srv/metacrypt/audit.log

# All SSH CA events in the last hour:
jq 'select(.engine == "sshca" and .time > "2026-03-17T03:00:00Z")' /srv/metacrypt/audit.log

# Count operations by type:
jq -r '.operation' /srv/metacrypt/audit.log | sort | uniq -c | sort -rn

# Failed unseal attempts:
jq 'select(.operation == "unseal" and .outcome == "denied")' /srv/metacrypt/audit.log
```

## Rotation

For file mode, use logrotate:

```
/srv/metacrypt/audit.log {
    daily
    rotate 90
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}
```

`copytruncate` avoids the need for a signal-based reopen mechanism. The
Go `slog.JSONHandler` writes are not buffered, so no data is lost.

At homelab scale with moderate usage, 90 days of uncompressed audit logs
will be well under 100 MB.