Update engine specs, audit doc, and server tests for SSH CA, transit, and user engines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 20:16:23 -07:00
parent 7237b2951e
commit 128f5abc4d
6 changed files with 1309 additions and 182 deletions
--- a/docs/engineering-standards.md
+++ b/docs/engineering-standards.md
@@ -1,12 +1,31 @@
 # Metacircular Dynamics — Engineering Standards

+Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
+
 This document describes the standard repository layout, tooling, and software
-development lifecycle (SDLC) for services built at Metacircular Dynamics. It is
-derived from the conventions established in Metacrypt and codifies them as the
-baseline for all new and existing services.
+development lifecycle (SDLC) for services built at Metacircular Dynamics. It
+incorporates the platform-wide project guidelines and codifies the conventions
+established in Metacrypt as the baseline for all services.
+
+## Platform Rules
+
+These four rules apply to every Metacircular service:
+
+1. **Data Storage**: All service data goes in `/srv/<service>/` to enable
+   straightforward migration across systems.
+2. **Deployment Architecture**: Services require systemd unit files but
+   prioritize container-first design to support deployment via the
+   Metacircular Control Plane (MCP).
+3. **Identity Management**: Services must integrate with MCIAS (Metacircular
+   Identity and Access Service) for user management and access control. Three
+   role levels: `admin` (full administrative access), `user` (full
+   non-administrative access), `guest` (service-dependent restrictions).
+4. **API Design**: Services expose both gRPC and REST interfaces, kept in
+   sync. Web UIs are built with htmx.

 ## Table of Contents

+0. [Platform Rules](#platform-rules)
 1. [Repository Layout](#repository-layout)
 2. [Language & Toolchain](#language--toolchain)
 3. [Build System](#build-system)
@@ -559,10 +578,35 @@ Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:

 | File | Purpose | Audience |
 |------|---------|----------|
+| `README.md` | Project overview, quick-start, and contributor guide | Everyone |
 | `CLAUDE.md` | AI-assisted development context | Claude Code |
 | `ARCHITECTURE.md` | Full system specification | Engineers |
+| `RUNBOOK.md` | Operational procedures and incident response | Operators |
 | `deploy/examples/<service>.toml` | Example configuration | Operators |

+### Suggested Files
+
+These are not required for every project but should be created where applicable:
+
+| File | When to Include | Purpose |
+|------|-----------------|---------|
+| `AUDIT.md` | Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
+| `POLICY.md` | Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
+
+### README.md
+
+The README is the front door. A new engineer or user should be able to
+understand what the service does and get it running from this file alone.
+It should contain:
+
+- Project name and one-paragraph description.
+- Quick-start instructions (build, configure, run).
+- Link to `ARCHITECTURE.md` for full technical details.
+- Link to `RUNBOOK.md` for operational procedures.
+- License and contribution notes (if applicable).
+
+Keep it concise. The README is not the spec — that's `ARCHITECTURE.md`.
+
 ### CLAUDE.md

 This file provides context for AI-assisted development. It should contain:
@@ -596,6 +640,56 @@ This is the canonical specification for the service. It should cover:
 This document is the source of truth. When the code and the spec disagree,
 one of them has a bug.

+### RUNBOOK.md
+
+The runbook is written for operators, not developers. It covers what to do
+when things go wrong and how to perform routine maintenance. It should
+contain:
+
+1. **Service overview** — what the service does, in one paragraph.
+2. **Health checks** — how to verify the service is healthy (endpoints,
+   CLI commands, expected responses).
+3. **Common operations** — start, stop, restart, seal/unseal, backup,
+   restore, log inspection.
+4. **Alerting** — what alerts exist, what they mean, and how to respond.
+5. **Incident procedures** — step-by-step playbooks for known failure
+   modes (database corruption, certificate expiry, MCIAS outage, disk
+   full, etc.).
+6. **Escalation** — when and how to escalate beyond the runbook.
+
+Write runbook entries as numbered steps, not prose. An operator at 3 AM
+should be able to follow them without thinking.
+
+### AUDIT.md (Suggested)
+
+For services that handle cryptography, secrets, PII, or authentication,
+maintain a security audit log. Each finding gets a numbered entry with:
+
+- Description of the issue.
+- Severity (critical, high, medium, low).
+- Resolution status: open, resolved (with summary), or accepted (with
+  rationale for accepting the risk).
+
+The priority summary table at the bottom provides a scannable overview.
+Resolved and accepted items are struck through but retained for history.
+See Metacrypt's `AUDIT.md` for the reference format.
+
+### POLICY.md (Suggested)
+
+For services with a policy engine or fine-grained access control, document
+the policy model separately from the architecture spec. It should cover:
+
+- Rule structure (fields, types, semantics).
+- Evaluation algorithm (match logic, priority, default effect).
+- Resource path conventions and glob patterns.
+- Action classification.
+- API endpoints for policy CRUD.
+- Common policy patterns with examples.
+- Role summary (what each MCIAS role gets by default).
+
+This document is aimed at administrators who need to write policy rules,
+not engineers who need to understand the implementation.
+
 ### Engine/Feature Design Documents

 For services with a modular architecture, each module gets its own design