Update engine specs, audit doc, and server tests for SSH CA, transit, and user engines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-16 20:16:23 -07:00
parent 7237b2951e
commit 128f5abc4d
6 changed files with 1309 additions and 182 deletions

View File

@@ -1,12 +1,31 @@
# Metacircular Dynamics — Engineering Standards
Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
This document describes the standard repository layout, tooling, and software
development lifecycle (SDLC) for services built at Metacircular Dynamics. It is
derived from the conventions established in Metacrypt and codifies them as the
baseline for all new and existing services.
development lifecycle (SDLC) for services built at Metacircular Dynamics. It
incorporates the platform-wide project guidelines and codifies the conventions
established in Metacrypt as the baseline for all services.
## Platform Rules
These four rules apply to every Metacircular service:
1. **Data Storage**: All service data goes in `/srv/<service>/` to enable
straightforward migration across systems.
2. **Deployment Architecture**: Services require systemd unit files but
prioritize container-first design to support deployment via the
Metacircular Control Plane (MCP).
3. **Identity Management**: Services must integrate with MCIAS (Metacircular
Identity and Access Service) for user management and access control. Three
role levels: `admin` (full administrative access), `user` (full
non-administrative access), `guest` (service-dependent restrictions).
4. **API Design**: Services expose both gRPC and REST interfaces, kept in
sync. Web UIs are built with htmx.
## Table of Contents
0. [Platform Rules](#platform-rules)
1. [Repository Layout](#repository-layout)
2. [Language & Toolchain](#language--toolchain)
3. [Build System](#build-system)
@@ -559,10 +578,35 @@ Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:
| File | Purpose | Audience |
|------|---------|----------|
| `README.md` | Project overview, quick-start, and contributor guide | Everyone |
| `CLAUDE.md` | AI-assisted development context | Claude Code |
| `ARCHITECTURE.md` | Full system specification | Engineers |
| `RUNBOOK.md` | Operational procedures and incident response | Operators |
| `deploy/examples/<service>.toml` | Example configuration | Operators |
### Suggested Files
These are not required for every project but should be created where applicable:
| File | When to Include | Purpose |
|------|-----------------|---------|
| `AUDIT.md` | Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
| `POLICY.md` | Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
### README.md
The README is the front door. A new engineer or user should be able to
understand what the service does and get it running from this file alone.
It should contain:
- Project name and one-paragraph description.
- Quick-start instructions (build, configure, run).
- Link to `ARCHITECTURE.md` for full technical details.
- Link to `RUNBOOK.md` for operational procedures.
- License and contribution notes (if applicable).
Keep it concise. The README is not the spec — that's `ARCHITECTURE.md`.
### CLAUDE.md
This file provides context for AI-assisted development. It should contain:
@@ -596,6 +640,56 @@ This is the canonical specification for the service. It should cover:
This document is the source of truth. When the code and the spec disagree,
one of them has a bug.
### RUNBOOK.md
The runbook is written for operators, not developers. It covers what to do
when things go wrong and how to perform routine maintenance. It should
contain:
1. **Service overview** — what the service does, in one paragraph.
2. **Health checks** — how to verify the service is healthy (endpoints,
CLI commands, expected responses).
3. **Common operations** — start, stop, restart, seal/unseal, backup,
restore, log inspection.
4. **Alerting** — what alerts exist, what they mean, and how to respond.
5. **Incident procedures** — step-by-step playbooks for known failure
modes (database corruption, certificate expiry, MCIAS outage, disk
full, etc.).
6. **Escalation** — when and how to escalate beyond the runbook.
Write runbook entries as numbered steps, not prose. An operator at 3 AM
should be able to follow them without thinking.
### AUDIT.md (Suggested)
For services that handle cryptography, secrets, PII, or authentication,
maintain a security audit log. Each finding gets a numbered entry with:
- Description of the issue.
- Severity (critical, high, medium, low).
- Resolution status: open, resolved (with summary), or accepted (with
rationale for accepting the risk).
The priority summary table at the bottom provides a scannable overview.
Resolved and accepted items are struck through but retained for history.
See Metacrypt's `AUDIT.md` for the reference format.
### POLICY.md (Suggested)
For services with a policy engine or fine-grained access control, document
the policy model separately from the architecture spec. It should cover:
- Rule structure (fields, types, semantics).
- Evaluation algorithm (match logic, priority, default effect).
- Resource path conventions and glob patterns.
- Action classification.
- API endpoints for policy CRUD.
- Common policy patterns with examples.
- Role summary (what each MCIAS role gets by default).
This document is aimed at administrators who need to write policy rules,
not engineers who need to understand the implementation.
### Engine/Feature Design Documents
For services with a modular architecture, each module gets its own design