Files
metacrypt/docs/engineering-standards.md
Kyle Isom bbe382dc10 Migrate module path from kyle/ to mc/ org
All import paths updated to git.wntrmute.dev/mc/. Bumps mcdsl to v1.2.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 02:05:59 -07:00

27 KiB

Metacircular Dynamics — Engineering Standards

Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html

This document describes the standard repository layout, tooling, and software development lifecycle (SDLC) for services built at Metacircular Dynamics. It incorporates the platform-wide project guidelines and codifies the conventions established in Metacrypt as the baseline for all services.

Platform Rules

These four rules apply to every Metacircular service:

  1. Data Storage: All service data goes in /srv/<service>/ to enable straightforward migration across systems.
  2. Deployment Architecture: Services require systemd unit files but prioritize container-first design to support deployment via the Metacircular Control Plane (MCP).
  3. Identity Management: Services must integrate with MCIAS (Metacircular Identity and Access Service) for user management and access control. Three role levels: admin (full administrative access), user (full non-administrative access), guest (service-dependent restrictions).
  4. API Design: Services expose both gRPC and REST interfaces, kept in sync. Web UIs are built with htmx.

Table of Contents

  1. Platform Rules
  2. Repository Layout
  3. Language & Toolchain
  4. Build System
  5. API Design
  6. Authentication & Authorization
  7. Database Conventions
  8. Configuration
  9. Web UI
  10. Testing
  11. Linting & Static Analysis
  12. Deployment
  13. Documentation
  14. Security
  15. Development Workflow

Repository Layout

Every service follows a consistent directory structure. Adjust the service-specific directories (e.g. engines/ in Metacrypt) as appropriate, but the top-level skeleton is fixed.

.
├── cmd/
│   ├── <service>/              CLI entry point (server, subcommands)
│   └── <service>-web/          Web UI entry point (if separate binary)
├── internal/
│   ├── auth/                   MCIAS integration (token validation, caching)
│   ├── config/                 TOML configuration loading & validation
│   ├── db/                     Database setup, schema migrations
│   ├── server/                 REST API server, routes, middleware
│   ├── grpcserver/             gRPC server, interceptors, service handlers
│   ├── webserver/              Web UI server, template routes, HTMX handlers
│   └── <domain>/               Service-specific packages
├── proto/<service>/
│   ├── v1/                     Legacy proto definitions (if applicable)
│   └── v2/                     Current proto definitions
├── gen/<service>/
│   ├── v1/                     Generated Go gRPC/protobuf code
│   └── v2/
├── web/
│   ├── embed.go                //go:embed directive for templates and static
│   ├── templates/              Go HTML templates
│   └── static/                 CSS, JS (htmx)
├── deploy/
│   ├── docker/                 Docker Compose configuration
│   ├── examples/               Example config files
│   ├── scripts/                Install, backup, migration scripts
│   └── systemd/                systemd unit files and timers
├── docs/                       Internal engineering documentation
├── Dockerfile.api              API server container (if split binary)
├── Dockerfile.web              Web UI container (if split binary)
├── Makefile
├── buf.yaml                    Protobuf linting & breaking-change config
├── .golangci.yaml              Linter configuration
├── .gitignore
├── CLAUDE.md                   AI-assisted development instructions
├── ARCHITECTURE.md             Full system specification
└── <service>.toml.example      Example configuration

Key Principles

  • cmd/ contains only CLI wiring (cobra commands, flag parsing). No business logic.
  • internal/ contains all service logic. Nothing in internal/ is importable by other modules — this is enforced by Go's module system.
  • proto/ is the source of truth for gRPC definitions. Generated code lives in gen/, never edited by hand.
  • deploy/ contains everything needed to run the service in production. A new engineer should be able to deploy from this directory alone.
  • web/ is embedded into the binary via //go:embed. No external file dependencies at runtime.

What Does Not Belong in the Repository

  • Runtime data (databases, certificates, logs) — these live in /srv/<service>
  • Real configuration files with secrets — only examples are committed
  • IDE configuration (.idea/, .vscode/) — per-developer, not shared
  • Vendored dependencies — Go module proxy handles this

Language & Toolchain

Tool Version Purpose
Go 1.25+ Primary language
protoc + protoc-gen-go Latest Protobuf/gRPC code generation
buf Latest Proto linting and breaking-change detection
golangci-lint v2 Static analysis and linting
Docker Latest Container builds

Go Conventions

  • Pure-Go dependencies where possible. Avoid CGo — it complicates cross-compilation and container builds. Use modernc.org/sqlite instead of mattn/go-sqlite3.
  • CGO_ENABLED=0 for all production builds. Statically linked binaries deploy cleanly to Alpine containers.
  • Stripped binaries: Build with -trimpath -ldflags="-s -w" to remove debug symbols and reduce image size.
  • Version injection: Pass git describe --tags --always --dirty via -X main.version=... at build time. Every binary must report its version.

Module Path

Services hosted on git.wntrmute.dev use:

git.wntrmute.dev/mc/<service>

Build System

Every repository has a Makefile with these standard targets:

.PHONY: build test vet lint proto-lint clean docker all

LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"

<service>:
	go build $(LDFLAGS) -o <service> ./cmd/<service>

build:
	go build ./...

test:
	go test ./...

vet:
	go vet ./...

lint:
	golangci-lint run ./...

proto:
	protoc --go_out=. --go_opt=module=<module> \
		--go-grpc_out=. --go-grpc_opt=module=<module> \
		proto/<service>/v2/*.proto

proto-lint:
	buf lint
	buf breaking --against '.git#branch=master,subdir=proto'

clean:
	rm -f <service>

docker:
	docker build -t <service> -f Dockerfile.api .

all: vet lint test <service>

Target Semantics

Target When to Run CI Gate?
vet Every change Yes
lint Every change Yes
test Every change Yes
proto-lint Any proto change Yes
proto After editing .proto files No (manual)
all Pre-push verification Yes

The all target is the CI pipeline: vet → lint → test → build. If any step fails, the pipeline stops.


API Design

Services expose two synchronized API surfaces:

gRPC (Primary)

  • Proto definitions live in proto/<service>/v2/.
  • Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
  • Use google.protobuf.Timestamp for all time fields (not RFC 3339 strings).
  • Run buf lint and buf breaking against master before merging proto changes.

REST (Secondary)

  • JSON over HTTPS. Routes live in internal/server/routes.go.
  • Use chi for routing (lightweight, stdlib-compatible).
  • Standard error format: {"error": "description"}.
  • Standard HTTP status codes: 401 (unauthenticated), 403 (unauthorized), 412 (precondition failed), 503 (service unavailable).

API Sync Rule

Every REST endpoint must have a corresponding gRPC RPC, and vice versa. When adding, removing, or changing an endpoint in either surface, the other must be updated in the same change. This is enforced in code review.

gRPC Interceptors

Access control is enforced via interceptor maps, not per-handler checks:

Map Effect
sealRequiredMethods Returns UNAVAILABLE if the service is sealed/locked
authRequiredMethods Validates MCIAS bearer token, populates caller info
adminRequiredMethods Requires admin role on the caller

Adding a new RPC means adding it to the correct interceptor maps. Forgetting this is a security defect.


Authentication & Authorization

Authentication

All services delegate authentication to MCIAS (Metacircular Identity and Access Service). No service maintains its own user database.

  • Client sends credentials to the service's /v1/auth/login endpoint.
  • The service forwards them to MCIAS via the client library (git.wntrmute.dev/mc/mcias/clients/go).
  • On success, MCIAS returns a bearer token. The service returns it to the client and optionally sets it as a cookie for the web UI.
  • Subsequent requests include the token via Authorization: Bearer <token> header or cookie.
  • Token validation calls MCIAS ValidateToken(). Results should be cached (keyed by SHA-256 of the token) with a short TTL (30 seconds or less).

Authorization

Three role levels:

Role Meaning
admin Full access to everything. Policy bypass.
user Access governed by policy rules. Default deny.
guest Service-dependent restrictions. Default deny.

Admin detection is based solely on the MCIAS admin role. The service never promotes users locally.

Services that need fine-grained access control should implement a policy engine (priority-based ACL rules stored in encrypted storage, default deny, admin bypass). See Metacrypt's implementation as the reference.


Database Conventions

SQLite

SQLite is the default database for Metacircular services. It is simple to operate, requires no external processes, and backs up cleanly with VACUUM INTO.

Connection settings (applied at open time):

PRAGMA journal_mode = WAL;
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 5000;

File permissions: 0600. Created by the service on first run.

Migrations

  • Migrations are Go functions registered in internal/db/ and run sequentially at startup.
  • Each migration is idempotent — CREATE TABLE IF NOT EXISTS, ALTER TABLE ... ADD COLUMN IF NOT EXISTS.
  • Applied migrations are tracked in a schema_migrations table.
  • Never modify a migration that has been deployed. Add a new one.

Backup

Every service must provide a snapshot CLI command that creates a consistent backup using VACUUM INTO. Automated backups run via a systemd timer (daily, with retention pruning).


Configuration

Format

TOML. Parsed with go-toml/v2. Environment variable overrides via SERVICENAME_* (e.g. METACRYPT_SERVER_LISTEN_ADDR).

Standard Sections

[server]
listen_addr = ":8443"           # HTTPS API
grpc_addr   = ":9443"           # gRPC (optional; disabled if unset)
tls_cert    = "/srv/<service>/certs/cert.pem"
tls_key     = "/srv/<service>/certs/key.pem"

[web]
listen_addr  = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
vault_grpc   = "127.0.0.1:9443" # gRPC address of the API server
vault_ca_cert = ""               # CA cert for verifying API server TLS

[database]
path = "/srv/<service>/<service>.db"

[mcias]
server_url = "https://mcias.metacircular.net:8443"
ca_cert    = ""                  # Custom CA for MCIAS TLS

[log]
level = "info"                   # debug, info, warn, error

Validation

Required fields are validated at startup. The service refuses to start if any are missing. Do not silently default required values.

Data Directory

All runtime data lives in /srv/<service>/:

/srv/<service>/
├── <service>.toml        Configuration
├── <service>.db          SQLite database
├── certs/                TLS certificates
└── backups/              Database snapshots

This convention enables straightforward service migration between hosts: copy /srv/<service>/ and the binary.


Web UI

Technology

  • Go html/template for server-side rendering. No JavaScript frameworks.
  • htmx for dynamic interactions (form submission, partial page updates) without full page reloads.
  • Templates and static files are embedded in the binary via //go:embed.

Structure

  • web/templates/layout.html — shared HTML skeleton, navigation, CSS/JS includes. All page templates extend this.
  • Page templates: one .html file per page/feature.
  • web/static/ — CSS, htmx. Keep this minimal.

Architecture

The web UI runs as a separate binary (<service>-web) that communicates with the API server via its gRPC interface. This separation means:

  • The web UI has no direct database access.
  • The API server enforces all authorization.
  • The web UI can be deployed independently or omitted entirely.

Security

  • CSRF protection via signed double-submit cookies on all mutating requests (POST/PUT/PATCH/DELETE).
  • Session cookie: HttpOnly, Secure, SameSite=Strict.
  • All user input is escaped by html/template (the default).

Testing

Philosophy

Tests are written using the Go standard library testing package. No test frameworks (testify, gomega, etc.) — the standard library is sufficient and keeps dependencies minimal.

Patterns

func TestFeatureName(t *testing.T) {
    // Setup: use t.TempDir() for isolated file system state.
    dir := t.TempDir()
    database, err := db.Open(filepath.Join(dir, "test.db"))
    if err != nil {
        t.Fatalf("open db: %v", err)
    }
    defer func() { _ = database.Close() }()
    db.Migrate(database)

    // Exercise the code under test.
    // ...

    // Assert with t.Fatal (not t.Error) for precondition failures.
    if !bytes.Equal(got, want) {
        t.Fatalf("got %q, want %q", got, want)
    }
}

Guidelines

  • Use t.TempDir() for all file-system state. Never write to fixed paths. Cleanup is automatic.
  • Use errors.Is for error assertions, not string comparison.
  • No mocks for databases. Tests use real SQLite databases created in temp directories. This catches migration bugs that mocks would hide.
  • Test files live alongside the code they test: barrier.go and barrier_test.go in the same package.
  • Test helpers call t.Helper() so failures report the caller's line.

What to Test

Layer Test Strategy
Crypto primitives Roundtrip encryption/decryption, wrong-key rejection, edge cases
Storage (barrier, DB) CRUD operations, sealed-state rejection, concurrent access
API handlers Request/response correctness, auth enforcement, error codes
Policy engine Rule matching, priority ordering, default deny, admin bypass
CLI commands Flag parsing, output format (lightweight)

Linting & Static Analysis

Configuration

Every repository includes a .golangci.yaml with this philosophy: fail loudly for security and correctness; everything else is a warning.

Required Linters

Linter Category Purpose
errcheck Correctness Unhandled errors are silent failures
govet Correctness Printf mismatches, unreachable code, suspicious constructs
ineffassign Correctness Dead writes hide logic bugs
unused Correctness Unused variables and functions
errorlint Error handling Proper errors.Is/errors.As usage
gosec Security Hardcoded secrets, weak RNG, insecure crypto, SQL injection
staticcheck Security Deprecated APIs, mutex misuse, deep analysis
revive Style Go naming conventions, error return ordering
gofmt Formatting Standard Go formatting
goimports Formatting Import grouping and ordering

Settings

  • errcheck: check-type-assertions: true (catch x.(*T) without ok check).
  • govet: all analyzers enabled except shadow (too noisy for idiomatic Go).
  • gosec: severity and confidence set to medium. Exclude G104 (overlaps with errcheck).
  • max-issues-per-linter: 0 — report everything. No caps.
  • Test files: allow G101 (hardcoded credentials) for test fixtures.

Deployment

Container-First

Services are designed for container deployment but must also run as native systemd services. Both paths are first-class.

Docker

Multi-stage builds:

  1. Builder: golang:1.23-alpine. Compile with CGO_ENABLED=0, strip symbols.
  2. Runtime: alpine:3.21. Non-root user (<service>), minimal attack surface.

If the service has separate API and web binaries, use separate Dockerfiles (Dockerfile.api, Dockerfile.web) and a docker-compose.yml that wires them together with a shared data volume.

systemd

Every service ships with:

File Purpose
<service>.service Main service unit (API server)
<service>-web.service Web UI unit (if applicable)
<service>-backup.service Oneshot backup unit
<service>-backup.timer Daily backup timer (02:00 UTC, 5-minute jitter)

Security Hardening

All service units must include these security directives:

NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
RestrictNamespaces=true
LockPersonality=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
ReadWritePaths=/srv/<service>

The web UI unit should use ReadOnlyPaths=/srv/<service> instead of ReadWritePaths — it has no reason to write to the data directory.

Install Script

deploy/scripts/install.sh handles:

  1. Create system user/group (idempotent).
  2. Install binary to /usr/local/bin/.
  3. Create /srv/<service>/ directory structure.
  4. Install example config if none exists.
  5. Install systemd units and reload the daemon.

TLS

  • Minimum TLS version: 1.3. No exceptions, no fallback cipher suites. Go's TLS 1.3 implementation manages cipher selection automatically.
  • Timeouts: read 30s, write 30s, idle 120s.
  • Certificate and key paths are required configuration — the service refuses to start without them.

Graceful Shutdown

Services handle SIGINT and SIGTERM, shutting down cleanly:

  1. Stop accepting new connections.
  2. Drain in-flight requests (with a timeout).
  3. Clean up resources (close databases, zeroize secrets if applicable).
  4. Exit.

Documentation

Required Files

File Purpose Audience
README.md Project overview, quick-start, and contributor guide Everyone
CLAUDE.md AI-assisted development context Claude Code
ARCHITECTURE.md Full system specification Engineers
RUNBOOK.md Operational procedures and incident response Operators
deploy/examples/<service>.toml Example configuration Operators

Suggested Files

These are not required for every project but should be created where applicable:

File When to Include Purpose
AUDIT.md Services handling cryptography, secrets, PII, or auth Security audit findings with issue tracking and resolution status
POLICY.md Services with fine-grained access control Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns

README.md

The README is the front door. A new engineer or user should be able to understand what the service does and get it running from this file alone. It should contain:

  • Project name and one-paragraph description.
  • Quick-start instructions (build, configure, run).
  • Link to ARCHITECTURE.md for full technical details.
  • Link to RUNBOOK.md for operational procedures.
  • License and contribution notes (if applicable).

Keep it concise. The README is not the spec — that's ARCHITECTURE.md.

CLAUDE.md

This file provides context for AI-assisted development. It should contain:

  • Project overview (one paragraph).
  • Build, test, and lint commands.
  • High-level architecture summary.
  • Project structure with directory descriptions.
  • Ignored directories (runtime data, generated code).
  • Critical rules (e.g. API sync requirements).

Keep it concise. AI tools read this on every interaction.

ARCHITECTURE.md

This is the canonical specification for the service. It should cover:

  1. System overview with a layered architecture diagram.
  2. Cryptographic design (if applicable): algorithms, key hierarchy.
  3. State machines and lifecycle (if applicable).
  4. Storage design.
  5. Authentication and authorization model.
  6. API surface (REST and gRPC, with tables of every endpoint).
  7. Web interface routes.
  8. Database schema (every table, every column).
  9. Configuration reference.
  10. Deployment guide.
  11. Security model: threat mitigations table and security invariants.
  12. Future work.

This document is the source of truth. When the code and the spec disagree, one of them has a bug.

RUNBOOK.md

The runbook is written for operators, not developers. It covers what to do when things go wrong and how to perform routine maintenance. It should contain:

  1. Service overview — what the service does, in one paragraph.
  2. Health checks — how to verify the service is healthy (endpoints, CLI commands, expected responses).
  3. Common operations — start, stop, restart, seal/unseal, backup, restore, log inspection.
  4. Alerting — what alerts exist, what they mean, and how to respond.
  5. Incident procedures — step-by-step playbooks for known failure modes (database corruption, certificate expiry, MCIAS outage, disk full, etc.).
  6. Escalation — when and how to escalate beyond the runbook.

Write runbook entries as numbered steps, not prose. An operator at 3 AM should be able to follow them without thinking.

AUDIT.md (Suggested)

For services that handle cryptography, secrets, PII, or authentication, maintain a security audit log. Each finding gets a numbered entry with:

  • Description of the issue.
  • Severity (critical, high, medium, low).
  • Resolution status: open, resolved (with summary), or accepted (with rationale for accepting the risk).

The priority summary table at the bottom provides a scannable overview. Resolved and accepted items are struck through but retained for history. See Metacrypt's AUDIT.md for the reference format.

POLICY.md (Suggested)

For services with a policy engine or fine-grained access control, document the policy model separately from the architecture spec. It should cover:

  • Rule structure (fields, types, semantics).
  • Evaluation algorithm (match logic, priority, default effect).
  • Resource path conventions and glob patterns.
  • Action classification.
  • API endpoints for policy CRUD.
  • Common policy patterns with examples.
  • Role summary (what each MCIAS role gets by default).

This document is aimed at administrators who need to write policy rules, not engineers who need to understand the implementation.

Engine/Feature Design Documents

For services with a modular architecture, each module gets its own design document (e.g. engines/sshca.md). These are detailed implementation plans that include:

  • Overview and core concepts.
  • Data model and storage layout.
  • Lifecycle (initialization, teardown).
  • Operations table with auth requirements.
  • API definitions (gRPC and REST).
  • Implementation steps (file-by-file).
  • Security considerations.
  • References to existing code patterns to follow.

Write these before writing code. They are the blueprint, not the afterthought.


Security

General Principles

  • Default deny. Unauthenticated requests are rejected. Unauthorized requests are rejected. If in doubt, deny.
  • Fail closed. If the service cannot verify authorization, it denies the request. If the database is unavailable, the service is unavailable.
  • Least privilege. Service processes run as non-root. systemd units restrict filesystem access, syscalls, and capabilities.
  • No local user databases. Authentication is always delegated to MCIAS.

Cryptographic Standards

Purpose Algorithm Notes
Symmetric encryption AES-256-GCM 12-byte random nonce per operation
Symmetric alternative XChaCha20-Poly1305 For contexts needing nonce misuse resistance
Key derivation Argon2id Memory-hard; tune params to hardware
Asymmetric signing Ed25519, ECDSA (P-256, P-384) Prefer Ed25519
CSPRNG crypto/rand All keys, nonces, salts, tokens
Constant-time comparison crypto/subtle All secret comparisons
  • Never use RSA for new designs. Ed25519 and ECDSA are faster, produce smaller keys, and have simpler security models.
  • Zeroize secrets from memory when they are no longer needed. Overwrite byte slices with zeros, nil out pointers.
  • Never log secrets. Keys, passwords, tokens, and plaintext must never appear in log output.

Web Security

  • CSRF tokens on all mutating requests.
  • SameSite=Strict on all cookies.
  • html/template for automatic escaping.
  • Validate all input at system boundaries.

Development Workflow

Local Development

# Build and run both servers locally:
make devserver

# Or build everything and run the full pipeline:
make all

The devserver target builds both binaries and runs them against a local config in srv/. The srv/ directory is gitignored — it holds your local database, certificates, and configuration.

Pre-Push Checklist

Before pushing a branch:

make all          # vet → lint → test → build
make proto-lint   # if proto files changed

Proto Changes

  1. Edit .proto files in proto/<service>/v2/.
  2. Run make proto to regenerate Go code.
  3. Run make proto-lint to check for linting violations and breaking changes.
  4. Update REST routes to match the new/changed RPCs.
  5. Update gRPC interceptor maps for any new RPCs.
  6. Update ARCHITECTURE.md API tables.

Adding a New Feature

  1. Design first. Write or update the relevant design document. For a new engine or major subsystem, create a new doc in docs/ or engines/.
  2. Implement. Follow existing patterns — the design doc should reference specific files and line numbers.
  3. Test. Write tests alongside the implementation.
  4. Update docs. Update ARCHITECTURE.md, CLAUDE.md, and route tables.
  5. Verify. Run make all.

CLI Commands

Every service uses cobra for CLI commands. Standard subcommands:

Command Purpose
server Start the service
init First-time setup (if applicable)
status Query a running instance's health
snapshot Create a database backup

Add service-specific subcommands as needed (e.g. migrate-aad, unseal). Each command lives in its own file in cmd/<service>/.