# Metacircular Dynamics — Engineering Standards This document describes the standard repository layout, tooling, and software development lifecycle (SDLC) for services built at Metacircular Dynamics. It is derived from the conventions established in Metacrypt and codifies them as the baseline for all new and existing services. ## Table of Contents 1. [Repository Layout](#repository-layout) 2. [Language & Toolchain](#language--toolchain) 3. [Build System](#build-system) 4. [API Design](#api-design) 5. [Authentication & Authorization](#authentication--authorization) 6. [Database Conventions](#database-conventions) 7. [Configuration](#configuration) 8. [Web UI](#web-ui) 9. [Testing](#testing) 10. [Linting & Static Analysis](#linting--static-analysis) 11. [Deployment](#deployment) 12. [Documentation](#documentation) 13. [Security](#security) 14. [Development Workflow](#development-workflow) --- ## Repository Layout Every service follows a consistent directory structure. Adjust the service-specific directories (e.g. `engines/` in Metacrypt) as appropriate, but the top-level skeleton is fixed. ``` . ├── cmd/ │ ├── / CLI entry point (server, subcommands) │ └── -web/ Web UI entry point (if separate binary) ├── internal/ │ ├── auth/ MCIAS integration (token validation, caching) │ ├── config/ TOML configuration loading & validation │ ├── db/ Database setup, schema migrations │ ├── server/ REST API server, routes, middleware │ ├── grpcserver/ gRPC server, interceptors, service handlers │ ├── webserver/ Web UI server, template routes, HTMX handlers │ └── / Service-specific packages ├── proto// │ ├── v1/ Legacy proto definitions (if applicable) │ └── v2/ Current proto definitions ├── gen// │ ├── v1/ Generated Go gRPC/protobuf code │ └── v2/ ├── web/ │ ├── embed.go //go:embed directive for templates and static │ ├── templates/ Go HTML templates │ └── static/ CSS, JS (htmx) ├── deploy/ │ ├── docker/ Docker Compose configuration │ ├── examples/ Example config files │ ├── scripts/ Install, backup, migration scripts │ └── systemd/ systemd unit files and timers ├── docs/ Internal engineering documentation ├── Dockerfile.api API server container (if split binary) ├── Dockerfile.web Web UI container (if split binary) ├── Makefile ├── buf.yaml Protobuf linting & breaking-change config ├── .golangci.yaml Linter configuration ├── .gitignore ├── CLAUDE.md AI-assisted development instructions ├── ARCHITECTURE.md Full system specification └── .toml.example Example configuration ``` ### Key Principles - **`cmd/`** contains only CLI wiring (cobra commands, flag parsing). No business logic. - **`internal/`** contains all service logic. Nothing in `internal/` is importable by other modules — this is enforced by Go's module system. - **`proto/`** is the source of truth for gRPC definitions. Generated code lives in `gen/`, never edited by hand. - **`deploy/`** contains everything needed to run the service in production. A new engineer should be able to deploy from this directory alone. - **`web/`** is embedded into the binary via `//go:embed`. No external file dependencies at runtime. ### What Does Not Belong in the Repository - Runtime data (databases, certificates, logs) — these live in `/srv/` - Real configuration files with secrets — only examples are committed - IDE configuration (`.idea/`, `.vscode/`) — per-developer, not shared - Vendored dependencies — Go module proxy handles this --- ## Language & Toolchain | Tool | Version | Purpose | |------|---------|---------| | Go | 1.25+ | Primary language | | protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation | | buf | Latest | Proto linting and breaking-change detection | | golangci-lint | v2 | Static analysis and linting | | Docker | Latest | Container builds | ### Go Conventions - **Pure-Go dependencies** where possible. Avoid CGo — it complicates cross-compilation and container builds. Use `modernc.org/sqlite` instead of `mattn/go-sqlite3`. - **`CGO_ENABLED=0`** for all production builds. Statically linked binaries deploy cleanly to Alpine containers. - **Stripped binaries**: Build with `-trimpath -ldflags="-s -w"` to remove debug symbols and reduce image size. - **Version injection**: Pass `git describe --tags --always --dirty` via `-X main.version=...` at build time. Every binary must report its version. ### Module Path Services hosted on `git.wntrmute.dev` use: ``` git.wntrmute.dev/kyle/ ``` --- ## Build System Every repository has a Makefile with these standard targets: ```makefile .PHONY: build test vet lint proto-lint clean docker all LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)" : go build $(LDFLAGS) -o ./cmd/ build: go build ./... test: go test ./... vet: go vet ./... lint: golangci-lint run ./... proto: protoc --go_out=. --go_opt=module= \ --go-grpc_out=. --go-grpc_opt=module= \ proto//v2/*.proto proto-lint: buf lint buf breaking --against '.git#branch=master,subdir=proto' clean: rm -f docker: docker build -t -f Dockerfile.api . all: vet lint test ``` ### Target Semantics | Target | When to Run | CI Gate? | |--------|-------------|----------| | `vet` | Every change | Yes | | `lint` | Every change | Yes | | `test` | Every change | Yes | | `proto-lint` | Any proto change | Yes | | `proto` | After editing `.proto` files | No (manual) | | `all` | Pre-push verification | Yes | The `all` target is the CI pipeline: `vet → lint → test → build`. If any step fails, the pipeline stops. --- ## API Design Services expose two synchronized API surfaces: ### gRPC (Primary) - Proto definitions live in `proto//v2/`. - Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns. - Use `google.protobuf.Timestamp` for all time fields (not RFC 3339 strings). - Run `buf lint` and `buf breaking` against master before merging proto changes. ### REST (Secondary) - JSON over HTTPS. Routes live in `internal/server/routes.go`. - Use `chi` for routing (lightweight, stdlib-compatible). - Standard error format: `{"error": "description"}`. - Standard HTTP status codes: `401` (unauthenticated), `403` (unauthorized), `412` (precondition failed), `503` (service unavailable). ### API Sync Rule **Every REST endpoint must have a corresponding gRPC RPC, and vice versa.** When adding, removing, or changing an endpoint in either surface, the other must be updated in the same change. This is enforced in code review. ### gRPC Interceptors Access control is enforced via interceptor maps, not per-handler checks: | Map | Effect | |-----|--------| | `sealRequiredMethods` | Returns `UNAVAILABLE` if the service is sealed/locked | | `authRequiredMethods` | Validates MCIAS bearer token, populates caller info | | `adminRequiredMethods` | Requires admin role on the caller | Adding a new RPC means adding it to the correct interceptor maps. Forgetting this is a security defect. --- ## Authentication & Authorization ### Authentication All services delegate authentication to **MCIAS** (Metacircular Identity and Access Service). No service maintains its own user database. - Client sends credentials to the service's `/v1/auth/login` endpoint. - The service forwards them to MCIAS via the client library (`git.wntrmute.dev/kyle/mcias/clients/go`). - On success, MCIAS returns a bearer token. The service returns it to the client and optionally sets it as a cookie for the web UI. - Subsequent requests include the token via `Authorization: Bearer ` header or cookie. - Token validation calls MCIAS `ValidateToken()`. Results should be cached (keyed by SHA-256 of the token) with a short TTL (30 seconds or less). ### Authorization Three role levels: | Role | Meaning | |------|---------| | `admin` | Full access to everything. Policy bypass. | | `user` | Access governed by policy rules. Default deny. | | `guest` | Service-dependent restrictions. Default deny. | Admin detection is based solely on the MCIAS `admin` role. The service never promotes users locally. Services that need fine-grained access control should implement a policy engine (priority-based ACL rules stored in encrypted storage, default deny, admin bypass). See Metacrypt's implementation as the reference. --- ## Database Conventions ### SQLite SQLite is the default database for Metacircular services. It is simple to operate, requires no external processes, and backs up cleanly with `VACUUM INTO`. Connection settings (applied at open time): ```go PRAGMA journal_mode = WAL; PRAGMA foreign_keys = ON; PRAGMA busy_timeout = 5000; ``` File permissions: `0600`. Created by the service on first run. ### Migrations - Migrations are Go functions registered in `internal/db/` and run sequentially at startup. - Each migration is idempotent — `CREATE TABLE IF NOT EXISTS`, `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`. - Applied migrations are tracked in a `schema_migrations` table. - Never modify a migration that has been deployed. Add a new one. ### Backup Every service must provide a `snapshot` CLI command that creates a consistent backup using `VACUUM INTO`. Automated backups run via a systemd timer (daily, with retention pruning). --- ## Configuration ### Format TOML. Parsed with `go-toml/v2`. Environment variable overrides via `SERVICENAME_*` (e.g. `METACRYPT_SERVER_LISTEN_ADDR`). ### Standard Sections ```toml [server] listen_addr = ":8443" # HTTPS API grpc_addr = ":9443" # gRPC (optional; disabled if unset) tls_cert = "/srv//certs/cert.pem" tls_key = "/srv//certs/key.pem" [web] listen_addr = "127.0.0.1:8080" # Web UI (optional; disabled if unset) vault_grpc = "127.0.0.1:9443" # gRPC address of the API server vault_ca_cert = "" # CA cert for verifying API server TLS [database] path = "/srv//.db" [mcias] server_url = "https://mcias.metacircular.net:8443" ca_cert = "" # Custom CA for MCIAS TLS [log] level = "info" # debug, info, warn, error ``` ### Validation Required fields are validated at startup. The service refuses to start if any are missing. Do not silently default required values. ### Data Directory All runtime data lives in `/srv//`: ``` /srv// ├── .toml Configuration ├── .db SQLite database ├── certs/ TLS certificates └── backups/ Database snapshots ``` This convention enables straightforward service migration between hosts: copy `/srv//` and the binary. --- ## Web UI ### Technology - **Go `html/template`** for server-side rendering. No JavaScript frameworks. - **htmx** for dynamic interactions (form submission, partial page updates) without full page reloads. - Templates and static files are embedded in the binary via `//go:embed`. ### Structure - `web/templates/layout.html` — shared HTML skeleton, navigation, CSS/JS includes. All page templates extend this. - Page templates: one `.html` file per page/feature. - `web/static/` — CSS, htmx. Keep this minimal. ### Architecture The web UI runs as a separate binary (`-web`) that communicates with the API server via its gRPC interface. This separation means: - The web UI has no direct database access. - The API server enforces all authorization. - The web UI can be deployed independently or omitted entirely. ### Security - CSRF protection via signed double-submit cookies on all mutating requests (POST/PUT/PATCH/DELETE). - Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`. - All user input is escaped by `html/template` (the default). --- ## Testing ### Philosophy Tests are written using the Go standard library `testing` package. No test frameworks (testify, gomega, etc.) — the standard library is sufficient and keeps dependencies minimal. ### Patterns ```go func TestFeatureName(t *testing.T) { // Setup: use t.TempDir() for isolated file system state. dir := t.TempDir() database, err := db.Open(filepath.Join(dir, "test.db")) if err != nil { t.Fatalf("open db: %v", err) } defer func() { _ = database.Close() }() db.Migrate(database) // Exercise the code under test. // ... // Assert with t.Fatal (not t.Error) for precondition failures. if !bytes.Equal(got, want) { t.Fatalf("got %q, want %q", got, want) } } ``` ### Guidelines - **Use `t.TempDir()`** for all file-system state. Never write to fixed paths. Cleanup is automatic. - **Use `errors.Is`** for error assertions, not string comparison. - **No mocks for databases.** Tests use real SQLite databases created in temp directories. This catches migration bugs that mocks would hide. - **Test files** live alongside the code they test: `barrier.go` and `barrier_test.go` in the same package. - **Test helpers** call `t.Helper()` so failures report the caller's line. ### What to Test | Layer | Test Strategy | |-------|---------------| | Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases | | Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access | | API handlers | Request/response correctness, auth enforcement, error codes | | Policy engine | Rule matching, priority ordering, default deny, admin bypass | | CLI commands | Flag parsing, output format (lightweight) | --- ## Linting & Static Analysis ### Configuration Every repository includes a `.golangci.yaml` with this philosophy: **fail loudly for security and correctness; everything else is a warning.** ### Required Linters | Linter | Category | Purpose | |--------|----------|---------| | `errcheck` | Correctness | Unhandled errors are silent failures | | `govet` | Correctness | Printf mismatches, unreachable code, suspicious constructs | | `ineffassign` | Correctness | Dead writes hide logic bugs | | `unused` | Correctness | Unused variables and functions | | `errorlint` | Error handling | Proper `errors.Is`/`errors.As` usage | | `gosec` | Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection | | `staticcheck` | Security | Deprecated APIs, mutex misuse, deep analysis | | `revive` | Style | Go naming conventions, error return ordering | | `gofmt` | Formatting | Standard Go formatting | | `goimports` | Formatting | Import grouping and ordering | ### Settings - `errcheck`: `check-type-assertions: true` (catch `x.(*T)` without ok check). - `govet`: all analyzers enabled except `shadow` (too noisy for idiomatic Go). - `gosec`: severity and confidence set to `medium`. Exclude `G104` (overlaps with errcheck). - `max-issues-per-linter: 0` — report everything. No caps. - Test files: allow `G101` (hardcoded credentials) for test fixtures. --- ## Deployment ### Container-First Services are designed for container deployment but must also run as native systemd services. Both paths are first-class. ### Docker Multi-stage builds: 1. **Builder**: `golang:1.23-alpine`. Compile with `CGO_ENABLED=0`, strip symbols. 2. **Runtime**: `alpine:3.21`. Non-root user (``), minimal attack surface. If the service has separate API and web binaries, use separate Dockerfiles (`Dockerfile.api`, `Dockerfile.web`) and a `docker-compose.yml` that wires them together with a shared data volume. ### systemd Every service ships with: | File | Purpose | |------|---------| | `.service` | Main service unit (API server) | | `-web.service` | Web UI unit (if applicable) | | `-backup.service` | Oneshot backup unit | | `-backup.timer` | Daily backup timer (02:00 UTC, 5-minute jitter) | #### Security Hardening All service units must include these security directives: ```ini NoNewPrivileges=true ProtectSystem=strict ProtectHome=true PrivateTmp=true PrivateDevices=true ProtectKernelTunables=true ProtectKernelModules=true ProtectControlGroups=true RestrictSUIDSGID=true RestrictNamespaces=true LockPersonality=true MemoryDenyWriteExecute=true RestrictRealtime=true ReadWritePaths=/srv/ ``` The web UI unit should use `ReadOnlyPaths=/srv/` instead of `ReadWritePaths` — it has no reason to write to the data directory. ### Install Script `deploy/scripts/install.sh` handles: 1. Create system user/group (idempotent). 2. Install binary to `/usr/local/bin/`. 3. Create `/srv//` directory structure. 4. Install example config if none exists. 5. Install systemd units and reload the daemon. ### TLS - **Minimum TLS version: 1.3.** No exceptions, no fallback cipher suites. Go's TLS 1.3 implementation manages cipher selection automatically. - **Timeouts**: read 30s, write 30s, idle 120s. - Certificate and key paths are required configuration — the service refuses to start without them. ### Graceful Shutdown Services handle `SIGINT` and `SIGTERM`, shutting down cleanly: 1. Stop accepting new connections. 2. Drain in-flight requests (with a timeout). 3. Clean up resources (close databases, zeroize secrets if applicable). 4. Exit. --- ## Documentation ### Required Files | File | Purpose | Audience | |------|---------|----------| | `CLAUDE.md` | AI-assisted development context | Claude Code | | `ARCHITECTURE.md` | Full system specification | Engineers | | `deploy/examples/.toml` | Example configuration | Operators | ### CLAUDE.md This file provides context for AI-assisted development. It should contain: - Project overview (one paragraph). - Build, test, and lint commands. - High-level architecture summary. - Project structure with directory descriptions. - Ignored directories (runtime data, generated code). - Critical rules (e.g. API sync requirements). Keep it concise. AI tools read this on every interaction. ### ARCHITECTURE.md This is the canonical specification for the service. It should cover: 1. System overview with a layered architecture diagram. 2. Cryptographic design (if applicable): algorithms, key hierarchy. 3. State machines and lifecycle (if applicable). 4. Storage design. 5. Authentication and authorization model. 6. API surface (REST and gRPC, with tables of every endpoint). 7. Web interface routes. 8. Database schema (every table, every column). 9. Configuration reference. 10. Deployment guide. 11. Security model: threat mitigations table and security invariants. 12. Future work. This document is the source of truth. When the code and the spec disagree, one of them has a bug. ### Engine/Feature Design Documents For services with a modular architecture, each module gets its own design document (e.g. `engines/sshca.md`). These are detailed implementation plans that include: - Overview and core concepts. - Data model and storage layout. - Lifecycle (initialization, teardown). - Operations table with auth requirements. - API definitions (gRPC and REST). - Implementation steps (file-by-file). - Security considerations. - References to existing code patterns to follow. Write these before writing code. They are the blueprint, not the afterthought. --- ## Security ### General Principles - **Default deny.** Unauthenticated requests are rejected. Unauthorized requests are rejected. If in doubt, deny. - **Fail closed.** If the service cannot verify authorization, it denies the request. If the database is unavailable, the service is unavailable. - **Least privilege.** Service processes run as non-root. systemd units restrict filesystem access, syscalls, and capabilities. - **No local user databases.** Authentication is always delegated to MCIAS. ### Cryptographic Standards | Purpose | Algorithm | Notes | |---------|-----------|-------| | Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation | | Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance | | Key derivation | Argon2id | Memory-hard; tune params to hardware | | Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 | | CSPRNG | `crypto/rand` | All keys, nonces, salts, tokens | | Constant-time comparison | `crypto/subtle` | All secret comparisons | - **Never use RSA for new designs.** Ed25519 and ECDSA are faster, produce smaller keys, and have simpler security models. - **Zeroize secrets** from memory when they are no longer needed. Overwrite byte slices with zeros, nil out pointers. - **Never log secrets.** Keys, passwords, tokens, and plaintext must never appear in log output. ### Web Security - CSRF tokens on all mutating requests. - `SameSite=Strict` on all cookies. - `html/template` for automatic escaping. - Validate all input at system boundaries. --- ## Development Workflow ### Local Development ```bash # Build and run both servers locally: make devserver # Or build everything and run the full pipeline: make all ``` The `devserver` target builds both binaries and runs them against a local config in `srv/`. The `srv/` directory is gitignored — it holds your local database, certificates, and configuration. ### Pre-Push Checklist Before pushing a branch: ```bash make all # vet → lint → test → build make proto-lint # if proto files changed ``` ### Proto Changes 1. Edit `.proto` files in `proto//v2/`. 2. Run `make proto` to regenerate Go code. 3. Run `make proto-lint` to check for linting violations and breaking changes. 4. Update REST routes to match the new/changed RPCs. 5. Update gRPC interceptor maps for any new RPCs. 6. Update `ARCHITECTURE.md` API tables. ### Adding a New Feature 1. **Design first.** Write or update the relevant design document. For a new engine or major subsystem, create a new doc in `docs/` or `engines/`. 2. **Implement.** Follow existing patterns — the design doc should reference specific files and line numbers. 3. **Test.** Write tests alongside the implementation. 4. **Update docs.** Update `ARCHITECTURE.md`, `CLAUDE.md`, and route tables. 5. **Verify.** Run `make all`. ### CLI Commands Every service uses cobra for CLI commands. Standard subcommands: | Command | Purpose | |---------|---------| | `server` | Start the service | | `init` | First-time setup (if applicable) | | `status` | Query a running instance's health | | `snapshot` | Create a database backup | Add service-specific subcommands as needed (e.g. `migrate-aad`, `unseal`). Each command lives in its own file in `cmd//`.