27 KiB
Metacircular Dynamics — Engineering Standards
Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
This document describes the standard repository layout, tooling, and software development lifecycle (SDLC) for services built at Metacircular Dynamics. It incorporates the platform-wide project guidelines and codifies the conventions established in Metacrypt as the baseline for all services.
Platform Rules
These four rules apply to every Metacircular service:
- Data Storage: All service data goes in
/srv/<service>/to enable straightforward migration across systems. - Deployment Architecture: Services require systemd unit files but prioritize container-first design to support deployment via the Metacircular Control Plane (MCP).
- Identity Management: Services must integrate with MCIAS (Metacircular
Identity and Access Service) for user management and access control. Three
role levels:
admin(full administrative access),user(full non-administrative access),guest(service-dependent restrictions). - API Design: Services expose both gRPC and REST interfaces, kept in sync. Web UIs are built with htmx.
Table of Contents
- Platform Rules
- Repository Layout
- Language & Toolchain
- Build System
- API Design
- Authentication & Authorization
- Database Conventions
- Configuration
- Web UI
- Testing
- Linting & Static Analysis
- Deployment
- Documentation
- Security
- Development Workflow
Repository Layout
Every service follows a consistent directory structure. Adjust the
service-specific directories (e.g. engines/ in Metacrypt) as appropriate,
but the top-level skeleton is fixed.
.
├── cmd/
│ ├── <service>/ CLI entry point (server, subcommands)
│ └── <service>-web/ Web UI entry point (if separate binary)
├── internal/
│ ├── auth/ MCIAS integration (token validation, caching)
│ ├── config/ TOML configuration loading & validation
│ ├── db/ Database setup, schema migrations
│ ├── server/ REST API server, routes, middleware
│ ├── grpcserver/ gRPC server, interceptors, service handlers
│ ├── webserver/ Web UI server, template routes, HTMX handlers
│ └── <domain>/ Service-specific packages
├── proto/<service>/
│ ├── v1/ Legacy proto definitions (if applicable)
│ └── v2/ Current proto definitions
├── gen/<service>/
│ ├── v1/ Generated Go gRPC/protobuf code
│ └── v2/
├── web/
│ ├── embed.go //go:embed directive for templates and static
│ ├── templates/ Go HTML templates
│ └── static/ CSS, JS (htmx)
├── deploy/
│ ├── docker/ Docker Compose configuration
│ ├── examples/ Example config files
│ ├── scripts/ Install, backup, migration scripts
│ └── systemd/ systemd unit files and timers
├── docs/ Internal engineering documentation
├── Dockerfile.api API server container (if split binary)
├── Dockerfile.web Web UI container (if split binary)
├── Makefile
├── buf.yaml Protobuf linting & breaking-change config
├── .golangci.yaml Linter configuration
├── .gitignore
├── CLAUDE.md AI-assisted development instructions
├── ARCHITECTURE.md Full system specification
└── <service>.toml.example Example configuration
Key Principles
cmd/contains only CLI wiring (cobra commands, flag parsing). No business logic.internal/contains all service logic. Nothing ininternal/is importable by other modules — this is enforced by Go's module system.proto/is the source of truth for gRPC definitions. Generated code lives ingen/, never edited by hand.deploy/contains everything needed to run the service in production. A new engineer should be able to deploy from this directory alone.web/is embedded into the binary via//go:embed. No external file dependencies at runtime.
What Does Not Belong in the Repository
- Runtime data (databases, certificates, logs) — these live in
/srv/<service> - Real configuration files with secrets — only examples are committed
- IDE configuration (
.idea/,.vscode/) — per-developer, not shared - Vendored dependencies — Go module proxy handles this
Language & Toolchain
| Tool | Version | Purpose |
|---|---|---|
| Go | 1.25+ | Primary language |
| protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation |
| buf | Latest | Proto linting and breaking-change detection |
| golangci-lint | v2 | Static analysis and linting |
| Docker | Latest | Container builds |
Go Conventions
- Pure-Go dependencies where possible. Avoid CGo — it complicates
cross-compilation and container builds. Use
modernc.org/sqliteinstead ofmattn/go-sqlite3. CGO_ENABLED=0for all production builds. Statically linked binaries deploy cleanly to Alpine containers.- Stripped binaries: Build with
-trimpath -ldflags="-s -w"to remove debug symbols and reduce image size. - Version injection: Pass
git describe --tags --always --dirtyvia-X main.version=...at build time. Every binary must report its version.
Module Path
Services hosted on git.wntrmute.dev use:
git.wntrmute.dev/kyle/<service>
Build System
Every repository has a Makefile with these standard targets:
.PHONY: build test vet lint proto-lint clean docker all
LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"
<service>:
go build $(LDFLAGS) -o <service> ./cmd/<service>
build:
go build ./...
test:
go test ./...
vet:
go vet ./...
lint:
golangci-lint run ./...
proto:
protoc --go_out=. --go_opt=module=<module> \
--go-grpc_out=. --go-grpc_opt=module=<module> \
proto/<service>/v2/*.proto
proto-lint:
buf lint
buf breaking --against '.git#branch=master,subdir=proto'
clean:
rm -f <service>
docker:
docker build -t <service> -f Dockerfile.api .
all: vet lint test <service>
Target Semantics
| Target | When to Run | CI Gate? |
|---|---|---|
vet |
Every change | Yes |
lint |
Every change | Yes |
test |
Every change | Yes |
proto-lint |
Any proto change | Yes |
proto |
After editing .proto files |
No (manual) |
all |
Pre-push verification | Yes |
The all target is the CI pipeline: vet → lint → test → build. If any
step fails, the pipeline stops.
API Design
Services expose two synchronized API surfaces:
gRPC (Primary)
- Proto definitions live in
proto/<service>/v2/. - Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
- Use
google.protobuf.Timestampfor all time fields (not RFC 3339 strings). - Run
buf lintandbuf breakingagainst master before merging proto changes.
REST (Secondary)
- JSON over HTTPS. Routes live in
internal/server/routes.go. - Use
chifor routing (lightweight, stdlib-compatible). - Standard error format:
{"error": "description"}. - Standard HTTP status codes:
401(unauthenticated),403(unauthorized),412(precondition failed),503(service unavailable).
API Sync Rule
Every REST endpoint must have a corresponding gRPC RPC, and vice versa. When adding, removing, or changing an endpoint in either surface, the other must be updated in the same change. This is enforced in code review.
gRPC Interceptors
Access control is enforced via interceptor maps, not per-handler checks:
| Map | Effect |
|---|---|
sealRequiredMethods |
Returns UNAVAILABLE if the service is sealed/locked |
authRequiredMethods |
Validates MCIAS bearer token, populates caller info |
adminRequiredMethods |
Requires admin role on the caller |
Adding a new RPC means adding it to the correct interceptor maps. Forgetting this is a security defect.
Authentication & Authorization
Authentication
All services delegate authentication to MCIAS (Metacircular Identity and Access Service). No service maintains its own user database.
- Client sends credentials to the service's
/v1/auth/loginendpoint. - The service forwards them to MCIAS via the client library
(
git.wntrmute.dev/kyle/mcias/clients/go). - On success, MCIAS returns a bearer token. The service returns it to the client and optionally sets it as a cookie for the web UI.
- Subsequent requests include the token via
Authorization: Bearer <token>header or cookie. - Token validation calls MCIAS
ValidateToken(). Results should be cached (keyed by SHA-256 of the token) with a short TTL (30 seconds or less).
Authorization
Three role levels:
| Role | Meaning |
|---|---|
admin |
Full access to everything. Policy bypass. |
user |
Access governed by policy rules. Default deny. |
guest |
Service-dependent restrictions. Default deny. |
Admin detection is based solely on the MCIAS admin role. The service never
promotes users locally.
Services that need fine-grained access control should implement a policy engine (priority-based ACL rules stored in encrypted storage, default deny, admin bypass). See Metacrypt's implementation as the reference.
Database Conventions
SQLite
SQLite is the default database for Metacircular services. It is simple to
operate, requires no external processes, and backs up cleanly with
VACUUM INTO.
Connection settings (applied at open time):
PRAGMA journal_mode = WAL;
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 5000;
File permissions: 0600. Created by the service on first run.
Migrations
- Migrations are Go functions registered in
internal/db/and run sequentially at startup. - Each migration is idempotent —
CREATE TABLE IF NOT EXISTS,ALTER TABLE ... ADD COLUMN IF NOT EXISTS. - Applied migrations are tracked in a
schema_migrationstable. - Never modify a migration that has been deployed. Add a new one.
Backup
Every service must provide a snapshot CLI command that creates a consistent
backup using VACUUM INTO. Automated backups run via a systemd timer
(daily, with retention pruning).
Configuration
Format
TOML. Parsed with go-toml/v2. Environment variable overrides via
SERVICENAME_* (e.g. METACRYPT_SERVER_LISTEN_ADDR).
Standard Sections
[server]
listen_addr = ":8443" # HTTPS API
grpc_addr = ":9443" # gRPC (optional; disabled if unset)
tls_cert = "/srv/<service>/certs/cert.pem"
tls_key = "/srv/<service>/certs/key.pem"
[web]
listen_addr = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
vault_grpc = "127.0.0.1:9443" # gRPC address of the API server
vault_ca_cert = "" # CA cert for verifying API server TLS
[database]
path = "/srv/<service>/<service>.db"
[mcias]
server_url = "https://mcias.metacircular.net:8443"
ca_cert = "" # Custom CA for MCIAS TLS
[log]
level = "info" # debug, info, warn, error
Validation
Required fields are validated at startup. The service refuses to start if any are missing. Do not silently default required values.
Data Directory
All runtime data lives in /srv/<service>/:
/srv/<service>/
├── <service>.toml Configuration
├── <service>.db SQLite database
├── certs/ TLS certificates
└── backups/ Database snapshots
This convention enables straightforward service migration between hosts:
copy /srv/<service>/ and the binary.
Web UI
Technology
- Go
html/templatefor server-side rendering. No JavaScript frameworks. - htmx for dynamic interactions (form submission, partial page updates) without full page reloads.
- Templates and static files are embedded in the binary via
//go:embed.
Structure
web/templates/layout.html— shared HTML skeleton, navigation, CSS/JS includes. All page templates extend this.- Page templates: one
.htmlfile per page/feature. web/static/— CSS, htmx. Keep this minimal.
Architecture
The web UI runs as a separate binary (<service>-web) that communicates
with the API server via its gRPC interface. This separation means:
- The web UI has no direct database access.
- The API server enforces all authorization.
- The web UI can be deployed independently or omitted entirely.
Security
- CSRF protection via signed double-submit cookies on all mutating requests (POST/PUT/PATCH/DELETE).
- Session cookie:
HttpOnly,Secure,SameSite=Strict. - All user input is escaped by
html/template(the default).
Testing
Philosophy
Tests are written using the Go standard library testing package. No test
frameworks (testify, gomega, etc.) — the standard library is sufficient and
keeps dependencies minimal.
Patterns
func TestFeatureName(t *testing.T) {
// Setup: use t.TempDir() for isolated file system state.
dir := t.TempDir()
database, err := db.Open(filepath.Join(dir, "test.db"))
if err != nil {
t.Fatalf("open db: %v", err)
}
defer func() { _ = database.Close() }()
db.Migrate(database)
// Exercise the code under test.
// ...
// Assert with t.Fatal (not t.Error) for precondition failures.
if !bytes.Equal(got, want) {
t.Fatalf("got %q, want %q", got, want)
}
}
Guidelines
- Use
t.TempDir()for all file-system state. Never write to fixed paths. Cleanup is automatic. - Use
errors.Isfor error assertions, not string comparison. - No mocks for databases. Tests use real SQLite databases created in temp directories. This catches migration bugs that mocks would hide.
- Test files live alongside the code they test:
barrier.goandbarrier_test.goin the same package. - Test helpers call
t.Helper()so failures report the caller's line.
What to Test
| Layer | Test Strategy |
|---|---|
| Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases |
| Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access |
| API handlers | Request/response correctness, auth enforcement, error codes |
| Policy engine | Rule matching, priority ordering, default deny, admin bypass |
| CLI commands | Flag parsing, output format (lightweight) |
Linting & Static Analysis
Configuration
Every repository includes a .golangci.yaml with this philosophy:
fail loudly for security and correctness; everything else is a warning.
Required Linters
| Linter | Category | Purpose |
|---|---|---|
errcheck |
Correctness | Unhandled errors are silent failures |
govet |
Correctness | Printf mismatches, unreachable code, suspicious constructs |
ineffassign |
Correctness | Dead writes hide logic bugs |
unused |
Correctness | Unused variables and functions |
errorlint |
Error handling | Proper errors.Is/errors.As usage |
gosec |
Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection |
staticcheck |
Security | Deprecated APIs, mutex misuse, deep analysis |
revive |
Style | Go naming conventions, error return ordering |
gofmt |
Formatting | Standard Go formatting |
goimports |
Formatting | Import grouping and ordering |
Settings
errcheck:check-type-assertions: true(catchx.(*T)without ok check).govet: all analyzers enabled exceptshadow(too noisy for idiomatic Go).gosec: severity and confidence set tomedium. ExcludeG104(overlaps with errcheck).max-issues-per-linter: 0— report everything. No caps.- Test files: allow
G101(hardcoded credentials) for test fixtures.
Deployment
Container-First
Services are designed for container deployment but must also run as native systemd services. Both paths are first-class.
Docker
Multi-stage builds:
- Builder:
golang:1.23-alpine. Compile withCGO_ENABLED=0, strip symbols. - Runtime:
alpine:3.21. Non-root user (<service>), minimal attack surface.
If the service has separate API and web binaries, use separate Dockerfiles
(Dockerfile.api, Dockerfile.web) and a docker-compose.yml that wires
them together with a shared data volume.
systemd
Every service ships with:
| File | Purpose |
|---|---|
<service>.service |
Main service unit (API server) |
<service>-web.service |
Web UI unit (if applicable) |
<service>-backup.service |
Oneshot backup unit |
<service>-backup.timer |
Daily backup timer (02:00 UTC, 5-minute jitter) |
Security Hardening
All service units must include these security directives:
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
RestrictNamespaces=true
LockPersonality=true
MemoryDenyWriteExecute=true
RestrictRealtime=true
ReadWritePaths=/srv/<service>
The web UI unit should use ReadOnlyPaths=/srv/<service> instead of
ReadWritePaths — it has no reason to write to the data directory.
Install Script
deploy/scripts/install.sh handles:
- Create system user/group (idempotent).
- Install binary to
/usr/local/bin/. - Create
/srv/<service>/directory structure. - Install example config if none exists.
- Install systemd units and reload the daemon.
TLS
- Minimum TLS version: 1.3. No exceptions, no fallback cipher suites. Go's TLS 1.3 implementation manages cipher selection automatically.
- Timeouts: read 30s, write 30s, idle 120s.
- Certificate and key paths are required configuration — the service refuses to start without them.
Graceful Shutdown
Services handle SIGINT and SIGTERM, shutting down cleanly:
- Stop accepting new connections.
- Drain in-flight requests (with a timeout).
- Clean up resources (close databases, zeroize secrets if applicable).
- Exit.
Documentation
Required Files
| File | Purpose | Audience |
|---|---|---|
README.md |
Project overview, quick-start, and contributor guide | Everyone |
CLAUDE.md |
AI-assisted development context | Claude Code |
ARCHITECTURE.md |
Full system specification | Engineers |
RUNBOOK.md |
Operational procedures and incident response | Operators |
deploy/examples/<service>.toml |
Example configuration | Operators |
Suggested Files
These are not required for every project but should be created where applicable:
| File | When to Include | Purpose |
|---|---|---|
AUDIT.md |
Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
POLICY.md |
Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
README.md
The README is the front door. A new engineer or user should be able to understand what the service does and get it running from this file alone. It should contain:
- Project name and one-paragraph description.
- Quick-start instructions (build, configure, run).
- Link to
ARCHITECTURE.mdfor full technical details. - Link to
RUNBOOK.mdfor operational procedures. - License and contribution notes (if applicable).
Keep it concise. The README is not the spec — that's ARCHITECTURE.md.
CLAUDE.md
This file provides context for AI-assisted development. It should contain:
- Project overview (one paragraph).
- Build, test, and lint commands.
- High-level architecture summary.
- Project structure with directory descriptions.
- Ignored directories (runtime data, generated code).
- Critical rules (e.g. API sync requirements).
Keep it concise. AI tools read this on every interaction.
ARCHITECTURE.md
This is the canonical specification for the service. It should cover:
- System overview with a layered architecture diagram.
- Cryptographic design (if applicable): algorithms, key hierarchy.
- State machines and lifecycle (if applicable).
- Storage design.
- Authentication and authorization model.
- API surface (REST and gRPC, with tables of every endpoint).
- Web interface routes.
- Database schema (every table, every column).
- Configuration reference.
- Deployment guide.
- Security model: threat mitigations table and security invariants.
- Future work.
This document is the source of truth. When the code and the spec disagree, one of them has a bug.
RUNBOOK.md
The runbook is written for operators, not developers. It covers what to do when things go wrong and how to perform routine maintenance. It should contain:
- Service overview — what the service does, in one paragraph.
- Health checks — how to verify the service is healthy (endpoints, CLI commands, expected responses).
- Common operations — start, stop, restart, seal/unseal, backup, restore, log inspection.
- Alerting — what alerts exist, what they mean, and how to respond.
- Incident procedures — step-by-step playbooks for known failure modes (database corruption, certificate expiry, MCIAS outage, disk full, etc.).
- Escalation — when and how to escalate beyond the runbook.
Write runbook entries as numbered steps, not prose. An operator at 3 AM should be able to follow them without thinking.
AUDIT.md (Suggested)
For services that handle cryptography, secrets, PII, or authentication, maintain a security audit log. Each finding gets a numbered entry with:
- Description of the issue.
- Severity (critical, high, medium, low).
- Resolution status: open, resolved (with summary), or accepted (with rationale for accepting the risk).
The priority summary table at the bottom provides a scannable overview.
Resolved and accepted items are struck through but retained for history.
See Metacrypt's AUDIT.md for the reference format.
POLICY.md (Suggested)
For services with a policy engine or fine-grained access control, document the policy model separately from the architecture spec. It should cover:
- Rule structure (fields, types, semantics).
- Evaluation algorithm (match logic, priority, default effect).
- Resource path conventions and glob patterns.
- Action classification.
- API endpoints for policy CRUD.
- Common policy patterns with examples.
- Role summary (what each MCIAS role gets by default).
This document is aimed at administrators who need to write policy rules, not engineers who need to understand the implementation.
Engine/Feature Design Documents
For services with a modular architecture, each module gets its own design
document (e.g. engines/sshca.md). These are detailed implementation plans
that include:
- Overview and core concepts.
- Data model and storage layout.
- Lifecycle (initialization, teardown).
- Operations table with auth requirements.
- API definitions (gRPC and REST).
- Implementation steps (file-by-file).
- Security considerations.
- References to existing code patterns to follow.
Write these before writing code. They are the blueprint, not the afterthought.
Security
General Principles
- Default deny. Unauthenticated requests are rejected. Unauthorized requests are rejected. If in doubt, deny.
- Fail closed. If the service cannot verify authorization, it denies the request. If the database is unavailable, the service is unavailable.
- Least privilege. Service processes run as non-root. systemd units restrict filesystem access, syscalls, and capabilities.
- No local user databases. Authentication is always delegated to MCIAS.
Cryptographic Standards
| Purpose | Algorithm | Notes |
|---|---|---|
| Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation |
| Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance |
| Key derivation | Argon2id | Memory-hard; tune params to hardware |
| Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 |
| CSPRNG | crypto/rand |
All keys, nonces, salts, tokens |
| Constant-time comparison | crypto/subtle |
All secret comparisons |
- Never use RSA for new designs. Ed25519 and ECDSA are faster, produce smaller keys, and have simpler security models.
- Zeroize secrets from memory when they are no longer needed. Overwrite byte slices with zeros, nil out pointers.
- Never log secrets. Keys, passwords, tokens, and plaintext must never appear in log output.
Web Security
- CSRF tokens on all mutating requests.
SameSite=Stricton all cookies.html/templatefor automatic escaping.- Validate all input at system boundaries.
Development Workflow
Local Development
# Build and run both servers locally:
make devserver
# Or build everything and run the full pipeline:
make all
The devserver target builds both binaries and runs them against a local
config in srv/. The srv/ directory is gitignored — it holds your local
database, certificates, and configuration.
Pre-Push Checklist
Before pushing a branch:
make all # vet → lint → test → build
make proto-lint # if proto files changed
Proto Changes
- Edit
.protofiles inproto/<service>/v2/. - Run
make prototo regenerate Go code. - Run
make proto-lintto check for linting violations and breaking changes. - Update REST routes to match the new/changed RPCs.
- Update gRPC interceptor maps for any new RPCs.
- Update
ARCHITECTURE.mdAPI tables.
Adding a New Feature
- Design first. Write or update the relevant design document. For a new
engine or major subsystem, create a new doc in
docs/orengines/. - Implement. Follow existing patterns — the design doc should reference specific files and line numbers.
- Test. Write tests alongside the implementation.
- Update docs. Update
ARCHITECTURE.md,CLAUDE.md, and route tables. - Verify. Run
make all.
CLI Commands
Every service uses cobra for CLI commands. Standard subcommands:
| Command | Purpose |
|---|---|
server |
Start the service |
init |
First-time setup (if applicable) |
status |
Query a running instance's health |
snapshot |
Create a database backup |
Add service-specific subcommands as needed (e.g. migrate-aad, unseal).
Each command lives in its own file in cmd/<service>/.