Initial import.
This commit is contained in:
12
.gitignore
vendored
Normal file
12
.gitignore
vendored
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# infrastructure / secrets
|
||||||
|
/ca
|
||||||
|
|
||||||
|
# project directories: these are separate git repos
|
||||||
|
/mcat
|
||||||
|
/mcias
|
||||||
|
/mc-proxy
|
||||||
|
/mcr
|
||||||
|
/metacrypt
|
||||||
|
/mcdsl
|
||||||
|
/mcns
|
||||||
|
|
||||||
76
CLAUDE.md
Normal file
76
CLAUDE.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Metacircular is a multi-service personal infrastructure platform. This root repository is a workspace container — each subdirectory is a separate Git repo (gitignored here). The authoritative platform-wide standards live in `engineering-standards.md`.
|
||||||
|
|
||||||
|
## Project Map
|
||||||
|
|
||||||
|
| Directory | Purpose | Language |
|
||||||
|
|-----------|---------|----------|
|
||||||
|
| `mcias/` | Identity and Access Service — central SSO/IAM, all other services delegate auth here | Go |
|
||||||
|
| `metacrypt/` | Cryptographic service engine — encrypted secrets, PKI/CA, SSH CA, transit encryption | Go |
|
||||||
|
| `mc-proxy/` | TLS proxy and router — L4 passthrough or L7 terminating, PROXY protocol, firewall | Go |
|
||||||
|
| `mcr/` | OCI container registry — integrated with MCIAS for auth and policy-based push/pull | Go |
|
||||||
|
| `mcat/` | MCIAS login policy tester — lightweight web app to test and audit login policies | Go |
|
||||||
|
| `mcdsl/` | Standard library — shared packages for auth, db, config, TLS servers, CSRF, snapshots | Go |
|
||||||
|
| `ca/` | PKI infrastructure and secrets for dev/test (not source code, gitignored) | — |
|
||||||
|
|
||||||
|
Each subproject has its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`. When working in a subproject, read its own CLAUDE.md first.
|
||||||
|
|
||||||
|
## Service Dependencies
|
||||||
|
|
||||||
|
MCIAS is the root dependency — every other service authenticates through it. No service maintains its own user database. The dependency graph:
|
||||||
|
|
||||||
|
```
|
||||||
|
mcias (standalone — no MCIAS dependency)
|
||||||
|
├── metacrypt (uses MCIAS for auth)
|
||||||
|
├── mc-proxy (uses MCIAS for admin auth)
|
||||||
|
├── mcr (uses MCIAS for auth + policy)
|
||||||
|
└── mcat (tests MCIAS login policies)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Standard Build Commands (all subprojects)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make all # vet → lint → test → build (the CI pipeline)
|
||||||
|
make build # go build ./...
|
||||||
|
make test # go test ./...
|
||||||
|
make vet # go vet ./...
|
||||||
|
make lint # golangci-lint run ./...
|
||||||
|
make proto # regenerate gRPC code from .proto files
|
||||||
|
make proto-lint # buf lint + buf breaking
|
||||||
|
make devserver # build and run locally against srv/ config
|
||||||
|
make docker # build container image
|
||||||
|
make clean # remove binaries
|
||||||
|
```
|
||||||
|
|
||||||
|
Run a single test: `go test ./internal/auth/ -run TestTokenValidation`
|
||||||
|
|
||||||
|
## Critical Rules
|
||||||
|
|
||||||
|
1. **REST/gRPC sync**: Every REST endpoint must have a corresponding gRPC RPC, updated in the same change.
|
||||||
|
2. **gRPC interceptor maps**: New RPCs must be added to `authRequiredMethods`, `adminRequiredMethods`, and/or `sealRequiredMethods`. Forgetting this is a security defect.
|
||||||
|
3. **No CGo in production**: All builds use `CGO_ENABLED=0`. Use `modernc.org/sqlite`, not `mattn/go-sqlite3`.
|
||||||
|
4. **No test frameworks**: Use stdlib `testing` only. Real SQLite in `t.TempDir()`, no mocks for databases.
|
||||||
|
5. **Default deny**: Unauthenticated and unauthorized requests are always rejected. Admin detection comes solely from the MCIAS `admin` role.
|
||||||
|
6. **Proto versioning**: Start at v1. Only create v2 for breaking changes. Non-breaking additions go in-place.
|
||||||
|
|
||||||
|
## Architecture Patterns
|
||||||
|
|
||||||
|
- **Seal/Unseal**: Metacrypt starts sealed and requires a password to unlock (Vault-like pattern). Key hierarchy: Password → Argon2id → KWK → MEK → per-engine DEKs.
|
||||||
|
- **Web UI separation**: Web UIs run as separate binaries communicating with the API server via gRPC. No direct DB access from the web tier.
|
||||||
|
- **Config**: TOML with env var overrides (`SERVICENAME_*`). All runtime data in `/srv/<service>/`.
|
||||||
|
- **Policy engines**: Priority-based ACL rules, default deny, admin bypass. See metacrypt's implementation as reference.
|
||||||
|
- **Auth flow**: Client → service `/v1/auth/login` → MCIAS client library → MCIAS validates → bearer token returned. Token validation cached 30s keyed by SHA-256 of token.
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
- Go 1.25+, chi router, cobra CLI, go-toml/v2
|
||||||
|
- SQLite via modernc.org/sqlite (pure Go), WAL mode, foreign keys on
|
||||||
|
- gRPC + protobuf, buf for linting
|
||||||
|
- htmx + Go html/template for web UIs
|
||||||
|
- golangci-lint v2 with errcheck, gosec, staticcheck, revive
|
||||||
|
- TLS 1.3 minimum, AES-256-GCM, Argon2id, Ed25519
|
||||||
104
README.md
Normal file
104
README.md
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
# Metacircular Dynamics
|
||||||
|
|
||||||
|
Metacircular Dynamics is a self-hosted personal infrastructure platform. The
|
||||||
|
name comes from the tradition of metacircular evaluators in Lisp — a system
|
||||||
|
defined in terms of itself — by way of SICP and Common Lisp projects that
|
||||||
|
preceded this work. The infrastructure is metacircular in the same sense: the
|
||||||
|
platform manages, secures, and hosts its own services.
|
||||||
|
|
||||||
|
Every component is self-hosted, every dependency is controlled, and the entire
|
||||||
|
stack is operable by one person. No cloud providers, no third-party auth, no
|
||||||
|
external databases. The platform is designed for a small number of machines — a
|
||||||
|
personal homelab or a handful of VPSes — not for hyperscale.
|
||||||
|
|
||||||
|
All services are written in Go and follow shared
|
||||||
|
[engineering standards](engineering-standards.md). Full platform documentation
|
||||||
|
lives in [docs/metacircular.md](docs/metacircular.md).
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
| Component | Purpose | Status |
|
||||||
|
|-----------|---------|--------|
|
||||||
|
| **MCIAS** | Identity and access — the root of trust. SSO, token issuance, role management, login policy. Every other service delegates auth here. | Implemented |
|
||||||
|
| **Metacrypt** | Cryptographic services — PKI/CA, transit encryption, encrypted secret storage behind a seal/unseal barrier. Issues TLS certificates for the platform. | Implemented |
|
||||||
|
| **MCR** | Container registry — OCI-compliant image storage with MCIAS auth and policy-controlled push/pull. | Implemented |
|
||||||
|
| **MC-Proxy** | Node ingress — TLS proxy and router. L4 passthrough or L7 terminating (per-route), PROXY protocol, firewall with rate limiting and GeoIP. | Implemented |
|
||||||
|
| **MCNS** | Networking — DNS and address management for the platform. | Planned |
|
||||||
|
| **MCP** | Control plane — operator-driven deployment, service registry, data transfer, master/agent container lifecycle. | Planned |
|
||||||
|
|
||||||
|
Shared library: **MCDSL** — standard library for all services (auth, db,
|
||||||
|
config, TLS server, CSRF, snapshots).
|
||||||
|
|
||||||
|
Supporting tool: **MCAT** — lightweight web app for testing MCIAS login
|
||||||
|
policies.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
MCIAS (standalone — the root of trust)
|
||||||
|
├── Metacrypt (auth via MCIAS; provides certs to all services)
|
||||||
|
├── MCR (auth via MCIAS; stores images pulled by MCP)
|
||||||
|
├── MCNS (auth via MCIAS; provides DNS for the platform)
|
||||||
|
├── MCP (auth via MCIAS; orchestrates everything; owns service registry)
|
||||||
|
└── MC-Proxy (pre-auth; routes traffic to services behind it)
|
||||||
|
```
|
||||||
|
|
||||||
|
Each machine is an **MC Node**. On every node, **MC-Proxy** accepts outside
|
||||||
|
connections and routes by TLS SNI — either relaying raw TCP (L4) or
|
||||||
|
terminating TLS and reverse proxying HTTP/2 (L7), per-route. **MCP Agent** on
|
||||||
|
each node receives commands from **MCP Master** (which runs on the operator's
|
||||||
|
workstation) and manages containers via the local runtime. Core infrastructure
|
||||||
|
(MCIAS, Metacrypt, MCR) runs on nodes like any other workload.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────┐ ┌──────────────┐
|
||||||
|
│ Core Infra │ │ MCP Master │
|
||||||
|
│ (e.g. MCIAS) │ │ │
|
||||||
|
└────────┬─────────┘ └──────┬───────┘
|
||||||
|
│ │ C2
|
||||||
|
Outside ┌─────────────▼─────────────────────▼──────────┐
|
||||||
|
Client ────▶│ MC Node │
|
||||||
|
│ ┌───────────┐ │
|
||||||
|
│ │ MC-Proxy │──┬──────┬──────┐ │
|
||||||
|
│ └───────────┘ │ │ │ │
|
||||||
|
│ ┌───▼┐ ┌──▼─┐ ┌─▼──┐ ┌─────┐ │
|
||||||
|
│ │ α │ │ β │ │ γ │ │ MCP │ │
|
||||||
|
│ └────┘ └────┘ └────┘ │Slave│ │
|
||||||
|
│ └──┬──┘ │
|
||||||
|
│ ┌────▼───┐│
|
||||||
|
│ │Container│
|
||||||
|
│ │Runtime │
|
||||||
|
│ └────────┘│
|
||||||
|
└──────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Design Principles
|
||||||
|
|
||||||
|
- **Sovereignty** — self-hosted end to end; no SaaS dependencies
|
||||||
|
- **Simplicity** — SQLite over Postgres, stdlib testing, pure Go, htmx, single binaries
|
||||||
|
- **Consistency** — every service follows identical patterns (layout, config, auth, deployment)
|
||||||
|
- **Security as structure** — default deny, TLS 1.3 minimum, interceptor-map auth, encrypted-at-rest secrets
|
||||||
|
- **Design before code** — ARCHITECTURE.md is the spec, written before implementation
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
Go 1.25+, SQLite (modernc.org/sqlite), chi router, gRPC + protobuf, htmx +
|
||||||
|
Go html/template, golangci-lint v2, Ed25519/Argon2id/AES-256-GCM, TLS 1.3,
|
||||||
|
container-first deployment (Docker + systemd).
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
This root repository is a workspace container. Each subdirectory is a separate
|
||||||
|
Git repo with its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`:
|
||||||
|
|
||||||
|
```
|
||||||
|
metacircular/
|
||||||
|
├── mcias/ Identity and Access Service
|
||||||
|
├── metacrypt/ Cryptographic service engine
|
||||||
|
├── mcr/ Container registry
|
||||||
|
├── mc-proxy/ TLS proxy and router
|
||||||
|
├── mcat/ Login policy tester
|
||||||
|
├── mcdsl/ Standard library (shared packages)
|
||||||
|
├── ca/ PKI infrastructure (dev/test, not source code)
|
||||||
|
└── docs/ Platform-wide documentation
|
||||||
|
```
|
||||||
927
docs/metacircular.md
Normal file
927
docs/metacircular.md
Normal file
@@ -0,0 +1,927 @@
|
|||||||
|
# Metacircular Infrastructure
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Metacircular Dynamics is a personal infrastructure platform. The name comes
|
||||||
|
from the tradition of metacircular evaluators in Lisp — a system defined in
|
||||||
|
terms of itself — by way of SICP and Common Lisp projects that preceded this
|
||||||
|
work. The infrastructure is metacircular in the same sense: the platform
|
||||||
|
manages, secures, and hosts its own services.
|
||||||
|
|
||||||
|
The goal is sovereign infrastructure. Every component is self-hosted, every
|
||||||
|
dependency is controlled, and the entire stack is operable by one person. There
|
||||||
|
are no cloud provider dependencies, no third-party auth providers, no external
|
||||||
|
databases. When a Metacircular node boots, it connects to Metacircular services
|
||||||
|
for identity, certificates, container images, and workload scheduling.
|
||||||
|
|
||||||
|
All services are written in Go and follow a shared set of engineering standards
|
||||||
|
(see `engineering-standards.md`). The platform is designed for a small number of
|
||||||
|
machines — a personal homelab or a handful of VPSes — not for hyperscale.
|
||||||
|
|
||||||
|
## Philosophy
|
||||||
|
|
||||||
|
**Sovereignty.** You own the whole stack. Identity, certificates, secrets,
|
||||||
|
container images, DNS, networking — all self-hosted. No SaaS dependency means
|
||||||
|
no vendor lock-in, no surprise deprecations, and no trust delegation to third
|
||||||
|
parties.
|
||||||
|
|
||||||
|
**Simplicity over sophistication.** SQLite over Postgres. Stdlib `testing` over
|
||||||
|
test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments
|
||||||
|
over microservice orchestrators. The right tool is the simplest one that solves
|
||||||
|
the problem without creating a new one.
|
||||||
|
|
||||||
|
**Consistency as leverage.** Every service follows identical patterns: the same
|
||||||
|
directory layout, the same Makefile targets, the same config format, the same
|
||||||
|
auth integration, the same deployment model. Knowledge of one service transfers
|
||||||
|
instantly to all others. A new service can be stood up by copying the skeleton.
|
||||||
|
|
||||||
|
**Security as structure.** Security is not a feature bolted on after the fact.
|
||||||
|
Default deny is the starting posture. TLS 1.3 is the minimum, not a goal.
|
||||||
|
Interceptor maps make "forgot to add auth" a visible, reviewable omission
|
||||||
|
rather than a silent runtime failure. Secrets are encrypted at rest behind a
|
||||||
|
seal/unseal barrier. Every service delegates identity to a single root of
|
||||||
|
trust.
|
||||||
|
|
||||||
|
**Design before code.** The architecture document is written before
|
||||||
|
implementation begins. It is the spec, not the afterthought. When the code and
|
||||||
|
the spec disagree, one of them has a bug.
|
||||||
|
|
||||||
|
## High-Level Overview
|
||||||
|
|
||||||
|
Metacircular infrastructure is built from six core components, plus a shared
|
||||||
|
standard library (**MCDSL**) that provides the common patterns all services
|
||||||
|
depend on (auth integration, database setup, config loading, TLS server
|
||||||
|
bootstrapping, CSRF, snapshots):
|
||||||
|
|
||||||
|
- **MCIAS** — Identity and access. The root of trust for all other services.
|
||||||
|
Handles authentication, token issuance, role management, and login policy
|
||||||
|
enforcement. Every other component delegates auth here.
|
||||||
|
|
||||||
|
- **Metacrypt** — Cryptographic services. PKI/CA, SSH CA, transit encryption,
|
||||||
|
and encrypted secret storage behind a Vault-inspired seal/unseal barrier.
|
||||||
|
Issues the TLS certificates that every other service depends on.
|
||||||
|
|
||||||
|
- **MCR** — Container registry. OCI-compliant image storage. MCP directs nodes
|
||||||
|
to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.
|
||||||
|
|
||||||
|
- **MCNS** — Networking. DNS and address management for the platform.
|
||||||
|
|
||||||
|
- **MCP** — Control plane. The orchestrator. A master/agent architecture that
|
||||||
|
manages workload scheduling, container lifecycle, service registry, data
|
||||||
|
transfer, and node state across the platform.
|
||||||
|
|
||||||
|
- **MC-Proxy** — Node ingress. A TLS proxy and router that sits on every node,
|
||||||
|
accepts outside connections, and routes them to the correct service — either
|
||||||
|
as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.
|
||||||
|
|
||||||
|
These components form a dependency graph rooted at MCIAS:
|
||||||
|
|
||||||
|
```
|
||||||
|
MCIAS (standalone — the root of trust)
|
||||||
|
├── Metacrypt (uses MCIAS for auth; provides certs to all services)
|
||||||
|
├── MCR (uses MCIAS for auth; stores images pulled by MCP)
|
||||||
|
├── MCNS (uses MCIAS for auth; provides DNS for the platform)
|
||||||
|
├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
|
||||||
|
└── MC-Proxy (pre-auth; routes traffic to services behind it)
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Node Model
|
||||||
|
|
||||||
|
The unit of deployment is the **MC Node** — a machine (physical or virtual)
|
||||||
|
that participates in the Metacircular platform.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────┐ ┌──────────────┐
|
||||||
|
│ System / Core │ │ MCP │
|
||||||
|
│ Infrastructure │ │ Master │
|
||||||
|
│ (e.g. MCIAS) │ │ │
|
||||||
|
└────────┬─────────┘ └──────┬───────┘
|
||||||
|
│ │ C2
|
||||||
|
│ │
|
||||||
|
Outside ┌─────────────▼─────────────────────▼──────────┐
|
||||||
|
Client ────▶│ MC Node │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────┐ │
|
||||||
|
│ │ MC-Proxy │──┬──────┬──────┐ │
|
||||||
|
│ └───────────┘ │ │ │ │
|
||||||
|
│ ┌───▼┐ ┌──▼─┐ ┌─▼──┐ ┌─────┐ │
|
||||||
|
│ │ α │ │ β │ │ γ │ │ MCP │ │
|
||||||
|
│ └────┘ └────┘ └────┘ │Slave│ │
|
||||||
|
│ └──┬──┘ │
|
||||||
|
│ ┌────▼───┐│
|
||||||
|
│ │Docker/ ││
|
||||||
|
│ │etc. ││
|
||||||
|
│ └────────┘│
|
||||||
|
└──────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Outside clients connect to **MC-Proxy**, which inspects the TLS SNI hostname
|
||||||
|
and routes to the correct service (α, β, γ) — either as a raw TCP relay or
|
||||||
|
via TLS-terminating HTTP/2 reverse proxy, per-route. The **MCP Agent** on each
|
||||||
|
node receives C2 commands from the **MCP Master** (running on the operator's
|
||||||
|
workstation) and manages local container lifecycle via the container runtime.
|
||||||
|
Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any
|
||||||
|
other workload.
|
||||||
|
|
||||||
|
### The Network Model
|
||||||
|
|
||||||
|
Metacircular nodes are connected via an **encrypted overlay network** — a
|
||||||
|
self-managed WireGuard mesh, Tailscale, or similar. No component has a hard
|
||||||
|
dependency on a specific overlay implementation; the platform requires only
|
||||||
|
that nodes can reach each other over encrypted links.
|
||||||
|
|
||||||
|
```
|
||||||
|
Public Internet
|
||||||
|
│
|
||||||
|
┌─────────▼──────────┐
|
||||||
|
│ Edge MC-Proxy │ VPS (public IP)
|
||||||
|
│ :443 │
|
||||||
|
└─────────┬──────────┘
|
||||||
|
│ PROXY protocol v2
|
||||||
|
┌─────────▼──────────────────────────────────┐
|
||||||
|
│ Encrypted Overlay (e.g. WireGuard) │
|
||||||
|
│ │
|
||||||
|
┌───────────┴──┐ ┌──────────┐ ┌──────────┐ ┌──────┴─────┐
|
||||||
|
│ Origin │ │ Node B │ │ Node C │ │ Operator │
|
||||||
|
│ MC-Proxy │ │ (MCP │ │ │ │ Workstation│
|
||||||
|
│ + services │ │ agent) │ │ (MCP │ │ (MCP │
|
||||||
|
│ (MCP agent) │ │ │ │ agent) │ │ Master) │
|
||||||
|
└──────────────┘ └──────────┘ └──────────┘ └────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**External traffic** flows from the internet through an edge MC-Proxy (on a
|
||||||
|
public VPS), which forwards via PROXY protocol over the overlay to an origin
|
||||||
|
MC-Proxy on the private network. The overlay preserves the real client IP
|
||||||
|
across the hop.
|
||||||
|
|
||||||
|
**Internal traffic** (MCP C2, inter-service communication, MCNS DNS) flows
|
||||||
|
directly over the overlay. MCP's C2 channel is gRPC over whatever link exists
|
||||||
|
between master and agent — the overlay provides the transport.
|
||||||
|
|
||||||
|
The overlay network itself is a candidate for future Metacircular management
|
||||||
|
(a self-hosted WireGuard mesh manager), consistent with the sovereignty
|
||||||
|
principle of minimizing third-party dependencies.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Catalog
|
||||||
|
|
||||||
|
### MCIAS — Metacircular Identity and Access Service
|
||||||
|
|
||||||
|
MCIAS is the root of trust for the entire platform. Every other service
|
||||||
|
delegates authentication to it; no service maintains its own user database.
|
||||||
|
|
||||||
|
**What it provides:**
|
||||||
|
|
||||||
|
- **Authentication.** Username/password with optional TOTP and FIDO2/WebAuthn.
|
||||||
|
Credentials are verified by MCIAS and a signed JWT bearer token is returned.
|
||||||
|
Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of
|
||||||
|
the token).
|
||||||
|
|
||||||
|
- **Role-based access.** Three roles — `admin` (full access, policy bypass),
|
||||||
|
`user` (policy-governed), `guest` (service-dependent restrictions). Admin
|
||||||
|
detection comes solely from the MCIAS `admin` role; services never promote
|
||||||
|
users locally.
|
||||||
|
|
||||||
|
- **Account types.** Human accounts (interactive users) and system accounts
|
||||||
|
(service-to-service). Both authenticate the same way; system accounts enable
|
||||||
|
automated workflows.
|
||||||
|
|
||||||
|
- **Login policy.** Priority-based ACL rules that control who can log into
|
||||||
|
which services. Rules can target roles, account types, service names, and
|
||||||
|
tags. This allows operators to restrict access per-service (e.g., deny
|
||||||
|
`guest` from services tagged `env:restricted`) without changing the
|
||||||
|
services themselves.
|
||||||
|
|
||||||
|
- **Token lifecycle.** Issuance, validation, renewal, and revocation.
|
||||||
|
Ed25519-signed JWTs. Short expiry with renewal support.
|
||||||
|
|
||||||
|
**How other services integrate:** Every service includes an `[mcias]` config
|
||||||
|
section with the MCIAS server URL, a `service_name`, and optional `tags`. At
|
||||||
|
login time, the service forwards credentials to MCIAS along with this context.
|
||||||
|
MCIAS evaluates login policy against the service context, verifies credentials,
|
||||||
|
and returns a bearer token. The MCIAS Go client library
|
||||||
|
(`git.wntrmute.dev/kyle/mcias/clients/go`) handles this flow.
|
||||||
|
|
||||||
|
**Status:** Implemented. v1.0.0 complete.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Metacrypt — Cryptographic Service Engine
|
||||||
|
|
||||||
|
Metacrypt provides cryptographic resources to the platform through a modular
|
||||||
|
engine architecture, backed by an encrypted storage barrier inspired by
|
||||||
|
HashiCorp Vault.
|
||||||
|
|
||||||
|
**What it provides:**
|
||||||
|
|
||||||
|
- **PKI / Certificate Authority.** X.509 certificate issuance. Root and
|
||||||
|
intermediate CAs, certificate signing, CRL management, ACME protocol
|
||||||
|
support. This is how every service in the platform gets its TLS
|
||||||
|
certificates.
|
||||||
|
|
||||||
|
- **SSH CA.** (Planned.) SSH certificate signing for host and user
|
||||||
|
certificates, replacing static SSH key management.
|
||||||
|
|
||||||
|
- **Transit encryption.** (Planned.) Encrypt and decrypt data without exposing
|
||||||
|
keys to the caller. Envelope encryption for services that need to protect
|
||||||
|
data at rest without managing their own key material.
|
||||||
|
|
||||||
|
- **User-to-user encryption.** (Planned.) End-to-end encryption between users,
|
||||||
|
with key management handled by Metacrypt.
|
||||||
|
|
||||||
|
**Seal/unseal model:** Metacrypt starts sealed. An operator provides a password
|
||||||
|
which derives (via Argon2id) a key-wrapping key, which decrypts the master
|
||||||
|
encryption key (MEK), which in turn unwraps per-engine data encryption keys
|
||||||
|
(DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise
|
||||||
|
of one engine's key does not expose another's data.
|
||||||
|
|
||||||
|
```
|
||||||
|
Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Engine architecture:** Engines are pluggable providers that register with a
|
||||||
|
central registry. Each engine mount has a type, a name, its own DEK, and its
|
||||||
|
own configuration. The engine interface handles initialization, seal/unseal
|
||||||
|
lifecycle, and request routing. New engine types plug in without modifying the
|
||||||
|
core.
|
||||||
|
|
||||||
|
**Policy:** Fine-grained ACL rules control which users can perform which
|
||||||
|
operations on which engine mounts. Priority-based evaluation, default deny,
|
||||||
|
admin bypass. See Metacrypt's `POLICY.md` for the full model.
|
||||||
|
|
||||||
|
**Status:** Implemented. CA engine complete with ACME support. SSH CA, transit,
|
||||||
|
and user-to-user engines planned.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MCR — Metacircular Container Registry
|
||||||
|
|
||||||
|
MCR is an OCI Distribution Spec-compliant container registry. It stores and
|
||||||
|
serves the container images that MCP deploys across the platform.
|
||||||
|
|
||||||
|
**What it provides:**
|
||||||
|
|
||||||
|
- **OCI-compliant image storage.** Pull, push, tag, and delete container
|
||||||
|
images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite,
|
||||||
|
blobs on the filesystem.
|
||||||
|
|
||||||
|
- **Authenticated access.** No anonymous access. MCR uses the OCI token
|
||||||
|
authentication flow: clients hit `/v2/`, receive a 401 with a token
|
||||||
|
endpoint, authenticate via MCIAS, and use the returned JWT for subsequent
|
||||||
|
requests.
|
||||||
|
|
||||||
|
- **Policy-controlled push/pull.** Fine-grained ACL rules govern who can push
|
||||||
|
to or pull from which repositories. Integrated with MCIAS roles.
|
||||||
|
|
||||||
|
- **Garbage collection.** Unreferenced blobs are cleaned up via the admin CLI
|
||||||
|
(`mcrctl`).
|
||||||
|
|
||||||
|
**How it fits in:** MCP directs nodes to pull images from MCR. When a workload
|
||||||
|
is scheduled, MCP tells the node's agent which image to pull and where to get
|
||||||
|
it. MCR sits behind an MC-Proxy instance for TLS routing.
|
||||||
|
|
||||||
|
**Status:** Implemented. Phase 12 (web UI) complete.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MC-Proxy — TLS Proxy and Router
|
||||||
|
|
||||||
|
MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections,
|
||||||
|
extracts the SNI hostname, and routes to the correct backend. Each route is
|
||||||
|
independently configured as either **L4 passthrough** (raw TCP relay, no TLS
|
||||||
|
termination) or **L7 terminating** (terminates TLS, reverse proxies HTTP/2 and
|
||||||
|
HTTP/1.1 including gRPC). Both modes coexist on the same listener.
|
||||||
|
|
||||||
|
**What it provides:**
|
||||||
|
|
||||||
|
- **SNI-based routing.** A route table maps hostnames to backend addresses.
|
||||||
|
Exact match, case-insensitive. Multiple listeners can bind different ports,
|
||||||
|
each with its own route table, all sharing the same global firewall.
|
||||||
|
|
||||||
|
- **Dual-mode proxying.** L4 routes relay raw TCP — backends see the original
|
||||||
|
TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy
|
||||||
|
and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS),
|
||||||
|
with header injection (`X-Forwarded-For`, `X-Real-IP`), gRPC streaming
|
||||||
|
support, and trailer forwarding.
|
||||||
|
|
||||||
|
- **Global firewall.** Every connection is evaluated before routing: per-IP
|
||||||
|
rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2).
|
||||||
|
Blocked connections get a TCP RST — no error messages, no TLS alerts.
|
||||||
|
|
||||||
|
- **PROXY protocol.** Listeners can accept v1/v2 headers from upstream proxies
|
||||||
|
to learn the real client IP. Routes can send v2 headers to downstream
|
||||||
|
backends. This enables multi-hop deployments — a public edge MC-Proxy on a
|
||||||
|
VPS forwarding over the encrypted overlay to a private origin MC-Proxy —
|
||||||
|
while preserving the real client IP for firewall evaluation and logging.
|
||||||
|
|
||||||
|
- **Runtime management.** Routes and firewall rules can be updated at runtime
|
||||||
|
via a gRPC admin API on a Unix domain socket (filesystem permissions for
|
||||||
|
access control, no network exposure). State is persisted to SQLite with
|
||||||
|
write-through semantics.
|
||||||
|
|
||||||
|
**How it fits in:** MC-Proxy is pre-auth infrastructure. It sits in front of
|
||||||
|
everything on a node. Outside clients connect to MC-Proxy on well-known ports
|
||||||
|
(443, 8443, etc.) and MC-Proxy routes to the correct backend based on the
|
||||||
|
hostname the client is trying to reach. A typical production deployment uses
|
||||||
|
two instances — an edge proxy on a public VPS and an origin proxy on the
|
||||||
|
private network, connected over the overlay with PROXY protocol preserving
|
||||||
|
client IPs across the hop.
|
||||||
|
|
||||||
|
**Status:** Implemented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MCNS — Metacircular Networking Service
|
||||||
|
|
||||||
|
MCNS provides DNS for the platform. It manages two internal zones and serves
|
||||||
|
as the name resolution layer for the Metacircular network. Service discovery
|
||||||
|
(which services run where) is owned by MCP; MCNS translates those assignments
|
||||||
|
into DNS records.
|
||||||
|
|
||||||
|
**What it will provide:**
|
||||||
|
|
||||||
|
- **Internal DNS.** MCNS is authoritative for the internal zones of the
|
||||||
|
Metacircular network. Three zones serve different purposes:
|
||||||
|
|
||||||
|
| Zone | Example | Purpose |
|
||||||
|
|------|---------|---------|
|
||||||
|
| `*.metacircular.net` | `metacrypt.metacircular.net` | External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy. |
|
||||||
|
| `*.mcp.metacircular.net` | `vade.mcp.metacircular.net` | Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs). |
|
||||||
|
| `*.svc.mcp.metacircular.net` | `metacrypt.svc.mcp.metacircular.net` | Internal service addresses. Maps service names to the node and port where they currently run. |
|
||||||
|
|
||||||
|
The `*.mcp.metacircular.net` and `*.svc.mcp.metacircular.net` zones are
|
||||||
|
managed by MCNS. The external `*.metacircular.net` zone is managed separately
|
||||||
|
(existing DNS infrastructure) and is mostly static.
|
||||||
|
|
||||||
|
- **MCP integration.** MCP pushes DNS record updates to MCNS after deploy and
|
||||||
|
migrate operations. When MCP starts service α on node X, it calls the MCNS
|
||||||
|
API to set `α.svc.mcp.metacircular.net` to X's address. Services and clients
|
||||||
|
using internal DNS names automatically resolve to the right place without
|
||||||
|
config changes.
|
||||||
|
|
||||||
|
- **Record management API.** Authenticated via MCIAS. MCP is the primary
|
||||||
|
consumer for dynamic updates. Operators can also manage records directly
|
||||||
|
for static entries (node addresses, aliases).
|
||||||
|
|
||||||
|
**How it fits in:** MCNS answers "what is the address of X?" MCP answers "where
|
||||||
|
is service α running?" and pushes the answer to MCNS. This separation means
|
||||||
|
services can use stable DNS names in their configs (e.g.,
|
||||||
|
`mcias.svc.mcp.metacircular.net` in `[mcias] server_url`) that survive
|
||||||
|
migration without config changes.
|
||||||
|
|
||||||
|
**Status:** Not yet implemented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MCP — Metacircular Control Plane
|
||||||
|
|
||||||
|
MCP is the orchestrator. It manages what runs where across the platform. The
|
||||||
|
deployment model is operator-driven: the user says "deploy service α" and MCP
|
||||||
|
handles the rest. MCP Master runs on the operator's workstation; agents run on
|
||||||
|
each managed node.
|
||||||
|
|
||||||
|
**What it will provide:**
|
||||||
|
|
||||||
|
- **Service registry.** MCP is the source of truth for what is running where.
|
||||||
|
It tracks every service, which node it's on, and its current state. Other
|
||||||
|
components that need to find a service (including MC-Proxy for route table
|
||||||
|
updates) query MCP's registry.
|
||||||
|
|
||||||
|
- **Deploy.** The operator says "deploy α". MCP checks if α is already running
|
||||||
|
somewhere. If it is, MCP pulls the new container image on that node and
|
||||||
|
restarts the service in place. If it isn't running, MCP selects a node
|
||||||
|
(the operator can pin to a specific node but shouldn't have to), transfers
|
||||||
|
the initial config, pulls the image from MCR, starts the container, and
|
||||||
|
pushes a DNS update to MCNS (`α.svc.mcp.metacircular.net` → node address).
|
||||||
|
|
||||||
|
- **Migrate.** Move a service from one node to another. MCP snapshots the
|
||||||
|
service's `/srv/<service>/` directory on the source node (as a tar.zst
|
||||||
|
image), transfers it to the destination, extracts it, starts the service,
|
||||||
|
stops it on the source, and updates MCNS so DNS points to the new location.
|
||||||
|
The `/srv/<service>/` convention makes this uniform across all services.
|
||||||
|
|
||||||
|
- **Data transfer.** The C2 channel supports file-level operations between
|
||||||
|
master and agents: copy or fetch individual files (push a config, pull a
|
||||||
|
log), and transfer tar.zst archives for bulk snapshot/restore of service
|
||||||
|
data directories. This is the foundation for both migration and backup.
|
||||||
|
|
||||||
|
- **Service snapshots.** To snapshot `/srv/<service>/`, the agent runs
|
||||||
|
`VACUUM INTO` to create a consistent database copy, then builds a tar.zst
|
||||||
|
that includes the full directory but **excludes** live database files
|
||||||
|
(`*.db`, `*.db-wal`, `*.db-shm`) and the `backups/` directory. The
|
||||||
|
temporary VACUUM INTO copy is injected into the archive as `<service>.db`.
|
||||||
|
The result is a clean, minimal archive that extracts directly into a
|
||||||
|
working service directory on the destination.
|
||||||
|
|
||||||
|
- **Container lifecycle.** Start, stop, restart, and update containers on
|
||||||
|
nodes. MCP Master issues commands; agents on each node execute them against
|
||||||
|
the local container runtime (Docker, etc.).
|
||||||
|
|
||||||
|
- **Master/agent architecture.** MCP Master runs on the operator's machine.
|
||||||
|
Agents run on every managed node, receiving C2 (command and control) from
|
||||||
|
Master, reporting node status, and managing local workloads. The C2 channel
|
||||||
|
is authenticated via MCIAS. The master does not need to be always-on —
|
||||||
|
agents keep running their workloads independently; the master is needed only
|
||||||
|
to issue new commands.
|
||||||
|
|
||||||
|
- **Node management.** Track which nodes are in the platform, their health,
|
||||||
|
available resources, and running workloads.
|
||||||
|
|
||||||
|
- **Scheduling.** When placing a new service, MCP selects a node based on
|
||||||
|
available resources and any operator-specified constraints. The operator can
|
||||||
|
override with an explicit node, but the default is MCP's choice.
|
||||||
|
|
||||||
|
**How it fits in:** MCP is the piece that ties everything together. MCIAS
|
||||||
|
provides identity, Metacrypt provides certificates, MCR provides images, MCNS
|
||||||
|
provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the
|
||||||
|
map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the
|
||||||
|
infrastructure metacircular: the control plane deploys and manages the very
|
||||||
|
services it depends on.
|
||||||
|
|
||||||
|
**Container-first design:** All Metacircular services are built as containers
|
||||||
|
(multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP
|
||||||
|
can deploy them. The systemd unit files exist as a fallback and for bootstrap —
|
||||||
|
the long-term deployment model is MCP-managed containers.
|
||||||
|
|
||||||
|
**Status:** Not yet implemented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### MCAT — MCIAS Login Policy Tester
|
||||||
|
|
||||||
|
MCAT is a lightweight diagnostic tool, not a core infrastructure component. It
|
||||||
|
presents a web login form, forwards credentials to MCIAS with a configurable
|
||||||
|
`service_name` and `tags`, and shows whether the login was accepted or denied
|
||||||
|
by policy. This lets operators verify that login policy rules behave as
|
||||||
|
expected without touching the target service.
|
||||||
|
|
||||||
|
**Status:** Implemented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Bootstrap Sequence
|
||||||
|
|
||||||
|
Bringing up a Metacircular platform from scratch requires careful ordering
|
||||||
|
because of the circular dependencies — the infrastructure manages itself, but
|
||||||
|
must exist before it can do so. The key challenge is that nearly every service
|
||||||
|
needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but
|
||||||
|
those services themselves need to be running first.
|
||||||
|
|
||||||
|
During bootstrap, all services run as **systemd units** on a single bootstrap
|
||||||
|
node. MCP takes over lifecycle management as the final step.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
Before any service starts, the operator needs:
|
||||||
|
|
||||||
|
- **The bootstrap node** — a machine (VPS, homelab server, etc.) with the
|
||||||
|
overlay network configured and reachable.
|
||||||
|
- **Seed PKI** — MCIAS and Metacrypt need TLS certs to start, but Metacrypt
|
||||||
|
isn't running yet to issue them. The root CA is generated manually using
|
||||||
|
`github.com/kisom/cert` and stored in the `ca/` directory in the workspace.
|
||||||
|
Initial service certificates are issued from this root. The root CA is then
|
||||||
|
imported into Metacrypt once it's running, so Metacrypt becomes the
|
||||||
|
authoritative CA for the platform going forward.
|
||||||
|
- **TOML config files** — each service needs its config in `/srv/<service>/`.
|
||||||
|
During bootstrap these are written manually. Later, MCP handles config
|
||||||
|
distribution.
|
||||||
|
|
||||||
|
### Startup Order
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 0: Seed PKI
|
||||||
|
Operator creates or obtains initial TLS certificates for MCIAS
|
||||||
|
and Metacrypt. Places them in /srv/mcias/certs/ and
|
||||||
|
/srv/metacrypt/certs/.
|
||||||
|
|
||||||
|
Phase 1: Identity
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ MCIAS starts (systemd) │
|
||||||
|
│ - No dependencies on other Metacircular services │
|
||||||
|
│ - Uses seed TLS certificates │
|
||||||
|
│ - Operator creates initial admin account │
|
||||||
|
│ - Operator creates system accounts for other services│
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Phase 2: Cryptographic Services
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ Metacrypt starts (systemd) │
|
||||||
|
│ - Authenticates against MCIAS │
|
||||||
|
│ - Uses seed TLS certificates initially │
|
||||||
|
│ - Operator initializes and unseals │
|
||||||
|
│ - Operator creates CA engine, imports root CA from │
|
||||||
|
│ ca/, creates issuers │
|
||||||
|
│ - Can now issue certificates for all other services │
|
||||||
|
│ - Reissue MCIAS and Metacrypt certs from own CA │
|
||||||
|
│ (replace seed certs with Metacrypt-issued certs) │
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Phase 3: Ingress
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ MC-Proxy starts (systemd) │
|
||||||
|
│ - Static route table from TOML config │
|
||||||
|
│ - Routes external traffic to MCIAS, Metacrypt │
|
||||||
|
│ - No MCIAS auth (pre-auth infrastructure) │
|
||||||
|
│ - TLS certs for L7 routes from Metacrypt │
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Phase 4: Container Registry
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ MCR starts (systemd) │
|
||||||
|
│ - Authenticates against MCIAS │
|
||||||
|
│ - TLS certificates from Metacrypt │
|
||||||
|
│ - Operator pushes container images for all services │
|
||||||
|
│ (including MCIAS, Metacrypt, MC-Proxy themselves) │
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Phase 5: DNS
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ MCNS starts (systemd) │
|
||||||
|
│ - Authenticates against MCIAS │
|
||||||
|
│ - Operator configures initial DNS records │
|
||||||
|
│ (node addresses, service names) │
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Phase 6: Control Plane
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ MCP Agent starts on bootstrap node (systemd) │
|
||||||
|
│ MCP Master starts on operator workstation │
|
||||||
|
│ - Authenticates against MCIAS │
|
||||||
|
│ - Master registers the bootstrap node │
|
||||||
|
│ - Master imports running services into its registry │
|
||||||
|
│ - From here, MCP owns the service map │
|
||||||
|
│ - Services can be redeployed as MCP-managed │
|
||||||
|
│ containers (replacing the systemd units) │
|
||||||
|
└──────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Seed Certificate Problem
|
||||||
|
|
||||||
|
The circular dependency between MCIAS, Metacrypt, and TLS is resolved by
|
||||||
|
bootstrapping with a **manually generated root CA**:
|
||||||
|
|
||||||
|
1. The operator generates a root CA using `github.com/kisom/cert`. This root
|
||||||
|
and initial service certificates live in the `ca/` directory.
|
||||||
|
2. MCIAS and Metacrypt start with certificates issued from this external root.
|
||||||
|
3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA
|
||||||
|
engine, making Metacrypt the authoritative issuer under the same root.
|
||||||
|
4. Metacrypt can now issue and renew certificates for all services. The `ca/`
|
||||||
|
directory remains as the offline backup of the root material.
|
||||||
|
|
||||||
|
This is a one-time process. The root CA is generated once, imported once, and
|
||||||
|
from that point forward Metacrypt is the sole CA. MCP handles certificate
|
||||||
|
provisioning for all services.
|
||||||
|
|
||||||
|
### Adding a New Node
|
||||||
|
|
||||||
|
Once the platform is bootstrapped, adding a node is straightforward:
|
||||||
|
|
||||||
|
1. Provision the machine and connect it to the overlay network.
|
||||||
|
2. Install the MCP agent binary.
|
||||||
|
3. Configure the agent with the MCP Master address and MCIAS credentials
|
||||||
|
(system account for the node).
|
||||||
|
4. Start the agent. It authenticates with MCIAS, connects to Master, and
|
||||||
|
reports as available.
|
||||||
|
5. The operator deploys workloads to it via MCP. MCP handles image pulls,
|
||||||
|
config transfer, certificate provisioning, and DNS updates.
|
||||||
|
|
||||||
|
### Disaster Recovery
|
||||||
|
|
||||||
|
If the bootstrap node is lost, recovery follows the same sequence as initial
|
||||||
|
bootstrap — but with data restored from backups:
|
||||||
|
|
||||||
|
1. Start MCIAS on a new node, restore its database from the most recent
|
||||||
|
`VACUUM INTO` snapshot.
|
||||||
|
2. Start Metacrypt, restore its database. Unseal with the original password.
|
||||||
|
The entire key hierarchy and all issued certificates are recovered.
|
||||||
|
3. Bring up the remaining services in order, restoring their databases.
|
||||||
|
4. Start MCP, which rebuilds its registry from the running services.
|
||||||
|
5. Update DNS (MCNS or external) to point to the new node.
|
||||||
|
|
||||||
|
Every service's `snapshot` CLI command and daily backup timer exist specifically
|
||||||
|
to make this recovery possible. The `/srv/<service>/` convention means each
|
||||||
|
service's entire state is a single directory to back up and restore.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Certificate Lifecycle
|
||||||
|
|
||||||
|
Every service in the platform requires TLS certificates, and Metacrypt is the
|
||||||
|
CA that issues them. This section describes how certificates flow from
|
||||||
|
Metacrypt to services, how they are renewed, and how the pieces fit together.
|
||||||
|
|
||||||
|
### PKI Structure
|
||||||
|
|
||||||
|
Metacrypt implements a **two-tier PKI**:
|
||||||
|
|
||||||
|
```
|
||||||
|
Root CA (self-signed, generated at engine initialization)
|
||||||
|
├── Issuer "infra" (intermediate CA for infrastructure services)
|
||||||
|
├── Issuer "services" (intermediate CA for application services)
|
||||||
|
└── Issuer "clients" (intermediate CA for client certificates)
|
||||||
|
```
|
||||||
|
|
||||||
|
The root CA signs intermediate CAs ("issuers"), which in turn sign leaf
|
||||||
|
certificates. Each issuer is scoped to a purpose. The root CA certificate is
|
||||||
|
the trust anchor — services and clients need it (or the relevant issuer chain)
|
||||||
|
to verify certificates presented by other services.
|
||||||
|
|
||||||
|
### ACME Protocol
|
||||||
|
|
||||||
|
Metacrypt implements an **ACME server** (RFC 8555) with External Account
|
||||||
|
Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any
|
||||||
|
standard ACME client can obtain certificates from Metacrypt.
|
||||||
|
|
||||||
|
The ACME flow:
|
||||||
|
|
||||||
|
1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
|
||||||
|
2. Client registers an ACME account using the EAB credentials.
|
||||||
|
3. Client places a certificate order (one or more domain names).
|
||||||
|
4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
|
||||||
|
5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT
|
||||||
|
record for DNS-01).
|
||||||
|
6. Metacrypt validates the challenge and issues the certificate.
|
||||||
|
7. Client downloads the certificate chain and private key.
|
||||||
|
|
||||||
|
A **Go client library** (`metacrypt/clients/go`) wraps this entire flow:
|
||||||
|
MCIAS login, EAB fetch, account registration, challenge fulfillment, and
|
||||||
|
certificate download. Services that integrate this library can obtain and
|
||||||
|
renew certificates programmatically.
|
||||||
|
|
||||||
|
### How Services Get Certificates Today
|
||||||
|
|
||||||
|
Currently, certificates are provisioned through Metacrypt's **REST API or web
|
||||||
|
UI** and placed into each service's `/srv/<service>/certs/` directory. This is
|
||||||
|
a manual process — the operator issues a certificate, downloads it, and
|
||||||
|
deploys the files. The ACME client library exists but is not yet integrated
|
||||||
|
into any service.
|
||||||
|
|
||||||
|
### How It Will Work With MCP
|
||||||
|
|
||||||
|
MCP is the natural place to automate certificate provisioning:
|
||||||
|
|
||||||
|
- **Initial deploy.** When MCP deploys a new service, it can provision a
|
||||||
|
certificate from Metacrypt (via the ACME client library or the REST API),
|
||||||
|
transfer the cert and key to the node as part of the config push to
|
||||||
|
`/srv/<service>/certs/`, and start the service with valid TLS material.
|
||||||
|
|
||||||
|
- **Renewal.** MCP knows what services are running and when their certificates
|
||||||
|
expire. It can renew certificates before expiry by re-running the ACME flow
|
||||||
|
(or calling Metacrypt's `renew` operation) and pushing updated files to the
|
||||||
|
node. The service restarts with the new certificate.
|
||||||
|
|
||||||
|
- **Migration.** When MCP migrates a service, the certificate in
|
||||||
|
`/srv/<service>/certs/` moves with the tar.zst snapshot. If the service's
|
||||||
|
hostname changes (new node, new DNS name), MCP provisions a new certificate
|
||||||
|
for the new name.
|
||||||
|
|
||||||
|
- **MC-Proxy L7 routes.** MC-Proxy's L7 mode requires certificate/key pairs
|
||||||
|
for TLS termination. MCP (or the operator) can provision these from
|
||||||
|
Metacrypt and push them to MC-Proxy's cert directory. MC-Proxy's
|
||||||
|
architecture doc lists ACME integration and Metacrypt key storage as future
|
||||||
|
work.
|
||||||
|
|
||||||
|
### Trust Distribution
|
||||||
|
|
||||||
|
Every service and client that validates TLS certificates needs the root CA
|
||||||
|
certificate (or the relevant issuer chain). Metacrypt serves these publicly
|
||||||
|
without authentication:
|
||||||
|
|
||||||
|
- `GET /v1/pki/{mount}/ca` — root CA certificate (PEM)
|
||||||
|
- `GET /v1/pki/{mount}/ca/chain` — full chain: issuer + root (PEM)
|
||||||
|
- `GET /v1/pki/{mount}/issuer/{name}` — specific issuer certificate (PEM)
|
||||||
|
|
||||||
|
During bootstrap, the root CA cert is distributed manually (or via the `ca/`
|
||||||
|
directory in the workspace). Once MCP is running, it can distribute the CA
|
||||||
|
cert as part of service deployment. Services reference the CA cert path in
|
||||||
|
their `[mcias]` config section (`ca_cert`) to verify connections to MCIAS and
|
||||||
|
other services.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## End-to-End Deploy Workflow
|
||||||
|
|
||||||
|
This traces a deployment from code change to running service, showing how every
|
||||||
|
component participates. The example deploys a new version of service α that is
|
||||||
|
already running on Node B.
|
||||||
|
|
||||||
|
### 1. Build and Push
|
||||||
|
|
||||||
|
The operator builds a new container image and pushes it to MCR:
|
||||||
|
|
||||||
|
```
|
||||||
|
Operator workstation (vade)
|
||||||
|
$ docker build -t mcr.metacircular.net/α:v1.2.0 .
|
||||||
|
$ docker push mcr.metacircular.net/α:v1.2.0
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
|
||||||
|
│
|
||||||
|
Authenticates
|
||||||
|
via MCIAS
|
||||||
|
│
|
||||||
|
Policy check:
|
||||||
|
can this user
|
||||||
|
push to α?
|
||||||
|
│
|
||||||
|
Image stored
|
||||||
|
(blobs + manifest)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `docker push` goes through MC-Proxy (SNI routing to MCR), authenticates
|
||||||
|
via the OCI token flow (which delegates to MCIAS), and is checked against
|
||||||
|
MCR's push policy. The image is stored content-addressed in MCR.
|
||||||
|
|
||||||
|
### 2. Deploy
|
||||||
|
|
||||||
|
The operator tells MCP to deploy:
|
||||||
|
|
||||||
|
```
|
||||||
|
Operator workstation (vade)
|
||||||
|
$ mcp deploy α # or: mcp deploy α --image v1.2.0
|
||||||
|
│
|
||||||
|
MCP Master
|
||||||
|
│
|
||||||
|
├── Registry lookup: α is running on Node B
|
||||||
|
│
|
||||||
|
├── C2 (gRPC over overlay) to Node B agent:
|
||||||
|
│ "pull mcr.metacircular.net/α:v1.2.0 and restart"
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
MCP Agent (Node B)
|
||||||
|
│
|
||||||
|
├── Pull image from MCR
|
||||||
|
│ (authenticates via MCIAS, same OCI flow)
|
||||||
|
│
|
||||||
|
├── Stop running container
|
||||||
|
│
|
||||||
|
├── Start new container from updated image
|
||||||
|
│ - Mounts /srv/α/ (config, database, certs all persist)
|
||||||
|
│ - Service starts, authenticates to MCIAS, resumes operation
|
||||||
|
│
|
||||||
|
└── Report status back to Master
|
||||||
|
```
|
||||||
|
|
||||||
|
Since α is already running on Node B, this is an in-place update. The
|
||||||
|
`/srv/α/` directory is untouched — config, database, and certificates persist
|
||||||
|
across the container restart.
|
||||||
|
|
||||||
|
### 3. First-Time Deploy
|
||||||
|
|
||||||
|
If α has never been deployed, MCP does more work:
|
||||||
|
|
||||||
|
```
|
||||||
|
Operator workstation (vade)
|
||||||
|
$ mcp deploy α --config α.toml
|
||||||
|
│
|
||||||
|
MCP Master
|
||||||
|
│
|
||||||
|
├── Registry lookup: α is not running anywhere
|
||||||
|
│
|
||||||
|
├── Scheduling: select Node C (best fit)
|
||||||
|
│
|
||||||
|
├── Provision TLS certificate from Metacrypt
|
||||||
|
│ (ACME flow or REST API)
|
||||||
|
│
|
||||||
|
├── C2 to Node C agent:
|
||||||
|
│ 1. Create /srv/α/ directory structure
|
||||||
|
│ 2. Transfer config file (α.toml → /srv/α/α.toml)
|
||||||
|
│ 3. Transfer TLS cert+key → /srv/α/certs/
|
||||||
|
│ 4. Transfer root CA cert → /srv/α/certs/ca.pem
|
||||||
|
│ 5. Pull image from MCR
|
||||||
|
│ 6. Start container
|
||||||
|
│
|
||||||
|
├── Update service registry: α → Node C
|
||||||
|
│
|
||||||
|
├── Push DNS update to MCNS:
|
||||||
|
│ α.svc.mcp.metacircular.net → Node C address
|
||||||
|
│
|
||||||
|
└── (Optionally) update MC-Proxy route table
|
||||||
|
if α needs external ingress
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Migration
|
||||||
|
|
||||||
|
Moving α from Node B to Node C:
|
||||||
|
|
||||||
|
```
|
||||||
|
Operator workstation (vade)
|
||||||
|
$ mcp migrate α --to node-c # or let MCP choose the destination
|
||||||
|
│
|
||||||
|
MCP Master
|
||||||
|
│
|
||||||
|
├── C2 to Node B agent:
|
||||||
|
│ 1. Stop α container
|
||||||
|
│ 2. Snapshot /srv/α/ → tar.zst archive
|
||||||
|
│ 3. Transfer tar.zst to Master (or directly to Node C)
|
||||||
|
│
|
||||||
|
├── C2 to Node C agent:
|
||||||
|
│ 1. Receive tar.zst archive
|
||||||
|
│ 2. Extract to /srv/α/
|
||||||
|
│ 3. Pull container image from MCR (if not cached)
|
||||||
|
│ 4. Start container
|
||||||
|
│ 5. Report status
|
||||||
|
│
|
||||||
|
├── Update service registry: α → Node C
|
||||||
|
│
|
||||||
|
├── Push DNS update to MCNS:
|
||||||
|
│ α.svc.mcp.metacircular.net → Node C address
|
||||||
|
│
|
||||||
|
└── (If α had external ingress) update MC-Proxy route
|
||||||
|
or rely on DNS change
|
||||||
|
```
|
||||||
|
|
||||||
|
### What Each Component Does
|
||||||
|
|
||||||
|
| Step | MCIAS | Metacrypt | MCR | MC-Proxy | MCP | MCNS |
|
||||||
|
|------|-------|-----------|-----|----------|-----|------|
|
||||||
|
| Build/push image | Authenticates push | — | Stores image, enforces push policy | Routes traffic to MCR | — | — |
|
||||||
|
| Deploy (update) | Authenticates pull, authenticates service on start | — | Serves image to agent | Routes traffic to service | Coordinates: registry lookup, C2 to agent | — |
|
||||||
|
| Deploy (new) | Authenticates pull, authenticates service on start | Issues TLS certificate | Serves image to agent | Routes traffic to service (if external) | Coordinates: scheduling, cert provisioning, config transfer, DNS update | Updates DNS records |
|
||||||
|
| Migrate | Authenticates service on new node | Issues new cert (if hostname changes) | Serves image (if not cached) | Routes traffic to new location | Coordinates: snapshot, transfer, DNS update | Updates DNS records |
|
||||||
|
| Steady state | Validates tokens for every authenticated request | Serves CA certs publicly, renews certs | Serves image pulls | Routes all external traffic | Tracks service health, holds registry | Serves DNS queries |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Ideas
|
||||||
|
|
||||||
|
Components and capabilities that may be worth building but have no immediate
|
||||||
|
timeline. Listed here to capture the thinking; none are committed.
|
||||||
|
|
||||||
|
### Observability — Log Collection and Health Monitoring
|
||||||
|
|
||||||
|
Every service already produces structured logs (`log/slog`) and exposes health
|
||||||
|
checks (gRPC `Health.Check` or REST status endpoints). What's missing is
|
||||||
|
aggregation — today, debugging a cross-service issue means SSH'ing into each
|
||||||
|
node and reading local logs.
|
||||||
|
|
||||||
|
A collector could:
|
||||||
|
|
||||||
|
- Gather structured logs from services on each node and forward them to a
|
||||||
|
central store.
|
||||||
|
- Periodically health-check local services and report status.
|
||||||
|
- Feed health data into MCP so it can make informed decisions (restart
|
||||||
|
unhealthy services, avoid scheduling on degraded nodes, alert the operator).
|
||||||
|
|
||||||
|
This might be a standalone service or an MCP agent capability, depending on
|
||||||
|
weight. If it's just "tail logs and hit health endpoints," it fits in the
|
||||||
|
agent. If it grows to include indexing, querying, retention policies, and
|
||||||
|
alerting rules, it's its own service.
|
||||||
|
|
||||||
|
### Object Store
|
||||||
|
|
||||||
|
The platform has structured storage (SQLite), blob storage scoped to container
|
||||||
|
images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does
|
||||||
|
not have general-purpose object/blob storage.
|
||||||
|
|
||||||
|
Potential uses:
|
||||||
|
|
||||||
|
- **Centralized backups.** Service snapshots currently live on each node in
|
||||||
|
`/srv/<service>/backups/`. A central object store gives MCP somewhere to push
|
||||||
|
tar.zst snapshots for offsite retention.
|
||||||
|
- **Artifact storage.** Build outputs, large files, anything that doesn't fit
|
||||||
|
in a database row.
|
||||||
|
- **Data sharing between services.** Files that need to move between services
|
||||||
|
outside the MCP C2 channel.
|
||||||
|
|
||||||
|
Prior art: [Nebula](https://metacircular.net/pages/nebula.html), a
|
||||||
|
content-addressable data store with capability-based security (SHA-256
|
||||||
|
addressed blobs, UUID entries for versioning, proxy references for revocable
|
||||||
|
access). Prototyped in multiple languages. The capability model is interesting
|
||||||
|
but may be more sophistication than the platform needs — a simpler
|
||||||
|
authenticated blob store with MCIAS integration might suffice.
|
||||||
|
|
||||||
|
### Overlay Network Management
|
||||||
|
|
||||||
|
The platform currently relies on an external overlay network (WireGuard,
|
||||||
|
Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard
|
||||||
|
mesh manager would bring the overlay under Metacircular's control:
|
||||||
|
|
||||||
|
- Automate key exchange and peer configuration when MCP adds a node.
|
||||||
|
- Manage IP allocation within the mesh (potentially absorbing part of MCNS's
|
||||||
|
scope).
|
||||||
|
- Remove the dependency on Tailscale's coordination servers.
|
||||||
|
|
||||||
|
This is a natural extension of the sovereignty principle but is low priority
|
||||||
|
while the mesh is small enough to manage by hand.
|
||||||
|
|
||||||
|
### Hypervisor / Isolation
|
||||||
|
|
||||||
|
A deeper exploration of environment isolation, message-passing between
|
||||||
|
services, and access mediation at a level below containers. Prior art:
|
||||||
|
[hypervisor concept](https://metacircular.net/pages/hypervisor.html). The
|
||||||
|
current platform achieves these goals through containers + MCIAS + policy
|
||||||
|
engines. A hypervisor layer would push isolation down to the OS level —
|
||||||
|
interesting for security but significant in scope. More relevant if the
|
||||||
|
platform ever moves beyond containers to VM-based workloads.
|
||||||
|
|
||||||
|
### Prior Art: SYSGOV
|
||||||
|
|
||||||
|
[SYSGOV](https://metacircular.net/pages/lisp-dcos.html) was an earlier
|
||||||
|
exploration of system management in Lisp, with SYSPLAN (desired state
|
||||||
|
enforcement) and SYSMON (service management). Many of its research questions —
|
||||||
|
C2 communication, service discovery, secure config distribution, failure
|
||||||
|
handling — are directly addressed by MCP's design. MCP is the spiritual
|
||||||
|
successor, reimplemented in Go with the benefit of the Metacircular platform
|
||||||
|
underneath it.
|
||||||
55399
docs/notebook.pdf
Normal file
55399
docs/notebook.pdf
Normal file
File diff suppressed because it is too large
Load Diff
851
engineering-standards.md
Normal file
851
engineering-standards.md
Normal file
@@ -0,0 +1,851 @@
|
|||||||
|
# Metacircular Dynamics — Engineering Standards
|
||||||
|
|
||||||
|
Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
|
||||||
|
|
||||||
|
This document describes the standard repository layout, tooling, and software
|
||||||
|
development lifecycle (SDLC) for services built at Metacircular Dynamics. It
|
||||||
|
incorporates the platform-wide project guidelines and codifies the conventions
|
||||||
|
established in Metacrypt as the baseline for all services.
|
||||||
|
|
||||||
|
## Platform Rules
|
||||||
|
|
||||||
|
These four rules apply to every Metacircular service:
|
||||||
|
|
||||||
|
1. **Data Storage**: All service data goes in `/srv/<service>/` to enable
|
||||||
|
straightforward migration across systems.
|
||||||
|
2. **Deployment Architecture**: Services require systemd unit files but
|
||||||
|
prioritize container-first design to support deployment via the
|
||||||
|
Metacircular Control Plane (MCP).
|
||||||
|
3. **Identity Management**: Services must integrate with MCIAS (Metacircular
|
||||||
|
Identity and Access Service) for user management and access control. Three
|
||||||
|
role levels: `admin` (full administrative access), `user` (full
|
||||||
|
non-administrative access), `guest` (service-dependent restrictions).
|
||||||
|
4. **API Design**: Services expose both gRPC and REST interfaces, kept in
|
||||||
|
sync. Web UIs are built with htmx.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
0. [Platform Rules](#platform-rules)
|
||||||
|
1. [Repository Layout](#repository-layout)
|
||||||
|
2. [Language & Toolchain](#language--toolchain)
|
||||||
|
3. [Build System](#build-system)
|
||||||
|
4. [API Design](#api-design)
|
||||||
|
5. [Authentication & Authorization](#authentication--authorization)
|
||||||
|
6. [Database Conventions](#database-conventions)
|
||||||
|
7. [Configuration](#configuration)
|
||||||
|
8. [Web UI](#web-ui)
|
||||||
|
9. [Testing](#testing)
|
||||||
|
10. [Linting & Static Analysis](#linting--static-analysis)
|
||||||
|
11. [Deployment](#deployment)
|
||||||
|
12. [Documentation](#documentation)
|
||||||
|
13. [Security](#security)
|
||||||
|
14. [Development Workflow](#development-workflow)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository Layout
|
||||||
|
|
||||||
|
Every service follows a consistent directory structure. Adjust the
|
||||||
|
service-specific directories (e.g. `engines/` in Metacrypt) as appropriate,
|
||||||
|
but the top-level skeleton is fixed.
|
||||||
|
|
||||||
|
```
|
||||||
|
.
|
||||||
|
├── cmd/
|
||||||
|
│ ├── <service>/ CLI entry point (server, subcommands)
|
||||||
|
│ └── <service>-web/ Web UI entry point (if separate binary)
|
||||||
|
├── internal/
|
||||||
|
│ ├── auth/ MCIAS integration (token validation, caching)
|
||||||
|
│ ├── config/ TOML configuration loading & validation
|
||||||
|
│ ├── db/ Database setup, schema migrations
|
||||||
|
│ ├── server/ REST API server, routes, middleware
|
||||||
|
│ ├── grpcserver/ gRPC server, interceptors, service handlers
|
||||||
|
│ ├── webserver/ Web UI server, template routes, HTMX handlers
|
||||||
|
│ └── <domain>/ Service-specific packages
|
||||||
|
├── proto/<service>/
|
||||||
|
│ └── v<N>/ Current proto definitions (start at v1;
|
||||||
|
│ increment only on breaking changes)
|
||||||
|
├── gen/<service>/
|
||||||
|
│ └── v<N>/ Generated Go gRPC/protobuf code
|
||||||
|
├── web/
|
||||||
|
│ ├── embed.go //go:embed directive for templates and static
|
||||||
|
│ ├── templates/ Go HTML templates
|
||||||
|
│ └── static/ CSS, JS (htmx)
|
||||||
|
├── deploy/
|
||||||
|
│ ├── docker/ Docker Compose configuration
|
||||||
|
│ ├── examples/ Example config files
|
||||||
|
│ ├── scripts/ Install, backup, migration scripts
|
||||||
|
│ └── systemd/ systemd unit files and timers
|
||||||
|
├── docs/ Internal engineering documentation
|
||||||
|
├── Dockerfile.api API server container (if split binary)
|
||||||
|
├── Dockerfile.web Web UI container (if split binary)
|
||||||
|
├── Makefile
|
||||||
|
├── buf.yaml Protobuf linting & breaking-change config
|
||||||
|
├── .golangci.yaml Linter configuration
|
||||||
|
├── .gitignore
|
||||||
|
├── CLAUDE.md AI-assisted development instructions
|
||||||
|
├── ARCHITECTURE.md Full system specification
|
||||||
|
└── <service>.toml.example Example configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Principles
|
||||||
|
|
||||||
|
- **`cmd/`** contains only CLI wiring (cobra commands, flag parsing). No
|
||||||
|
business logic.
|
||||||
|
- **`internal/`** contains all service logic. Nothing in `internal/` is
|
||||||
|
importable by other modules — this is enforced by Go's module system.
|
||||||
|
- **`proto/`** is the source of truth for gRPC definitions. Generated code
|
||||||
|
lives in `gen/`, never edited by hand. Versions start at `v1`; a new
|
||||||
|
version directory is only created when a breaking change is required — not
|
||||||
|
as a naming convention or initial setup step.
|
||||||
|
- **`deploy/`** contains everything needed to run the service in production.
|
||||||
|
A new engineer should be able to deploy from this directory alone.
|
||||||
|
- **`web/`** is embedded into the binary via `//go:embed`. No external file
|
||||||
|
dependencies at runtime.
|
||||||
|
|
||||||
|
### What Does Not Belong in the Repository
|
||||||
|
|
||||||
|
- Runtime data (databases, certificates, logs) — these live in `/srv/<service>`
|
||||||
|
- Real configuration files with secrets — only examples are committed
|
||||||
|
- IDE configuration (`.idea/`, `.vscode/`) — per-developer, not shared
|
||||||
|
- Vendored dependencies — Go module proxy handles this
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Language & Toolchain
|
||||||
|
|
||||||
|
| Tool | Version | Purpose |
|
||||||
|
|------|---------|---------|
|
||||||
|
| Go | 1.25+ | Primary language |
|
||||||
|
| protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation |
|
||||||
|
| buf | Latest | Proto linting and breaking-change detection |
|
||||||
|
| golangci-lint | v2 | Static analysis and linting |
|
||||||
|
| Docker | Latest | Container builds |
|
||||||
|
|
||||||
|
### Go Conventions
|
||||||
|
|
||||||
|
- **Pure-Go dependencies** where possible. Avoid CGo — it complicates
|
||||||
|
cross-compilation and container builds. Use `modernc.org/sqlite` instead
|
||||||
|
of `mattn/go-sqlite3`.
|
||||||
|
- **`CGO_ENABLED=0`** for all production builds. Statically linked binaries
|
||||||
|
deploy cleanly to Alpine containers.
|
||||||
|
- **Stripped binaries**: Build with `-trimpath -ldflags="-s -w"` to remove
|
||||||
|
debug symbols and reduce image size.
|
||||||
|
- **Version injection**: Pass `git describe --tags --always --dirty` via
|
||||||
|
`-X main.version=...` at build time. Every binary must report its version.
|
||||||
|
|
||||||
|
### Module Path
|
||||||
|
|
||||||
|
Services hosted on `git.wntrmute.dev` use:
|
||||||
|
|
||||||
|
```
|
||||||
|
git.wntrmute.dev/kyle/<service>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Build System
|
||||||
|
|
||||||
|
Every repository has a Makefile with these standard targets:
|
||||||
|
|
||||||
|
```makefile
|
||||||
|
.PHONY: build test vet lint proto-lint clean docker all
|
||||||
|
|
||||||
|
LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"
|
||||||
|
|
||||||
|
<service>:
|
||||||
|
go build $(LDFLAGS) -o <service> ./cmd/<service>
|
||||||
|
|
||||||
|
build:
|
||||||
|
go build ./...
|
||||||
|
|
||||||
|
test:
|
||||||
|
go test ./...
|
||||||
|
|
||||||
|
vet:
|
||||||
|
go vet ./...
|
||||||
|
|
||||||
|
lint:
|
||||||
|
golangci-lint run ./...
|
||||||
|
|
||||||
|
proto:
|
||||||
|
protoc --go_out=. --go_opt=module=<module> \
|
||||||
|
--go-grpc_out=. --go-grpc_opt=module=<module> \
|
||||||
|
proto/<service>/v2/*.proto
|
||||||
|
|
||||||
|
proto-lint:
|
||||||
|
buf lint
|
||||||
|
buf breaking --against '.git#branch=master,subdir=proto'
|
||||||
|
|
||||||
|
clean:
|
||||||
|
rm -f <service>
|
||||||
|
|
||||||
|
docker:
|
||||||
|
docker build -t <service> -f Dockerfile.api .
|
||||||
|
|
||||||
|
all: vet lint test <service>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Target Semantics
|
||||||
|
|
||||||
|
| Target | When to Run | CI Gate? |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| `vet` | Every change | Yes |
|
||||||
|
| `lint` | Every change | Yes |
|
||||||
|
| `test` | Every change | Yes |
|
||||||
|
| `proto-lint` | Any proto change | Yes |
|
||||||
|
| `proto` | After editing `.proto` files | No (manual) |
|
||||||
|
| `all` | Pre-push verification | Yes |
|
||||||
|
|
||||||
|
The `all` target is the CI pipeline: `vet → lint → test → build`. If any
|
||||||
|
step fails, the pipeline stops.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Design
|
||||||
|
|
||||||
|
Services expose two synchronized API surfaces:
|
||||||
|
|
||||||
|
### gRPC (Primary)
|
||||||
|
|
||||||
|
- Proto definitions live in `proto/<service>/v<N>/`, where N starts at 1.
|
||||||
|
- **Versioning policy**: proto packages are versioned to protect existing
|
||||||
|
clients from breaking changes. A new version directory (`v2/`, `v3/`, …)
|
||||||
|
is only introduced when a breaking change is unavoidable. Non-breaking
|
||||||
|
additions (new fields, new RPCs) are made in-place to the current version.
|
||||||
|
- Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
|
||||||
|
- Use `google.protobuf.Timestamp` for all time fields (not RFC 3339 strings).
|
||||||
|
- Run `buf lint` and `buf breaking` against master before merging proto
|
||||||
|
changes.
|
||||||
|
|
||||||
|
### REST (Secondary)
|
||||||
|
|
||||||
|
- JSON over HTTPS. Routes live in `internal/server/routes.go`.
|
||||||
|
- Use `chi` for routing (lightweight, stdlib-compatible).
|
||||||
|
- Standard error format: `{"error": "description"}`.
|
||||||
|
- Standard HTTP status codes: `401` (unauthenticated), `403` (unauthorized),
|
||||||
|
`412` (precondition failed), `503` (service unavailable).
|
||||||
|
|
||||||
|
### API Sync Rule
|
||||||
|
|
||||||
|
**Every REST endpoint must have a corresponding gRPC RPC, and vice versa.**
|
||||||
|
When adding, removing, or changing an endpoint in either surface, the other
|
||||||
|
must be updated in the same change. This is enforced in code review.
|
||||||
|
|
||||||
|
### gRPC Interceptors
|
||||||
|
|
||||||
|
Access control is enforced via interceptor maps, not per-handler checks:
|
||||||
|
|
||||||
|
| Map | Effect |
|
||||||
|
|-----|--------|
|
||||||
|
| `sealRequiredMethods` | Returns `UNAVAILABLE` if the service is sealed/locked |
|
||||||
|
| `authRequiredMethods` | Validates MCIAS bearer token, populates caller info |
|
||||||
|
| `adminRequiredMethods` | Requires admin role on the caller |
|
||||||
|
|
||||||
|
Adding a new RPC means adding it to the correct interceptor maps. Forgetting
|
||||||
|
this is a security defect.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Authentication & Authorization
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
All services delegate authentication to **MCIAS** (Metacircular Identity and
|
||||||
|
Access Service). No service maintains its own user database.
|
||||||
|
|
||||||
|
- Client sends credentials to the service's `/v1/auth/login` endpoint.
|
||||||
|
- The service forwards them to MCIAS via the client library
|
||||||
|
(`git.wntrmute.dev/kyle/mcias/clients/go`).
|
||||||
|
- On success, MCIAS returns a bearer token. The service returns it to the
|
||||||
|
client and optionally sets it as a cookie for the web UI.
|
||||||
|
- Subsequent requests include the token via `Authorization: Bearer <token>`
|
||||||
|
header or cookie.
|
||||||
|
- Token validation calls MCIAS `ValidateToken()`. Results should be cached
|
||||||
|
(keyed by SHA-256 of the token) with a short TTL (30 seconds or less).
|
||||||
|
|
||||||
|
### Authorization
|
||||||
|
|
||||||
|
Three role levels:
|
||||||
|
|
||||||
|
| Role | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| `admin` | Full access to everything. Policy bypass. |
|
||||||
|
| `user` | Access governed by policy rules. Default deny. |
|
||||||
|
| `guest` | Service-dependent restrictions. Default deny. |
|
||||||
|
|
||||||
|
Admin detection is based solely on the MCIAS `admin` role. The service never
|
||||||
|
promotes users locally.
|
||||||
|
|
||||||
|
Services that need fine-grained access control should implement a policy
|
||||||
|
engine (priority-based ACL rules stored in encrypted storage, default deny,
|
||||||
|
admin bypass). See Metacrypt's implementation as the reference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Database Conventions
|
||||||
|
|
||||||
|
### SQLite
|
||||||
|
|
||||||
|
SQLite is the default database for Metacircular services. It is simple to
|
||||||
|
operate, requires no external processes, and backs up cleanly with
|
||||||
|
`VACUUM INTO`.
|
||||||
|
|
||||||
|
Connection settings (applied at open time):
|
||||||
|
|
||||||
|
```go
|
||||||
|
PRAGMA journal_mode = WAL;
|
||||||
|
PRAGMA foreign_keys = ON;
|
||||||
|
PRAGMA busy_timeout = 5000;
|
||||||
|
```
|
||||||
|
|
||||||
|
File permissions: `0600`. Created by the service on first run.
|
||||||
|
|
||||||
|
### Migrations
|
||||||
|
|
||||||
|
- Migrations are Go functions registered in `internal/db/` and run
|
||||||
|
sequentially at startup.
|
||||||
|
- Each migration is idempotent — `CREATE TABLE IF NOT EXISTS`,
|
||||||
|
`ALTER TABLE ... ADD COLUMN IF NOT EXISTS`.
|
||||||
|
- Applied migrations are tracked in a `schema_migrations` table.
|
||||||
|
- Never modify a migration that has been deployed. Add a new one.
|
||||||
|
|
||||||
|
### Backup
|
||||||
|
|
||||||
|
Every service must provide a `snapshot` CLI command that creates a consistent
|
||||||
|
backup using `VACUUM INTO`. Automated backups run via a systemd timer
|
||||||
|
(daily, with retention pruning).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Format
|
||||||
|
|
||||||
|
TOML. Parsed with `go-toml/v2`. Environment variable overrides via
|
||||||
|
`SERVICENAME_*` (e.g. `METACRYPT_SERVER_LISTEN_ADDR`).
|
||||||
|
|
||||||
|
### Standard Sections
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[server]
|
||||||
|
listen_addr = ":8443" # HTTPS API
|
||||||
|
grpc_addr = ":9443" # gRPC (optional; disabled if unset)
|
||||||
|
tls_cert = "/srv/<service>/certs/cert.pem"
|
||||||
|
tls_key = "/srv/<service>/certs/key.pem"
|
||||||
|
|
||||||
|
[web]
|
||||||
|
listen_addr = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
|
||||||
|
vault_grpc = "127.0.0.1:9443" # gRPC address of the API server
|
||||||
|
vault_ca_cert = "" # CA cert for verifying API server TLS
|
||||||
|
|
||||||
|
[database]
|
||||||
|
path = "/srv/<service>/<service>.db"
|
||||||
|
|
||||||
|
[mcias]
|
||||||
|
server_url = "https://mcias.metacircular.net:8443"
|
||||||
|
ca_cert = "" # Custom CA for MCIAS TLS
|
||||||
|
service_name = "<service>" # This service's identity, as registered in MCIAS
|
||||||
|
tags = [] # Tags sent with every login request (e.g. ["env:restricted"])
|
||||||
|
# MCIAS evaluates auth:login policy against these tags,
|
||||||
|
# enabling per-service login restrictions via policy rules.
|
||||||
|
|
||||||
|
[log]
|
||||||
|
level = "info" # debug, info, warn, error
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Service context and login policy
|
||||||
|
|
||||||
|
`service_name` and `tags` in `[mcias]` are sent with every `POST /v1/auth/login`
|
||||||
|
request. MCIAS evaluates the `auth:login` action with the resource set to
|
||||||
|
`{service_name, tags}`. This allows operators to write deny rules that restrict
|
||||||
|
which roles or account types can log into specific services.
|
||||||
|
|
||||||
|
Example: deny `guest` and `viewer` human accounts from any service tagged
|
||||||
|
`env:restricted`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"effect": "deny",
|
||||||
|
"roles": ["guest", "viewer"],
|
||||||
|
"account_types": ["human"],
|
||||||
|
"actions": ["auth:login"],
|
||||||
|
"required_tags": ["env:restricted"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
A service can also be targeted by name instead of (or in addition to) tags:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"effect": "deny",
|
||||||
|
"roles": ["guest"],
|
||||||
|
"actions": ["auth:login"],
|
||||||
|
"service_names": ["meta-money-printer"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
MCIAS enforces the policy after credentials are verified; a policy-denied
|
||||||
|
login returns HTTP 403 (not 401) so the client can distinguish a bad password
|
||||||
|
from a service access restriction.
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
Required fields are validated at startup. The service refuses to start if
|
||||||
|
any are missing. Do not silently default required values.
|
||||||
|
|
||||||
|
### Data Directory
|
||||||
|
|
||||||
|
All runtime data lives in `/srv/<service>/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
/srv/<service>/
|
||||||
|
├── <service>.toml Configuration
|
||||||
|
├── <service>.db SQLite database
|
||||||
|
├── certs/ TLS certificates
|
||||||
|
└── backups/ Database snapshots
|
||||||
|
```
|
||||||
|
|
||||||
|
This convention enables straightforward service migration between hosts:
|
||||||
|
copy `/srv/<service>/` and the binary.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Web UI
|
||||||
|
|
||||||
|
### Technology
|
||||||
|
|
||||||
|
- **Go `html/template`** for server-side rendering. No JavaScript frameworks.
|
||||||
|
- **htmx** for dynamic interactions (form submission, partial page updates)
|
||||||
|
without full page reloads.
|
||||||
|
- Templates and static files are embedded in the binary via `//go:embed`.
|
||||||
|
|
||||||
|
### Structure
|
||||||
|
|
||||||
|
- `web/templates/layout.html` — shared HTML skeleton, navigation, CSS/JS
|
||||||
|
includes. All page templates extend this.
|
||||||
|
- Page templates: one `.html` file per page/feature.
|
||||||
|
- `web/static/` — CSS, htmx. Keep this minimal.
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
The web UI runs as a separate binary (`<service>-web`) that communicates
|
||||||
|
with the API server via its gRPC interface. This separation means:
|
||||||
|
|
||||||
|
- The web UI has no direct database access.
|
||||||
|
- The API server enforces all authorization.
|
||||||
|
- The web UI can be deployed independently or omitted entirely.
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
- CSRF protection via signed double-submit cookies on all mutating requests
|
||||||
|
(POST/PUT/PATCH/DELETE).
|
||||||
|
- Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`.
|
||||||
|
- All user input is escaped by `html/template` (the default).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Philosophy
|
||||||
|
|
||||||
|
Tests are written using the Go standard library `testing` package. No test
|
||||||
|
frameworks (testify, gomega, etc.) — the standard library is sufficient and
|
||||||
|
keeps dependencies minimal.
|
||||||
|
|
||||||
|
### Patterns
|
||||||
|
|
||||||
|
```go
|
||||||
|
func TestFeatureName(t *testing.T) {
|
||||||
|
// Setup: use t.TempDir() for isolated file system state.
|
||||||
|
dir := t.TempDir()
|
||||||
|
database, err := db.Open(filepath.Join(dir, "test.db"))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("open db: %v", err)
|
||||||
|
}
|
||||||
|
defer func() { _ = database.Close() }()
|
||||||
|
db.Migrate(database)
|
||||||
|
|
||||||
|
// Exercise the code under test.
|
||||||
|
// ...
|
||||||
|
|
||||||
|
// Assert with t.Fatal (not t.Error) for precondition failures.
|
||||||
|
if !bytes.Equal(got, want) {
|
||||||
|
t.Fatalf("got %q, want %q", got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Guidelines
|
||||||
|
|
||||||
|
- **Use `t.TempDir()`** for all file-system state. Never write to fixed
|
||||||
|
paths. Cleanup is automatic.
|
||||||
|
- **Use `errors.Is`** for error assertions, not string comparison.
|
||||||
|
- **No mocks for databases.** Tests use real SQLite databases created in
|
||||||
|
temp directories. This catches migration bugs that mocks would hide.
|
||||||
|
- **Test files** live alongside the code they test: `barrier.go` and
|
||||||
|
`barrier_test.go` in the same package.
|
||||||
|
- **Test helpers** call `t.Helper()` so failures report the caller's line.
|
||||||
|
|
||||||
|
### What to Test
|
||||||
|
|
||||||
|
| Layer | Test Strategy |
|
||||||
|
|-------|---------------|
|
||||||
|
| Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases |
|
||||||
|
| Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access |
|
||||||
|
| API handlers | Request/response correctness, auth enforcement, error codes |
|
||||||
|
| Policy engine | Rule matching, priority ordering, default deny, admin bypass |
|
||||||
|
| CLI commands | Flag parsing, output format (lightweight) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Linting & Static Analysis
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Every repository includes a `.golangci.yaml` with this philosophy:
|
||||||
|
**fail loudly for security and correctness; everything else is a warning.**
|
||||||
|
|
||||||
|
### Required Linters
|
||||||
|
|
||||||
|
| Linter | Category | Purpose |
|
||||||
|
|--------|----------|---------|
|
||||||
|
| `errcheck` | Correctness | Unhandled errors are silent failures |
|
||||||
|
| `govet` | Correctness | Printf mismatches, unreachable code, suspicious constructs |
|
||||||
|
| `ineffassign` | Correctness | Dead writes hide logic bugs |
|
||||||
|
| `unused` | Correctness | Unused variables and functions |
|
||||||
|
| `errorlint` | Error handling | Proper `errors.Is`/`errors.As` usage |
|
||||||
|
| `gosec` | Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection |
|
||||||
|
| `staticcheck` | Security | Deprecated APIs, mutex misuse, deep analysis |
|
||||||
|
| `revive` | Style | Go naming conventions, error return ordering |
|
||||||
|
| `gofmt` | Formatting | Standard Go formatting |
|
||||||
|
| `goimports` | Formatting | Import grouping and ordering |
|
||||||
|
|
||||||
|
### Settings
|
||||||
|
|
||||||
|
- `errcheck`: `check-type-assertions: true` (catch `x.(*T)` without ok check).
|
||||||
|
- `govet`: all analyzers enabled except `shadow` (too noisy for idiomatic Go).
|
||||||
|
- `gosec`: severity and confidence set to `medium`. Exclude `G104` (overlaps
|
||||||
|
with errcheck).
|
||||||
|
- `max-issues-per-linter: 0` — report everything. No caps.
|
||||||
|
- Test files: allow `G101` (hardcoded credentials) for test fixtures.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### Container-First
|
||||||
|
|
||||||
|
Services are designed for container deployment but must also run as native
|
||||||
|
systemd services. Both paths are first-class.
|
||||||
|
|
||||||
|
### Docker
|
||||||
|
|
||||||
|
Multi-stage builds:
|
||||||
|
|
||||||
|
1. **Builder**: `golang:1.23-alpine`. Compile with `CGO_ENABLED=0`, strip
|
||||||
|
symbols.
|
||||||
|
2. **Runtime**: `alpine:3.21`. Non-root user (`<service>`), minimal attack
|
||||||
|
surface.
|
||||||
|
|
||||||
|
If the service has separate API and web binaries, use separate Dockerfiles
|
||||||
|
(`Dockerfile.api`, `Dockerfile.web`) and a `docker-compose.yml` that wires
|
||||||
|
them together with a shared data volume.
|
||||||
|
|
||||||
|
### systemd
|
||||||
|
|
||||||
|
Every service ships with:
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `<service>.service` | Main service unit (API server) |
|
||||||
|
| `<service>-web.service` | Web UI unit (if applicable) |
|
||||||
|
| `<service>-backup.service` | Oneshot backup unit |
|
||||||
|
| `<service>-backup.timer` | Daily backup timer (02:00 UTC, 5-minute jitter) |
|
||||||
|
|
||||||
|
#### Security Hardening
|
||||||
|
|
||||||
|
All service units must include these security directives:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
NoNewPrivileges=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
PrivateTmp=true
|
||||||
|
PrivateDevices=true
|
||||||
|
ProtectKernelTunables=true
|
||||||
|
ProtectKernelModules=true
|
||||||
|
ProtectControlGroups=true
|
||||||
|
RestrictSUIDSGID=true
|
||||||
|
RestrictNamespaces=true
|
||||||
|
LockPersonality=true
|
||||||
|
MemoryDenyWriteExecute=true
|
||||||
|
RestrictRealtime=true
|
||||||
|
ReadWritePaths=/srv/<service>
|
||||||
|
```
|
||||||
|
|
||||||
|
The web UI unit should use `ReadOnlyPaths=/srv/<service>` instead of
|
||||||
|
`ReadWritePaths` — it has no reason to write to the data directory.
|
||||||
|
|
||||||
|
### Install Script
|
||||||
|
|
||||||
|
`deploy/scripts/install.sh` handles:
|
||||||
|
|
||||||
|
1. Create system user/group (idempotent).
|
||||||
|
2. Install binary to `/usr/local/bin/`.
|
||||||
|
3. Create `/srv/<service>/` directory structure.
|
||||||
|
4. Install example config if none exists.
|
||||||
|
5. Install systemd units and reload the daemon.
|
||||||
|
|
||||||
|
### TLS
|
||||||
|
|
||||||
|
- **Minimum TLS version: 1.3.** No exceptions, no fallback cipher suites.
|
||||||
|
Go's TLS 1.3 implementation manages cipher selection automatically.
|
||||||
|
- **Timeouts**: read 30s, write 30s, idle 120s.
|
||||||
|
- Certificate and key paths are required configuration — the service refuses
|
||||||
|
to start without them.
|
||||||
|
|
||||||
|
### Graceful Shutdown
|
||||||
|
|
||||||
|
Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:
|
||||||
|
|
||||||
|
1. Stop accepting new connections.
|
||||||
|
2. Drain in-flight requests (with a timeout).
|
||||||
|
3. Clean up resources (close databases, zeroize secrets if applicable).
|
||||||
|
4. Exit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
|
||||||
|
### Required Files
|
||||||
|
|
||||||
|
| File | Purpose | Audience |
|
||||||
|
|------|---------|----------|
|
||||||
|
| `README.md` | Project overview, quick-start, and contributor guide | Everyone |
|
||||||
|
| `CLAUDE.md` | AI-assisted development context | Claude Code |
|
||||||
|
| `ARCHITECTURE.md` | Full system specification | Engineers |
|
||||||
|
| `RUNBOOK.md` | Operational procedures and incident response | Operators |
|
||||||
|
| `deploy/examples/<service>.toml` | Example configuration | Operators |
|
||||||
|
|
||||||
|
### Suggested Files
|
||||||
|
|
||||||
|
These are not required for every project but should be created where applicable:
|
||||||
|
|
||||||
|
| File | When to Include | Purpose |
|
||||||
|
|------|-----------------|---------|
|
||||||
|
| `AUDIT.md` | Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
|
||||||
|
| `POLICY.md` | Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
|
||||||
|
|
||||||
|
### README.md
|
||||||
|
|
||||||
|
The README is the front door. A new engineer or user should be able to
|
||||||
|
understand what the service does and get it running from this file alone.
|
||||||
|
It should contain:
|
||||||
|
|
||||||
|
- Project name and one-paragraph description.
|
||||||
|
- Quick-start instructions (build, configure, run).
|
||||||
|
- Link to `ARCHITECTURE.md` for full technical details.
|
||||||
|
- Link to `RUNBOOK.md` for operational procedures.
|
||||||
|
- License and contribution notes (if applicable).
|
||||||
|
|
||||||
|
Keep it concise. The README is not the spec — that's `ARCHITECTURE.md`.
|
||||||
|
|
||||||
|
### CLAUDE.md
|
||||||
|
|
||||||
|
This file provides context for AI-assisted development. It should contain:
|
||||||
|
|
||||||
|
- Project overview (one paragraph).
|
||||||
|
- Build, test, and lint commands.
|
||||||
|
- High-level architecture summary.
|
||||||
|
- Project structure with directory descriptions.
|
||||||
|
- Ignored directories (runtime data, generated code).
|
||||||
|
- Critical rules (e.g. API sync requirements).
|
||||||
|
|
||||||
|
Keep it concise. AI tools read this on every interaction.
|
||||||
|
|
||||||
|
### ARCHITECTURE.md
|
||||||
|
|
||||||
|
This is the canonical specification for the service. It should cover:
|
||||||
|
|
||||||
|
1. System overview with a layered architecture diagram.
|
||||||
|
2. Cryptographic design (if applicable): algorithms, key hierarchy.
|
||||||
|
3. State machines and lifecycle (if applicable).
|
||||||
|
4. Storage design.
|
||||||
|
5. Authentication and authorization model.
|
||||||
|
6. API surface (REST and gRPC, with tables of every endpoint).
|
||||||
|
7. Web interface routes.
|
||||||
|
8. Database schema (every table, every column).
|
||||||
|
9. Configuration reference.
|
||||||
|
10. Deployment guide.
|
||||||
|
11. Security model: threat mitigations table and security invariants.
|
||||||
|
12. Future work.
|
||||||
|
|
||||||
|
This document is the source of truth. When the code and the spec disagree,
|
||||||
|
one of them has a bug.
|
||||||
|
|
||||||
|
### RUNBOOK.md
|
||||||
|
|
||||||
|
The runbook is written for operators, not developers. It covers what to do
|
||||||
|
when things go wrong and how to perform routine maintenance. It should
|
||||||
|
contain:
|
||||||
|
|
||||||
|
1. **Service overview** — what the service does, in one paragraph.
|
||||||
|
2. **Health checks** — how to verify the service is healthy (endpoints,
|
||||||
|
CLI commands, expected responses).
|
||||||
|
3. **Common operations** — start, stop, restart, seal/unseal, backup,
|
||||||
|
restore, log inspection.
|
||||||
|
4. **Alerting** — what alerts exist, what they mean, and how to respond.
|
||||||
|
5. **Incident procedures** — step-by-step playbooks for known failure
|
||||||
|
modes (database corruption, certificate expiry, MCIAS outage, disk
|
||||||
|
full, etc.).
|
||||||
|
6. **Escalation** — when and how to escalate beyond the runbook.
|
||||||
|
|
||||||
|
Write runbook entries as numbered steps, not prose. An operator at 3 AM
|
||||||
|
should be able to follow them without thinking.
|
||||||
|
|
||||||
|
### AUDIT.md (Suggested)
|
||||||
|
|
||||||
|
For services that handle cryptography, secrets, PII, or authentication,
|
||||||
|
maintain a security audit log. Each finding gets a numbered entry with:
|
||||||
|
|
||||||
|
- Description of the issue.
|
||||||
|
- Severity (critical, high, medium, low).
|
||||||
|
- Resolution status: open, resolved (with summary), or accepted (with
|
||||||
|
rationale for accepting the risk).
|
||||||
|
|
||||||
|
The priority summary table at the bottom provides a scannable overview.
|
||||||
|
Resolved and accepted items are struck through but retained for history.
|
||||||
|
See Metacrypt's `AUDIT.md` for the reference format.
|
||||||
|
|
||||||
|
### POLICY.md (Suggested)
|
||||||
|
|
||||||
|
For services with a policy engine or fine-grained access control, document
|
||||||
|
the policy model separately from the architecture spec. It should cover:
|
||||||
|
|
||||||
|
- Rule structure (fields, types, semantics).
|
||||||
|
- Evaluation algorithm (match logic, priority, default effect).
|
||||||
|
- Resource path conventions and glob patterns.
|
||||||
|
- Action classification.
|
||||||
|
- API endpoints for policy CRUD.
|
||||||
|
- Common policy patterns with examples.
|
||||||
|
- Role summary (what each MCIAS role gets by default).
|
||||||
|
|
||||||
|
This document is aimed at administrators who need to write policy rules,
|
||||||
|
not engineers who need to understand the implementation.
|
||||||
|
|
||||||
|
### Engine/Feature Design Documents
|
||||||
|
|
||||||
|
For services with a modular architecture, each module gets its own design
|
||||||
|
document (e.g. `engines/sshca.md`). These are detailed implementation plans
|
||||||
|
that include:
|
||||||
|
|
||||||
|
- Overview and core concepts.
|
||||||
|
- Data model and storage layout.
|
||||||
|
- Lifecycle (initialization, teardown).
|
||||||
|
- Operations table with auth requirements.
|
||||||
|
- API definitions (gRPC and REST).
|
||||||
|
- Implementation steps (file-by-file).
|
||||||
|
- Security considerations.
|
||||||
|
- References to existing code patterns to follow.
|
||||||
|
|
||||||
|
Write these before writing code. They are the blueprint, not the afterthought.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security
|
||||||
|
|
||||||
|
### General Principles
|
||||||
|
|
||||||
|
- **Default deny.** Unauthenticated requests are rejected. Unauthorized
|
||||||
|
requests are rejected. If in doubt, deny.
|
||||||
|
- **Fail closed.** If the service cannot verify authorization, it denies the
|
||||||
|
request. If the database is unavailable, the service is unavailable.
|
||||||
|
- **Least privilege.** Service processes run as non-root. systemd units
|
||||||
|
restrict filesystem access, syscalls, and capabilities.
|
||||||
|
- **No local user databases.** Authentication is always delegated to MCIAS.
|
||||||
|
|
||||||
|
### Cryptographic Standards
|
||||||
|
|
||||||
|
| Purpose | Algorithm | Notes |
|
||||||
|
|---------|-----------|-------|
|
||||||
|
| Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation |
|
||||||
|
| Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance |
|
||||||
|
| Key derivation | Argon2id | Memory-hard; tune params to hardware |
|
||||||
|
| Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 |
|
||||||
|
| CSPRNG | `crypto/rand` | All keys, nonces, salts, tokens |
|
||||||
|
| Constant-time comparison | `crypto/subtle` | All secret comparisons |
|
||||||
|
|
||||||
|
- **Never use RSA for new designs.** Ed25519 and ECDSA are faster, produce
|
||||||
|
smaller keys, and have simpler security models.
|
||||||
|
- **Zeroize secrets** from memory when they are no longer needed. Overwrite
|
||||||
|
byte slices with zeros, nil out pointers.
|
||||||
|
- **Never log secrets.** Keys, passwords, tokens, and plaintext must never
|
||||||
|
appear in log output.
|
||||||
|
|
||||||
|
### Web Security
|
||||||
|
|
||||||
|
- CSRF tokens on all mutating requests.
|
||||||
|
- `SameSite=Strict` on all cookies.
|
||||||
|
- `html/template` for automatic escaping.
|
||||||
|
- Validate all input at system boundaries.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Development Workflow
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build and run both servers locally:
|
||||||
|
make devserver
|
||||||
|
|
||||||
|
# Or build everything and run the full pipeline:
|
||||||
|
make all
|
||||||
|
```
|
||||||
|
|
||||||
|
The `devserver` target builds both binaries and runs them against a local
|
||||||
|
config in `srv/`. The `srv/` directory is gitignored — it holds your local
|
||||||
|
database, certificates, and configuration.
|
||||||
|
|
||||||
|
### Pre-Push Checklist
|
||||||
|
|
||||||
|
Before pushing a branch:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make all # vet → lint → test → build
|
||||||
|
make proto-lint # if proto files changed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Proto Changes
|
||||||
|
|
||||||
|
1. Edit `.proto` files in `proto/<service>/v2/`.
|
||||||
|
2. Run `make proto` to regenerate Go code.
|
||||||
|
3. Run `make proto-lint` to check for linting violations and breaking changes.
|
||||||
|
4. Update REST routes to match the new/changed RPCs.
|
||||||
|
5. Update gRPC interceptor maps for any new RPCs.
|
||||||
|
6. Update `ARCHITECTURE.md` API tables.
|
||||||
|
|
||||||
|
### Adding a New Feature
|
||||||
|
|
||||||
|
1. **Design first.** Write or update the relevant design document. For a new
|
||||||
|
engine or major subsystem, create a new doc in `docs/` or `engines/`.
|
||||||
|
2. **Implement.** Follow existing patterns — the design doc should reference
|
||||||
|
specific files and line numbers.
|
||||||
|
3. **Test.** Write tests alongside the implementation.
|
||||||
|
4. **Update docs.** Update `ARCHITECTURE.md`, `CLAUDE.md`, and route tables.
|
||||||
|
5. **Verify.** Run `make all`.
|
||||||
|
|
||||||
|
### CLI Commands
|
||||||
|
|
||||||
|
Every service uses cobra for CLI commands. Standard subcommands:
|
||||||
|
|
||||||
|
| Command | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| `server` | Start the service |
|
||||||
|
| `init` | First-time setup (if applicable) |
|
||||||
|
| `status` | Query a running instance's health |
|
||||||
|
| `snapshot` | Create a database backup |
|
||||||
|
|
||||||
|
Add service-specific subcommands as needed (e.g. `migrate-aad`, `unseal`).
|
||||||
|
Each command lives in its own file in `cmd/<service>/`.
|
||||||
Reference in New Issue
Block a user