Initial import.
This commit is contained in:
12
.gitignore
vendored
Normal file
12
.gitignore
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
# infrastructure / secrets
|
||||
/ca
|
||||
|
||||
# project directories: these are separate git repos
|
||||
/mcat
|
||||
/mcias
|
||||
/mc-proxy
|
||||
/mcr
|
||||
/metacrypt
|
||||
/mcdsl
|
||||
/mcns
|
||||
|
||||
76
CLAUDE.md
Normal file
76
CLAUDE.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
Metacircular is a multi-service personal infrastructure platform. This root repository is a workspace container — each subdirectory is a separate Git repo (gitignored here). The authoritative platform-wide standards live in `engineering-standards.md`.
|
||||
|
||||
## Project Map
|
||||
|
||||
| Directory | Purpose | Language |
|
||||
|-----------|---------|----------|
|
||||
| `mcias/` | Identity and Access Service — central SSO/IAM, all other services delegate auth here | Go |
|
||||
| `metacrypt/` | Cryptographic service engine — encrypted secrets, PKI/CA, SSH CA, transit encryption | Go |
|
||||
| `mc-proxy/` | TLS proxy and router — L4 passthrough or L7 terminating, PROXY protocol, firewall | Go |
|
||||
| `mcr/` | OCI container registry — integrated with MCIAS for auth and policy-based push/pull | Go |
|
||||
| `mcat/` | MCIAS login policy tester — lightweight web app to test and audit login policies | Go |
|
||||
| `mcdsl/` | Standard library — shared packages for auth, db, config, TLS servers, CSRF, snapshots | Go |
|
||||
| `ca/` | PKI infrastructure and secrets for dev/test (not source code, gitignored) | — |
|
||||
|
||||
Each subproject has its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`. When working in a subproject, read its own CLAUDE.md first.
|
||||
|
||||
## Service Dependencies
|
||||
|
||||
MCIAS is the root dependency — every other service authenticates through it. No service maintains its own user database. The dependency graph:
|
||||
|
||||
```
|
||||
mcias (standalone — no MCIAS dependency)
|
||||
├── metacrypt (uses MCIAS for auth)
|
||||
├── mc-proxy (uses MCIAS for admin auth)
|
||||
├── mcr (uses MCIAS for auth + policy)
|
||||
└── mcat (tests MCIAS login policies)
|
||||
```
|
||||
|
||||
## Standard Build Commands (all subprojects)
|
||||
|
||||
```bash
|
||||
make all # vet → lint → test → build (the CI pipeline)
|
||||
make build # go build ./...
|
||||
make test # go test ./...
|
||||
make vet # go vet ./...
|
||||
make lint # golangci-lint run ./...
|
||||
make proto # regenerate gRPC code from .proto files
|
||||
make proto-lint # buf lint + buf breaking
|
||||
make devserver # build and run locally against srv/ config
|
||||
make docker # build container image
|
||||
make clean # remove binaries
|
||||
```
|
||||
|
||||
Run a single test: `go test ./internal/auth/ -run TestTokenValidation`
|
||||
|
||||
## Critical Rules
|
||||
|
||||
1. **REST/gRPC sync**: Every REST endpoint must have a corresponding gRPC RPC, updated in the same change.
|
||||
2. **gRPC interceptor maps**: New RPCs must be added to `authRequiredMethods`, `adminRequiredMethods`, and/or `sealRequiredMethods`. Forgetting this is a security defect.
|
||||
3. **No CGo in production**: All builds use `CGO_ENABLED=0`. Use `modernc.org/sqlite`, not `mattn/go-sqlite3`.
|
||||
4. **No test frameworks**: Use stdlib `testing` only. Real SQLite in `t.TempDir()`, no mocks for databases.
|
||||
5. **Default deny**: Unauthenticated and unauthorized requests are always rejected. Admin detection comes solely from the MCIAS `admin` role.
|
||||
6. **Proto versioning**: Start at v1. Only create v2 for breaking changes. Non-breaking additions go in-place.
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
- **Seal/Unseal**: Metacrypt starts sealed and requires a password to unlock (Vault-like pattern). Key hierarchy: Password → Argon2id → KWK → MEK → per-engine DEKs.
|
||||
- **Web UI separation**: Web UIs run as separate binaries communicating with the API server via gRPC. No direct DB access from the web tier.
|
||||
- **Config**: TOML with env var overrides (`SERVICENAME_*`). All runtime data in `/srv/<service>/`.
|
||||
- **Policy engines**: Priority-based ACL rules, default deny, admin bypass. See metacrypt's implementation as reference.
|
||||
- **Auth flow**: Client → service `/v1/auth/login` → MCIAS client library → MCIAS validates → bearer token returned. Token validation cached 30s keyed by SHA-256 of token.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- Go 1.25+, chi router, cobra CLI, go-toml/v2
|
||||
- SQLite via modernc.org/sqlite (pure Go), WAL mode, foreign keys on
|
||||
- gRPC + protobuf, buf for linting
|
||||
- htmx + Go html/template for web UIs
|
||||
- golangci-lint v2 with errcheck, gosec, staticcheck, revive
|
||||
- TLS 1.3 minimum, AES-256-GCM, Argon2id, Ed25519
|
||||
104
README.md
Normal file
104
README.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Metacircular Dynamics
|
||||
|
||||
Metacircular Dynamics is a self-hosted personal infrastructure platform. The
|
||||
name comes from the tradition of metacircular evaluators in Lisp — a system
|
||||
defined in terms of itself — by way of SICP and Common Lisp projects that
|
||||
preceded this work. The infrastructure is metacircular in the same sense: the
|
||||
platform manages, secures, and hosts its own services.
|
||||
|
||||
Every component is self-hosted, every dependency is controlled, and the entire
|
||||
stack is operable by one person. No cloud providers, no third-party auth, no
|
||||
external databases. The platform is designed for a small number of machines — a
|
||||
personal homelab or a handful of VPSes — not for hyperscale.
|
||||
|
||||
All services are written in Go and follow shared
|
||||
[engineering standards](engineering-standards.md). Full platform documentation
|
||||
lives in [docs/metacircular.md](docs/metacircular.md).
|
||||
|
||||
## Components
|
||||
|
||||
| Component | Purpose | Status |
|
||||
|-----------|---------|--------|
|
||||
| **MCIAS** | Identity and access — the root of trust. SSO, token issuance, role management, login policy. Every other service delegates auth here. | Implemented |
|
||||
| **Metacrypt** | Cryptographic services — PKI/CA, transit encryption, encrypted secret storage behind a seal/unseal barrier. Issues TLS certificates for the platform. | Implemented |
|
||||
| **MCR** | Container registry — OCI-compliant image storage with MCIAS auth and policy-controlled push/pull. | Implemented |
|
||||
| **MC-Proxy** | Node ingress — TLS proxy and router. L4 passthrough or L7 terminating (per-route), PROXY protocol, firewall with rate limiting and GeoIP. | Implemented |
|
||||
| **MCNS** | Networking — DNS and address management for the platform. | Planned |
|
||||
| **MCP** | Control plane — operator-driven deployment, service registry, data transfer, master/agent container lifecycle. | Planned |
|
||||
|
||||
Shared library: **MCDSL** — standard library for all services (auth, db,
|
||||
config, TLS server, CSRF, snapshots).
|
||||
|
||||
Supporting tool: **MCAT** — lightweight web app for testing MCIAS login
|
||||
policies.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
MCIAS (standalone — the root of trust)
|
||||
├── Metacrypt (auth via MCIAS; provides certs to all services)
|
||||
├── MCR (auth via MCIAS; stores images pulled by MCP)
|
||||
├── MCNS (auth via MCIAS; provides DNS for the platform)
|
||||
├── MCP (auth via MCIAS; orchestrates everything; owns service registry)
|
||||
└── MC-Proxy (pre-auth; routes traffic to services behind it)
|
||||
```
|
||||
|
||||
Each machine is an **MC Node**. On every node, **MC-Proxy** accepts outside
|
||||
connections and routes by TLS SNI — either relaying raw TCP (L4) or
|
||||
terminating TLS and reverse proxying HTTP/2 (L7), per-route. **MCP Agent** on
|
||||
each node receives commands from **MCP Master** (which runs on the operator's
|
||||
workstation) and manages containers via the local runtime. Core infrastructure
|
||||
(MCIAS, Metacrypt, MCR) runs on nodes like any other workload.
|
||||
|
||||
```
|
||||
┌──────────────────┐ ┌──────────────┐
|
||||
│ Core Infra │ │ MCP Master │
|
||||
│ (e.g. MCIAS) │ │ │
|
||||
└────────┬─────────┘ └──────┬───────┘
|
||||
│ │ C2
|
||||
Outside ┌─────────────▼─────────────────────▼──────────┐
|
||||
Client ────▶│ MC Node │
|
||||
│ ┌───────────┐ │
|
||||
│ │ MC-Proxy │──┬──────┬──────┐ │
|
||||
│ └───────────┘ │ │ │ │
|
||||
│ ┌───▼┐ ┌──▼─┐ ┌─▼──┐ ┌─────┐ │
|
||||
│ │ α │ │ β │ │ γ │ │ MCP │ │
|
||||
│ └────┘ └────┘ └────┘ │Slave│ │
|
||||
│ └──┬──┘ │
|
||||
│ ┌────▼───┐│
|
||||
│ │Container│
|
||||
│ │Runtime │
|
||||
│ └────────┘│
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Design Principles
|
||||
|
||||
- **Sovereignty** — self-hosted end to end; no SaaS dependencies
|
||||
- **Simplicity** — SQLite over Postgres, stdlib testing, pure Go, htmx, single binaries
|
||||
- **Consistency** — every service follows identical patterns (layout, config, auth, deployment)
|
||||
- **Security as structure** — default deny, TLS 1.3 minimum, interceptor-map auth, encrypted-at-rest secrets
|
||||
- **Design before code** — ARCHITECTURE.md is the spec, written before implementation
|
||||
|
||||
## Tech Stack
|
||||
|
||||
Go 1.25+, SQLite (modernc.org/sqlite), chi router, gRPC + protobuf, htmx +
|
||||
Go html/template, golangci-lint v2, Ed25519/Argon2id/AES-256-GCM, TLS 1.3,
|
||||
container-first deployment (Docker + systemd).
|
||||
|
||||
## Repository Structure
|
||||
|
||||
This root repository is a workspace container. Each subdirectory is a separate
|
||||
Git repo with its own `CLAUDE.md`, `ARCHITECTURE.md`, `Makefile`, and `go.mod`:
|
||||
|
||||
```
|
||||
metacircular/
|
||||
├── mcias/ Identity and Access Service
|
||||
├── metacrypt/ Cryptographic service engine
|
||||
├── mcr/ Container registry
|
||||
├── mc-proxy/ TLS proxy and router
|
||||
├── mcat/ Login policy tester
|
||||
├── mcdsl/ Standard library (shared packages)
|
||||
├── ca/ PKI infrastructure (dev/test, not source code)
|
||||
└── docs/ Platform-wide documentation
|
||||
```
|
||||
927
docs/metacircular.md
Normal file
927
docs/metacircular.md
Normal file
@@ -0,0 +1,927 @@
|
||||
# Metacircular Infrastructure
|
||||
|
||||
## Background
|
||||
|
||||
Metacircular Dynamics is a personal infrastructure platform. The name comes
|
||||
from the tradition of metacircular evaluators in Lisp — a system defined in
|
||||
terms of itself — by way of SICP and Common Lisp projects that preceded this
|
||||
work. The infrastructure is metacircular in the same sense: the platform
|
||||
manages, secures, and hosts its own services.
|
||||
|
||||
The goal is sovereign infrastructure. Every component is self-hosted, every
|
||||
dependency is controlled, and the entire stack is operable by one person. There
|
||||
are no cloud provider dependencies, no third-party auth providers, no external
|
||||
databases. When a Metacircular node boots, it connects to Metacircular services
|
||||
for identity, certificates, container images, and workload scheduling.
|
||||
|
||||
All services are written in Go and follow a shared set of engineering standards
|
||||
(see `engineering-standards.md`). The platform is designed for a small number of
|
||||
machines — a personal homelab or a handful of VPSes — not for hyperscale.
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Sovereignty.** You own the whole stack. Identity, certificates, secrets,
|
||||
container images, DNS, networking — all self-hosted. No SaaS dependency means
|
||||
no vendor lock-in, no surprise deprecations, and no trust delegation to third
|
||||
parties.
|
||||
|
||||
**Simplicity over sophistication.** SQLite over Postgres. Stdlib `testing` over
|
||||
test frameworks. Pure Go over CGo. htmx over React. Single-binary deployments
|
||||
over microservice orchestrators. The right tool is the simplest one that solves
|
||||
the problem without creating a new one.
|
||||
|
||||
**Consistency as leverage.** Every service follows identical patterns: the same
|
||||
directory layout, the same Makefile targets, the same config format, the same
|
||||
auth integration, the same deployment model. Knowledge of one service transfers
|
||||
instantly to all others. A new service can be stood up by copying the skeleton.
|
||||
|
||||
**Security as structure.** Security is not a feature bolted on after the fact.
|
||||
Default deny is the starting posture. TLS 1.3 is the minimum, not a goal.
|
||||
Interceptor maps make "forgot to add auth" a visible, reviewable omission
|
||||
rather than a silent runtime failure. Secrets are encrypted at rest behind a
|
||||
seal/unseal barrier. Every service delegates identity to a single root of
|
||||
trust.
|
||||
|
||||
**Design before code.** The architecture document is written before
|
||||
implementation begins. It is the spec, not the afterthought. When the code and
|
||||
the spec disagree, one of them has a bug.
|
||||
|
||||
## High-Level Overview
|
||||
|
||||
Metacircular infrastructure is built from six core components, plus a shared
|
||||
standard library (**MCDSL**) that provides the common patterns all services
|
||||
depend on (auth integration, database setup, config loading, TLS server
|
||||
bootstrapping, CSRF, snapshots):
|
||||
|
||||
- **MCIAS** — Identity and access. The root of trust for all other services.
|
||||
Handles authentication, token issuance, role management, and login policy
|
||||
enforcement. Every other component delegates auth here.
|
||||
|
||||
- **Metacrypt** — Cryptographic services. PKI/CA, SSH CA, transit encryption,
|
||||
and encrypted secret storage behind a Vault-inspired seal/unseal barrier.
|
||||
Issues the TLS certificates that every other service depends on.
|
||||
|
||||
- **MCR** — Container registry. OCI-compliant image storage. MCP directs nodes
|
||||
to pull images from MCR. Policy-controlled push/pull integrated with MCIAS.
|
||||
|
||||
- **MCNS** — Networking. DNS and address management for the platform.
|
||||
|
||||
- **MCP** — Control plane. The orchestrator. A master/agent architecture that
|
||||
manages workload scheduling, container lifecycle, service registry, data
|
||||
transfer, and node state across the platform.
|
||||
|
||||
- **MC-Proxy** — Node ingress. A TLS proxy and router that sits on every node,
|
||||
accepts outside connections, and routes them to the correct service — either
|
||||
as raw TCP passthrough or via TLS-terminating HTTP/2 reverse proxy.
|
||||
|
||||
These components form a dependency graph rooted at MCIAS:
|
||||
|
||||
```
|
||||
MCIAS (standalone — the root of trust)
|
||||
├── Metacrypt (uses MCIAS for auth; provides certs to all services)
|
||||
├── MCR (uses MCIAS for auth; stores images pulled by MCP)
|
||||
├── MCNS (uses MCIAS for auth; provides DNS for the platform)
|
||||
├── MCP (uses MCIAS for auth; orchestrates everything; owns service registry)
|
||||
└── MC-Proxy (pre-auth; routes traffic to services behind it)
|
||||
```
|
||||
|
||||
### The Node Model
|
||||
|
||||
The unit of deployment is the **MC Node** — a machine (physical or virtual)
|
||||
that participates in the Metacircular platform.
|
||||
|
||||
```
|
||||
┌──────────────────┐ ┌──────────────┐
|
||||
│ System / Core │ │ MCP │
|
||||
│ Infrastructure │ │ Master │
|
||||
│ (e.g. MCIAS) │ │ │
|
||||
└────────┬─────────┘ └──────┬───────┘
|
||||
│ │ C2
|
||||
│ │
|
||||
Outside ┌─────────────▼─────────────────────▼──────────┐
|
||||
Client ────▶│ MC Node │
|
||||
│ │
|
||||
│ ┌───────────┐ │
|
||||
│ │ MC-Proxy │──┬──────┬──────┐ │
|
||||
│ └───────────┘ │ │ │ │
|
||||
│ ┌───▼┐ ┌──▼─┐ ┌─▼──┐ ┌─────┐ │
|
||||
│ │ α │ │ β │ │ γ │ │ MCP │ │
|
||||
│ └────┘ └────┘ └────┘ │Slave│ │
|
||||
│ └──┬──┘ │
|
||||
│ ┌────▼───┐│
|
||||
│ │Docker/ ││
|
||||
│ │etc. ││
|
||||
│ └────────┘│
|
||||
└──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Outside clients connect to **MC-Proxy**, which inspects the TLS SNI hostname
|
||||
and routes to the correct service (α, β, γ) — either as a raw TCP relay or
|
||||
via TLS-terminating HTTP/2 reverse proxy, per-route. The **MCP Agent** on each
|
||||
node receives C2 commands from the **MCP Master** (running on the operator's
|
||||
workstation) and manages local container lifecycle via the container runtime.
|
||||
Core infrastructure services (MCIAS, Metacrypt, MCR) run on nodes like any
|
||||
other workload.
|
||||
|
||||
### The Network Model
|
||||
|
||||
Metacircular nodes are connected via an **encrypted overlay network** — a
|
||||
self-managed WireGuard mesh, Tailscale, or similar. No component has a hard
|
||||
dependency on a specific overlay implementation; the platform requires only
|
||||
that nodes can reach each other over encrypted links.
|
||||
|
||||
```
|
||||
Public Internet
|
||||
│
|
||||
┌─────────▼──────────┐
|
||||
│ Edge MC-Proxy │ VPS (public IP)
|
||||
│ :443 │
|
||||
└─────────┬──────────┘
|
||||
│ PROXY protocol v2
|
||||
┌─────────▼──────────────────────────────────┐
|
||||
│ Encrypted Overlay (e.g. WireGuard) │
|
||||
│ │
|
||||
┌───────────┴──┐ ┌──────────┐ ┌──────────┐ ┌──────┴─────┐
|
||||
│ Origin │ │ Node B │ │ Node C │ │ Operator │
|
||||
│ MC-Proxy │ │ (MCP │ │ │ │ Workstation│
|
||||
│ + services │ │ agent) │ │ (MCP │ │ (MCP │
|
||||
│ (MCP agent) │ │ │ │ agent) │ │ Master) │
|
||||
└──────────────┘ └──────────┘ └──────────┘ └────────────┘
|
||||
```
|
||||
|
||||
**External traffic** flows from the internet through an edge MC-Proxy (on a
|
||||
public VPS), which forwards via PROXY protocol over the overlay to an origin
|
||||
MC-Proxy on the private network. The overlay preserves the real client IP
|
||||
across the hop.
|
||||
|
||||
**Internal traffic** (MCP C2, inter-service communication, MCNS DNS) flows
|
||||
directly over the overlay. MCP's C2 channel is gRPC over whatever link exists
|
||||
between master and agent — the overlay provides the transport.
|
||||
|
||||
The overlay network itself is a candidate for future Metacircular management
|
||||
(a self-hosted WireGuard mesh manager), consistent with the sovereignty
|
||||
principle of minimizing third-party dependencies.
|
||||
|
||||
---
|
||||
|
||||
## System Catalog
|
||||
|
||||
### MCIAS — Metacircular Identity and Access Service
|
||||
|
||||
MCIAS is the root of trust for the entire platform. Every other service
|
||||
delegates authentication to it; no service maintains its own user database.
|
||||
|
||||
**What it provides:**
|
||||
|
||||
- **Authentication.** Username/password with optional TOTP and FIDO2/WebAuthn.
|
||||
Credentials are verified by MCIAS and a signed JWT bearer token is returned.
|
||||
Services validate tokens by calling back to MCIAS (cached 30s by SHA-256 of
|
||||
the token).
|
||||
|
||||
- **Role-based access.** Three roles — `admin` (full access, policy bypass),
|
||||
`user` (policy-governed), `guest` (service-dependent restrictions). Admin
|
||||
detection comes solely from the MCIAS `admin` role; services never promote
|
||||
users locally.
|
||||
|
||||
- **Account types.** Human accounts (interactive users) and system accounts
|
||||
(service-to-service). Both authenticate the same way; system accounts enable
|
||||
automated workflows.
|
||||
|
||||
- **Login policy.** Priority-based ACL rules that control who can log into
|
||||
which services. Rules can target roles, account types, service names, and
|
||||
tags. This allows operators to restrict access per-service (e.g., deny
|
||||
`guest` from services tagged `env:restricted`) without changing the
|
||||
services themselves.
|
||||
|
||||
- **Token lifecycle.** Issuance, validation, renewal, and revocation.
|
||||
Ed25519-signed JWTs. Short expiry with renewal support.
|
||||
|
||||
**How other services integrate:** Every service includes an `[mcias]` config
|
||||
section with the MCIAS server URL, a `service_name`, and optional `tags`. At
|
||||
login time, the service forwards credentials to MCIAS along with this context.
|
||||
MCIAS evaluates login policy against the service context, verifies credentials,
|
||||
and returns a bearer token. The MCIAS Go client library
|
||||
(`git.wntrmute.dev/kyle/mcias/clients/go`) handles this flow.
|
||||
|
||||
**Status:** Implemented. v1.0.0 complete.
|
||||
|
||||
---
|
||||
|
||||
### Metacrypt — Cryptographic Service Engine
|
||||
|
||||
Metacrypt provides cryptographic resources to the platform through a modular
|
||||
engine architecture, backed by an encrypted storage barrier inspired by
|
||||
HashiCorp Vault.
|
||||
|
||||
**What it provides:**
|
||||
|
||||
- **PKI / Certificate Authority.** X.509 certificate issuance. Root and
|
||||
intermediate CAs, certificate signing, CRL management, ACME protocol
|
||||
support. This is how every service in the platform gets its TLS
|
||||
certificates.
|
||||
|
||||
- **SSH CA.** (Planned.) SSH certificate signing for host and user
|
||||
certificates, replacing static SSH key management.
|
||||
|
||||
- **Transit encryption.** (Planned.) Encrypt and decrypt data without exposing
|
||||
keys to the caller. Envelope encryption for services that need to protect
|
||||
data at rest without managing their own key material.
|
||||
|
||||
- **User-to-user encryption.** (Planned.) End-to-end encryption between users,
|
||||
with key management handled by Metacrypt.
|
||||
|
||||
**Seal/unseal model:** Metacrypt starts sealed. An operator provides a password
|
||||
which derives (via Argon2id) a key-wrapping key, which decrypts the master
|
||||
encryption key (MEK), which in turn unwraps per-engine data encryption keys
|
||||
(DEKs). Each engine mount gets its own DEK, limiting blast radius — compromise
|
||||
of one engine's key does not expose another's data.
|
||||
|
||||
```
|
||||
Password → Argon2id → KWK → [decrypt] → MEK → [unwrap] → per-engine DEKs
|
||||
```
|
||||
|
||||
**Engine architecture:** Engines are pluggable providers that register with a
|
||||
central registry. Each engine mount has a type, a name, its own DEK, and its
|
||||
own configuration. The engine interface handles initialization, seal/unseal
|
||||
lifecycle, and request routing. New engine types plug in without modifying the
|
||||
core.
|
||||
|
||||
**Policy:** Fine-grained ACL rules control which users can perform which
|
||||
operations on which engine mounts. Priority-based evaluation, default deny,
|
||||
admin bypass. See Metacrypt's `POLICY.md` for the full model.
|
||||
|
||||
**Status:** Implemented. CA engine complete with ACME support. SSH CA, transit,
|
||||
and user-to-user engines planned.
|
||||
|
||||
---
|
||||
|
||||
### MCR — Metacircular Container Registry
|
||||
|
||||
MCR is an OCI Distribution Spec-compliant container registry. It stores and
|
||||
serves the container images that MCP deploys across the platform.
|
||||
|
||||
**What it provides:**
|
||||
|
||||
- **OCI-compliant image storage.** Pull, push, tag, and delete container
|
||||
images. Content-addressed by SHA-256 digest. Manifests and tags in SQLite,
|
||||
blobs on the filesystem.
|
||||
|
||||
- **Authenticated access.** No anonymous access. MCR uses the OCI token
|
||||
authentication flow: clients hit `/v2/`, receive a 401 with a token
|
||||
endpoint, authenticate via MCIAS, and use the returned JWT for subsequent
|
||||
requests.
|
||||
|
||||
- **Policy-controlled push/pull.** Fine-grained ACL rules govern who can push
|
||||
to or pull from which repositories. Integrated with MCIAS roles.
|
||||
|
||||
- **Garbage collection.** Unreferenced blobs are cleaned up via the admin CLI
|
||||
(`mcrctl`).
|
||||
|
||||
**How it fits in:** MCP directs nodes to pull images from MCR. When a workload
|
||||
is scheduled, MCP tells the node's agent which image to pull and where to get
|
||||
it. MCR sits behind an MC-Proxy instance for TLS routing.
|
||||
|
||||
**Status:** Implemented. Phase 12 (web UI) complete.
|
||||
|
||||
---
|
||||
|
||||
### MC-Proxy — TLS Proxy and Router
|
||||
|
||||
MC-Proxy is the ingress layer for every MC Node. It accepts TLS connections,
|
||||
extracts the SNI hostname, and routes to the correct backend. Each route is
|
||||
independently configured as either **L4 passthrough** (raw TCP relay, no TLS
|
||||
termination) or **L7 terminating** (terminates TLS, reverse proxies HTTP/2 and
|
||||
HTTP/1.1 including gRPC). Both modes coexist on the same listener.
|
||||
|
||||
**What it provides:**
|
||||
|
||||
- **SNI-based routing.** A route table maps hostnames to backend addresses.
|
||||
Exact match, case-insensitive. Multiple listeners can bind different ports,
|
||||
each with its own route table, all sharing the same global firewall.
|
||||
|
||||
- **Dual-mode proxying.** L4 routes relay raw TCP — backends see the original
|
||||
TLS handshake, MC-Proxy adds nothing. L7 routes terminate TLS at the proxy
|
||||
and reverse proxy HTTP/2 to backends (plaintext h2c or re-encrypted TLS),
|
||||
with header injection (`X-Forwarded-For`, `X-Real-IP`), gRPC streaming
|
||||
support, and trailer forwarding.
|
||||
|
||||
- **Global firewall.** Every connection is evaluated before routing: per-IP
|
||||
rate limiting, IP/CIDR blocks, and GeoIP country blocks (MaxMind GeoLite2).
|
||||
Blocked connections get a TCP RST — no error messages, no TLS alerts.
|
||||
|
||||
- **PROXY protocol.** Listeners can accept v1/v2 headers from upstream proxies
|
||||
to learn the real client IP. Routes can send v2 headers to downstream
|
||||
backends. This enables multi-hop deployments — a public edge MC-Proxy on a
|
||||
VPS forwarding over the encrypted overlay to a private origin MC-Proxy —
|
||||
while preserving the real client IP for firewall evaluation and logging.
|
||||
|
||||
- **Runtime management.** Routes and firewall rules can be updated at runtime
|
||||
via a gRPC admin API on a Unix domain socket (filesystem permissions for
|
||||
access control, no network exposure). State is persisted to SQLite with
|
||||
write-through semantics.
|
||||
|
||||
**How it fits in:** MC-Proxy is pre-auth infrastructure. It sits in front of
|
||||
everything on a node. Outside clients connect to MC-Proxy on well-known ports
|
||||
(443, 8443, etc.) and MC-Proxy routes to the correct backend based on the
|
||||
hostname the client is trying to reach. A typical production deployment uses
|
||||
two instances — an edge proxy on a public VPS and an origin proxy on the
|
||||
private network, connected over the overlay with PROXY protocol preserving
|
||||
client IPs across the hop.
|
||||
|
||||
**Status:** Implemented.
|
||||
|
||||
---
|
||||
|
||||
### MCNS — Metacircular Networking Service
|
||||
|
||||
MCNS provides DNS for the platform. It manages two internal zones and serves
|
||||
as the name resolution layer for the Metacircular network. Service discovery
|
||||
(which services run where) is owned by MCP; MCNS translates those assignments
|
||||
into DNS records.
|
||||
|
||||
**What it will provide:**
|
||||
|
||||
- **Internal DNS.** MCNS is authoritative for the internal zones of the
|
||||
Metacircular network. Three zones serve different purposes:
|
||||
|
||||
| Zone | Example | Purpose |
|
||||
|------|---------|---------|
|
||||
| `*.metacircular.net` | `metacrypt.metacircular.net` | External, public-facing. Managed outside MCNS (existing DNS). Points to edge MC-Proxy. |
|
||||
| `*.mcp.metacircular.net` | `vade.mcp.metacircular.net` | Node addresses. Maps node names to their network addresses (e.g. Tailscale IPs). |
|
||||
| `*.svc.mcp.metacircular.net` | `metacrypt.svc.mcp.metacircular.net` | Internal service addresses. Maps service names to the node and port where they currently run. |
|
||||
|
||||
The `*.mcp.metacircular.net` and `*.svc.mcp.metacircular.net` zones are
|
||||
managed by MCNS. The external `*.metacircular.net` zone is managed separately
|
||||
(existing DNS infrastructure) and is mostly static.
|
||||
|
||||
- **MCP integration.** MCP pushes DNS record updates to MCNS after deploy and
|
||||
migrate operations. When MCP starts service α on node X, it calls the MCNS
|
||||
API to set `α.svc.mcp.metacircular.net` to X's address. Services and clients
|
||||
using internal DNS names automatically resolve to the right place without
|
||||
config changes.
|
||||
|
||||
- **Record management API.** Authenticated via MCIAS. MCP is the primary
|
||||
consumer for dynamic updates. Operators can also manage records directly
|
||||
for static entries (node addresses, aliases).
|
||||
|
||||
**How it fits in:** MCNS answers "what is the address of X?" MCP answers "where
|
||||
is service α running?" and pushes the answer to MCNS. This separation means
|
||||
services can use stable DNS names in their configs (e.g.,
|
||||
`mcias.svc.mcp.metacircular.net` in `[mcias] server_url`) that survive
|
||||
migration without config changes.
|
||||
|
||||
**Status:** Not yet implemented.
|
||||
|
||||
---
|
||||
|
||||
### MCP — Metacircular Control Plane
|
||||
|
||||
MCP is the orchestrator. It manages what runs where across the platform. The
|
||||
deployment model is operator-driven: the user says "deploy service α" and MCP
|
||||
handles the rest. MCP Master runs on the operator's workstation; agents run on
|
||||
each managed node.
|
||||
|
||||
**What it will provide:**
|
||||
|
||||
- **Service registry.** MCP is the source of truth for what is running where.
|
||||
It tracks every service, which node it's on, and its current state. Other
|
||||
components that need to find a service (including MC-Proxy for route table
|
||||
updates) query MCP's registry.
|
||||
|
||||
- **Deploy.** The operator says "deploy α". MCP checks if α is already running
|
||||
somewhere. If it is, MCP pulls the new container image on that node and
|
||||
restarts the service in place. If it isn't running, MCP selects a node
|
||||
(the operator can pin to a specific node but shouldn't have to), transfers
|
||||
the initial config, pulls the image from MCR, starts the container, and
|
||||
pushes a DNS update to MCNS (`α.svc.mcp.metacircular.net` → node address).
|
||||
|
||||
- **Migrate.** Move a service from one node to another. MCP snapshots the
|
||||
service's `/srv/<service>/` directory on the source node (as a tar.zst
|
||||
image), transfers it to the destination, extracts it, starts the service,
|
||||
stops it on the source, and updates MCNS so DNS points to the new location.
|
||||
The `/srv/<service>/` convention makes this uniform across all services.
|
||||
|
||||
- **Data transfer.** The C2 channel supports file-level operations between
|
||||
master and agents: copy or fetch individual files (push a config, pull a
|
||||
log), and transfer tar.zst archives for bulk snapshot/restore of service
|
||||
data directories. This is the foundation for both migration and backup.
|
||||
|
||||
- **Service snapshots.** To snapshot `/srv/<service>/`, the agent runs
|
||||
`VACUUM INTO` to create a consistent database copy, then builds a tar.zst
|
||||
that includes the full directory but **excludes** live database files
|
||||
(`*.db`, `*.db-wal`, `*.db-shm`) and the `backups/` directory. The
|
||||
temporary VACUUM INTO copy is injected into the archive as `<service>.db`.
|
||||
The result is a clean, minimal archive that extracts directly into a
|
||||
working service directory on the destination.
|
||||
|
||||
- **Container lifecycle.** Start, stop, restart, and update containers on
|
||||
nodes. MCP Master issues commands; agents on each node execute them against
|
||||
the local container runtime (Docker, etc.).
|
||||
|
||||
- **Master/agent architecture.** MCP Master runs on the operator's machine.
|
||||
Agents run on every managed node, receiving C2 (command and control) from
|
||||
Master, reporting node status, and managing local workloads. The C2 channel
|
||||
is authenticated via MCIAS. The master does not need to be always-on —
|
||||
agents keep running their workloads independently; the master is needed only
|
||||
to issue new commands.
|
||||
|
||||
- **Node management.** Track which nodes are in the platform, their health,
|
||||
available resources, and running workloads.
|
||||
|
||||
- **Scheduling.** When placing a new service, MCP selects a node based on
|
||||
available resources and any operator-specified constraints. The operator can
|
||||
override with an explicit node, but the default is MCP's choice.
|
||||
|
||||
**How it fits in:** MCP is the piece that ties everything together. MCIAS
|
||||
provides identity, Metacrypt provides certificates, MCR provides images, MCNS
|
||||
provides DNS, MC-Proxy provides ingress — MCP orchestrates all of it, owns the
|
||||
map of what is running where, and pushes updates to MCNS so DNS stays current. It is the system that makes the
|
||||
infrastructure metacircular: the control plane deploys and manages the very
|
||||
services it depends on.
|
||||
|
||||
**Container-first design:** All Metacircular services are built as containers
|
||||
(multi-stage Docker builds, Alpine runtime, non-root) specifically so that MCP
|
||||
can deploy them. The systemd unit files exist as a fallback and for bootstrap —
|
||||
the long-term deployment model is MCP-managed containers.
|
||||
|
||||
**Status:** Not yet implemented.
|
||||
|
||||
---
|
||||
|
||||
### MCAT — MCIAS Login Policy Tester
|
||||
|
||||
MCAT is a lightweight diagnostic tool, not a core infrastructure component. It
|
||||
presents a web login form, forwards credentials to MCIAS with a configurable
|
||||
`service_name` and `tags`, and shows whether the login was accepted or denied
|
||||
by policy. This lets operators verify that login policy rules behave as
|
||||
expected without touching the target service.
|
||||
|
||||
**Status:** Implemented.
|
||||
|
||||
---
|
||||
|
||||
## Bootstrap Sequence
|
||||
|
||||
Bringing up a Metacircular platform from scratch requires careful ordering
|
||||
because of the circular dependencies — the infrastructure manages itself, but
|
||||
must exist before it can do so. The key challenge is that nearly every service
|
||||
needs TLS certificates (from Metacrypt) and authentication (from MCIAS), but
|
||||
those services themselves need to be running first.
|
||||
|
||||
During bootstrap, all services run as **systemd units** on a single bootstrap
|
||||
node. MCP takes over lifecycle management as the final step.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Before any service starts, the operator needs:
|
||||
|
||||
- **The bootstrap node** — a machine (VPS, homelab server, etc.) with the
|
||||
overlay network configured and reachable.
|
||||
- **Seed PKI** — MCIAS and Metacrypt need TLS certs to start, but Metacrypt
|
||||
isn't running yet to issue them. The root CA is generated manually using
|
||||
`github.com/kisom/cert` and stored in the `ca/` directory in the workspace.
|
||||
Initial service certificates are issued from this root. The root CA is then
|
||||
imported into Metacrypt once it's running, so Metacrypt becomes the
|
||||
authoritative CA for the platform going forward.
|
||||
- **TOML config files** — each service needs its config in `/srv/<service>/`.
|
||||
During bootstrap these are written manually. Later, MCP handles config
|
||||
distribution.
|
||||
|
||||
### Startup Order
|
||||
|
||||
```
|
||||
Phase 0: Seed PKI
|
||||
Operator creates or obtains initial TLS certificates for MCIAS
|
||||
and Metacrypt. Places them in /srv/mcias/certs/ and
|
||||
/srv/metacrypt/certs/.
|
||||
|
||||
Phase 1: Identity
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MCIAS starts (systemd) │
|
||||
│ - No dependencies on other Metacircular services │
|
||||
│ - Uses seed TLS certificates │
|
||||
│ - Operator creates initial admin account │
|
||||
│ - Operator creates system accounts for other services│
|
||||
└──────────────────────────────────────────────────────┘
|
||||
|
||||
Phase 2: Cryptographic Services
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ Metacrypt starts (systemd) │
|
||||
│ - Authenticates against MCIAS │
|
||||
│ - Uses seed TLS certificates initially │
|
||||
│ - Operator initializes and unseals │
|
||||
│ - Operator creates CA engine, imports root CA from │
|
||||
│ ca/, creates issuers │
|
||||
│ - Can now issue certificates for all other services │
|
||||
│ - Reissue MCIAS and Metacrypt certs from own CA │
|
||||
│ (replace seed certs with Metacrypt-issued certs) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
|
||||
Phase 3: Ingress
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MC-Proxy starts (systemd) │
|
||||
│ - Static route table from TOML config │
|
||||
│ - Routes external traffic to MCIAS, Metacrypt │
|
||||
│ - No MCIAS auth (pre-auth infrastructure) │
|
||||
│ - TLS certs for L7 routes from Metacrypt │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
|
||||
Phase 4: Container Registry
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MCR starts (systemd) │
|
||||
│ - Authenticates against MCIAS │
|
||||
│ - TLS certificates from Metacrypt │
|
||||
│ - Operator pushes container images for all services │
|
||||
│ (including MCIAS, Metacrypt, MC-Proxy themselves) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
|
||||
Phase 5: DNS
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MCNS starts (systemd) │
|
||||
│ - Authenticates against MCIAS │
|
||||
│ - Operator configures initial DNS records │
|
||||
│ (node addresses, service names) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
|
||||
Phase 6: Control Plane
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ MCP Agent starts on bootstrap node (systemd) │
|
||||
│ MCP Master starts on operator workstation │
|
||||
│ - Authenticates against MCIAS │
|
||||
│ - Master registers the bootstrap node │
|
||||
│ - Master imports running services into its registry │
|
||||
│ - From here, MCP owns the service map │
|
||||
│ - Services can be redeployed as MCP-managed │
|
||||
│ containers (replacing the systemd units) │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### The Seed Certificate Problem
|
||||
|
||||
The circular dependency between MCIAS, Metacrypt, and TLS is resolved by
|
||||
bootstrapping with a **manually generated root CA**:
|
||||
|
||||
1. The operator generates a root CA using `github.com/kisom/cert`. This root
|
||||
and initial service certificates live in the `ca/` directory.
|
||||
2. MCIAS and Metacrypt start with certificates issued from this external root.
|
||||
3. Metacrypt comes up. The operator imports the root CA into Metacrypt's CA
|
||||
engine, making Metacrypt the authoritative issuer under the same root.
|
||||
4. Metacrypt can now issue and renew certificates for all services. The `ca/`
|
||||
directory remains as the offline backup of the root material.
|
||||
|
||||
This is a one-time process. The root CA is generated once, imported once, and
|
||||
from that point forward Metacrypt is the sole CA. MCP handles certificate
|
||||
provisioning for all services.
|
||||
|
||||
### Adding a New Node
|
||||
|
||||
Once the platform is bootstrapped, adding a node is straightforward:
|
||||
|
||||
1. Provision the machine and connect it to the overlay network.
|
||||
2. Install the MCP agent binary.
|
||||
3. Configure the agent with the MCP Master address and MCIAS credentials
|
||||
(system account for the node).
|
||||
4. Start the agent. It authenticates with MCIAS, connects to Master, and
|
||||
reports as available.
|
||||
5. The operator deploys workloads to it via MCP. MCP handles image pulls,
|
||||
config transfer, certificate provisioning, and DNS updates.
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
If the bootstrap node is lost, recovery follows the same sequence as initial
|
||||
bootstrap — but with data restored from backups:
|
||||
|
||||
1. Start MCIAS on a new node, restore its database from the most recent
|
||||
`VACUUM INTO` snapshot.
|
||||
2. Start Metacrypt, restore its database. Unseal with the original password.
|
||||
The entire key hierarchy and all issued certificates are recovered.
|
||||
3. Bring up the remaining services in order, restoring their databases.
|
||||
4. Start MCP, which rebuilds its registry from the running services.
|
||||
5. Update DNS (MCNS or external) to point to the new node.
|
||||
|
||||
Every service's `snapshot` CLI command and daily backup timer exist specifically
|
||||
to make this recovery possible. The `/srv/<service>/` convention means each
|
||||
service's entire state is a single directory to back up and restore.
|
||||
|
||||
---
|
||||
|
||||
## Certificate Lifecycle
|
||||
|
||||
Every service in the platform requires TLS certificates, and Metacrypt is the
|
||||
CA that issues them. This section describes how certificates flow from
|
||||
Metacrypt to services, how they are renewed, and how the pieces fit together.
|
||||
|
||||
### PKI Structure
|
||||
|
||||
Metacrypt implements a **two-tier PKI**:
|
||||
|
||||
```
|
||||
Root CA (self-signed, generated at engine initialization)
|
||||
├── Issuer "infra" (intermediate CA for infrastructure services)
|
||||
├── Issuer "services" (intermediate CA for application services)
|
||||
└── Issuer "clients" (intermediate CA for client certificates)
|
||||
```
|
||||
|
||||
The root CA signs intermediate CAs ("issuers"), which in turn sign leaf
|
||||
certificates. Each issuer is scoped to a purpose. The root CA certificate is
|
||||
the trust anchor — services and clients need it (or the relevant issuer chain)
|
||||
to verify certificates presented by other services.
|
||||
|
||||
### ACME Protocol
|
||||
|
||||
Metacrypt implements an **ACME server** (RFC 8555) with External Account
|
||||
Binding (EAB). This is the same protocol used by Let's Encrypt, meaning any
|
||||
standard ACME client can obtain certificates from Metacrypt.
|
||||
|
||||
The ACME flow:
|
||||
|
||||
1. Client authenticates with MCIAS and requests EAB credentials from Metacrypt.
|
||||
2. Client registers an ACME account using the EAB credentials.
|
||||
3. Client places a certificate order (one or more domain names).
|
||||
4. Metacrypt creates authorization challenges (HTTP-01 and DNS-01 supported).
|
||||
5. Client fulfills the challenge (places a file for HTTP-01, or a DNS TXT
|
||||
record for DNS-01).
|
||||
6. Metacrypt validates the challenge and issues the certificate.
|
||||
7. Client downloads the certificate chain and private key.
|
||||
|
||||
A **Go client library** (`metacrypt/clients/go`) wraps this entire flow:
|
||||
MCIAS login, EAB fetch, account registration, challenge fulfillment, and
|
||||
certificate download. Services that integrate this library can obtain and
|
||||
renew certificates programmatically.
|
||||
|
||||
### How Services Get Certificates Today
|
||||
|
||||
Currently, certificates are provisioned through Metacrypt's **REST API or web
|
||||
UI** and placed into each service's `/srv/<service>/certs/` directory. This is
|
||||
a manual process — the operator issues a certificate, downloads it, and
|
||||
deploys the files. The ACME client library exists but is not yet integrated
|
||||
into any service.
|
||||
|
||||
### How It Will Work With MCP
|
||||
|
||||
MCP is the natural place to automate certificate provisioning:
|
||||
|
||||
- **Initial deploy.** When MCP deploys a new service, it can provision a
|
||||
certificate from Metacrypt (via the ACME client library or the REST API),
|
||||
transfer the cert and key to the node as part of the config push to
|
||||
`/srv/<service>/certs/`, and start the service with valid TLS material.
|
||||
|
||||
- **Renewal.** MCP knows what services are running and when their certificates
|
||||
expire. It can renew certificates before expiry by re-running the ACME flow
|
||||
(or calling Metacrypt's `renew` operation) and pushing updated files to the
|
||||
node. The service restarts with the new certificate.
|
||||
|
||||
- **Migration.** When MCP migrates a service, the certificate in
|
||||
`/srv/<service>/certs/` moves with the tar.zst snapshot. If the service's
|
||||
hostname changes (new node, new DNS name), MCP provisions a new certificate
|
||||
for the new name.
|
||||
|
||||
- **MC-Proxy L7 routes.** MC-Proxy's L7 mode requires certificate/key pairs
|
||||
for TLS termination. MCP (or the operator) can provision these from
|
||||
Metacrypt and push them to MC-Proxy's cert directory. MC-Proxy's
|
||||
architecture doc lists ACME integration and Metacrypt key storage as future
|
||||
work.
|
||||
|
||||
### Trust Distribution
|
||||
|
||||
Every service and client that validates TLS certificates needs the root CA
|
||||
certificate (or the relevant issuer chain). Metacrypt serves these publicly
|
||||
without authentication:
|
||||
|
||||
- `GET /v1/pki/{mount}/ca` — root CA certificate (PEM)
|
||||
- `GET /v1/pki/{mount}/ca/chain` — full chain: issuer + root (PEM)
|
||||
- `GET /v1/pki/{mount}/issuer/{name}` — specific issuer certificate (PEM)
|
||||
|
||||
During bootstrap, the root CA cert is distributed manually (or via the `ca/`
|
||||
directory in the workspace). Once MCP is running, it can distribute the CA
|
||||
cert as part of service deployment. Services reference the CA cert path in
|
||||
their `[mcias]` config section (`ca_cert`) to verify connections to MCIAS and
|
||||
other services.
|
||||
|
||||
---
|
||||
|
||||
## End-to-End Deploy Workflow
|
||||
|
||||
This traces a deployment from code change to running service, showing how every
|
||||
component participates. The example deploys a new version of service α that is
|
||||
already running on Node B.
|
||||
|
||||
### 1. Build and Push
|
||||
|
||||
The operator builds a new container image and pushes it to MCR:
|
||||
|
||||
```
|
||||
Operator workstation (vade)
|
||||
$ docker build -t mcr.metacircular.net/α:v1.2.0 .
|
||||
$ docker push mcr.metacircular.net/α:v1.2.0
|
||||
│
|
||||
▼
|
||||
MC-Proxy (edge) ──overlay──→ MC-Proxy (origin) ──→ MCR
|
||||
│
|
||||
Authenticates
|
||||
via MCIAS
|
||||
│
|
||||
Policy check:
|
||||
can this user
|
||||
push to α?
|
||||
│
|
||||
Image stored
|
||||
(blobs + manifest)
|
||||
```
|
||||
|
||||
The `docker push` goes through MC-Proxy (SNI routing to MCR), authenticates
|
||||
via the OCI token flow (which delegates to MCIAS), and is checked against
|
||||
MCR's push policy. The image is stored content-addressed in MCR.
|
||||
|
||||
### 2. Deploy
|
||||
|
||||
The operator tells MCP to deploy:
|
||||
|
||||
```
|
||||
Operator workstation (vade)
|
||||
$ mcp deploy α # or: mcp deploy α --image v1.2.0
|
||||
│
|
||||
MCP Master
|
||||
│
|
||||
├── Registry lookup: α is running on Node B
|
||||
│
|
||||
├── C2 (gRPC over overlay) to Node B agent:
|
||||
│ "pull mcr.metacircular.net/α:v1.2.0 and restart"
|
||||
│
|
||||
▼
|
||||
MCP Agent (Node B)
|
||||
│
|
||||
├── Pull image from MCR
|
||||
│ (authenticates via MCIAS, same OCI flow)
|
||||
│
|
||||
├── Stop running container
|
||||
│
|
||||
├── Start new container from updated image
|
||||
│ - Mounts /srv/α/ (config, database, certs all persist)
|
||||
│ - Service starts, authenticates to MCIAS, resumes operation
|
||||
│
|
||||
└── Report status back to Master
|
||||
```
|
||||
|
||||
Since α is already running on Node B, this is an in-place update. The
|
||||
`/srv/α/` directory is untouched — config, database, and certificates persist
|
||||
across the container restart.
|
||||
|
||||
### 3. First-Time Deploy
|
||||
|
||||
If α has never been deployed, MCP does more work:
|
||||
|
||||
```
|
||||
Operator workstation (vade)
|
||||
$ mcp deploy α --config α.toml
|
||||
│
|
||||
MCP Master
|
||||
│
|
||||
├── Registry lookup: α is not running anywhere
|
||||
│
|
||||
├── Scheduling: select Node C (best fit)
|
||||
│
|
||||
├── Provision TLS certificate from Metacrypt
|
||||
│ (ACME flow or REST API)
|
||||
│
|
||||
├── C2 to Node C agent:
|
||||
│ 1. Create /srv/α/ directory structure
|
||||
│ 2. Transfer config file (α.toml → /srv/α/α.toml)
|
||||
│ 3. Transfer TLS cert+key → /srv/α/certs/
|
||||
│ 4. Transfer root CA cert → /srv/α/certs/ca.pem
|
||||
│ 5. Pull image from MCR
|
||||
│ 6. Start container
|
||||
│
|
||||
├── Update service registry: α → Node C
|
||||
│
|
||||
├── Push DNS update to MCNS:
|
||||
│ α.svc.mcp.metacircular.net → Node C address
|
||||
│
|
||||
└── (Optionally) update MC-Proxy route table
|
||||
if α needs external ingress
|
||||
```
|
||||
|
||||
### 4. Migration
|
||||
|
||||
Moving α from Node B to Node C:
|
||||
|
||||
```
|
||||
Operator workstation (vade)
|
||||
$ mcp migrate α --to node-c # or let MCP choose the destination
|
||||
│
|
||||
MCP Master
|
||||
│
|
||||
├── C2 to Node B agent:
|
||||
│ 1. Stop α container
|
||||
│ 2. Snapshot /srv/α/ → tar.zst archive
|
||||
│ 3. Transfer tar.zst to Master (or directly to Node C)
|
||||
│
|
||||
├── C2 to Node C agent:
|
||||
│ 1. Receive tar.zst archive
|
||||
│ 2. Extract to /srv/α/
|
||||
│ 3. Pull container image from MCR (if not cached)
|
||||
│ 4. Start container
|
||||
│ 5. Report status
|
||||
│
|
||||
├── Update service registry: α → Node C
|
||||
│
|
||||
├── Push DNS update to MCNS:
|
||||
│ α.svc.mcp.metacircular.net → Node C address
|
||||
│
|
||||
└── (If α had external ingress) update MC-Proxy route
|
||||
or rely on DNS change
|
||||
```
|
||||
|
||||
### What Each Component Does
|
||||
|
||||
| Step | MCIAS | Metacrypt | MCR | MC-Proxy | MCP | MCNS |
|
||||
|------|-------|-----------|-----|----------|-----|------|
|
||||
| Build/push image | Authenticates push | — | Stores image, enforces push policy | Routes traffic to MCR | — | — |
|
||||
| Deploy (update) | Authenticates pull, authenticates service on start | — | Serves image to agent | Routes traffic to service | Coordinates: registry lookup, C2 to agent | — |
|
||||
| Deploy (new) | Authenticates pull, authenticates service on start | Issues TLS certificate | Serves image to agent | Routes traffic to service (if external) | Coordinates: scheduling, cert provisioning, config transfer, DNS update | Updates DNS records |
|
||||
| Migrate | Authenticates service on new node | Issues new cert (if hostname changes) | Serves image (if not cached) | Routes traffic to new location | Coordinates: snapshot, transfer, DNS update | Updates DNS records |
|
||||
| Steady state | Validates tokens for every authenticated request | Serves CA certs publicly, renews certs | Serves image pulls | Routes all external traffic | Tracks service health, holds registry | Serves DNS queries |
|
||||
|
||||
---
|
||||
|
||||
## Future Ideas
|
||||
|
||||
Components and capabilities that may be worth building but have no immediate
|
||||
timeline. Listed here to capture the thinking; none are committed.
|
||||
|
||||
### Observability — Log Collection and Health Monitoring
|
||||
|
||||
Every service already produces structured logs (`log/slog`) and exposes health
|
||||
checks (gRPC `Health.Check` or REST status endpoints). What's missing is
|
||||
aggregation — today, debugging a cross-service issue means SSH'ing into each
|
||||
node and reading local logs.
|
||||
|
||||
A collector could:
|
||||
|
||||
- Gather structured logs from services on each node and forward them to a
|
||||
central store.
|
||||
- Periodically health-check local services and report status.
|
||||
- Feed health data into MCP so it can make informed decisions (restart
|
||||
unhealthy services, avoid scheduling on degraded nodes, alert the operator).
|
||||
|
||||
This might be a standalone service or an MCP agent capability, depending on
|
||||
weight. If it's just "tail logs and hit health endpoints," it fits in the
|
||||
agent. If it grows to include indexing, querying, retention policies, and
|
||||
alerting rules, it's its own service.
|
||||
|
||||
### Object Store
|
||||
|
||||
The platform has structured storage (SQLite), blob storage scoped to container
|
||||
images (MCR), and encrypted key-value storage (Metacrypt's barrier). It does
|
||||
not have general-purpose object/blob storage.
|
||||
|
||||
Potential uses:
|
||||
|
||||
- **Centralized backups.** Service snapshots currently live on each node in
|
||||
`/srv/<service>/backups/`. A central object store gives MCP somewhere to push
|
||||
tar.zst snapshots for offsite retention.
|
||||
- **Artifact storage.** Build outputs, large files, anything that doesn't fit
|
||||
in a database row.
|
||||
- **Data sharing between services.** Files that need to move between services
|
||||
outside the MCP C2 channel.
|
||||
|
||||
Prior art: [Nebula](https://metacircular.net/pages/nebula.html), a
|
||||
content-addressable data store with capability-based security (SHA-256
|
||||
addressed blobs, UUID entries for versioning, proxy references for revocable
|
||||
access). Prototyped in multiple languages. The capability model is interesting
|
||||
but may be more sophistication than the platform needs — a simpler
|
||||
authenticated blob store with MCIAS integration might suffice.
|
||||
|
||||
### Overlay Network Management
|
||||
|
||||
The platform currently relies on an external overlay network (WireGuard,
|
||||
Tailscale, or similar) for node-to-node connectivity. A self-hosted WireGuard
|
||||
mesh manager would bring the overlay under Metacircular's control:
|
||||
|
||||
- Automate key exchange and peer configuration when MCP adds a node.
|
||||
- Manage IP allocation within the mesh (potentially absorbing part of MCNS's
|
||||
scope).
|
||||
- Remove the dependency on Tailscale's coordination servers.
|
||||
|
||||
This is a natural extension of the sovereignty principle but is low priority
|
||||
while the mesh is small enough to manage by hand.
|
||||
|
||||
### Hypervisor / Isolation
|
||||
|
||||
A deeper exploration of environment isolation, message-passing between
|
||||
services, and access mediation at a level below containers. Prior art:
|
||||
[hypervisor concept](https://metacircular.net/pages/hypervisor.html). The
|
||||
current platform achieves these goals through containers + MCIAS + policy
|
||||
engines. A hypervisor layer would push isolation down to the OS level —
|
||||
interesting for security but significant in scope. More relevant if the
|
||||
platform ever moves beyond containers to VM-based workloads.
|
||||
|
||||
### Prior Art: SYSGOV
|
||||
|
||||
[SYSGOV](https://metacircular.net/pages/lisp-dcos.html) was an earlier
|
||||
exploration of system management in Lisp, with SYSPLAN (desired state
|
||||
enforcement) and SYSMON (service management). Many of its research questions —
|
||||
C2 communication, service discovery, secure config distribution, failure
|
||||
handling — are directly addressed by MCP's design. MCP is the spiritual
|
||||
successor, reimplemented in Go with the benefit of the Metacircular platform
|
||||
underneath it.
|
||||
55399
docs/notebook.pdf
Normal file
55399
docs/notebook.pdf
Normal file
File diff suppressed because it is too large
Load Diff
851
engineering-standards.md
Normal file
851
engineering-standards.md
Normal file
@@ -0,0 +1,851 @@
|
||||
# Metacircular Dynamics — Engineering Standards
|
||||
|
||||
Source: https://metacircular.net/roam/20260314210051-metacircular_dynamics.html
|
||||
|
||||
This document describes the standard repository layout, tooling, and software
|
||||
development lifecycle (SDLC) for services built at Metacircular Dynamics. It
|
||||
incorporates the platform-wide project guidelines and codifies the conventions
|
||||
established in Metacrypt as the baseline for all services.
|
||||
|
||||
## Platform Rules
|
||||
|
||||
These four rules apply to every Metacircular service:
|
||||
|
||||
1. **Data Storage**: All service data goes in `/srv/<service>/` to enable
|
||||
straightforward migration across systems.
|
||||
2. **Deployment Architecture**: Services require systemd unit files but
|
||||
prioritize container-first design to support deployment via the
|
||||
Metacircular Control Plane (MCP).
|
||||
3. **Identity Management**: Services must integrate with MCIAS (Metacircular
|
||||
Identity and Access Service) for user management and access control. Three
|
||||
role levels: `admin` (full administrative access), `user` (full
|
||||
non-administrative access), `guest` (service-dependent restrictions).
|
||||
4. **API Design**: Services expose both gRPC and REST interfaces, kept in
|
||||
sync. Web UIs are built with htmx.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
0. [Platform Rules](#platform-rules)
|
||||
1. [Repository Layout](#repository-layout)
|
||||
2. [Language & Toolchain](#language--toolchain)
|
||||
3. [Build System](#build-system)
|
||||
4. [API Design](#api-design)
|
||||
5. [Authentication & Authorization](#authentication--authorization)
|
||||
6. [Database Conventions](#database-conventions)
|
||||
7. [Configuration](#configuration)
|
||||
8. [Web UI](#web-ui)
|
||||
9. [Testing](#testing)
|
||||
10. [Linting & Static Analysis](#linting--static-analysis)
|
||||
11. [Deployment](#deployment)
|
||||
12. [Documentation](#documentation)
|
||||
13. [Security](#security)
|
||||
14. [Development Workflow](#development-workflow)
|
||||
|
||||
---
|
||||
|
||||
## Repository Layout
|
||||
|
||||
Every service follows a consistent directory structure. Adjust the
|
||||
service-specific directories (e.g. `engines/` in Metacrypt) as appropriate,
|
||||
but the top-level skeleton is fixed.
|
||||
|
||||
```
|
||||
.
|
||||
├── cmd/
|
||||
│ ├── <service>/ CLI entry point (server, subcommands)
|
||||
│ └── <service>-web/ Web UI entry point (if separate binary)
|
||||
├── internal/
|
||||
│ ├── auth/ MCIAS integration (token validation, caching)
|
||||
│ ├── config/ TOML configuration loading & validation
|
||||
│ ├── db/ Database setup, schema migrations
|
||||
│ ├── server/ REST API server, routes, middleware
|
||||
│ ├── grpcserver/ gRPC server, interceptors, service handlers
|
||||
│ ├── webserver/ Web UI server, template routes, HTMX handlers
|
||||
│ └── <domain>/ Service-specific packages
|
||||
├── proto/<service>/
|
||||
│ └── v<N>/ Current proto definitions (start at v1;
|
||||
│ increment only on breaking changes)
|
||||
├── gen/<service>/
|
||||
│ └── v<N>/ Generated Go gRPC/protobuf code
|
||||
├── web/
|
||||
│ ├── embed.go //go:embed directive for templates and static
|
||||
│ ├── templates/ Go HTML templates
|
||||
│ └── static/ CSS, JS (htmx)
|
||||
├── deploy/
|
||||
│ ├── docker/ Docker Compose configuration
|
||||
│ ├── examples/ Example config files
|
||||
│ ├── scripts/ Install, backup, migration scripts
|
||||
│ └── systemd/ systemd unit files and timers
|
||||
├── docs/ Internal engineering documentation
|
||||
├── Dockerfile.api API server container (if split binary)
|
||||
├── Dockerfile.web Web UI container (if split binary)
|
||||
├── Makefile
|
||||
├── buf.yaml Protobuf linting & breaking-change config
|
||||
├── .golangci.yaml Linter configuration
|
||||
├── .gitignore
|
||||
├── CLAUDE.md AI-assisted development instructions
|
||||
├── ARCHITECTURE.md Full system specification
|
||||
└── <service>.toml.example Example configuration
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
|
||||
- **`cmd/`** contains only CLI wiring (cobra commands, flag parsing). No
|
||||
business logic.
|
||||
- **`internal/`** contains all service logic. Nothing in `internal/` is
|
||||
importable by other modules — this is enforced by Go's module system.
|
||||
- **`proto/`** is the source of truth for gRPC definitions. Generated code
|
||||
lives in `gen/`, never edited by hand. Versions start at `v1`; a new
|
||||
version directory is only created when a breaking change is required — not
|
||||
as a naming convention or initial setup step.
|
||||
- **`deploy/`** contains everything needed to run the service in production.
|
||||
A new engineer should be able to deploy from this directory alone.
|
||||
- **`web/`** is embedded into the binary via `//go:embed`. No external file
|
||||
dependencies at runtime.
|
||||
|
||||
### What Does Not Belong in the Repository
|
||||
|
||||
- Runtime data (databases, certificates, logs) — these live in `/srv/<service>`
|
||||
- Real configuration files with secrets — only examples are committed
|
||||
- IDE configuration (`.idea/`, `.vscode/`) — per-developer, not shared
|
||||
- Vendored dependencies — Go module proxy handles this
|
||||
|
||||
---
|
||||
|
||||
## Language & Toolchain
|
||||
|
||||
| Tool | Version | Purpose |
|
||||
|------|---------|---------|
|
||||
| Go | 1.25+ | Primary language |
|
||||
| protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation |
|
||||
| buf | Latest | Proto linting and breaking-change detection |
|
||||
| golangci-lint | v2 | Static analysis and linting |
|
||||
| Docker | Latest | Container builds |
|
||||
|
||||
### Go Conventions
|
||||
|
||||
- **Pure-Go dependencies** where possible. Avoid CGo — it complicates
|
||||
cross-compilation and container builds. Use `modernc.org/sqlite` instead
|
||||
of `mattn/go-sqlite3`.
|
||||
- **`CGO_ENABLED=0`** for all production builds. Statically linked binaries
|
||||
deploy cleanly to Alpine containers.
|
||||
- **Stripped binaries**: Build with `-trimpath -ldflags="-s -w"` to remove
|
||||
debug symbols and reduce image size.
|
||||
- **Version injection**: Pass `git describe --tags --always --dirty` via
|
||||
`-X main.version=...` at build time. Every binary must report its version.
|
||||
|
||||
### Module Path
|
||||
|
||||
Services hosted on `git.wntrmute.dev` use:
|
||||
|
||||
```
|
||||
git.wntrmute.dev/kyle/<service>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build System
|
||||
|
||||
Every repository has a Makefile with these standard targets:
|
||||
|
||||
```makefile
|
||||
.PHONY: build test vet lint proto-lint clean docker all
|
||||
|
||||
LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"
|
||||
|
||||
<service>:
|
||||
go build $(LDFLAGS) -o <service> ./cmd/<service>
|
||||
|
||||
build:
|
||||
go build ./...
|
||||
|
||||
test:
|
||||
go test ./...
|
||||
|
||||
vet:
|
||||
go vet ./...
|
||||
|
||||
lint:
|
||||
golangci-lint run ./...
|
||||
|
||||
proto:
|
||||
protoc --go_out=. --go_opt=module=<module> \
|
||||
--go-grpc_out=. --go-grpc_opt=module=<module> \
|
||||
proto/<service>/v2/*.proto
|
||||
|
||||
proto-lint:
|
||||
buf lint
|
||||
buf breaking --against '.git#branch=master,subdir=proto'
|
||||
|
||||
clean:
|
||||
rm -f <service>
|
||||
|
||||
docker:
|
||||
docker build -t <service> -f Dockerfile.api .
|
||||
|
||||
all: vet lint test <service>
|
||||
```
|
||||
|
||||
### Target Semantics
|
||||
|
||||
| Target | When to Run | CI Gate? |
|
||||
|--------|-------------|----------|
|
||||
| `vet` | Every change | Yes |
|
||||
| `lint` | Every change | Yes |
|
||||
| `test` | Every change | Yes |
|
||||
| `proto-lint` | Any proto change | Yes |
|
||||
| `proto` | After editing `.proto` files | No (manual) |
|
||||
| `all` | Pre-push verification | Yes |
|
||||
|
||||
The `all` target is the CI pipeline: `vet → lint → test → build`. If any
|
||||
step fails, the pipeline stops.
|
||||
|
||||
---
|
||||
|
||||
## API Design
|
||||
|
||||
Services expose two synchronized API surfaces:
|
||||
|
||||
### gRPC (Primary)
|
||||
|
||||
- Proto definitions live in `proto/<service>/v<N>/`, where N starts at 1.
|
||||
- **Versioning policy**: proto packages are versioned to protect existing
|
||||
clients from breaking changes. A new version directory (`v2/`, `v3/`, …)
|
||||
is only introduced when a breaking change is unavoidable. Non-breaking
|
||||
additions (new fields, new RPCs) are made in-place to the current version.
|
||||
- Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
|
||||
- Use `google.protobuf.Timestamp` for all time fields (not RFC 3339 strings).
|
||||
- Run `buf lint` and `buf breaking` against master before merging proto
|
||||
changes.
|
||||
|
||||
### REST (Secondary)
|
||||
|
||||
- JSON over HTTPS. Routes live in `internal/server/routes.go`.
|
||||
- Use `chi` for routing (lightweight, stdlib-compatible).
|
||||
- Standard error format: `{"error": "description"}`.
|
||||
- Standard HTTP status codes: `401` (unauthenticated), `403` (unauthorized),
|
||||
`412` (precondition failed), `503` (service unavailable).
|
||||
|
||||
### API Sync Rule
|
||||
|
||||
**Every REST endpoint must have a corresponding gRPC RPC, and vice versa.**
|
||||
When adding, removing, or changing an endpoint in either surface, the other
|
||||
must be updated in the same change. This is enforced in code review.
|
||||
|
||||
### gRPC Interceptors
|
||||
|
||||
Access control is enforced via interceptor maps, not per-handler checks:
|
||||
|
||||
| Map | Effect |
|
||||
|-----|--------|
|
||||
| `sealRequiredMethods` | Returns `UNAVAILABLE` if the service is sealed/locked |
|
||||
| `authRequiredMethods` | Validates MCIAS bearer token, populates caller info |
|
||||
| `adminRequiredMethods` | Requires admin role on the caller |
|
||||
|
||||
Adding a new RPC means adding it to the correct interceptor maps. Forgetting
|
||||
this is a security defect.
|
||||
|
||||
---
|
||||
|
||||
## Authentication & Authorization
|
||||
|
||||
### Authentication
|
||||
|
||||
All services delegate authentication to **MCIAS** (Metacircular Identity and
|
||||
Access Service). No service maintains its own user database.
|
||||
|
||||
- Client sends credentials to the service's `/v1/auth/login` endpoint.
|
||||
- The service forwards them to MCIAS via the client library
|
||||
(`git.wntrmute.dev/kyle/mcias/clients/go`).
|
||||
- On success, MCIAS returns a bearer token. The service returns it to the
|
||||
client and optionally sets it as a cookie for the web UI.
|
||||
- Subsequent requests include the token via `Authorization: Bearer <token>`
|
||||
header or cookie.
|
||||
- Token validation calls MCIAS `ValidateToken()`. Results should be cached
|
||||
(keyed by SHA-256 of the token) with a short TTL (30 seconds or less).
|
||||
|
||||
### Authorization
|
||||
|
||||
Three role levels:
|
||||
|
||||
| Role | Meaning |
|
||||
|------|---------|
|
||||
| `admin` | Full access to everything. Policy bypass. |
|
||||
| `user` | Access governed by policy rules. Default deny. |
|
||||
| `guest` | Service-dependent restrictions. Default deny. |
|
||||
|
||||
Admin detection is based solely on the MCIAS `admin` role. The service never
|
||||
promotes users locally.
|
||||
|
||||
Services that need fine-grained access control should implement a policy
|
||||
engine (priority-based ACL rules stored in encrypted storage, default deny,
|
||||
admin bypass). See Metacrypt's implementation as the reference.
|
||||
|
||||
---
|
||||
|
||||
## Database Conventions
|
||||
|
||||
### SQLite
|
||||
|
||||
SQLite is the default database for Metacircular services. It is simple to
|
||||
operate, requires no external processes, and backs up cleanly with
|
||||
`VACUUM INTO`.
|
||||
|
||||
Connection settings (applied at open time):
|
||||
|
||||
```go
|
||||
PRAGMA journal_mode = WAL;
|
||||
PRAGMA foreign_keys = ON;
|
||||
PRAGMA busy_timeout = 5000;
|
||||
```
|
||||
|
||||
File permissions: `0600`. Created by the service on first run.
|
||||
|
||||
### Migrations
|
||||
|
||||
- Migrations are Go functions registered in `internal/db/` and run
|
||||
sequentially at startup.
|
||||
- Each migration is idempotent — `CREATE TABLE IF NOT EXISTS`,
|
||||
`ALTER TABLE ... ADD COLUMN IF NOT EXISTS`.
|
||||
- Applied migrations are tracked in a `schema_migrations` table.
|
||||
- Never modify a migration that has been deployed. Add a new one.
|
||||
|
||||
### Backup
|
||||
|
||||
Every service must provide a `snapshot` CLI command that creates a consistent
|
||||
backup using `VACUUM INTO`. Automated backups run via a systemd timer
|
||||
(daily, with retention pruning).
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Format
|
||||
|
||||
TOML. Parsed with `go-toml/v2`. Environment variable overrides via
|
||||
`SERVICENAME_*` (e.g. `METACRYPT_SERVER_LISTEN_ADDR`).
|
||||
|
||||
### Standard Sections
|
||||
|
||||
```toml
|
||||
[server]
|
||||
listen_addr = ":8443" # HTTPS API
|
||||
grpc_addr = ":9443" # gRPC (optional; disabled if unset)
|
||||
tls_cert = "/srv/<service>/certs/cert.pem"
|
||||
tls_key = "/srv/<service>/certs/key.pem"
|
||||
|
||||
[web]
|
||||
listen_addr = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
|
||||
vault_grpc = "127.0.0.1:9443" # gRPC address of the API server
|
||||
vault_ca_cert = "" # CA cert for verifying API server TLS
|
||||
|
||||
[database]
|
||||
path = "/srv/<service>/<service>.db"
|
||||
|
||||
[mcias]
|
||||
server_url = "https://mcias.metacircular.net:8443"
|
||||
ca_cert = "" # Custom CA for MCIAS TLS
|
||||
service_name = "<service>" # This service's identity, as registered in MCIAS
|
||||
tags = [] # Tags sent with every login request (e.g. ["env:restricted"])
|
||||
# MCIAS evaluates auth:login policy against these tags,
|
||||
# enabling per-service login restrictions via policy rules.
|
||||
|
||||
[log]
|
||||
level = "info" # debug, info, warn, error
|
||||
```
|
||||
|
||||
#### Service context and login policy
|
||||
|
||||
`service_name` and `tags` in `[mcias]` are sent with every `POST /v1/auth/login`
|
||||
request. MCIAS evaluates the `auth:login` action with the resource set to
|
||||
`{service_name, tags}`. This allows operators to write deny rules that restrict
|
||||
which roles or account types can log into specific services.
|
||||
|
||||
Example: deny `guest` and `viewer` human accounts from any service tagged
|
||||
`env:restricted`:
|
||||
|
||||
```json
|
||||
{
|
||||
"effect": "deny",
|
||||
"roles": ["guest", "viewer"],
|
||||
"account_types": ["human"],
|
||||
"actions": ["auth:login"],
|
||||
"required_tags": ["env:restricted"]
|
||||
}
|
||||
```
|
||||
|
||||
A service can also be targeted by name instead of (or in addition to) tags:
|
||||
|
||||
```json
|
||||
{
|
||||
"effect": "deny",
|
||||
"roles": ["guest"],
|
||||
"actions": ["auth:login"],
|
||||
"service_names": ["meta-money-printer"]
|
||||
}
|
||||
```
|
||||
|
||||
MCIAS enforces the policy after credentials are verified; a policy-denied
|
||||
login returns HTTP 403 (not 401) so the client can distinguish a bad password
|
||||
from a service access restriction.
|
||||
|
||||
### Validation
|
||||
|
||||
Required fields are validated at startup. The service refuses to start if
|
||||
any are missing. Do not silently default required values.
|
||||
|
||||
### Data Directory
|
||||
|
||||
All runtime data lives in `/srv/<service>/`:
|
||||
|
||||
```
|
||||
/srv/<service>/
|
||||
├── <service>.toml Configuration
|
||||
├── <service>.db SQLite database
|
||||
├── certs/ TLS certificates
|
||||
└── backups/ Database snapshots
|
||||
```
|
||||
|
||||
This convention enables straightforward service migration between hosts:
|
||||
copy `/srv/<service>/` and the binary.
|
||||
|
||||
---
|
||||
|
||||
## Web UI
|
||||
|
||||
### Technology
|
||||
|
||||
- **Go `html/template`** for server-side rendering. No JavaScript frameworks.
|
||||
- **htmx** for dynamic interactions (form submission, partial page updates)
|
||||
without full page reloads.
|
||||
- Templates and static files are embedded in the binary via `//go:embed`.
|
||||
|
||||
### Structure
|
||||
|
||||
- `web/templates/layout.html` — shared HTML skeleton, navigation, CSS/JS
|
||||
includes. All page templates extend this.
|
||||
- Page templates: one `.html` file per page/feature.
|
||||
- `web/static/` — CSS, htmx. Keep this minimal.
|
||||
|
||||
### Architecture
|
||||
|
||||
The web UI runs as a separate binary (`<service>-web`) that communicates
|
||||
with the API server via its gRPC interface. This separation means:
|
||||
|
||||
- The web UI has no direct database access.
|
||||
- The API server enforces all authorization.
|
||||
- The web UI can be deployed independently or omitted entirely.
|
||||
|
||||
### Security
|
||||
|
||||
- CSRF protection via signed double-submit cookies on all mutating requests
|
||||
(POST/PUT/PATCH/DELETE).
|
||||
- Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`.
|
||||
- All user input is escaped by `html/template` (the default).
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Philosophy
|
||||
|
||||
Tests are written using the Go standard library `testing` package. No test
|
||||
frameworks (testify, gomega, etc.) — the standard library is sufficient and
|
||||
keeps dependencies minimal.
|
||||
|
||||
### Patterns
|
||||
|
||||
```go
|
||||
func TestFeatureName(t *testing.T) {
|
||||
// Setup: use t.TempDir() for isolated file system state.
|
||||
dir := t.TempDir()
|
||||
database, err := db.Open(filepath.Join(dir, "test.db"))
|
||||
if err != nil {
|
||||
t.Fatalf("open db: %v", err)
|
||||
}
|
||||
defer func() { _ = database.Close() }()
|
||||
db.Migrate(database)
|
||||
|
||||
// Exercise the code under test.
|
||||
// ...
|
||||
|
||||
// Assert with t.Fatal (not t.Error) for precondition failures.
|
||||
if !bytes.Equal(got, want) {
|
||||
t.Fatalf("got %q, want %q", got, want)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Guidelines
|
||||
|
||||
- **Use `t.TempDir()`** for all file-system state. Never write to fixed
|
||||
paths. Cleanup is automatic.
|
||||
- **Use `errors.Is`** for error assertions, not string comparison.
|
||||
- **No mocks for databases.** Tests use real SQLite databases created in
|
||||
temp directories. This catches migration bugs that mocks would hide.
|
||||
- **Test files** live alongside the code they test: `barrier.go` and
|
||||
`barrier_test.go` in the same package.
|
||||
- **Test helpers** call `t.Helper()` so failures report the caller's line.
|
||||
|
||||
### What to Test
|
||||
|
||||
| Layer | Test Strategy |
|
||||
|-------|---------------|
|
||||
| Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases |
|
||||
| Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access |
|
||||
| API handlers | Request/response correctness, auth enforcement, error codes |
|
||||
| Policy engine | Rule matching, priority ordering, default deny, admin bypass |
|
||||
| CLI commands | Flag parsing, output format (lightweight) |
|
||||
|
||||
---
|
||||
|
||||
## Linting & Static Analysis
|
||||
|
||||
### Configuration
|
||||
|
||||
Every repository includes a `.golangci.yaml` with this philosophy:
|
||||
**fail loudly for security and correctness; everything else is a warning.**
|
||||
|
||||
### Required Linters
|
||||
|
||||
| Linter | Category | Purpose |
|
||||
|--------|----------|---------|
|
||||
| `errcheck` | Correctness | Unhandled errors are silent failures |
|
||||
| `govet` | Correctness | Printf mismatches, unreachable code, suspicious constructs |
|
||||
| `ineffassign` | Correctness | Dead writes hide logic bugs |
|
||||
| `unused` | Correctness | Unused variables and functions |
|
||||
| `errorlint` | Error handling | Proper `errors.Is`/`errors.As` usage |
|
||||
| `gosec` | Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection |
|
||||
| `staticcheck` | Security | Deprecated APIs, mutex misuse, deep analysis |
|
||||
| `revive` | Style | Go naming conventions, error return ordering |
|
||||
| `gofmt` | Formatting | Standard Go formatting |
|
||||
| `goimports` | Formatting | Import grouping and ordering |
|
||||
|
||||
### Settings
|
||||
|
||||
- `errcheck`: `check-type-assertions: true` (catch `x.(*T)` without ok check).
|
||||
- `govet`: all analyzers enabled except `shadow` (too noisy for idiomatic Go).
|
||||
- `gosec`: severity and confidence set to `medium`. Exclude `G104` (overlaps
|
||||
with errcheck).
|
||||
- `max-issues-per-linter: 0` — report everything. No caps.
|
||||
- Test files: allow `G101` (hardcoded credentials) for test fixtures.
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
### Container-First
|
||||
|
||||
Services are designed for container deployment but must also run as native
|
||||
systemd services. Both paths are first-class.
|
||||
|
||||
### Docker
|
||||
|
||||
Multi-stage builds:
|
||||
|
||||
1. **Builder**: `golang:1.23-alpine`. Compile with `CGO_ENABLED=0`, strip
|
||||
symbols.
|
||||
2. **Runtime**: `alpine:3.21`. Non-root user (`<service>`), minimal attack
|
||||
surface.
|
||||
|
||||
If the service has separate API and web binaries, use separate Dockerfiles
|
||||
(`Dockerfile.api`, `Dockerfile.web`) and a `docker-compose.yml` that wires
|
||||
them together with a shared data volume.
|
||||
|
||||
### systemd
|
||||
|
||||
Every service ships with:
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `<service>.service` | Main service unit (API server) |
|
||||
| `<service>-web.service` | Web UI unit (if applicable) |
|
||||
| `<service>-backup.service` | Oneshot backup unit |
|
||||
| `<service>-backup.timer` | Daily backup timer (02:00 UTC, 5-minute jitter) |
|
||||
|
||||
#### Security Hardening
|
||||
|
||||
All service units must include these security directives:
|
||||
|
||||
```ini
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectControlGroups=true
|
||||
RestrictSUIDSGID=true
|
||||
RestrictNamespaces=true
|
||||
LockPersonality=true
|
||||
MemoryDenyWriteExecute=true
|
||||
RestrictRealtime=true
|
||||
ReadWritePaths=/srv/<service>
|
||||
```
|
||||
|
||||
The web UI unit should use `ReadOnlyPaths=/srv/<service>` instead of
|
||||
`ReadWritePaths` — it has no reason to write to the data directory.
|
||||
|
||||
### Install Script
|
||||
|
||||
`deploy/scripts/install.sh` handles:
|
||||
|
||||
1. Create system user/group (idempotent).
|
||||
2. Install binary to `/usr/local/bin/`.
|
||||
3. Create `/srv/<service>/` directory structure.
|
||||
4. Install example config if none exists.
|
||||
5. Install systemd units and reload the daemon.
|
||||
|
||||
### TLS
|
||||
|
||||
- **Minimum TLS version: 1.3.** No exceptions, no fallback cipher suites.
|
||||
Go's TLS 1.3 implementation manages cipher selection automatically.
|
||||
- **Timeouts**: read 30s, write 30s, idle 120s.
|
||||
- Certificate and key paths are required configuration — the service refuses
|
||||
to start without them.
|
||||
|
||||
### Graceful Shutdown
|
||||
|
||||
Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:
|
||||
|
||||
1. Stop accepting new connections.
|
||||
2. Drain in-flight requests (with a timeout).
|
||||
3. Clean up resources (close databases, zeroize secrets if applicable).
|
||||
4. Exit.
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
### Required Files
|
||||
|
||||
| File | Purpose | Audience |
|
||||
|------|---------|----------|
|
||||
| `README.md` | Project overview, quick-start, and contributor guide | Everyone |
|
||||
| `CLAUDE.md` | AI-assisted development context | Claude Code |
|
||||
| `ARCHITECTURE.md` | Full system specification | Engineers |
|
||||
| `RUNBOOK.md` | Operational procedures and incident response | Operators |
|
||||
| `deploy/examples/<service>.toml` | Example configuration | Operators |
|
||||
|
||||
### Suggested Files
|
||||
|
||||
These are not required for every project but should be created where applicable:
|
||||
|
||||
| File | When to Include | Purpose |
|
||||
|------|-----------------|---------|
|
||||
| `AUDIT.md` | Services handling cryptography, secrets, PII, or auth | Security audit findings with issue tracking and resolution status |
|
||||
| `POLICY.md` | Services with fine-grained access control | Policy engine documentation: rule structure, evaluation algorithm, resource paths, action classification, common patterns |
|
||||
|
||||
### README.md
|
||||
|
||||
The README is the front door. A new engineer or user should be able to
|
||||
understand what the service does and get it running from this file alone.
|
||||
It should contain:
|
||||
|
||||
- Project name and one-paragraph description.
|
||||
- Quick-start instructions (build, configure, run).
|
||||
- Link to `ARCHITECTURE.md` for full technical details.
|
||||
- Link to `RUNBOOK.md` for operational procedures.
|
||||
- License and contribution notes (if applicable).
|
||||
|
||||
Keep it concise. The README is not the spec — that's `ARCHITECTURE.md`.
|
||||
|
||||
### CLAUDE.md
|
||||
|
||||
This file provides context for AI-assisted development. It should contain:
|
||||
|
||||
- Project overview (one paragraph).
|
||||
- Build, test, and lint commands.
|
||||
- High-level architecture summary.
|
||||
- Project structure with directory descriptions.
|
||||
- Ignored directories (runtime data, generated code).
|
||||
- Critical rules (e.g. API sync requirements).
|
||||
|
||||
Keep it concise. AI tools read this on every interaction.
|
||||
|
||||
### ARCHITECTURE.md
|
||||
|
||||
This is the canonical specification for the service. It should cover:
|
||||
|
||||
1. System overview with a layered architecture diagram.
|
||||
2. Cryptographic design (if applicable): algorithms, key hierarchy.
|
||||
3. State machines and lifecycle (if applicable).
|
||||
4. Storage design.
|
||||
5. Authentication and authorization model.
|
||||
6. API surface (REST and gRPC, with tables of every endpoint).
|
||||
7. Web interface routes.
|
||||
8. Database schema (every table, every column).
|
||||
9. Configuration reference.
|
||||
10. Deployment guide.
|
||||
11. Security model: threat mitigations table and security invariants.
|
||||
12. Future work.
|
||||
|
||||
This document is the source of truth. When the code and the spec disagree,
|
||||
one of them has a bug.
|
||||
|
||||
### RUNBOOK.md
|
||||
|
||||
The runbook is written for operators, not developers. It covers what to do
|
||||
when things go wrong and how to perform routine maintenance. It should
|
||||
contain:
|
||||
|
||||
1. **Service overview** — what the service does, in one paragraph.
|
||||
2. **Health checks** — how to verify the service is healthy (endpoints,
|
||||
CLI commands, expected responses).
|
||||
3. **Common operations** — start, stop, restart, seal/unseal, backup,
|
||||
restore, log inspection.
|
||||
4. **Alerting** — what alerts exist, what they mean, and how to respond.
|
||||
5. **Incident procedures** — step-by-step playbooks for known failure
|
||||
modes (database corruption, certificate expiry, MCIAS outage, disk
|
||||
full, etc.).
|
||||
6. **Escalation** — when and how to escalate beyond the runbook.
|
||||
|
||||
Write runbook entries as numbered steps, not prose. An operator at 3 AM
|
||||
should be able to follow them without thinking.
|
||||
|
||||
### AUDIT.md (Suggested)
|
||||
|
||||
For services that handle cryptography, secrets, PII, or authentication,
|
||||
maintain a security audit log. Each finding gets a numbered entry with:
|
||||
|
||||
- Description of the issue.
|
||||
- Severity (critical, high, medium, low).
|
||||
- Resolution status: open, resolved (with summary), or accepted (with
|
||||
rationale for accepting the risk).
|
||||
|
||||
The priority summary table at the bottom provides a scannable overview.
|
||||
Resolved and accepted items are struck through but retained for history.
|
||||
See Metacrypt's `AUDIT.md` for the reference format.
|
||||
|
||||
### POLICY.md (Suggested)
|
||||
|
||||
For services with a policy engine or fine-grained access control, document
|
||||
the policy model separately from the architecture spec. It should cover:
|
||||
|
||||
- Rule structure (fields, types, semantics).
|
||||
- Evaluation algorithm (match logic, priority, default effect).
|
||||
- Resource path conventions and glob patterns.
|
||||
- Action classification.
|
||||
- API endpoints for policy CRUD.
|
||||
- Common policy patterns with examples.
|
||||
- Role summary (what each MCIAS role gets by default).
|
||||
|
||||
This document is aimed at administrators who need to write policy rules,
|
||||
not engineers who need to understand the implementation.
|
||||
|
||||
### Engine/Feature Design Documents
|
||||
|
||||
For services with a modular architecture, each module gets its own design
|
||||
document (e.g. `engines/sshca.md`). These are detailed implementation plans
|
||||
that include:
|
||||
|
||||
- Overview and core concepts.
|
||||
- Data model and storage layout.
|
||||
- Lifecycle (initialization, teardown).
|
||||
- Operations table with auth requirements.
|
||||
- API definitions (gRPC and REST).
|
||||
- Implementation steps (file-by-file).
|
||||
- Security considerations.
|
||||
- References to existing code patterns to follow.
|
||||
|
||||
Write these before writing code. They are the blueprint, not the afterthought.
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### General Principles
|
||||
|
||||
- **Default deny.** Unauthenticated requests are rejected. Unauthorized
|
||||
requests are rejected. If in doubt, deny.
|
||||
- **Fail closed.** If the service cannot verify authorization, it denies the
|
||||
request. If the database is unavailable, the service is unavailable.
|
||||
- **Least privilege.** Service processes run as non-root. systemd units
|
||||
restrict filesystem access, syscalls, and capabilities.
|
||||
- **No local user databases.** Authentication is always delegated to MCIAS.
|
||||
|
||||
### Cryptographic Standards
|
||||
|
||||
| Purpose | Algorithm | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation |
|
||||
| Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance |
|
||||
| Key derivation | Argon2id | Memory-hard; tune params to hardware |
|
||||
| Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 |
|
||||
| CSPRNG | `crypto/rand` | All keys, nonces, salts, tokens |
|
||||
| Constant-time comparison | `crypto/subtle` | All secret comparisons |
|
||||
|
||||
- **Never use RSA for new designs.** Ed25519 and ECDSA are faster, produce
|
||||
smaller keys, and have simpler security models.
|
||||
- **Zeroize secrets** from memory when they are no longer needed. Overwrite
|
||||
byte slices with zeros, nil out pointers.
|
||||
- **Never log secrets.** Keys, passwords, tokens, and plaintext must never
|
||||
appear in log output.
|
||||
|
||||
### Web Security
|
||||
|
||||
- CSRF tokens on all mutating requests.
|
||||
- `SameSite=Strict` on all cookies.
|
||||
- `html/template` for automatic escaping.
|
||||
- Validate all input at system boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Build and run both servers locally:
|
||||
make devserver
|
||||
|
||||
# Or build everything and run the full pipeline:
|
||||
make all
|
||||
```
|
||||
|
||||
The `devserver` target builds both binaries and runs them against a local
|
||||
config in `srv/`. The `srv/` directory is gitignored — it holds your local
|
||||
database, certificates, and configuration.
|
||||
|
||||
### Pre-Push Checklist
|
||||
|
||||
Before pushing a branch:
|
||||
|
||||
```bash
|
||||
make all # vet → lint → test → build
|
||||
make proto-lint # if proto files changed
|
||||
```
|
||||
|
||||
### Proto Changes
|
||||
|
||||
1. Edit `.proto` files in `proto/<service>/v2/`.
|
||||
2. Run `make proto` to regenerate Go code.
|
||||
3. Run `make proto-lint` to check for linting violations and breaking changes.
|
||||
4. Update REST routes to match the new/changed RPCs.
|
||||
5. Update gRPC interceptor maps for any new RPCs.
|
||||
6. Update `ARCHITECTURE.md` API tables.
|
||||
|
||||
### Adding a New Feature
|
||||
|
||||
1. **Design first.** Write or update the relevant design document. For a new
|
||||
engine or major subsystem, create a new doc in `docs/` or `engines/`.
|
||||
2. **Implement.** Follow existing patterns — the design doc should reference
|
||||
specific files and line numbers.
|
||||
3. **Test.** Write tests alongside the implementation.
|
||||
4. **Update docs.** Update `ARCHITECTURE.md`, `CLAUDE.md`, and route tables.
|
||||
5. **Verify.** Run `make all`.
|
||||
|
||||
### CLI Commands
|
||||
|
||||
Every service uses cobra for CLI commands. Standard subcommands:
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `server` | Start the service |
|
||||
| `init` | First-time setup (if applicable) |
|
||||
| `status` | Query a running instance's health |
|
||||
| `snapshot` | Create a database backup |
|
||||
|
||||
Add service-specific subcommands as needed (e.g. `migrate-aad`, `unseal`).
|
||||
Each command lives in its own file in `cmd/<service>/`.
|
||||
Reference in New Issue
Block a user