diff --git a/PLATFORM_EVOLUTION.md b/PLATFORM_EVOLUTION.md index fa2a32c..815a452 100644 --- a/PLATFORM_EVOLUTION.md +++ b/PLATFORM_EVOLUTION.md @@ -28,9 +28,12 @@ But the wiring between services is manual: placed in `/srv/mc-proxy/certs/`, and referenced by path in the mc-proxy config. - **DNS**: records are manually configured in MCNS zone files. -- **Container networking**: operators specify `network`, `user`, and - `restart` policy per component, even though these are almost always - the same values. +- **Container config boilerplate**: operators specify `network`, `user`, + `restart`, full image URLs, and port mappings per component, even + though these are almost always the same values. +- **mcdsl build wiring**: the shared library requires `replace` + directives or sibling directory tricks in Docker builds. It should + be a normally-versioned Go module fetched by the toolchain. Each new service requires touching 4-5 files across 3-4 repos. The process works but doesn't scale and is error-prone. @@ -43,11 +46,7 @@ want, not **how** to wire it: ```toml name = "metacrypt" node = "rift" -active = true -path = "metacrypt" - -[build] -uses_mcdsl = false +version = "v1.0.0" [build.images] metacrypt = "Dockerfile.api" @@ -55,9 +54,6 @@ metacrypt-web = "Dockerfile.web" [[components]] name = "api" -image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:v1.0.0" -volumes = ["/srv/metacrypt:/srv/metacrypt"] -cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"] [[components.routes]] name = "rest" @@ -71,20 +67,30 @@ mode = "l4" [[components]] name = "web" -image = "mcr.svc.mcp.metacircular.net:8443/metacrypt-web:v1.0.0" -volumes = ["/srv/metacrypt:/srv/metacrypt"] -cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"] [[components.routes]] -name = "web" port = 443 mode = "l7" ``` +Everything else is derived from conventions: + +- **Image name**: `` for the first/api component, + `-` for others. Resolved against the registry + URL from global MCP config (`~/.config/mcp/mcp.toml`). +- **Version**: the service-level `version` field applies to all + components. Can be overridden per-component when needed. +- **Volumes**: `/srv/:/srv/` is the agent default. + Only declare additional mounts. +- **Network, user, restart**: agent defaults (`mcpnet`, `0:0`, + `unless-stopped`). Override only when needed. +- **Source path**: defaults to `` relative to the workspace + root. Override with `path` if different. + `mcp deploy metacrypt` does the rest: -1. Agent assigns a free host port per route (random, check availability, - retry on collision). +1. Agent assigns a free host port per route (random, check + availability, retry on collision). 2. Agent requests TLS certs from Metacrypt CA for `metacrypt.svc.mcp.metacircular.net`. 3. Agent registers routes with mc-proxy via gRPC (mc-proxy persists @@ -128,16 +134,27 @@ hostname = "docs.metacircular.net" # optional, public DNS If `hostname` is omitted, the route uses the default `.svc.mcp.metacircular.net`. -### Fields Removed from Service Definitions +### Multi-Node Considerations -These become agent-level defaults or are derived automatically: +This design targets single-node (rift) but should not prevent +multi-node operation. Key design decisions that keep the door open: -| Field | Current | Target | -|-------|---------|--------| -| `ports` | Manual port mapping | Agent-assigned via routes | -| `network` | Per-component | Agent default (`mcpnet`) | -| `user` | Per-component | Agent default (`0:0`) | -| `restart` | Per-component | Agent default (`unless-stopped`) | +- **Port assignment is per-agent.** Each node's agent manages its own + port space. No cross-node coordination needed. +- **Route registration uses the node's address, not `127.0.0.1`.** + When mc-proxy and the service are on the same host, the backend is + loopback. When they're on different hosts, the backend is the node's + network address. The agent registers the appropriate address for its + node. The mc-proxy route API already accepts arbitrary backend + addresses. +- **DNS can have multiple A records.** MCNS can return multiple records + for the same hostname (one per node) for simple load distribution. +- **The CLI routes to the correct agent via the `node` field.** Adding + a second node is `mcp node add orion
` and then services + can target `node = "orion"`. + +Nothing in the single-node implementation should hardcode assumptions +about one node, one mc-proxy, or loopback-only backends. --- @@ -157,11 +174,30 @@ These become agent-level defaults or are derived automatically: | MCNS DNS serving | Working | | MCR container registry | Working | | Service definitions in ~/.config/mcp/services/ | Working | -| Image build pipeline (mcdeploy.toml, being folded into MCP) | Working | +| Image build pipeline (being folded into MCP) | Working | ### What needs to change -#### 1. MCP Agent: Port Assignment +#### 1. mcdsl: Proper Module Versioning + +**Gap**: mcdsl is used via `replace` directives and sibling directory +hacks. Docker builds require the source tree to be adjacent. This is +fragile and violates normal Go module conventions. + +**Work**: +- Tag mcdsl releases with semver (e.g., `v1.0.0`, `v1.1.0`). +- Remove all `replace` directives from consuming services' `go.mod` + files. Services import mcdsl by URL and version like any other + dependency. +- Docker builds fetch mcdsl via the Go module proxy / Gitea — no local + source tree required. +- `uses_mcdsl` is eliminated from service definitions and build config. + +**Depends on**: Gitea module hosting working correctly for +`git.wntrmute.dev/kyle/mcdsl` (it should already — Go modules over +git are standard). + +#### 2. MCP Agent: Port Assignment **Gap**: agent doesn't manage host ports. Service definitions specify them manually. @@ -177,19 +213,20 @@ them manually. **Depends on**: nothing (can be developed standalone). -#### 2. MCP Agent: mc-proxy Route Registration +#### 3. MCP Agent: mc-proxy Route Registration **Gap**: mc-proxy routes are static TOML. The gRPC admin API exists but MCP doesn't use it. **Work**: -- Agent calls mc-proxy gRPC API to register/remove routes on deploy/stop. -- Route registration includes: hostname, host port (agent-assigned), - mode (l4/l7), TLS cert paths. +- Agent calls mc-proxy gRPC API to register/remove routes on + deploy/stop. +- Route registration includes: hostname, backend address (node address + + assigned port), mode (l4/l7), TLS cert paths. -**Depends on**: port assignment (#1), mc-proxy route persistence (#4). +**Depends on**: port assignment (#2), mc-proxy route persistence (#5). -#### 3. MCP Agent: TLS Cert Provisioning +#### 4. MCP Agent: TLS Cert Provisioning **Gap**: certs are manually provisioned and placed on disk. There is no automated issuance flow. @@ -200,9 +237,9 @@ automated issuance flow. (`/srv/mc-proxy/certs/.pem`). - Cert renewal is handled automatically before expiry. -**Depends on**: Metacrypt cert issuance policy (#6). +**Depends on**: Metacrypt cert issuance policy (#7). -#### 4. mc-proxy: Route Persistence +#### 5. mc-proxy: Route Persistence **Gap**: mc-proxy loads routes from TOML on startup. Routes added via gRPC are lost on restart. @@ -210,13 +247,13 @@ gRPC are lost on restart. **Work**: - mc-proxy persists gRPC-managed routes in its SQLite database. - On startup, mc-proxy loads routes from the database. -- TOML route config is deprecated (kept for bootstrapping only, e.g., - mc-proxy's own routes before MCP is fully operational). -- mcproxyctl becomes the primary route management interface. +- TOML route config is vestigial — kept only for mc-proxy's own + bootstrap before MCP is operational. The gRPC API and mcproxyctl + are the primary route management interfaces going forward. **Depends on**: nothing (mc-proxy already has SQLite and gRPC API). -#### 5. MCP Agent: DNS Registration +#### 6. MCP Agent: DNS Registration **Gap**: DNS records are manually configured in MCNS zone files. @@ -225,9 +262,9 @@ gRPC are lost on restart. `.svc.mcp.metacircular.net`. - Agent removes records on service teardown. -**Depends on**: MCNS record management API (#7). +**Depends on**: MCNS record management API (#8). -#### 6. Metacrypt: Automated Cert Issuance Policy +#### 7. Metacrypt: Automated Cert Issuance Policy **Gap**: no policy exists for automated cert issuance. The MCP agent doesn't have a Metacrypt identity or permissions. @@ -235,13 +272,14 @@ doesn't have a Metacrypt identity or permissions. **Work**: - MCP agent gets an MCIAS service account. - Metacrypt policy allows this account to issue certs scoped to - `*.svc.mcp.metacircular.net` (and explicitly listed public hostnames). + `*.svc.mcp.metacircular.net` (and explicitly listed public + hostnames). - No wildcard certs — one cert per hostname per service. **Depends on**: MCIAS service account provisioning (exists today, just needs the account created). -#### 7. MCNS: Record Management API +#### 8. MCNS: Record Management API **Gap**: MCNS is a CoreDNS precursor serving static zone files. There is no API for dynamic record management. @@ -257,7 +295,7 @@ is no API for dynamic record management. wrapper, not a full service. This may be the right time to build the real MCNS. -#### 8. Application $PORT Convention +#### 9. Application $PORT Convention **Gap**: applications read listen addresses from their config files. They don't check `$PORT` env vars. @@ -279,32 +317,38 @@ The dependencies form a rough order: ``` Phase A — Independent groundwork (parallel): - #1 MCP agent port assignment - #4 mc-proxy route persistence - #8 $PORT convention in applications + #1 mcdsl proper module versioning + #2 MCP agent port assignment + #5 mc-proxy route persistence + #9 $PORT convention in applications Phase B — MCP route registration: - #2 Agent registers routes with mc-proxy - (depends on #1 + #4) + #3 Agent registers routes with mc-proxy + (depends on #2 + #5) Phase C — Automated TLS: - #6 Metacrypt cert issuance policy - #3 Agent provisions certs - (depends on #6) + #7 Metacrypt cert issuance policy + #4 Agent provisions certs + (depends on #7) Phase D — DNS: - #7 MCNS record management API - #5 Agent registers DNS - (depends on #7) + #8 MCNS record management API + #6 Agent registers DNS + (depends on #8) ``` -After Phase B, the manual steps are: cert provisioning and DNS. After -Phase C, only DNS remains manual. After Phase D, `mcp deploy` is fully -declarative. +After Phase A, mcdsl builds are clean and services can be deployed +with agent-assigned ports (manually registered in mc-proxy). -Each phase is independently useful. Phase A + B alone eliminates the -most common source of manual wiring errors (port assignment and mc-proxy -config). +After Phase B, the manual steps are: cert provisioning and DNS. This +is the biggest quality-of-life improvement — no more manual port +picking or mc-proxy TOML editing. + +After Phase C, only DNS remains manual. + +After Phase D, `mcp deploy` is fully declarative. + +Each phase is independently useful and deployable. --- @@ -317,14 +361,16 @@ config). in addition to the `.svc.mcp.metacircular.net` name. Public DNS is managed outside MCNS (Cloudflare? registrar?). How does the agent handle the split between internal and external DNS? -- **mc-proxy bootstrap**: mc-proxy itself needs routes to be reachable. - If routes are in SQLite, how does mc-proxy start before MCP configures - it? A small set of static bootstrap routes (or self-configuration) may - be needed. -- **Multi-node**: this design assumes single-node (rift). When a second - node is added, port assignment is still per-agent, but mc-proxy - routing, cert provisioning, and DNS need to account for multiple - backends. Not a v1 concern, but worth keeping in mind. +- **mc-proxy bootstrap**: mc-proxy itself is a service that needs to be + running before other services can be routed. Its own routes (if any) + may need to be self-configured or seeded from a minimal static config + at first start. Once operational, all route management goes through + the gRPC API. - **Rollback**: if cert provisioning fails mid-deploy, does the agent roll back the port assignment and mc-proxy route? What's the failure mode — partial deploy, full rollback, or best-effort? +- **Service discovery between components**: currently, components find + each other via config (e.g., mcr-web knows mcr-api's gRPC address). + With agent-assigned ports, components within a service need to + discover each other's ports. The agent could set additional env vars + (`$PEER_API_GRPC=127.0.0.1:9217`) or services could query the agent. diff --git a/engineering-standards.md b/engineering-standards.md index 4c608af..aa2f468 100644 --- a/engineering-standards.md +++ b/engineering-standards.md @@ -143,6 +143,40 @@ Services hosted on `git.wntrmute.dev` use: git.wntrmute.dev/kyle/ ``` +### Shared Libraries (mcdsl) + +The `mcdsl` module (`git.wntrmute.dev/kyle/mcdsl`) is the platform's +standard library — shared packages for auth, database, config, +HTTP/gRPC servers, CSRF, snapshots, and other cross-cutting concerns. + +mcdsl is a normal Go module, versioned and tagged per standard SDLC +conventions. Services import it like any other dependency: + +```go +import "git.wntrmute.dev/kyle/mcdsl/auth" +``` + +And reference it in `go.mod` with a tagged version: + +``` +require git.wntrmute.dev/kyle/mcdsl v1.2.0 +``` + +**Rules:** + +- mcdsl follows semver. Breaking changes require a major version bump. +- Services pin to a specific mcdsl version and upgrade deliberately. +- `replace` directives in `go.mod` are not permitted in committed code. + They are acceptable only during local development when iterating on + mcdsl and a consuming service simultaneously — they must be removed + before committing. +- Docker builds must not require the mcdsl source tree to be present. + The Go toolchain fetches the tagged module from Gitea like any other + dependency. +- When releasing a new mcdsl version, update consuming services in a + follow-up change — not atomically. Each service upgrades on its own + schedule. + --- ## Build System @@ -635,11 +669,7 @@ file defines a service with one or more container components: ```toml name = "metacrypt" node = "rift" -active = true -path = "metacrypt" - -[build] -uses_mcdsl = false +version = "v1.0.0" [build.images] metacrypt = "Dockerfile.api" @@ -647,29 +677,53 @@ metacrypt-web = "Dockerfile.web" [[components]] name = "api" -image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:v1.0.0" -network = "mcpnet" -user = "0:0" -restart = "unless-stopped" -ports = ["127.0.0.1:18443:8443", "127.0.0.1:19443:9443"] volumes = ["/srv/metacrypt:/srv/metacrypt"] -cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"] + +[[components.routes]] +name = "rest" +port = 8443 +mode = "l4" + +[[components.routes]] +name = "grpc" +port = 9443 +mode = "l4" + +[[components]] +name = "web" +volumes = ["/srv/metacrypt:/srv/metacrypt"] + +[[components.routes]] +port = 443 +mode = "l7" ``` +The service definition is intentionally minimal. Most fields are +derived from conventions: + +- **Image name**: `` for api components, `-` + for others. The registry URL comes from global MCP config. +- **Version**: service-level `version` applies to all components unless + overridden per-component. +- **Volumes**: `/srv/:/srv/` is the default; only + declare additional mounts. +- **Network, user, restart**: agent defaults (`mcpnet`, `0:0`, + `unless-stopped`); override only when needed. + Top-level fields: | Field | Purpose | |-------|---------| | `name` | Service name (matches the project name) | | `node` | Target host to deploy to | -| `active` | Whether MCP should keep this service running | -| `path` | Source directory relative to the workspace (for builds) | +| `version` | Image version tag (applies to all components) | +| `active` | Whether MCP should keep this service running (default: true) | +| `path` | Source directory relative to workspace (default: same as `name`) | Build fields: | Field | Purpose | |-------|---------| -| `build.uses_mcdsl` | Whether the build requires the mcdsl module | | `build.images.` | Maps image name to its Dockerfile path | Component fields: @@ -677,13 +731,19 @@ Component fields: | Field | Purpose | |-------|---------| | `name` | Component name within the service (e.g. `api`, `web`) | -| `image` | Full image reference including MCR registry and version tag | -| `network` | Podman network to attach to | -| `user` | Container user:group | -| `restart` | Restart policy | -| `ports` | Host-to-container port mappings | -| `volumes` | Host-to-container volume mounts | -| `cmd` | Command and arguments passed to the entrypoint | +| `image` | Image name override (default: derived from service/component name) | +| `version` | Version override for this component | +| `volumes` | Host-to-container volume mounts (in addition to default) | +| `cmd` | Command override (default: Dockerfile CMD) | + +Route fields: + +| Field | Purpose | +|-------|---------| +| `name` | Route name (used for `$PORT_` env var) | +| `port` | External port on mc-proxy | +| `mode` | `l4` (TLS passthrough) or `l7` (TLS termination) | +| `hostname` | Public hostname override (default: `.svc.mcp.metacircular.net`) | #### Convention