Refine platform evolution and engineering standards

PLATFORM_EVOLUTION.md: rewrite with convention-driven service
definitions (derived image names, service-level version, agent
defaults), mcdsl as a proper Go module (gap #1), multi-node
design considerations, and service discovery open question.

engineering-standards.md: add shared libraries section establishing
mcdsl as a normally-versioned Go module (no replace directives in
committed code), update service definition example to convention-
driven minimal format with route declarations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-27 00:19:08 -07:00
parent 1146606208
commit 1def85f1eb
2 changed files with 196 additions and 90 deletions

View File

@@ -28,9 +28,12 @@ But the wiring between services is manual:
placed in `/srv/mc-proxy/certs/`, and referenced by path in the placed in `/srv/mc-proxy/certs/`, and referenced by path in the
mc-proxy config. mc-proxy config.
- **DNS**: records are manually configured in MCNS zone files. - **DNS**: records are manually configured in MCNS zone files.
- **Container networking**: operators specify `network`, `user`, and - **Container config boilerplate**: operators specify `network`, `user`,
`restart` policy per component, even though these are almost always `restart`, full image URLs, and port mappings per component, even
the same values. though these are almost always the same values.
- **mcdsl build wiring**: the shared library requires `replace`
directives or sibling directory tricks in Docker builds. It should
be a normally-versioned Go module fetched by the toolchain.
Each new service requires touching 4-5 files across 3-4 repos. The Each new service requires touching 4-5 files across 3-4 repos. The
process works but doesn't scale and is error-prone. process works but doesn't scale and is error-prone.
@@ -43,11 +46,7 @@ want, not **how** to wire it:
```toml ```toml
name = "metacrypt" name = "metacrypt"
node = "rift" node = "rift"
active = true version = "v1.0.0"
path = "metacrypt"
[build]
uses_mcdsl = false
[build.images] [build.images]
metacrypt = "Dockerfile.api" metacrypt = "Dockerfile.api"
@@ -55,9 +54,6 @@ metacrypt-web = "Dockerfile.web"
[[components]] [[components]]
name = "api" name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:v1.0.0"
volumes = ["/srv/metacrypt:/srv/metacrypt"]
cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"]
[[components.routes]] [[components.routes]]
name = "rest" name = "rest"
@@ -71,20 +67,30 @@ mode = "l4"
[[components]] [[components]]
name = "web" name = "web"
image = "mcr.svc.mcp.metacircular.net:8443/metacrypt-web:v1.0.0"
volumes = ["/srv/metacrypt:/srv/metacrypt"]
cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"]
[[components.routes]] [[components.routes]]
name = "web"
port = 443 port = 443
mode = "l7" mode = "l7"
``` ```
Everything else is derived from conventions:
- **Image name**: `<service>` for the first/api component,
`<service>-<component>` for others. Resolved against the registry
URL from global MCP config (`~/.config/mcp/mcp.toml`).
- **Version**: the service-level `version` field applies to all
components. Can be overridden per-component when needed.
- **Volumes**: `/srv/<service>:/srv/<service>` is the agent default.
Only declare additional mounts.
- **Network, user, restart**: agent defaults (`mcpnet`, `0:0`,
`unless-stopped`). Override only when needed.
- **Source path**: defaults to `<service>` relative to the workspace
root. Override with `path` if different.
`mcp deploy metacrypt` does the rest: `mcp deploy metacrypt` does the rest:
1. Agent assigns a free host port per route (random, check availability, 1. Agent assigns a free host port per route (random, check
retry on collision). availability, retry on collision).
2. Agent requests TLS certs from Metacrypt CA for 2. Agent requests TLS certs from Metacrypt CA for
`metacrypt.svc.mcp.metacircular.net`. `metacrypt.svc.mcp.metacircular.net`.
3. Agent registers routes with mc-proxy via gRPC (mc-proxy persists 3. Agent registers routes with mc-proxy via gRPC (mc-proxy persists
@@ -128,16 +134,27 @@ hostname = "docs.metacircular.net" # optional, public DNS
If `hostname` is omitted, the route uses the default If `hostname` is omitted, the route uses the default
`<service>.svc.mcp.metacircular.net`. `<service>.svc.mcp.metacircular.net`.
### Fields Removed from Service Definitions ### Multi-Node Considerations
These become agent-level defaults or are derived automatically: This design targets single-node (rift) but should not prevent
multi-node operation. Key design decisions that keep the door open:
| Field | Current | Target | - **Port assignment is per-agent.** Each node's agent manages its own
|-------|---------|--------| port space. No cross-node coordination needed.
| `ports` | Manual port mapping | Agent-assigned via routes | - **Route registration uses the node's address, not `127.0.0.1`.**
| `network` | Per-component | Agent default (`mcpnet`) | When mc-proxy and the service are on the same host, the backend is
| `user` | Per-component | Agent default (`0:0`) | loopback. When they're on different hosts, the backend is the node's
| `restart` | Per-component | Agent default (`unless-stopped`) | network address. The agent registers the appropriate address for its
node. The mc-proxy route API already accepts arbitrary backend
addresses.
- **DNS can have multiple A records.** MCNS can return multiple records
for the same hostname (one per node) for simple load distribution.
- **The CLI routes to the correct agent via the `node` field.** Adding
a second node is `mcp node add orion <address>` and then services
can target `node = "orion"`.
Nothing in the single-node implementation should hardcode assumptions
about one node, one mc-proxy, or loopback-only backends.
--- ---
@@ -157,11 +174,30 @@ These become agent-level defaults or are derived automatically:
| MCNS DNS serving | Working | | MCNS DNS serving | Working |
| MCR container registry | Working | | MCR container registry | Working |
| Service definitions in ~/.config/mcp/services/ | Working | | Service definitions in ~/.config/mcp/services/ | Working |
| Image build pipeline (mcdeploy.toml, being folded into MCP) | Working | | Image build pipeline (being folded into MCP) | Working |
### What needs to change ### What needs to change
#### 1. MCP Agent: Port Assignment #### 1. mcdsl: Proper Module Versioning
**Gap**: mcdsl is used via `replace` directives and sibling directory
hacks. Docker builds require the source tree to be adjacent. This is
fragile and violates normal Go module conventions.
**Work**:
- Tag mcdsl releases with semver (e.g., `v1.0.0`, `v1.1.0`).
- Remove all `replace` directives from consuming services' `go.mod`
files. Services import mcdsl by URL and version like any other
dependency.
- Docker builds fetch mcdsl via the Go module proxy / Gitea — no local
source tree required.
- `uses_mcdsl` is eliminated from service definitions and build config.
**Depends on**: Gitea module hosting working correctly for
`git.wntrmute.dev/kyle/mcdsl` (it should already — Go modules over
git are standard).
#### 2. MCP Agent: Port Assignment
**Gap**: agent doesn't manage host ports. Service definitions specify **Gap**: agent doesn't manage host ports. Service definitions specify
them manually. them manually.
@@ -177,19 +213,20 @@ them manually.
**Depends on**: nothing (can be developed standalone). **Depends on**: nothing (can be developed standalone).
#### 2. MCP Agent: mc-proxy Route Registration #### 3. MCP Agent: mc-proxy Route Registration
**Gap**: mc-proxy routes are static TOML. The gRPC admin API exists but **Gap**: mc-proxy routes are static TOML. The gRPC admin API exists but
MCP doesn't use it. MCP doesn't use it.
**Work**: **Work**:
- Agent calls mc-proxy gRPC API to register/remove routes on deploy/stop. - Agent calls mc-proxy gRPC API to register/remove routes on
- Route registration includes: hostname, host port (agent-assigned), deploy/stop.
mode (l4/l7), TLS cert paths. - Route registration includes: hostname, backend address (node address
+ assigned port), mode (l4/l7), TLS cert paths.
**Depends on**: port assignment (#1), mc-proxy route persistence (#4). **Depends on**: port assignment (#2), mc-proxy route persistence (#5).
#### 3. MCP Agent: TLS Cert Provisioning #### 4. MCP Agent: TLS Cert Provisioning
**Gap**: certs are manually provisioned and placed on disk. There is no **Gap**: certs are manually provisioned and placed on disk. There is no
automated issuance flow. automated issuance flow.
@@ -200,9 +237,9 @@ automated issuance flow.
(`/srv/mc-proxy/certs/<service>.pem`). (`/srv/mc-proxy/certs/<service>.pem`).
- Cert renewal is handled automatically before expiry. - Cert renewal is handled automatically before expiry.
**Depends on**: Metacrypt cert issuance policy (#6). **Depends on**: Metacrypt cert issuance policy (#7).
#### 4. mc-proxy: Route Persistence #### 5. mc-proxy: Route Persistence
**Gap**: mc-proxy loads routes from TOML on startup. Routes added via **Gap**: mc-proxy loads routes from TOML on startup. Routes added via
gRPC are lost on restart. gRPC are lost on restart.
@@ -210,13 +247,13 @@ gRPC are lost on restart.
**Work**: **Work**:
- mc-proxy persists gRPC-managed routes in its SQLite database. - mc-proxy persists gRPC-managed routes in its SQLite database.
- On startup, mc-proxy loads routes from the database. - On startup, mc-proxy loads routes from the database.
- TOML route config is deprecated (kept for bootstrapping only, e.g., - TOML route config is vestigial — kept only for mc-proxy's own
mc-proxy's own routes before MCP is fully operational). bootstrap before MCP is operational. The gRPC API and mcproxyctl
- mcproxyctl becomes the primary route management interface. are the primary route management interfaces going forward.
**Depends on**: nothing (mc-proxy already has SQLite and gRPC API). **Depends on**: nothing (mc-proxy already has SQLite and gRPC API).
#### 5. MCP Agent: DNS Registration #### 6. MCP Agent: DNS Registration
**Gap**: DNS records are manually configured in MCNS zone files. **Gap**: DNS records are manually configured in MCNS zone files.
@@ -225,9 +262,9 @@ gRPC are lost on restart.
`<service>.svc.mcp.metacircular.net`. `<service>.svc.mcp.metacircular.net`.
- Agent removes records on service teardown. - Agent removes records on service teardown.
**Depends on**: MCNS record management API (#7). **Depends on**: MCNS record management API (#8).
#### 6. Metacrypt: Automated Cert Issuance Policy #### 7. Metacrypt: Automated Cert Issuance Policy
**Gap**: no policy exists for automated cert issuance. The MCP agent **Gap**: no policy exists for automated cert issuance. The MCP agent
doesn't have a Metacrypt identity or permissions. doesn't have a Metacrypt identity or permissions.
@@ -235,13 +272,14 @@ doesn't have a Metacrypt identity or permissions.
**Work**: **Work**:
- MCP agent gets an MCIAS service account. - MCP agent gets an MCIAS service account.
- Metacrypt policy allows this account to issue certs scoped to - Metacrypt policy allows this account to issue certs scoped to
`*.svc.mcp.metacircular.net` (and explicitly listed public hostnames). `*.svc.mcp.metacircular.net` (and explicitly listed public
hostnames).
- No wildcard certs — one cert per hostname per service. - No wildcard certs — one cert per hostname per service.
**Depends on**: MCIAS service account provisioning (exists today, just **Depends on**: MCIAS service account provisioning (exists today, just
needs the account created). needs the account created).
#### 7. MCNS: Record Management API #### 8. MCNS: Record Management API
**Gap**: MCNS is a CoreDNS precursor serving static zone files. There **Gap**: MCNS is a CoreDNS precursor serving static zone files. There
is no API for dynamic record management. is no API for dynamic record management.
@@ -257,7 +295,7 @@ is no API for dynamic record management.
wrapper, not a full service. This may be the right time to build the wrapper, not a full service. This may be the right time to build the
real MCNS. real MCNS.
#### 8. Application $PORT Convention #### 9. Application $PORT Convention
**Gap**: applications read listen addresses from their config files. **Gap**: applications read listen addresses from their config files.
They don't check `$PORT` env vars. They don't check `$PORT` env vars.
@@ -279,32 +317,38 @@ The dependencies form a rough order:
``` ```
Phase A — Independent groundwork (parallel): Phase A — Independent groundwork (parallel):
#1 MCP agent port assignment #1 mcdsl proper module versioning
#4 mc-proxy route persistence #2 MCP agent port assignment
#8 $PORT convention in applications #5 mc-proxy route persistence
#9 $PORT convention in applications
Phase B — MCP route registration: Phase B — MCP route registration:
#2 Agent registers routes with mc-proxy #3 Agent registers routes with mc-proxy
(depends on #1 + #4) (depends on #2 + #5)
Phase C — Automated TLS: Phase C — Automated TLS:
#6 Metacrypt cert issuance policy #7 Metacrypt cert issuance policy
#3 Agent provisions certs #4 Agent provisions certs
(depends on #6) (depends on #7)
Phase D — DNS: Phase D — DNS:
#7 MCNS record management API #8 MCNS record management API
#5 Agent registers DNS #6 Agent registers DNS
(depends on #7) (depends on #8)
``` ```
After Phase B, the manual steps are: cert provisioning and DNS. After After Phase A, mcdsl builds are clean and services can be deployed
Phase C, only DNS remains manual. After Phase D, `mcp deploy` is fully with agent-assigned ports (manually registered in mc-proxy).
declarative.
Each phase is independently useful. Phase A + B alone eliminates the After Phase B, the manual steps are: cert provisioning and DNS. This
most common source of manual wiring errors (port assignment and mc-proxy is the biggest quality-of-life improvement — no more manual port
config). picking or mc-proxy TOML editing.
After Phase C, only DNS remains manual.
After Phase D, `mcp deploy` is fully declarative.
Each phase is independently useful and deployable.
--- ---
@@ -317,14 +361,16 @@ config).
in addition to the `.svc.mcp.metacircular.net` name. Public DNS is in addition to the `.svc.mcp.metacircular.net` name. Public DNS is
managed outside MCNS (Cloudflare? registrar?). How does the agent managed outside MCNS (Cloudflare? registrar?). How does the agent
handle the split between internal and external DNS? handle the split between internal and external DNS?
- **mc-proxy bootstrap**: mc-proxy itself needs routes to be reachable. - **mc-proxy bootstrap**: mc-proxy itself is a service that needs to be
If routes are in SQLite, how does mc-proxy start before MCP configures running before other services can be routed. Its own routes (if any)
it? A small set of static bootstrap routes (or self-configuration) may may need to be self-configured or seeded from a minimal static config
be needed. at first start. Once operational, all route management goes through
- **Multi-node**: this design assumes single-node (rift). When a second the gRPC API.
node is added, port assignment is still per-agent, but mc-proxy
routing, cert provisioning, and DNS need to account for multiple
backends. Not a v1 concern, but worth keeping in mind.
- **Rollback**: if cert provisioning fails mid-deploy, does the agent - **Rollback**: if cert provisioning fails mid-deploy, does the agent
roll back the port assignment and mc-proxy route? What's the failure roll back the port assignment and mc-proxy route? What's the failure
mode — partial deploy, full rollback, or best-effort? mode — partial deploy, full rollback, or best-effort?
- **Service discovery between components**: currently, components find
each other via config (e.g., mcr-web knows mcr-api's gRPC address).
With agent-assigned ports, components within a service need to
discover each other's ports. The agent could set additional env vars
(`$PEER_API_GRPC=127.0.0.1:9217`) or services could query the agent.

View File

@@ -143,6 +143,40 @@ Services hosted on `git.wntrmute.dev` use:
git.wntrmute.dev/kyle/<service> git.wntrmute.dev/kyle/<service>
``` ```
### Shared Libraries (mcdsl)
The `mcdsl` module (`git.wntrmute.dev/kyle/mcdsl`) is the platform's
standard library — shared packages for auth, database, config,
HTTP/gRPC servers, CSRF, snapshots, and other cross-cutting concerns.
mcdsl is a normal Go module, versioned and tagged per standard SDLC
conventions. Services import it like any other dependency:
```go
import "git.wntrmute.dev/kyle/mcdsl/auth"
```
And reference it in `go.mod` with a tagged version:
```
require git.wntrmute.dev/kyle/mcdsl v1.2.0
```
**Rules:**
- mcdsl follows semver. Breaking changes require a major version bump.
- Services pin to a specific mcdsl version and upgrade deliberately.
- `replace` directives in `go.mod` are not permitted in committed code.
They are acceptable only during local development when iterating on
mcdsl and a consuming service simultaneously — they must be removed
before committing.
- Docker builds must not require the mcdsl source tree to be present.
The Go toolchain fetches the tagged module from Gitea like any other
dependency.
- When releasing a new mcdsl version, update consuming services in a
follow-up change — not atomically. Each service upgrades on its own
schedule.
--- ---
## Build System ## Build System
@@ -635,11 +669,7 @@ file defines a service with one or more container components:
```toml ```toml
name = "metacrypt" name = "metacrypt"
node = "rift" node = "rift"
active = true version = "v1.0.0"
path = "metacrypt"
[build]
uses_mcdsl = false
[build.images] [build.images]
metacrypt = "Dockerfile.api" metacrypt = "Dockerfile.api"
@@ -647,29 +677,53 @@ metacrypt-web = "Dockerfile.web"
[[components]] [[components]]
name = "api" name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:v1.0.0"
network = "mcpnet"
user = "0:0"
restart = "unless-stopped"
ports = ["127.0.0.1:18443:8443", "127.0.0.1:19443:9443"]
volumes = ["/srv/metacrypt:/srv/metacrypt"] volumes = ["/srv/metacrypt:/srv/metacrypt"]
cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"]
[[components.routes]]
name = "rest"
port = 8443
mode = "l4"
[[components.routes]]
name = "grpc"
port = 9443
mode = "l4"
[[components]]
name = "web"
volumes = ["/srv/metacrypt:/srv/metacrypt"]
[[components.routes]]
port = 443
mode = "l7"
``` ```
The service definition is intentionally minimal. Most fields are
derived from conventions:
- **Image name**: `<service>` for api components, `<service>-<component>`
for others. The registry URL comes from global MCP config.
- **Version**: service-level `version` applies to all components unless
overridden per-component.
- **Volumes**: `/srv/<service>:/srv/<service>` is the default; only
declare additional mounts.
- **Network, user, restart**: agent defaults (`mcpnet`, `0:0`,
`unless-stopped`); override only when needed.
Top-level fields: Top-level fields:
| Field | Purpose | | Field | Purpose |
|-------|---------| |-------|---------|
| `name` | Service name (matches the project name) | | `name` | Service name (matches the project name) |
| `node` | Target host to deploy to | | `node` | Target host to deploy to |
| `active` | Whether MCP should keep this service running | | `version` | Image version tag (applies to all components) |
| `path` | Source directory relative to the workspace (for builds) | | `active` | Whether MCP should keep this service running (default: true) |
| `path` | Source directory relative to workspace (default: same as `name`) |
Build fields: Build fields:
| Field | Purpose | | Field | Purpose |
|-------|---------| |-------|---------|
| `build.uses_mcdsl` | Whether the build requires the mcdsl module |
| `build.images.<name>` | Maps image name to its Dockerfile path | | `build.images.<name>` | Maps image name to its Dockerfile path |
Component fields: Component fields:
@@ -677,13 +731,19 @@ Component fields:
| Field | Purpose | | Field | Purpose |
|-------|---------| |-------|---------|
| `name` | Component name within the service (e.g. `api`, `web`) | | `name` | Component name within the service (e.g. `api`, `web`) |
| `image` | Full image reference including MCR registry and version tag | | `image` | Image name override (default: derived from service/component name) |
| `network` | Podman network to attach to | | `version` | Version override for this component |
| `user` | Container user:group | | `volumes` | Host-to-container volume mounts (in addition to default) |
| `restart` | Restart policy | | `cmd` | Command override (default: Dockerfile CMD) |
| `ports` | Host-to-container port mappings |
| `volumes` | Host-to-container volume mounts | Route fields:
| `cmd` | Command and arguments passed to the entrypoint |
| Field | Purpose |
|-------|---------|
| `name` | Route name (used for `$PORT_<NAME>` env var) |
| `port` | External port on mc-proxy |
| `mode` | `l4` (TLS passthrough) or `l7` (TLS termination) |
| `hostname` | Public hostname override (default: `<service>.svc.mcp.metacircular.net`) |
#### Convention #### Convention