Sync platform docs: Phases C+D complete, Phase E planned #5

Merged
kyle merged 7 commits from docs/platform-evolution-sync into master 2026-03-29 06:21:22 +00:00
3 changed files with 200 additions and 77 deletions
Showing only changes of commit 4722fdb0da - Show all commits

View File

@@ -5,7 +5,7 @@ from its current manually-wired state to fully declarative deployment.
It is a living design document — not a spec, not a commitment, but a It is a living design document — not a spec, not a commitment, but a
record of where we are, where we want to be, and what's between. record of where we are, where we want to be, and what's between.
Last updated: 2026-03-27 (Phases A + B + C complete) Last updated: 2026-03-28 (Phases A + B + C + D complete)
--- ---
@@ -239,16 +239,16 @@ mc-proxy routes are fully persisted in SQLite and survive restarts:
bootstrap before MCP is operational. The gRPC API and mcproxyctl bootstrap before MCP is operational. The gRPC API and mcproxyctl
are the primary route management interfaces going forward. are the primary route management interfaces going forward.
#### 6. MCP Agent: DNS Registration #### 6. MCP Agent: DNS Registration — DONE
**Gap**: DNS records are manually configured in MCNS zone files. Agent automatically manages DNS records during deploy and stop:
- Deploy: calls MCNS API to create/update A records for
**Work**: `<service>.svc.mcp.metacircular.net` pointing to the node's address.
- Agent creates/updates A records in MCNS for - Stop/undeploy: removes DNS records before stopping containers.
`<service>.svc.mcp.metacircular.net`. - Config: `[mcns]` section in agent config with server URL, CA cert,
- Agent removes records on service teardown. token path, zone, and node address.
- Nil-safe: if MCNS not configured, silently skipped (backward compatible).
**Depends on**: MCNS record management API (#8). - Authorization: mcp-agent system account can manage any record name.
#### 7. Metacrypt: Automated Cert Issuance Policy — DONE #### 7. Metacrypt: Automated Cert Issuance Policy — DONE
@@ -259,31 +259,29 @@ issuance:
`*.svc.mcp.metacircular.net` `*.svc.mcp.metacircular.net`
- One cert per hostname per service — no wildcard certs - One cert per hostname per service — no wildcard certs
#### 8. MCNS: Record Management API #### 8. MCNS: Record Management API — DONE
**Gap**: MCNS v1.0.0 has REST + gRPC APIs and SQLite storage, but MCNS provides full CRUD for DNS records via REST and gRPC:
records are currently seeded from migrations (static). The API supports - REST: POST/GET/PUT/DELETE on `/v1/zones/{zone}/records`
CRUD operations but MCP does not yet call it for dynamic registration. - gRPC: RecordService with ListRecords, CreateRecord, GetRecord,
UpdateRecord, DeleteRecord RPCs
**Work**: - SQLite-backed with transactional writes, CNAME exclusivity enforcement,
- MCP agent calls MCNS API to create/update/delete records on and automatic SOA serial bumping on mutations
deploy/stop. - Authorization: admin can manage any record, mcp-agent system account
- MCIAS auth scoping to allow MCP agent to manage can manage any record name, other system accounts scoped to own name
`*.svc.mcp.metacircular.net` records. - MCP agent uses the REST API to register/deregister records on
deploy/stop
**Depends on**: MCNS API exists. Remaining work is MCP integration
and auth scoping.
#### 9. Application $PORT Convention — DONE #### 9. Application $PORT Convention — DONE
mcdsl v1.2.0 adds `$PORT` and `$PORT_GRPC` env var support: mcdsl v1.2.0 added `$PORT` and `$PORT_GRPC` env var support:
- `config.Load` checks `$PORT` → overrides `Server.ListenAddr` - `config.Load` checks `$PORT` → overrides `Server.ListenAddr`
- `config.Load` checks `$PORT_GRPC` → overrides `Server.GRPCAddr` - `config.Load` checks `$PORT_GRPC` → overrides `Server.GRPCAddr`
- Takes precedence over TOML and generic env overrides - Takes precedence over TOML and generic env overrides
(`$MCR_SERVER_LISTEN_ADDR`) — agent-assigned ports are authoritative (`$MCR_SERVER_LISTEN_ADDR`) — agent-assigned ports are authoritative
- Handles both `config.Base` embedding (MCR, MCNS, MCAT) and direct - Handles both `config.Base` embedding (MCR, MCNS, MCAT) and direct
`ServerConfig` embedding (Metacrypt) via struct tree walking `ServerConfig` embedding (Metacrypt) via struct tree walking
- All consuming services upgraded to mcdsl v1.2.0 - All consuming services on mcdsl v1.4.0
--- ---
@@ -306,26 +304,85 @@ Phase C — Automated TLS: ✓ COMPLETE
#4 Agent provisions certs ✓ DONE #4 Agent provisions certs ✓ DONE
(depends on #7) (depends on #7)
Phase D — DNS: Phase D — DNS: ✓ COMPLETE
#8 MCNS record management API #8 MCNS record management API ✓ DONE
#6 Agent registers DNS #6 Agent registers DNS ✓ DONE
(depends on #8) (depends on #8)
Phase E — Multi-node agent management:
#10 Agent binary at /srv/mcp/mcp-agent on all nodes
#11 mcp agent upgrade (SSH-based cross-compiled push)
#12 Node provisioning tooling (Debian + NixOS)
(depends on #10)
``` ```
**Phases A, B, and C are complete.** Services can be deployed with **Phases A, B, C, and D are complete.** Services can be deployed with
agent-assigned ports, `$PORT` env vars, automatic mc-proxy route agent-assigned ports, `$PORT` env vars, automatic mc-proxy route
registration, and automated TLS cert provisioning from Metacrypt CA. registration, automated TLS cert provisioning from Metacrypt CA, and
No more manual port picking, mcproxyctl, TOML editing, or cert generation. automatic DNS registration in MCNS. No more manual port picking,
mcproxyctl, TOML editing, cert generation, or DNS zone editing.
The only remaining manual step is DNS registration (Phase D).
### Immediate Next Steps ### Immediate Next Steps
1. **Phase D: DNS** — MCNS record management API integration, then 1. **Phase E: Multi-node agent management** — see below.
agent registers DNS records during deploy.
2. **mcdoc implementation** — fully designed, no platform evolution 2. **mcdoc implementation** — fully designed, no platform evolution
dependency. Deployable now with the new route system. dependency. Deployable now with the new route system.
#### 10. Agent Binary Location Convention
**Gap**: The agent binary is currently NixOS-managed on rift (lives in
`/nix/store/`, systemd `ExecStart` points there). This doesn't work for
Debian nodes and requires a full `nixos-rebuild` for every MCP release.
**Work**:
- Standardize agent binary at `/srv/mcp/mcp-agent` on all nodes.
- NixOS config: change `ExecStart` from nix store path to
`/srv/mcp/mcp-agent`. NixOS still owns user, systemd unit, podman,
directories — just not the binary version.
- Debian nodes: same layout, provisioned by setup script.
#### 11. Agent Upgrade via SSH Push
**Gap**: Updating the agent requires manual, OS-specific steps. On
NixOS: update flake lock, commit, push, rebuild. On Debian: build, scp,
restart. With multiple nodes and architectures (amd64 + arm64), this
doesn't scale.
**Work**:
- `mcp agent upgrade [node]` CLI command.
- Cross-compiles agent for each target arch (`GOARCH` from node config).
- Uses `golang.org/x/crypto/ssh` to push the binary and restart the
service. No external tool dependencies.
- Node config gains `ssh` (hostname) and `arch` (GOARCH) fields.
- Upgrades all nodes by default to prevent version skew. New RPCs cause
`Unimplemented` errors if agent and CLI are out of sync.
**Depends on**: #10 (binary location convention).
#### 12. Node Provisioning Tooling
**Gap**: Setting up a new node requires manual steps: create user,
create directories, install podman, write config, create systemd unit.
Different for NixOS vs Debian.
**Work**:
- Go-based provisioning tool (part of MCP CLI) or standalone script.
- `mcp node provision <name>` SSHs to the node and runs setup:
create `mcp` user with podman access, create `/srv/mcp/`, write
systemd unit, install initial binary, start service.
- For NixOS, provisioning remains in the NixOS config (declarative).
The provisioning tool targets Debian/generic Linux.
**Depends on**: #10 (binary location convention), #11 (SSH infra).
**Current fleet**:
| Node | OS | Arch | Status |
|------|----|------|--------|
| rift | NixOS | amd64 | Operational, single MCP agent |
| hyperborea | Debian (RPi) | arm64 | Online, needs agent provisioning |
| svc | Debian | amd64 | Runs MCIAS, needs agent for public edge services |
--- ---
## Open Questions ## Open Questions

View File

@@ -1,6 +1,6 @@
# Metacircular Platform Status # Metacircular Platform Status
Last updated: 2026-03-27 Last updated: 2026-03-28
## Platform Overview ## Platform Overview
@@ -8,28 +8,30 @@ One node operational (**rift**), running core infrastructure services as
containers fronted by MC-Proxy. MCIAS runs separately (not on rift). containers fronted by MC-Proxy. MCIAS runs separately (not on rift).
Bootstrap phases 04 complete (MCIAS, Metacrypt, MC-Proxy, MCR all Bootstrap phases 04 complete (MCIAS, Metacrypt, MC-Proxy, MCR all
operational). MCP is deployed and managing all platform containers. MCNS is operational). MCP is deployed and managing all platform containers. MCNS is
deployed on rift, serving authoritative DNS. deployed on rift, serving authoritative DNS. Platform evolution Phases AD
complete (automated port assignment, route registration, TLS cert
provisioning, and DNS registration). Multi-node deployment is being planned
(Phase E).
## Service Status ## Service Status
| Service | Version | SDLC Phase | Deployed | Node | | Service | Version | SDLC Phase | Deployed | Node |
|---------|---------|------------|----------|------| |---------|---------|------------|----------|------|
| MCIAS | v1.8.0 | Maintenance | Yes | (separate) | | MCIAS | v1.9.0 | Maintenance | Yes | (separate) |
| Metacrypt | v1.1.0 | Production | Yes | rift | | Metacrypt | v1.3.1 | Production | Yes | rift |
| MC-Proxy | v1.2.1 | Maintenance | Yes | rift | | MC-Proxy | v1.2.1 | Maintenance | Yes | rift |
| MCR | v1.2.0 | Production | Yes | rift | | MCR | v1.2.1 | Production | Yes | rift |
| MCAT | v1.1.0 | Complete | Unknown | — | | MCAT | v1.1.1 | Complete | Unknown | — |
| MCDSL | v1.2.0 | Stable | N/A (library) | — | | MCDSL | v1.4.0 | Stable | N/A (library) | — |
| MCNS | v1.1.0 | Production | Yes | rift | | MCNS | v1.1.1 | Production | Yes | rift |
| MCDoc | v0.1.0 | Production | Yes | rift | | MCDoc | v0.1.0 | Production | Yes | rift |
| MCP | v0.4.0 | Production | Yes | rift | | MCP | v0.7.6 | Production | Yes | rift |
| MCDeploy | v0.2.0 | Active dev | N/A (CLI tool) | — |
## Service Details ## Service Details
### MCIAS — Identity and Access Service ### MCIAS — Identity and Access Service
- **Version:** v1.8.0 (client library: clients/go/v0.2.0) - **Version:** v1.9.0 (client library: clients/go/v0.2.0)
- **Phase:** Maintenance. Phases 0-14 complete. Feature-complete with active - **Phase:** Maintenance. Phases 0-14 complete. Feature-complete with active
refinement. refinement.
- **Deployment:** Running in production. All other services authenticate - **Deployment:** Running in production. All other services authenticate
@@ -41,7 +43,7 @@ deployed on rift, serving authoritative DNS.
### Metacrypt — Cryptographic Service Engine ### Metacrypt — Cryptographic Service Engine
- **Version:** v1.1.0. - **Version:** v1.3.1.
- **Phase:** Production. All four engine types implemented (CA, SSH CA, transit, - **Phase:** Production. All four engine types implemented (CA, SSH CA, transit,
user-to-user). Active work on integration test coverage. user-to-user). Active work on integration test coverage.
- **Deployment:** Running on rift as a container, fronted by MC-Proxy on - **Deployment:** Running on rift as a container, fronted by MC-Proxy on
@@ -56,7 +58,8 @@ deployed on rift, serving authoritative DNS.
- **Version:** v1.2.1. - **Version:** v1.2.1.
- **Phase:** Maintenance. Stable and actively routing traffic on rift. - **Phase:** Maintenance. Stable and actively routing traffic on rift.
- **Deployment:** Running on rift. Fronts Metacrypt, MCR, and sgard on ports - **Deployment:** Running on rift. Fronts Metacrypt, MCR, and sgard on ports
443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091. 443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091. Routes persisted
in SQLite and managed via gRPC API.
- **Recent work:** Route persistence (SQLite), idempotent AddRoute (upsert), - **Recent work:** Route persistence (SQLite), idempotent AddRoute (upsert),
golangci-lint v2 compliance, module path migration to mc/ org. golangci-lint v2 compliance, module path migration to mc/ org.
- **Artifacts:** systemd units (service + backup timer), Docker Compose - **Artifacts:** systemd units (service + backup timer), Docker Compose
@@ -64,7 +67,7 @@ deployed on rift, serving authoritative DNS.
### MCR — Container Registry ### MCR — Container Registry
- **Version:** v1.2.0. All implementation phases complete. - **Version:** v1.2.1. All implementation phases complete.
- **Phase:** Production. Deployed on rift, serving container images. - **Phase:** Production. Deployed on rift, serving container images.
- **Deployment:** Running on rift as two containers (mcr API + mcr-web), - **Deployment:** Running on rift as two containers (mcr API + mcr-web),
fronted by MC-Proxy on ports 443 (web, L7), 8443 (API, L4), and fronted by MC-Proxy on ports 443 (web, L7), 8443 (API, L4), and
@@ -77,7 +80,7 @@ deployed on rift, serving authoritative DNS.
### MCAT — Login Policy Tester ### MCAT — Login Policy Tester
- **Version:** v1.1.0. - **Version:** v1.1.1.
- **Phase:** Complete. Diagnostic tool, not core infrastructure. - **Phase:** Complete. Diagnostic tool, not core infrastructure.
- **Deployment:** Available for ad-hoc use. Lightweight tool for testing - **Deployment:** Available for ad-hoc use. Lightweight tool for testing
MCIAS login policy rules. MCIAS login policy rules.
@@ -86,20 +89,21 @@ deployed on rift, serving authoritative DNS.
### MCDSL — Standard Library ### MCDSL — Standard Library
- **Version:** v1.2.0. - **Version:** v1.4.0.
- **Phase:** Stable. All 9 packages implemented and tested. Being adopted - **Phase:** Stable. All 9 packages implemented and tested. Being adopted
across the platform. across the platform.
- **Deployment:** N/A (Go library, imported by other services). - **Deployment:** N/A (Go library, imported by other services).
- **Packages:** auth, db, config, httpserver, grpcserver, csrf, web, health, - **Packages:** auth, db, config, httpserver, grpcserver, csrf, web, health,
archive. archive.
- **Adoption:** All services except mcias on v1.2.0. mcias pending. - **Adoption:** All services except mcias on v1.4.0. mcias pending.
### MCNS — Networking Service ### MCNS — Networking Service
- **Version:** v1.1.0. - **Version:** v1.1.1.
- **Phase:** Production. Custom Go DNS server replacing CoreDNS precursor. - **Phase:** Production. Custom Go DNS server replacing CoreDNS precursor.
- **Deployment:** Running on rift as a container managed by MCP. Serves two - **Deployment:** Running on rift as a container managed by MCP. Serves two
authoritative zones plus upstream forwarding. authoritative zones plus upstream forwarding. REST + gRPC APIs with MCIAS
auth and name-scoped system account authorization.
- **Recent work:** v1.0.0 implementation (custom Go DNS server), engineering - **Recent work:** v1.0.0 implementation (custom Go DNS server), engineering
review, deployed to rift replacing CoreDNS. review, deployed to rift replacing CoreDNS.
- **Artifacts:** Dockerfile, Docker Compose (rift), MCP service definition, - **Artifacts:** Dockerfile, Docker Compose (rift), MCP service definition,
@@ -117,34 +121,24 @@ deployed on rift, serving authoritative DNS.
### MCP — Control Plane ### MCP — Control Plane
- **Version:** v0.4.0. - **Version:** v0.7.6.
- **Phase:** Production. Phases 0-4 complete. Phase C (automated TLS cert - **Phase:** Production. Phases AD complete. Deployed to rift, managing all
provisioning) implemented. Deployed to rift, managing all platform containers. platform containers.
- **Deployment:** Running on rift. Agent as systemd service under `mcp` user - **Deployment:** Running on rift. Agent as systemd service under `mcp` user
with rootless podman. Manages metacrypt, mc-proxy, mcr, and mcns containers. with rootless podman. Manages metacrypt, mc-proxy, mcr, mcns, and mcdoc
containers.
- **Architecture:** Two components — `mcp` CLI (thin client on vade) and - **Architecture:** Two components — `mcp` CLI (thin client on vade) and
`mcp-agent` (per-node daemon with SQLite registry, podman management, `mcp-agent` (per-node daemon with SQLite registry, podman management,
monitoring with drift/flap detection, route registration with mc-proxy during monitoring with drift/flap detection, route registration with mc-proxy,
deploy/stop, automated TLS cert provisioning for L7 routes via Metacrypt CA). automated TLS cert provisioning for L7 routes via Metacrypt CA, automated
gRPC-only (no REST). DNS registration in MCNS). gRPC-only (no REST). 15 RPCs, 17+ CLI commands.
- **Recent work:** Full v1 implementation (12 RPCs, 15 CLI commands), - **Recent work:** Phase C (automated TLS cert provisioning), Phase D
deployment to rift, container migration from kyle→mcp user, service (automated DNS registration via MCNS), undeploy command, logs command,
definition authoring. Phase C automated TLS cert provisioning for L7 routes, edit command, auto-login to MCR, system account auth model, module path
mc-proxy route registration during deploy, mc-proxy dependency updated to migration.
v1.2.0, module path migration.
- **Artifacts:** systemd service (NixOS), TLS cert from Metacrypt, service - **Artifacts:** systemd service (NixOS), TLS cert from Metacrypt, service
definition files, design docs. definition files, design docs.
### MCDeploy — Deployment CLI
- **Version:** v0.2.0.
- **Phase:** Active development. Tactical bridge tool for deploying services
while MCP is being built.
- **Deployment:** N/A (local CLI tool, not a server).
- **Recent work:** Initial implementation, Nix flake.
- **Description:** Single-binary CLI that shells out to podman/ssh/scp/git
for build, push, deploy, cert renewal, and status. TOML-configured.
## Node Inventory ## Node Inventory
| Node | Address (LAN) | Address (Tailscale) | Role | | Node | Address (LAN) | Address (Tailscale) | Role |
@@ -153,10 +147,14 @@ deployed on rift, serving authoritative DNS.
## Rift Port Map ## Rift Port Map
Note: Services deployed via MCP receive dynamically assigned host ports
(1000060000). The ports below are for infrastructure services with static
assignments or well-known ports.
| Port | Protocol | Services | | Port | Protocol | Services |
|------|----------|----------| |------|----------|----------|
| 53 | DNS (LAN + Tailscale) | mcns | | 53 | DNS (LAN + Tailscale) | mcns |
| 443 | L7 (TLS termination) | metacrypt-web, mcr-web | | 443 | L7 (TLS termination) | metacrypt-web, mcr-web, mcdoc |
| 8080 | HTTP (all interfaces) | exod | | 8080 | HTTP (all interfaces) | exod |
| 8443 | L4 (SNI passthrough) | metacrypt API, mcr API | | 8443 | L4 (SNI passthrough) | metacrypt API, mcr API |
| 9090 | HTTP (all interfaces) | exod | | 9090 | HTTP (all interfaces) | exod |

View File

@@ -608,6 +608,74 @@ Services follow a standard directory structure:
--- ---
## 10. Agent Management
MCP manages a fleet of nodes with heterogeneous operating systems and
architectures. The agent binary lives at `/srv/mcp/mcp-agent` on every
node — this is a mutable path that MCP controls, regardless of whether
the node runs NixOS or Debian.
### Node Configuration
Each node in `~/.config/mcp/mcp.toml` includes SSH and architecture
info for agent management:
```toml
[[nodes]]
name = "rift"
address = "100.95.252.120:9444"
ssh = "rift"
arch = "amd64"
[[nodes]]
name = "hyperborea"
address = "100.x.x.x:9444"
ssh = "hyperborea"
arch = "arm64"
```
### Upgrading Agents
After tagging a new MCP release:
```bash
# Upgrade all nodes (recommended — prevents version skew)
mcp agent upgrade
# Upgrade a single node
mcp agent upgrade rift
# Check versions across the fleet
mcp agent status
```
`mcp agent upgrade` cross-compiles the agent binary for each target
architecture, SSHs to each node, atomically replaces the binary, and
restarts the systemd service. All nodes should be upgraded together
because new CLI versions often depend on new agent RPCs.
### Provisioning New Nodes
One-time setup for a new Debian node:
```bash
# 1. Provision the node (creates user, dirs, systemd unit, installs binary)
mcp node provision <name>
# 2. Register the node
mcp node add <name> <address>
# 3. Deploy services
mcp deploy <service>
```
For NixOS nodes, provisioning is handled by the NixOS configuration.
The NixOS config creates the `mcp` user, systemd unit, and directories.
The `ExecStart` path points to `/srv/mcp/mcp-agent` so that `mcp agent
upgrade` works the same as on Debian nodes.
---
## Appendix: Currently Deployed Services ## Appendix: Currently Deployed Services
For reference, these services are operational on the platform: For reference, these services are operational on the platform: