Sync platform docs: Phases C+D complete, Phase E planned #5
@@ -5,7 +5,7 @@ from its current manually-wired state to fully declarative deployment.
|
||||
It is a living design document — not a spec, not a commitment, but a
|
||||
record of where we are, where we want to be, and what's between.
|
||||
|
||||
Last updated: 2026-03-27 (Phases A + B + C complete)
|
||||
Last updated: 2026-03-28 (Phases A + B + C + D complete)
|
||||
|
||||
---
|
||||
|
||||
@@ -239,16 +239,16 @@ mc-proxy routes are fully persisted in SQLite and survive restarts:
|
||||
bootstrap before MCP is operational. The gRPC API and mcproxyctl
|
||||
are the primary route management interfaces going forward.
|
||||
|
||||
#### 6. MCP Agent: DNS Registration
|
||||
#### 6. MCP Agent: DNS Registration — DONE
|
||||
|
||||
**Gap**: DNS records are manually configured in MCNS zone files.
|
||||
|
||||
**Work**:
|
||||
- Agent creates/updates A records in MCNS for
|
||||
`<service>.svc.mcp.metacircular.net`.
|
||||
- Agent removes records on service teardown.
|
||||
|
||||
**Depends on**: MCNS record management API (#8).
|
||||
Agent automatically manages DNS records during deploy and stop:
|
||||
- Deploy: calls MCNS API to create/update A records for
|
||||
`<service>.svc.mcp.metacircular.net` pointing to the node's address.
|
||||
- Stop/undeploy: removes DNS records before stopping containers.
|
||||
- Config: `[mcns]` section in agent config with server URL, CA cert,
|
||||
token path, zone, and node address.
|
||||
- Nil-safe: if MCNS not configured, silently skipped (backward compatible).
|
||||
- Authorization: mcp-agent system account can manage any record name.
|
||||
|
||||
#### 7. Metacrypt: Automated Cert Issuance Policy — DONE
|
||||
|
||||
@@ -259,31 +259,29 @@ issuance:
|
||||
`*.svc.mcp.metacircular.net`
|
||||
- One cert per hostname per service — no wildcard certs
|
||||
|
||||
#### 8. MCNS: Record Management API
|
||||
#### 8. MCNS: Record Management API — DONE
|
||||
|
||||
**Gap**: MCNS v1.0.0 has REST + gRPC APIs and SQLite storage, but
|
||||
records are currently seeded from migrations (static). The API supports
|
||||
CRUD operations but MCP does not yet call it for dynamic registration.
|
||||
|
||||
**Work**:
|
||||
- MCP agent calls MCNS API to create/update/delete records on
|
||||
deploy/stop.
|
||||
- MCIAS auth scoping to allow MCP agent to manage
|
||||
`*.svc.mcp.metacircular.net` records.
|
||||
|
||||
**Depends on**: MCNS API exists. Remaining work is MCP integration
|
||||
and auth scoping.
|
||||
MCNS provides full CRUD for DNS records via REST and gRPC:
|
||||
- REST: POST/GET/PUT/DELETE on `/v1/zones/{zone}/records`
|
||||
- gRPC: RecordService with ListRecords, CreateRecord, GetRecord,
|
||||
UpdateRecord, DeleteRecord RPCs
|
||||
- SQLite-backed with transactional writes, CNAME exclusivity enforcement,
|
||||
and automatic SOA serial bumping on mutations
|
||||
- Authorization: admin can manage any record, mcp-agent system account
|
||||
can manage any record name, other system accounts scoped to own name
|
||||
- MCP agent uses the REST API to register/deregister records on
|
||||
deploy/stop
|
||||
|
||||
#### 9. Application $PORT Convention — DONE
|
||||
|
||||
mcdsl v1.2.0 adds `$PORT` and `$PORT_GRPC` env var support:
|
||||
mcdsl v1.2.0 added `$PORT` and `$PORT_GRPC` env var support:
|
||||
- `config.Load` checks `$PORT` → overrides `Server.ListenAddr`
|
||||
- `config.Load` checks `$PORT_GRPC` → overrides `Server.GRPCAddr`
|
||||
- Takes precedence over TOML and generic env overrides
|
||||
(`$MCR_SERVER_LISTEN_ADDR`) — agent-assigned ports are authoritative
|
||||
- Handles both `config.Base` embedding (MCR, MCNS, MCAT) and direct
|
||||
`ServerConfig` embedding (Metacrypt) via struct tree walking
|
||||
- All consuming services upgraded to mcdsl v1.2.0
|
||||
- All consuming services on mcdsl v1.4.0
|
||||
|
||||
---
|
||||
|
||||
@@ -306,26 +304,85 @@ Phase C — Automated TLS: ✓ COMPLETE
|
||||
#4 Agent provisions certs ✓ DONE
|
||||
(depends on #7)
|
||||
|
||||
Phase D — DNS:
|
||||
#8 MCNS record management API
|
||||
#6 Agent registers DNS
|
||||
Phase D — DNS: ✓ COMPLETE
|
||||
#8 MCNS record management API ✓ DONE
|
||||
#6 Agent registers DNS ✓ DONE
|
||||
(depends on #8)
|
||||
|
||||
Phase E — Multi-node agent management:
|
||||
#10 Agent binary at /srv/mcp/mcp-agent on all nodes
|
||||
#11 mcp agent upgrade (SSH-based cross-compiled push)
|
||||
#12 Node provisioning tooling (Debian + NixOS)
|
||||
(depends on #10)
|
||||
```
|
||||
|
||||
**Phases A, B, and C are complete.** Services can be deployed with
|
||||
**Phases A, B, C, and D are complete.** Services can be deployed with
|
||||
agent-assigned ports, `$PORT` env vars, automatic mc-proxy route
|
||||
registration, and automated TLS cert provisioning from Metacrypt CA.
|
||||
No more manual port picking, mcproxyctl, TOML editing, or cert generation.
|
||||
|
||||
The only remaining manual step is DNS registration (Phase D).
|
||||
registration, automated TLS cert provisioning from Metacrypt CA, and
|
||||
automatic DNS registration in MCNS. No more manual port picking,
|
||||
mcproxyctl, TOML editing, cert generation, or DNS zone editing.
|
||||
|
||||
### Immediate Next Steps
|
||||
|
||||
1. **Phase D: DNS** — MCNS record management API integration, then
|
||||
agent registers DNS records during deploy.
|
||||
1. **Phase E: Multi-node agent management** — see below.
|
||||
2. **mcdoc implementation** — fully designed, no platform evolution
|
||||
dependency. Deployable now with the new route system.
|
||||
|
||||
#### 10. Agent Binary Location Convention
|
||||
|
||||
**Gap**: The agent binary is currently NixOS-managed on rift (lives in
|
||||
`/nix/store/`, systemd `ExecStart` points there). This doesn't work for
|
||||
Debian nodes and requires a full `nixos-rebuild` for every MCP release.
|
||||
|
||||
**Work**:
|
||||
- Standardize agent binary at `/srv/mcp/mcp-agent` on all nodes.
|
||||
- NixOS config: change `ExecStart` from nix store path to
|
||||
`/srv/mcp/mcp-agent`. NixOS still owns user, systemd unit, podman,
|
||||
directories — just not the binary version.
|
||||
- Debian nodes: same layout, provisioned by setup script.
|
||||
|
||||
#### 11. Agent Upgrade via SSH Push
|
||||
|
||||
**Gap**: Updating the agent requires manual, OS-specific steps. On
|
||||
NixOS: update flake lock, commit, push, rebuild. On Debian: build, scp,
|
||||
restart. With multiple nodes and architectures (amd64 + arm64), this
|
||||
doesn't scale.
|
||||
|
||||
**Work**:
|
||||
- `mcp agent upgrade [node]` CLI command.
|
||||
- Cross-compiles agent for each target arch (`GOARCH` from node config).
|
||||
- Uses `golang.org/x/crypto/ssh` to push the binary and restart the
|
||||
service. No external tool dependencies.
|
||||
- Node config gains `ssh` (hostname) and `arch` (GOARCH) fields.
|
||||
- Upgrades all nodes by default to prevent version skew. New RPCs cause
|
||||
`Unimplemented` errors if agent and CLI are out of sync.
|
||||
|
||||
**Depends on**: #10 (binary location convention).
|
||||
|
||||
#### 12. Node Provisioning Tooling
|
||||
|
||||
**Gap**: Setting up a new node requires manual steps: create user,
|
||||
create directories, install podman, write config, create systemd unit.
|
||||
Different for NixOS vs Debian.
|
||||
|
||||
**Work**:
|
||||
- Go-based provisioning tool (part of MCP CLI) or standalone script.
|
||||
- `mcp node provision <name>` SSHs to the node and runs setup:
|
||||
create `mcp` user with podman access, create `/srv/mcp/`, write
|
||||
systemd unit, install initial binary, start service.
|
||||
- For NixOS, provisioning remains in the NixOS config (declarative).
|
||||
The provisioning tool targets Debian/generic Linux.
|
||||
|
||||
**Depends on**: #10 (binary location convention), #11 (SSH infra).
|
||||
|
||||
**Current fleet**:
|
||||
|
||||
| Node | OS | Arch | Status |
|
||||
|------|----|------|--------|
|
||||
| rift | NixOS | amd64 | Operational, single MCP agent |
|
||||
| hyperborea | Debian (RPi) | arm64 | Online, needs agent provisioning |
|
||||
| svc | Debian | amd64 | Runs MCIAS, needs agent for public edge services |
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
82
STATUS.md
82
STATUS.md
@@ -1,6 +1,6 @@
|
||||
# Metacircular Platform Status
|
||||
|
||||
Last updated: 2026-03-27
|
||||
Last updated: 2026-03-28
|
||||
|
||||
## Platform Overview
|
||||
|
||||
@@ -8,28 +8,30 @@ One node operational (**rift**), running core infrastructure services as
|
||||
containers fronted by MC-Proxy. MCIAS runs separately (not on rift).
|
||||
Bootstrap phases 0–4 complete (MCIAS, Metacrypt, MC-Proxy, MCR all
|
||||
operational). MCP is deployed and managing all platform containers. MCNS is
|
||||
deployed on rift, serving authoritative DNS.
|
||||
deployed on rift, serving authoritative DNS. Platform evolution Phases A–D
|
||||
complete (automated port assignment, route registration, TLS cert
|
||||
provisioning, and DNS registration). Multi-node deployment is being planned
|
||||
(Phase E).
|
||||
|
||||
## Service Status
|
||||
|
||||
| Service | Version | SDLC Phase | Deployed | Node |
|
||||
|---------|---------|------------|----------|------|
|
||||
| MCIAS | v1.8.0 | Maintenance | Yes | (separate) |
|
||||
| Metacrypt | v1.1.0 | Production | Yes | rift |
|
||||
| MCIAS | v1.9.0 | Maintenance | Yes | (separate) |
|
||||
| Metacrypt | v1.3.1 | Production | Yes | rift |
|
||||
| MC-Proxy | v1.2.1 | Maintenance | Yes | rift |
|
||||
| MCR | v1.2.0 | Production | Yes | rift |
|
||||
| MCAT | v1.1.0 | Complete | Unknown | — |
|
||||
| MCDSL | v1.2.0 | Stable | N/A (library) | — |
|
||||
| MCNS | v1.1.0 | Production | Yes | rift |
|
||||
| MCR | v1.2.1 | Production | Yes | rift |
|
||||
| MCAT | v1.1.1 | Complete | Unknown | — |
|
||||
| MCDSL | v1.4.0 | Stable | N/A (library) | — |
|
||||
| MCNS | v1.1.1 | Production | Yes | rift |
|
||||
| MCDoc | v0.1.0 | Production | Yes | rift |
|
||||
| MCP | v0.4.0 | Production | Yes | rift |
|
||||
| MCDeploy | v0.2.0 | Active dev | N/A (CLI tool) | — |
|
||||
| MCP | v0.7.6 | Production | Yes | rift |
|
||||
|
||||
## Service Details
|
||||
|
||||
### MCIAS — Identity and Access Service
|
||||
|
||||
- **Version:** v1.8.0 (client library: clients/go/v0.2.0)
|
||||
- **Version:** v1.9.0 (client library: clients/go/v0.2.0)
|
||||
- **Phase:** Maintenance. Phases 0-14 complete. Feature-complete with active
|
||||
refinement.
|
||||
- **Deployment:** Running in production. All other services authenticate
|
||||
@@ -41,7 +43,7 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
### Metacrypt — Cryptographic Service Engine
|
||||
|
||||
- **Version:** v1.1.0.
|
||||
- **Version:** v1.3.1.
|
||||
- **Phase:** Production. All four engine types implemented (CA, SSH CA, transit,
|
||||
user-to-user). Active work on integration test coverage.
|
||||
- **Deployment:** Running on rift as a container, fronted by MC-Proxy on
|
||||
@@ -56,7 +58,8 @@ deployed on rift, serving authoritative DNS.
|
||||
- **Version:** v1.2.1.
|
||||
- **Phase:** Maintenance. Stable and actively routing traffic on rift.
|
||||
- **Deployment:** Running on rift. Fronts Metacrypt, MCR, and sgard on ports
|
||||
443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091.
|
||||
443, 8443, and 9443. Prometheus metrics on 127.0.0.1:9091. Routes persisted
|
||||
in SQLite and managed via gRPC API.
|
||||
- **Recent work:** Route persistence (SQLite), idempotent AddRoute (upsert),
|
||||
golangci-lint v2 compliance, module path migration to mc/ org.
|
||||
- **Artifacts:** systemd units (service + backup timer), Docker Compose
|
||||
@@ -64,7 +67,7 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
### MCR — Container Registry
|
||||
|
||||
- **Version:** v1.2.0. All implementation phases complete.
|
||||
- **Version:** v1.2.1. All implementation phases complete.
|
||||
- **Phase:** Production. Deployed on rift, serving container images.
|
||||
- **Deployment:** Running on rift as two containers (mcr API + mcr-web),
|
||||
fronted by MC-Proxy on ports 443 (web, L7), 8443 (API, L4), and
|
||||
@@ -77,7 +80,7 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
### MCAT — Login Policy Tester
|
||||
|
||||
- **Version:** v1.1.0.
|
||||
- **Version:** v1.1.1.
|
||||
- **Phase:** Complete. Diagnostic tool, not core infrastructure.
|
||||
- **Deployment:** Available for ad-hoc use. Lightweight tool for testing
|
||||
MCIAS login policy rules.
|
||||
@@ -86,20 +89,21 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
### MCDSL — Standard Library
|
||||
|
||||
- **Version:** v1.2.0.
|
||||
- **Version:** v1.4.0.
|
||||
- **Phase:** Stable. All 9 packages implemented and tested. Being adopted
|
||||
across the platform.
|
||||
- **Deployment:** N/A (Go library, imported by other services).
|
||||
- **Packages:** auth, db, config, httpserver, grpcserver, csrf, web, health,
|
||||
archive.
|
||||
- **Adoption:** All services except mcias on v1.2.0. mcias pending.
|
||||
- **Adoption:** All services except mcias on v1.4.0. mcias pending.
|
||||
|
||||
### MCNS — Networking Service
|
||||
|
||||
- **Version:** v1.1.0.
|
||||
- **Version:** v1.1.1.
|
||||
- **Phase:** Production. Custom Go DNS server replacing CoreDNS precursor.
|
||||
- **Deployment:** Running on rift as a container managed by MCP. Serves two
|
||||
authoritative zones plus upstream forwarding.
|
||||
authoritative zones plus upstream forwarding. REST + gRPC APIs with MCIAS
|
||||
auth and name-scoped system account authorization.
|
||||
- **Recent work:** v1.0.0 implementation (custom Go DNS server), engineering
|
||||
review, deployed to rift replacing CoreDNS.
|
||||
- **Artifacts:** Dockerfile, Docker Compose (rift), MCP service definition,
|
||||
@@ -117,34 +121,24 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
### MCP — Control Plane
|
||||
|
||||
- **Version:** v0.4.0.
|
||||
- **Phase:** Production. Phases 0-4 complete. Phase C (automated TLS cert
|
||||
provisioning) implemented. Deployed to rift, managing all platform containers.
|
||||
- **Version:** v0.7.6.
|
||||
- **Phase:** Production. Phases A–D complete. Deployed to rift, managing all
|
||||
platform containers.
|
||||
- **Deployment:** Running on rift. Agent as systemd service under `mcp` user
|
||||
with rootless podman. Manages metacrypt, mc-proxy, mcr, and mcns containers.
|
||||
with rootless podman. Manages metacrypt, mc-proxy, mcr, mcns, and mcdoc
|
||||
containers.
|
||||
- **Architecture:** Two components — `mcp` CLI (thin client on vade) and
|
||||
`mcp-agent` (per-node daemon with SQLite registry, podman management,
|
||||
monitoring with drift/flap detection, route registration with mc-proxy during
|
||||
deploy/stop, automated TLS cert provisioning for L7 routes via Metacrypt CA).
|
||||
gRPC-only (no REST).
|
||||
- **Recent work:** Full v1 implementation (12 RPCs, 15 CLI commands),
|
||||
deployment to rift, container migration from kyle→mcp user, service
|
||||
definition authoring. Phase C automated TLS cert provisioning for L7 routes,
|
||||
mc-proxy route registration during deploy, mc-proxy dependency updated to
|
||||
v1.2.0, module path migration.
|
||||
monitoring with drift/flap detection, route registration with mc-proxy,
|
||||
automated TLS cert provisioning for L7 routes via Metacrypt CA, automated
|
||||
DNS registration in MCNS). gRPC-only (no REST). 15 RPCs, 17+ CLI commands.
|
||||
- **Recent work:** Phase C (automated TLS cert provisioning), Phase D
|
||||
(automated DNS registration via MCNS), undeploy command, logs command,
|
||||
edit command, auto-login to MCR, system account auth model, module path
|
||||
migration.
|
||||
- **Artifacts:** systemd service (NixOS), TLS cert from Metacrypt, service
|
||||
definition files, design docs.
|
||||
|
||||
### MCDeploy — Deployment CLI
|
||||
|
||||
- **Version:** v0.2.0.
|
||||
- **Phase:** Active development. Tactical bridge tool for deploying services
|
||||
while MCP is being built.
|
||||
- **Deployment:** N/A (local CLI tool, not a server).
|
||||
- **Recent work:** Initial implementation, Nix flake.
|
||||
- **Description:** Single-binary CLI that shells out to podman/ssh/scp/git
|
||||
for build, push, deploy, cert renewal, and status. TOML-configured.
|
||||
|
||||
## Node Inventory
|
||||
|
||||
| Node | Address (LAN) | Address (Tailscale) | Role |
|
||||
@@ -153,10 +147,14 @@ deployed on rift, serving authoritative DNS.
|
||||
|
||||
## Rift Port Map
|
||||
|
||||
Note: Services deployed via MCP receive dynamically assigned host ports
|
||||
(10000–60000). The ports below are for infrastructure services with static
|
||||
assignments or well-known ports.
|
||||
|
||||
| Port | Protocol | Services |
|
||||
|------|----------|----------|
|
||||
| 53 | DNS (LAN + Tailscale) | mcns |
|
||||
| 443 | L7 (TLS termination) | metacrypt-web, mcr-web |
|
||||
| 443 | L7 (TLS termination) | metacrypt-web, mcr-web, mcdoc |
|
||||
| 8080 | HTTP (all interfaces) | exod |
|
||||
| 8443 | L4 (SNI passthrough) | metacrypt API, mcr API |
|
||||
| 9090 | HTTP (all interfaces) | exod |
|
||||
|
||||
@@ -608,6 +608,74 @@ Services follow a standard directory structure:
|
||||
|
||||
---
|
||||
|
||||
## 10. Agent Management
|
||||
|
||||
MCP manages a fleet of nodes with heterogeneous operating systems and
|
||||
architectures. The agent binary lives at `/srv/mcp/mcp-agent` on every
|
||||
node — this is a mutable path that MCP controls, regardless of whether
|
||||
the node runs NixOS or Debian.
|
||||
|
||||
### Node Configuration
|
||||
|
||||
Each node in `~/.config/mcp/mcp.toml` includes SSH and architecture
|
||||
info for agent management:
|
||||
|
||||
```toml
|
||||
[[nodes]]
|
||||
name = "rift"
|
||||
address = "100.95.252.120:9444"
|
||||
ssh = "rift"
|
||||
arch = "amd64"
|
||||
|
||||
[[nodes]]
|
||||
name = "hyperborea"
|
||||
address = "100.x.x.x:9444"
|
||||
ssh = "hyperborea"
|
||||
arch = "arm64"
|
||||
```
|
||||
|
||||
### Upgrading Agents
|
||||
|
||||
After tagging a new MCP release:
|
||||
|
||||
```bash
|
||||
# Upgrade all nodes (recommended — prevents version skew)
|
||||
mcp agent upgrade
|
||||
|
||||
# Upgrade a single node
|
||||
mcp agent upgrade rift
|
||||
|
||||
# Check versions across the fleet
|
||||
mcp agent status
|
||||
```
|
||||
|
||||
`mcp agent upgrade` cross-compiles the agent binary for each target
|
||||
architecture, SSHs to each node, atomically replaces the binary, and
|
||||
restarts the systemd service. All nodes should be upgraded together
|
||||
because new CLI versions often depend on new agent RPCs.
|
||||
|
||||
### Provisioning New Nodes
|
||||
|
||||
One-time setup for a new Debian node:
|
||||
|
||||
```bash
|
||||
# 1. Provision the node (creates user, dirs, systemd unit, installs binary)
|
||||
mcp node provision <name>
|
||||
|
||||
# 2. Register the node
|
||||
mcp node add <name> <address>
|
||||
|
||||
# 3. Deploy services
|
||||
mcp deploy <service>
|
||||
```
|
||||
|
||||
For NixOS nodes, provisioning is handled by the NixOS configuration.
|
||||
The NixOS config creates the `mcp` user, systemd unit, and directories.
|
||||
The `ExecStart` path points to `/srv/mcp/mcp-agent` so that `mcp agent
|
||||
upgrade` works the same as on Debian nodes.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Currently Deployed Services
|
||||
|
||||
For reference, these services are operational on the platform:
|
||||
|
||||
Reference in New Issue
Block a user