mcp-agent and mc-proxy run on every node as systemd services — they are not placed by the master and don't belong in the placements table. Snapshot paths are node-keyed for infrastructure (<service>/<node>/) to avoid collisions between instances on different nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1658 lines
58 KiB
Markdown
1658 lines
58 KiB
Markdown
# MCP v2 -- Multi-Node Control Plane
|
|
|
|
## Overview
|
|
|
|
MCP v2 introduces multi-node orchestration with a master/agent topology.
|
|
The CLI no longer dials agents directly. A dedicated **mcp-master** daemon
|
|
coordinates deployments across nodes, handles cross-node concerns (edge
|
|
routing, certificate provisioning, DNS), and serves as the single control
|
|
point for the platform.
|
|
|
|
### Motivation
|
|
|
|
v1 deployed successfully on a single node (rift) but exposed operational
|
|
pain points as services needed public-facing routes through svc:
|
|
|
|
- **Manual edge routing**: Exposing mcq.metacircular.net required hand-editing
|
|
mc-proxy's TOML config on svc, provisioning a TLS cert manually, updating
|
|
the SQLite database when the config and database diverged, and debugging
|
|
silent failures. Every redeployment risked breaking the public route.
|
|
|
|
- **Dynamic port instability**: The route system assigns ephemeral host ports
|
|
that change on every deploy. svc's mc-proxy pointed at a specific port
|
|
(e.g., `100.95.252.120:48080`), which went stale after redeployment.
|
|
Container ports are also localhost-only under rootless podman, requiring
|
|
explicit Tailscale IP bindings for external access.
|
|
|
|
- **$PORT env override conflict**: The mcdsl config loader overrides
|
|
`listen_addr` from `$PORT` when routes are present. This meant containers
|
|
ignored their configured port and listened on the route-allocated one
|
|
instead, breaking explicit port mappings that expected the config port.
|
|
|
|
- **Cert chain issues**: mc-proxy requires full certificate chains (leaf +
|
|
intermediates). Certs provisioned outside the standard metacrypt flow
|
|
were leaf-only and caused silent TLS handshake failures (`client_bytes=7
|
|
backend_bytes=0` with no error logged).
|
|
|
|
- **mc-proxy database divergence**: mc-proxy persists routes in SQLite.
|
|
Routes added via the admin API override the TOML config. Editing the TOML
|
|
alone had no effect until the database was manually updated -- a failure
|
|
mode that took hours to diagnose.
|
|
|
|
- **No cross-node coordination**: The v1 CLI talks directly to individual
|
|
agents. There is no mechanism for one agent to tell another "set up a
|
|
route for this service." Every cross-node operation was manual.
|
|
|
|
v2 addresses all of these by making the master the single coordination
|
|
point for deployments, with agents handling local concerns (containers,
|
|
mc-proxy routes, cert provisioning) on instruction from the master.
|
|
|
|
### What Changes from v1
|
|
|
|
| Concern | v1 | v2 |
|
|
|---------|----|----|
|
|
| CLI target | CLI dials agents directly | CLI dials the master |
|
|
| Node awareness | CLI routes by `node` field in service defs | Master owns the node registry |
|
|
| Service placement | Explicit `node` required | `tier` field; master auto-places workers |
|
|
| Edge routing | Manual mc-proxy config on svc | Master coordinates edge setup |
|
|
| Cert provisioning | Agent provisions for local mc-proxy only | Edge agent provisions its own public certs |
|
|
| DNS registration | Agent registers records on deploy | Master coordinates DNS across zones |
|
|
| Auth model | Token validation only | Per-RPC role-based authorization |
|
|
|
|
### What Stays the Same
|
|
|
|
The agent's core responsibilities are unchanged: it manages containers via
|
|
podman, stores its local registry in SQLite, monitors for drift, and alerts
|
|
the operator. The agent gains new RPCs for edge routing and health reporting
|
|
but does not become aware of other nodes -- the master handles all
|
|
cross-node coordination. Agents never communicate with each other.
|
|
|
|
---
|
|
|
|
## Topology
|
|
|
|
```
|
|
Operator workstation (vade)
|
|
┌──────────────────────────┐
|
|
│ mcp (CLI) │
|
|
│ │
|
|
│ gRPC ───────────────────┼─── Tailnet ──┐
|
|
└──────────────────────────┘ │
|
|
▼
|
|
Master + worker node (rift)
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ mcp-master │
|
|
│ ├── node registry (agents self-register) │
|
|
│ ├── service placement (tier-aware) │
|
|
│ ├── edge routing coordinator │
|
|
│ └── SQLite state (edge routes, placements) │
|
|
│ │
|
|
│ mcp-agent │
|
|
│ ├── mcias container │
|
|
│ ├── mcns container │
|
|
│ ├── metacrypt container │
|
|
│ ├── mcr container │
|
|
│ ├── mcq, mcdoc, exo, sgard, kls ... │
|
|
│ └── mc-proxy (rift) │
|
|
└──────────┬──────────────────┬───────────┬────────────┘
|
|
│ │ │
|
|
Tailnet Tailnet Tailnet
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
Worker (orion) Edge (svc)
|
|
┌──────────────────┐ ┌─────────────────────┐
|
|
│ mcp-agent │ │ mcp-agent │
|
|
│ ├── services │ │ ├── mc-proxy │
|
|
│ └── mc-proxy │ │ └── (routes only)│
|
|
└──────────────────┘ └─────────────────────┘
|
|
NixOS / amd64 Debian / amd64
|
|
```
|
|
|
|
### Node Roles
|
|
|
|
| Role | Purpose | Nodes |
|
|
|------|---------|-------|
|
|
| **master** | Runs mcp-master + mcp-agent. Hosts core infrastructure. Single coordination point. | rift |
|
|
| **worker** | Runs mcp-agent. Hosts application services. | orion |
|
|
| **edge** | Runs mcp-agent. Terminates public TLS, forwards to internal services. No application containers. | svc |
|
|
|
|
Every node runs an mcp-agent. Rift also runs mcp-master. The master's
|
|
local agent manages the infrastructure services (MCIAS, mcns, metacrypt,
|
|
mcr) the same way other agents manage application services.
|
|
|
|
### mc-proxy Mesh
|
|
|
|
Each node runs its own mc-proxy instance. They form a routing mesh:
|
|
|
|
```
|
|
mc-proxy (rift)
|
|
├── :443 L7 routes for internal .svc.mcp hostnames
|
|
├── :8443 L4 passthrough for API servers (MCIAS, metacrypt, mcr)
|
|
└── :9443 L4 passthrough for gRPC services
|
|
|
|
mc-proxy (orion)
|
|
├── :443 L7 routes for services hosted on this node
|
|
└── :8443 L4/L7 routes for internal APIs
|
|
|
|
mc-proxy (svc)
|
|
└── :443 L7 termination for public hostnames
|
|
→ forwards to internal .svc.mcp endpoints over Tailnet
|
|
```
|
|
|
|
---
|
|
|
|
## Security Model
|
|
|
|
### Authentication and Authorization
|
|
|
|
All gRPC channels (CLI↔master, master↔agent, agent→master) use TLS 1.3
|
|
with MCIAS bearer tokens. Every entity has a distinct MCIAS identity:
|
|
|
|
| Entity | MCIAS Identity | Account Type |
|
|
|--------|---------------|--------------|
|
|
| Operator CLI | `kyle` (or personal account) | human |
|
|
| mcp-master | `mcp-master` | service |
|
|
| Agent on rift | `agent-rift` | service |
|
|
| Agent on orion | `agent-orion` | service |
|
|
| Agent on svc | `agent-svc` | service |
|
|
|
|
RPCs are authorized by **caller role**, not just authentication:
|
|
|
|
| RPC Category | Allowed Callers | Rejected |
|
|
|--------------|-----------------|----------|
|
|
| CLI→master (Deploy, Undeploy, Status, Sync) | human accounts, `mcp-master` (for self-management) | agent service accounts |
|
|
| Agent→master (Register, Heartbeat) | `agent-*` service accounts | human accounts, `mcp-master` |
|
|
| Master→agent (Deploy, SetupEdgeRoute, HealthCheck) | `mcp-master` only | all others |
|
|
|
|
The auth interceptor on both master and agent validates the bearer token
|
|
via MCIAS, then checks the caller's account type and service name against
|
|
the RPC's allowed-caller list. Unauthorized calls return
|
|
`PermissionDenied`.
|
|
|
|
### Trust Assumptions
|
|
|
|
The master is a **fully trusted** component. A compromised master can
|
|
control the entire fleet: deploy arbitrary containers, exfiltrate data
|
|
via snapshots, redirect traffic via edge routes. This is inherent to
|
|
the master/agent topology and acceptable for a single-operator personal
|
|
platform. Mitigations: the master runs on the operator's always-on
|
|
machine (rift) behind Tailscale, authenticates to MCIAS with its own
|
|
service identity, and all communication is TLS 1.3.
|
|
|
|
### TLS Verification
|
|
|
|
All gRPC connections verify the peer's TLS certificate against the
|
|
Metacrypt CA cert. Agents configure the CA cert path in their config:
|
|
|
|
```toml
|
|
[tls]
|
|
ca_cert = "/srv/mcp/certs/metacircular-ca.pem"
|
|
```
|
|
|
|
When an agent starts before the master is available (e.g., svc's agent
|
|
starts before rift's boot sequence completes), the TLS connection fails
|
|
and the agent retries with exponential backoff. The CA cert itself is
|
|
pre-provisioned on all nodes — it does not depend on Metacrypt being
|
|
running.
|
|
|
|
### Registration Security
|
|
|
|
Agents self-register with the master, but registration is **identity-bound**:
|
|
|
|
1. The master extracts the caller's MCIAS service name from the validated
|
|
token (e.g., `agent-rift`).
|
|
2. The expected node name is derived by stripping the `agent-` prefix.
|
|
3. The `RegisterRequest.name` must match. `agent-rift` can only register
|
|
`name = "rift"`. A rogue agent cannot impersonate another node.
|
|
4. The master maintains an allowlist of permitted agent identities:
|
|
|
|
```toml
|
|
[registration]
|
|
allowed_agents = ["agent-rift", "agent-svc", "agent-orion"]
|
|
```
|
|
|
|
Registration from unknown identities is rejected. Re-registration from the
|
|
same identity updates the entry (handles restarts) and logs a warning with
|
|
the previous address for audit.
|
|
|
|
### Edge Route Validation
|
|
|
|
When the master sets up an edge route, it validates both ends:
|
|
|
|
- **Public hostname**: must fall under an allowed domain
|
|
(`metacircular.net`, `wntrmute.net`). Validation uses proper domain
|
|
label matching — `evilmetacircular.net` is rejected. Implementation:
|
|
the hostname must equal the allowed domain or be preceded by a `.`
|
|
(e.g., `mcq.metacircular.net` matches, `metacircular.net` matches,
|
|
`xmetacircular.net` does not).
|
|
|
|
- **Backend hostname**: must end with `.svc.mcp.metacircular.net`
|
|
(the internal DNS zone). The edge agent resolves it and verifies the
|
|
result is a Tailnet IP (100.64.0.0/10). Non-Tailnet backends are
|
|
rejected.
|
|
|
|
### Certificate Issuance Policies
|
|
|
|
Per-identity restrictions in Metacrypt limit what each agent can issue:
|
|
|
|
| Agent | Allowed SANs | Denied SANs |
|
|
|-------|-------------|-------------|
|
|
| `agent-rift`, `agent-orion` | `*.svc.mcp.metacircular.net` | public domains |
|
|
| `agent-svc` | `*.metacircular.net`, `*.wntrmute.net` | `.svc.mcp.` names |
|
|
|
|
This ensures a compromised edge agent cannot issue certs for internal
|
|
names, and a compromised worker agent cannot issue certs for public
|
|
names. The Metacrypt CA is not publicly trusted, which limits blast
|
|
radius further.
|
|
|
|
### Rate Limiting
|
|
|
|
The master rate-limits agent RPCs:
|
|
|
|
- `Register`: 1 per minute per identity.
|
|
- `Heartbeat`: 1 per 10 seconds per identity.
|
|
- Maximum registered nodes: 16 (configurable).
|
|
|
|
Excess calls return `ResourceExhausted`.
|
|
|
|
### Tailscale ACLs
|
|
|
|
Network-level restriction (configured in Tailscale admin, not MCP):
|
|
|
|
- rift (master): can reach all agent gRPC ports (9444) on all nodes.
|
|
The master process needs this to forward deploys and set up edge
|
|
routes.
|
|
- svc: can reach master gRPC (9555), backend service ports (443, 8443,
|
|
9443), and Metacrypt (8443). Blocked from MCIAS management, MCR push,
|
|
and agent gRPC on other nodes.
|
|
- Workers: can reach master gRPC, MCR (pull), Metacrypt, MCIAS. Blocked
|
|
from other workers' agent ports and svc's agent port.
|
|
|
|
---
|
|
|
|
## Node Infrastructure vs Deployed Services
|
|
|
|
Two categories of software run on the platform:
|
|
|
|
**Node infrastructure** runs on every node by definition, not because
|
|
the master placed it. Managed by systemd, outside the master's
|
|
deploy/undeploy/placement model:
|
|
|
|
| Component | Present on | Managed by |
|
|
|-----------|-----------|------------|
|
|
| mcp-agent | all nodes | systemd; upgraded via `mcp agent upgrade` |
|
|
| mc-proxy | all nodes | systemd; upgraded via binary replacement |
|
|
|
|
The master interacts with node infrastructure (calls agent RPCs,
|
|
manipulates mc-proxy routes) but does not deploy or place it. Node
|
|
infrastructure is not tracked in the `placements` table.
|
|
|
|
**Deployed services** are placed by the master on specific nodes.
|
|
Tracked in the placements table, managed through `mcp deploy/undeploy/
|
|
migrate`.
|
|
|
|
### Snapshot Paths
|
|
|
|
Node infrastructure is snapshotted per-node (the same component exists
|
|
on multiple nodes with different data):
|
|
|
|
```
|
|
/srv/mcp-master/snapshots/
|
|
mc-proxy/rift/2026-04-02T00:00:00Z.tar.zst
|
|
mc-proxy/svc/2026-04-02T00:00:00Z.tar.zst
|
|
mcp-agent/rift/...
|
|
mcp-agent/svc/...
|
|
mcq/2026-04-02T00:00:00Z.tar.zst # deployed service — single node
|
|
mcias/2026-04-02T00:00:00Z.tar.zst
|
|
```
|
|
|
|
---
|
|
|
|
## Service Placement
|
|
|
|
Deployed services declare a **tier** that determines where they run:
|
|
|
|
- **`tier = "core"`** — scheduled on the master node. Used for platform
|
|
infrastructure: MCIAS, metacrypt, mcr, mcns.
|
|
- **`tier = "worker"`** (default) — auto-placed on a worker node. The
|
|
master selects the node based on container count and health.
|
|
|
|
Explicit node pinning is still supported via `node = "orion"` for cases
|
|
where a service must run on a specific machine. When `node` is set, it
|
|
overrides `tier`.
|
|
|
|
### Placement Algorithm
|
|
|
|
Worker placement is deliberately simple:
|
|
|
|
1. Filter eligible nodes: healthy workers.
|
|
2. Select the node with the fewest running containers.
|
|
3. Break ties alphabetically by node name (deterministic).
|
|
|
|
All v2 nodes are amd64, so architecture filtering is not needed.
|
|
Services do not declare resource requirements for v2. The heartbeat
|
|
reports available resources (CPU, memory, disk) which the master uses
|
|
for health assessment, but placement is container-count based. Resource-
|
|
aware bin-packing is future work.
|
|
|
|
### Service Definition
|
|
|
|
```toml
|
|
name = "mcq"
|
|
tier = "worker" # default; placed by master
|
|
active = true
|
|
|
|
[[components]]
|
|
name = "mcq"
|
|
image = "mcr.svc.mcp.metacircular.net:8443/mcq:v0.4.0"
|
|
volumes = ["/srv/mcq:/srv/mcq"]
|
|
cmd = ["server", "--config", "/srv/mcq/mcq.toml"]
|
|
|
|
# Internal route: handled by the local node's mc-proxy.
|
|
[[components.routes]]
|
|
name = "internal"
|
|
port = 8443
|
|
mode = "l7"
|
|
|
|
# Public route: master sets up edge routing on svc.
|
|
[[components.routes]]
|
|
name = "public"
|
|
port = 8443
|
|
mode = "l7"
|
|
hostname = "mcq.metacircular.net"
|
|
public = true
|
|
```
|
|
|
|
Core service example:
|
|
|
|
```toml
|
|
name = "mcias"
|
|
tier = "core" # always on master node
|
|
active = true
|
|
|
|
[[components]]
|
|
name = "mcias"
|
|
image = "mcr.svc.mcp.metacircular.net:8443/mcias:v1.10.5"
|
|
volumes = ["/srv/mcias:/srv/mcias"]
|
|
cmd = ["mciassrv", "-config", "/srv/mcias/mcias.toml"]
|
|
```
|
|
|
|
### v1 Compatibility
|
|
|
|
Existing v1 service definitions with `node = "rift"` continue to work
|
|
(explicit pinning). New v2 fields (`tier`, `public`) default to their
|
|
zero values (`"worker"`, `false`) when absent. The validation rule
|
|
changes from "node required" to "either node or tier must be set;
|
|
tier defaults to worker if both are empty."
|
|
|
|
---
|
|
|
|
## Proto Definitions
|
|
|
|
### ServiceSpec and RouteSpec Updates
|
|
|
|
```protobuf
|
|
message ServiceSpec {
|
|
string name = 1;
|
|
bool active = 2;
|
|
repeated ComponentSpec components = 3; // unchanged from v1
|
|
string tier = 4; // "core" or "worker" (default: "worker")
|
|
string node = 5; // explicit node pin (overrides tier)
|
|
SnapshotConfig snapshot = 6; // snapshot method and excludes
|
|
}
|
|
|
|
message SnapshotConfig {
|
|
string method = 1; // "grpc", "cli", "exec: <cmd>", "full", or "" (default)
|
|
repeated string excludes = 2; // paths relative to /srv/<service>/ to skip
|
|
}
|
|
|
|
message RouteSpec {
|
|
string name = 1;
|
|
int32 port = 2;
|
|
string mode = 3; // "l4" or "l7"
|
|
string hostname = 4;
|
|
bool public = 5; // triggers edge routing
|
|
}
|
|
```
|
|
|
|
### McpMasterService
|
|
|
|
```protobuf
|
|
service McpMasterService {
|
|
// CLI operations.
|
|
rpc Deploy(MasterDeployRequest) returns (MasterDeployResponse);
|
|
rpc Undeploy(MasterUndeployRequest) returns (MasterUndeployResponse);
|
|
rpc Status(MasterStatusRequest) returns (MasterStatusResponse);
|
|
rpc Sync(MasterSyncRequest) returns (MasterSyncResponse);
|
|
rpc Migrate(MigrateRequest) returns (MigrateResponse);
|
|
|
|
// Fleet management.
|
|
rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
|
|
|
|
// Snapshots (CLI-triggered).
|
|
rpc CreateSnapshot(CreateSnapshotRequest) returns (CreateSnapshotResponse);
|
|
rpc ListSnapshots(ListSnapshotsRequest) returns (ListSnapshotsResponse);
|
|
|
|
// Agent registration and health (called by agents).
|
|
rpc Register(RegisterRequest) returns (RegisterResponse);
|
|
rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
|
|
}
|
|
|
|
message MasterDeployRequest {
|
|
ServiceSpec service = 1;
|
|
}
|
|
|
|
message MasterDeployResponse {
|
|
string node = 1; // node the service was placed on
|
|
bool success = 2; // true only if ALL steps succeeded
|
|
string error = 3;
|
|
// Per-step results for operator visibility. Partial failure is
|
|
// possible: deploy succeeds but edge routing fails. The CLI shows
|
|
// exactly what worked and what didn't.
|
|
StepResult deploy_result = 4;
|
|
StepResult edge_route_result = 5;
|
|
StepResult dns_result = 6;
|
|
}
|
|
|
|
message StepResult {
|
|
string step = 1;
|
|
bool success = 2;
|
|
string error = 3;
|
|
}
|
|
|
|
message MasterUndeployRequest {
|
|
string service_name = 1;
|
|
}
|
|
|
|
message MasterUndeployResponse {
|
|
bool success = 1;
|
|
string error = 2;
|
|
}
|
|
|
|
message MasterStatusRequest {
|
|
string service_name = 1; // empty = all services
|
|
}
|
|
|
|
message MasterStatusResponse {
|
|
repeated ServiceStatus services = 1;
|
|
}
|
|
|
|
message ServiceStatus {
|
|
string name = 1;
|
|
string node = 2;
|
|
string tier = 3;
|
|
string status = 4; // "running", "stopped", "unhealthy", "unknown"
|
|
repeated EdgeRouteStatus edge_routes = 5;
|
|
}
|
|
|
|
message EdgeRouteStatus {
|
|
string hostname = 1;
|
|
string edge_node = 2;
|
|
string cert_expires = 3;
|
|
}
|
|
|
|
message MasterSyncRequest {
|
|
repeated ServiceSpec services = 1;
|
|
}
|
|
|
|
message MasterSyncResponse {
|
|
repeated StepResult results = 1;
|
|
}
|
|
|
|
message ListNodesRequest {}
|
|
|
|
message ListNodesResponse {
|
|
repeated NodeInfo nodes = 1;
|
|
}
|
|
|
|
message NodeInfo {
|
|
string name = 1;
|
|
string role = 2;
|
|
string address = 3;
|
|
string arch = 4;
|
|
string status = 5; // "healthy", "unhealthy", "unknown"
|
|
int32 containers = 6;
|
|
string last_heartbeat = 7; // RFC3339
|
|
}
|
|
|
|
message RegisterRequest {
|
|
string name = 1;
|
|
string role = 2;
|
|
string address = 3;
|
|
string arch = 4;
|
|
}
|
|
|
|
message RegisterResponse {
|
|
bool accepted = 1;
|
|
}
|
|
|
|
message HeartbeatRequest {
|
|
string name = 1;
|
|
int64 cpu_millicores = 2;
|
|
int64 memory_bytes = 3;
|
|
int64 disk_bytes = 4;
|
|
int32 containers = 5;
|
|
}
|
|
|
|
message HeartbeatResponse {
|
|
bool acknowledged = 1;
|
|
}
|
|
```
|
|
|
|
### Agent RPC Additions
|
|
|
|
```protobuf
|
|
// Health probe -- called by master on missed heartbeats.
|
|
rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse);
|
|
|
|
// Edge routing -- called by master on edge nodes.
|
|
rpc SetupEdgeRoute(SetupEdgeRouteRequest) returns (SetupEdgeRouteResponse);
|
|
rpc RemoveEdgeRoute(RemoveEdgeRouteRequest) returns (RemoveEdgeRouteResponse);
|
|
rpc ListEdgeRoutes(ListEdgeRoutesRequest) returns (ListEdgeRoutesResponse);
|
|
|
|
message HealthCheckRequest {}
|
|
|
|
message HealthCheckResponse {
|
|
string status = 1; // "healthy" or "degraded"
|
|
int32 containers = 2;
|
|
}
|
|
|
|
message SetupEdgeRouteRequest {
|
|
string hostname = 1; // public hostname
|
|
string backend_hostname = 2; // internal .svc.mcp hostname
|
|
int32 backend_port = 3; // port on worker's mc-proxy
|
|
bool backend_tls = 4; // MUST be true; agent rejects false
|
|
}
|
|
|
|
message SetupEdgeRouteResponse {}
|
|
|
|
message RemoveEdgeRouteRequest {
|
|
string hostname = 1;
|
|
}
|
|
|
|
message RemoveEdgeRouteResponse {}
|
|
|
|
message ListEdgeRoutesRequest {}
|
|
|
|
message ListEdgeRoutesResponse {
|
|
repeated EdgeRoute routes = 1;
|
|
}
|
|
|
|
message EdgeRoute {
|
|
string hostname = 1;
|
|
string backend_hostname = 2;
|
|
int32 backend_port = 3;
|
|
string cert_serial = 4;
|
|
string cert_expires = 5;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Agent Registration and Health
|
|
|
|
### Registration
|
|
|
|
Agents self-register with the master on startup by calling
|
|
`McpMasterService.Register`. The master validates the caller's MCIAS
|
|
identity (see Security Model) and adds the node to its registry (SQLite).
|
|
|
|
If the master is unreachable at startup, the agent retries with
|
|
exponential backoff (1s, 2s, 4s, ... capped at 60s). Running containers
|
|
are unaffected — registration is a management concern, not a runtime one.
|
|
|
|
### Heartbeats
|
|
|
|
Agents send heartbeats every 30 seconds via `McpMasterService.Heartbeat`.
|
|
Each heartbeat includes resource data (CPU, memory, disk, container count).
|
|
The master derives the agent's node name from the authenticated MCIAS
|
|
identity (same as registration) — the `name` field in the heartbeat is
|
|
verified against the token, not trusted blindly.
|
|
|
|
If the master has not received a heartbeat from an agent in 90 seconds
|
|
(3 missed intervals), it probes the agent with `HealthCheck`. If the
|
|
probe fails (5-second timeout), the agent is marked unhealthy. Unhealthy
|
|
nodes are excluded from placement but their services continue running.
|
|
|
|
When a previously unhealthy agent sends a heartbeat, the master marks it
|
|
healthy again.
|
|
|
|
### Node Identity
|
|
|
|
Each agent authenticates to MCIAS as a distinct service user:
|
|
`agent-rift`, `agent-svc`, `agent-orion`. Benefits:
|
|
|
|
- **Audit**: logs show which node performed an action.
|
|
- **Least privilege**: edge agents don't need image pull access.
|
|
- **Revocation**: a compromised node's credentials can be revoked
|
|
without affecting the fleet.
|
|
|
|
---
|
|
|
|
## mcp-master
|
|
|
|
### Responsibilities
|
|
|
|
1. **Accept CLI commands** via gRPC (deploy, undeploy, status, sync).
|
|
2. **Maintain node registry** from agent self-registration (SQLite).
|
|
3. **Place services** on nodes based on tier, explicit node, and
|
|
container count.
|
|
4. **Detect public routes** (`public = true`) and coordinate edge routing.
|
|
5. **Validate public hostnames** against allowed domain list.
|
|
6. **Assign edge nodes** for public routes (currently always svc).
|
|
7. **Coordinate undeploy** across nodes.
|
|
8. **Aggregate status** from all agents for fleet-wide views.
|
|
|
|
### What the Master Does NOT Do
|
|
|
|
- Store container state (agents own their registries).
|
|
- Manage container lifecycle directly (agents do this).
|
|
- Run containers (the co-located agent does).
|
|
- Replace the agent on any node.
|
|
- Talk to agents on behalf of other agents.
|
|
|
|
### Master State (SQLite)
|
|
|
|
The master maintains a SQLite database at `/srv/mcp-master/master.db`
|
|
with three tables:
|
|
|
|
```sql
|
|
-- Registered nodes. Populated by agent Register RPCs.
|
|
-- Rebuilt from agent re-registration on master restart.
|
|
CREATE TABLE nodes (
|
|
name TEXT PRIMARY KEY,
|
|
role TEXT NOT NULL,
|
|
address TEXT NOT NULL,
|
|
arch TEXT NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'unknown',
|
|
containers INTEGER NOT NULL DEFAULT 0,
|
|
last_heartbeat TEXT
|
|
);
|
|
|
|
-- Service placements. Records which node hosts which service.
|
|
-- Populated on deploy, removed on undeploy.
|
|
CREATE TABLE placements (
|
|
service_name TEXT PRIMARY KEY,
|
|
node TEXT NOT NULL REFERENCES nodes(name),
|
|
tier TEXT NOT NULL,
|
|
deployed_at TEXT NOT NULL
|
|
);
|
|
|
|
-- Edge routes. Records public routes for undeploy cleanup.
|
|
CREATE TABLE edge_routes (
|
|
hostname TEXT PRIMARY KEY,
|
|
service_name TEXT NOT NULL REFERENCES placements(service_name),
|
|
edge_node TEXT NOT NULL REFERENCES nodes(name),
|
|
backend_hostname TEXT NOT NULL,
|
|
backend_port INTEGER NOT NULL,
|
|
created_at TEXT NOT NULL
|
|
);
|
|
|
|
-- Snapshot metadata. The archive files live on disk at
|
|
-- /srv/mcp-master/snapshots/<service>/<timestamp>.tar.zst.
|
|
CREATE TABLE snapshots (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
service_name TEXT NOT NULL,
|
|
node TEXT NOT NULL,
|
|
filename TEXT NOT NULL,
|
|
size_bytes INTEGER NOT NULL,
|
|
created_at TEXT NOT NULL
|
|
);
|
|
CREATE INDEX idx_snapshots_service ON snapshots(service_name, created_at DESC);
|
|
```
|
|
|
|
On master restart, the node registry is rebuilt as agents re-register
|
|
(within 30s via heartbeat). Placements and edge routes persist across
|
|
restarts. The master reconciles placements against actual agent state
|
|
on startup (see Reconciliation).
|
|
|
|
### Master Configuration
|
|
|
|
```toml
|
|
[server]
|
|
grpc_addr = "100.x.x.x:9555" # master listens on Tailnet
|
|
tls_cert = "/srv/mcp-master/certs/cert.pem"
|
|
tls_key = "/srv/mcp-master/certs/key.pem"
|
|
|
|
[database]
|
|
path = "/srv/mcp-master/master.db"
|
|
|
|
[mcias]
|
|
server_url = "https://mcias.metacircular.net:8443"
|
|
service_name = "mcp-master"
|
|
|
|
[edge]
|
|
allowed_domains = ["metacircular.net", "wntrmute.net"]
|
|
|
|
[registration]
|
|
allowed_agents = ["agent-rift", "agent-svc", "agent-orion"]
|
|
max_nodes = 16
|
|
|
|
[timeouts]
|
|
deploy = "5m"
|
|
edge_route = "30s"
|
|
health_check = "5s"
|
|
undeploy = "2m"
|
|
snapshot = "10m"
|
|
|
|
# Bootstrap: master's own agent (can't self-register before master starts).
|
|
[[nodes]]
|
|
name = "rift"
|
|
address = "100.95.252.120:9444"
|
|
role = "master"
|
|
```
|
|
|
|
### Boot Sequencing
|
|
|
|
The master node runs core infrastructure that other services depend on.
|
|
On boot, these services must start in dependency order. Only the master
|
|
needs sequencing -- worker and edge nodes start their agent and wait
|
|
for registration with the master.
|
|
|
|
The master's agent config declares boot stages:
|
|
|
|
```toml
|
|
[[boot.sequence]]
|
|
name = "foundation"
|
|
services = ["mcias", "mcns"]
|
|
timeout = "120s"
|
|
health = "tcp"
|
|
|
|
[[boot.sequence]]
|
|
name = "core"
|
|
services = ["metacrypt", "mcr"]
|
|
timeout = "60s"
|
|
health = "tcp"
|
|
|
|
[[boot.sequence]]
|
|
name = "management"
|
|
services = ["mcp-master"]
|
|
timeout = "30s"
|
|
health = "grpc"
|
|
```
|
|
|
|
**Stage 1 -- Foundation**: MCIAS and MCNS start first. Every other
|
|
service needs authentication (MCIAS) and DNS resolution (MCNS).
|
|
|
|
**Stage 2 -- Core**: Metacrypt and MCR start once auth and DNS are
|
|
available. Agents need Metacrypt for cert provisioning and MCR for
|
|
image pulls.
|
|
|
|
**Stage 3 -- Management**: MCP-Master starts last. It requires all
|
|
infrastructure services to be running before it can coordinate the fleet.
|
|
|
|
mcp-master runs as a container managed by the agent, just like any
|
|
other service. This means updates are a normal `mcp deploy` (or image
|
|
bump in the bootstrap config), and the agent handles restarts via
|
|
podman's `--restart unless-stopped` policy.
|
|
|
|
**Bootstrap (first boot):** On initial cluster setup, no images exist
|
|
in MCR yet. The boot sequence config references container images, but
|
|
MCR doesn't start until stage 2. Resolution:
|
|
|
|
- Stage 1 and 2 images (MCIAS, MCNS, Metacrypt, MCR) must be
|
|
**pre-staged** into the local podman image store before first boot
|
|
(`podman load` or `podman pull` from an external source).
|
|
- Once MCR is running (stage 2), stage 3 (mcp-master) can pull its
|
|
image from MCR normally.
|
|
- Subsequent boots use cached images. Image updates go through the
|
|
normal `mcp deploy` flow (which pulls from MCR).
|
|
|
|
The boot sequence config contains full service definitions (image,
|
|
volumes, cmd, routes) — not just service names. This is the only
|
|
place where service definitions live on the agent rather than being
|
|
pushed from the CLI via the master.
|
|
|
|
**Health check types:**
|
|
- `tcp` — connect to the container's mapped port. Success = connection
|
|
accepted. Used for most services.
|
|
- `grpc` — call the gRPC health endpoint. Used for services with gRPC.
|
|
- `http` — GET a health endpoint. Future option.
|
|
|
|
**Timeout behavior:** Depends on the stage:
|
|
- **Foundation** (MCIAS, MCNS): failure **blocks** boot. The agent
|
|
retries indefinitely with backoff and alerts the operator. All
|
|
downstream services depend on auth and DNS — proceeding is futile.
|
|
- **Core and management**: failure logs an error and proceeds. The
|
|
operator can fix the failed service manually. Partial boot is
|
|
better than no boot for non-foundation services.
|
|
|
|
The agent treats boot sequencing as a startup concern only. Once all
|
|
stages complete, normal operations proceed. If a foundation service
|
|
crashes at runtime, the agent restarts it independently via the
|
|
`--restart unless-stopped` podman policy.
|
|
|
|
**Boot config drift:** The boot sequence config contains pinned image
|
|
versions. When the operator updates a service via `mcp deploy`, the
|
|
boot config is NOT automatically updated. On reboot, the agent starts
|
|
the old version; the master then deploys the current version. This is
|
|
self-correcting for core and management services, but foundation
|
|
services (MCIAS, MCNS) run before the master exists. **When updating
|
|
foundation service images, also update the boot sequence config.**
|
|
|
|
### Reconciliation
|
|
|
|
On startup, the master actively probes all nodes it knows about from
|
|
its persisted `nodes` table — it does not wait for agents to
|
|
re-register. This means the master has a fleet-wide view within seconds
|
|
of starting, rather than waiting up to 30s per agent heartbeat cycle.
|
|
|
|
The initial probe cycle is a **warm-up** phase: the master builds its
|
|
fleet view but does not emit health alerts. Once all known nodes have
|
|
been probed (or the probe timeout expires), the master transitions to
|
|
**ready** and begins normal health alerting. This avoids noisy
|
|
"unhealthy" warnings for agents that simply haven't started yet.
|
|
|
|
1. **Probe known nodes**: For each node in the `nodes` table, the
|
|
master calls `HealthCheck` (5s timeout). Nodes that respond are
|
|
marked healthy; nodes that don't respond are marked unhealthy.
|
|
Agent self-registration still runs in the background and updates
|
|
addresses or adds new nodes, but reconciliation does not depend
|
|
on it.
|
|
2. **Check placements**: For each placement in the database, query
|
|
the hosting agent's `Status` RPC (bulk — one call per agent, not
|
|
per service). If the agent reports a service is not running, mark
|
|
the placement as stale (log warning, do not auto-redeploy).
|
|
3. **Detect orphans**: For each service running on an agent that has
|
|
no matching placement record, log it as an orphan. Orphans may
|
|
result from failed deploys, manual `podman run`, or v1 leftovers.
|
|
4. **Check edge routes**: For each edge route in the database, query
|
|
the edge agent for route status.
|
|
5. **Check snapshot freshness**: Flag any service whose latest
|
|
snapshot is older than 2x the snapshot interval (e.g., older than
|
|
48 hours with a 24-hour cycle). Stale snapshots are a disaster
|
|
recovery risk.
|
|
6. **Report**: All discrepancies (stale placements, orphans, missing
|
|
edge routes, unhealthy nodes, stale snapshots) are reported via
|
|
`mcp status` and structured logs.
|
|
|
|
Reconciliation is read-only — it detects drift but does not
|
|
auto-remediate. The operator reviews `mcp status` output and takes
|
|
action. Auto-reconciliation is future work.
|
|
|
|
---
|
|
|
|
## Edge Routing
|
|
|
|
The core v2 feature: when a service declares `public = true` on a route,
|
|
the master automatically provisions the edge route.
|
|
|
|
### Deploy Flow with Edge Routing
|
|
|
|
When the master receives `Deploy(mcq)`:
|
|
|
|
1. **Place service**: Master selects the target node based on tier/node/
|
|
container count. For mcq (tier=worker), master picks the least-loaded
|
|
healthy worker.
|
|
|
|
2. **Deploy to worker**: Master sends `Deploy` RPC to the worker's agent
|
|
(timeout: 5m). The agent deploys the container, provisions a TLS cert
|
|
for `mcq.svc.mcp.metacircular.net` from Metacrypt, and registers the
|
|
internal mc-proxy route.
|
|
|
|
3. **Register DNS**: Master registers an A record for the internal
|
|
hostname (`mcq.svc.mcp.metacircular.net`) pointing to the worker's
|
|
Tailnet IP via MCNS. This is the backend address that edge and
|
|
internal clients resolve.
|
|
|
|
4. **Detect public routes**: Master inspects the service spec for routes
|
|
with `public = true`.
|
|
|
|
5. **Validate hostname**: Master checks that `mcq.metacircular.net` falls
|
|
under an allowed domain using proper domain label matching.
|
|
|
|
6. **Check public DNS**: Master resolves `mcq.metacircular.net` to
|
|
verify it points to the edge node's public IP. Public DNS records
|
|
are pre-provisioned manually at Hurricane Electric. If the hostname
|
|
does not resolve, the master warns but continues — the operator
|
|
may be setting up DNS in parallel.
|
|
|
|
7. **Validate backend hostname**: Master verifies the internal hostname
|
|
(`mcq.svc.mcp.metacircular.net`) ends with `.svc.mcp.metacircular.net`.
|
|
The internal hostname is derived from the service and component name
|
|
using the convention `<component>.svc.mcp.metacircular.net`.
|
|
|
|
8. **Assign edge node**: Master selects an edge node (currently svc).
|
|
|
|
9. **Set up edge route**: Master sends `SetupEdgeRoute` RPC to svc's
|
|
agent (timeout: 30s):
|
|
```
|
|
SetupEdgeRoute(
|
|
hostname: "mcq.metacircular.net"
|
|
backend_hostname: "mcq.svc.mcp.metacircular.net"
|
|
backend_port: 8443
|
|
backend_tls: true
|
|
)
|
|
```
|
|
|
|
10. **Svc agent provisions**: On receiving `SetupEdgeRoute`, svc's agent:
|
|
a. Validates that `backend_hostname` ends with `.svc.mcp.metacircular.net`.
|
|
b. Resolves `backend_hostname` — verifies result is a Tailnet IP
|
|
(100.64.0.0/10).
|
|
c. Provisions a TLS certificate from Metacrypt for the **public**
|
|
hostname `mcq.metacircular.net` only. Internal names never appear
|
|
on edge certs.
|
|
d. Registers an L7 route in its local mc-proxy:
|
|
`mcq.metacircular.net:443 → <worker-tailnet-ip>:8443`
|
|
with `backend_tls = true`.
|
|
|
|
11. **Master records the edge route** in its SQLite database.
|
|
|
|
12. **Master returns structured result** to CLI with per-step status.
|
|
|
|
**Failure handling:** If any step fails, the master returns the error
|
|
to the CLI with the step that failed. If the deploy succeeded but
|
|
edge routing failed, the service is running internally but not publicly
|
|
reachable. The operator can retry with `mcp deploy` (idempotent) or
|
|
fix the issue and run `mcp sync`.
|
|
|
|
If cert provisioning fails during deploy (step 2 or 8), the deploy
|
|
**fails** — the agent does not register an mc-proxy route pointing to
|
|
a nonexistent cert. This prevents the silent TLS failure from v1.
|
|
|
|
### Undeploy Flow
|
|
|
|
1. **Undeploy on worker first**: Master sends `Undeploy` RPC to the
|
|
worker agent (timeout: 2m). The agent tears down the container,
|
|
routes, DNS, and certs. This stops the backend, ensuring no traffic
|
|
is served during edge cleanup.
|
|
2. **Remove edge route**: Master sends `RemoveEdgeRoute` to svc's agent.
|
|
Svc removes the mc-proxy route and cleans up the cert.
|
|
3. **Master removes records** from placements and edge_routes tables.
|
|
|
|
Ordering rationale: undeploy the backend first so that if edge cleanup
|
|
fails, the service is already stopped and the edge route returns a
|
|
502 rather than serving stale content.
|
|
|
|
### Certificate Model
|
|
|
|
Two separate certs per public service — internal names never appear on
|
|
edge certs:
|
|
|
|
| Cert | Provisioned by | SAN | Used on |
|
|
|------|---------------|-----|---------|
|
|
| Internal | Worker agent → Metacrypt | `mcq.svc.mcp.metacircular.net` | Worker's mc-proxy |
|
|
| Public | Edge agent → Metacrypt | `mcq.metacircular.net` | Edge's mc-proxy |
|
|
|
|
Edge cert renewal is the edge agent's responsibility. The agent runs
|
|
the same `renewWindow` check as worker agents, renewing certs before
|
|
they expire (90-day TTL, renew at 30 days remaining).
|
|
|
|
---
|
|
|
|
## Snapshots
|
|
|
|
The master maintains periodic snapshots of every service's data.
|
|
Snapshots are the foundation for both migration and disaster recovery —
|
|
if a node dies, the master can restore a service to a new node from its
|
|
latest snapshot without the source node being alive.
|
|
|
|
All nodes have LUKS-encrypted disks. Snapshots are stored on the
|
|
master's encrypted disk, so service data is encrypted at rest at both
|
|
source and destination. An existing backup service on rift replicates
|
|
to external storage, covering the case where rift itself is lost.
|
|
|
|
### Snapshot Mechanism
|
|
|
|
The service definition declares how the agent should trigger a
|
|
consistent snapshot via the `method` field:
|
|
|
|
```toml
|
|
[snapshot]
|
|
method = "grpc" # preferred for Metacircular services
|
|
exclude = ["layers/", "uploads/"] # paths to skip (optional)
|
|
```
|
|
|
|
**Methods:**
|
|
|
|
| Method | How it works | Best for |
|
|
|--------|-------------|----------|
|
|
| `grpc` | Agent calls the standard `Snapshot` gRPC RPC on the service's gRPC port. The service vacuums databases and confirms. Agent then tars. | Metacircular services with gRPC servers |
|
|
| `cli` | Agent runs `podman exec <container> <service> snapshot` (the engineering standard's snapshot CLI command). Agent then tars. | Metacircular services without gRPC |
|
|
| `exec: <cmd>` | Agent runs `podman exec <container> <cmd>`. Agent then tars. | Non-standard services with custom backup scripts |
|
|
| `full` | Agent tars the entire `/srv/<service>/` directory, auto-vacuuming any `.db` files found. | Services that need everything backed up |
|
|
| *(omitted)* | Agent collects only `*.toml`, `*.db`, and `*.pem` files from `/srv/<service>/` — config, database, and certs. `.db` files are auto-vacuumed. | Default — covers the essentials without configuration |
|
|
|
|
The **default** (no `[snapshot]` section) captures the minimum needed
|
|
to restore a service: config, database, and TLS certs. This keeps
|
|
snapshot sizes small and predictable. Services that need more data
|
|
(e.g., file uploads, state directories) opt into `full` or specify
|
|
paths explicitly.
|
|
|
|
**`exclude`** works with any method. MCR uses `exclude` to skip layer
|
|
blobs (which can be rebuilt from git) while still capturing its
|
|
database and config.
|
|
|
|
**Database consistency:** For `grpc` and `cli` methods, the service
|
|
owns its own vacuum logic. For `full` and the default, the agent
|
|
detects `.db` files and runs `VACUUM INTO` to a temp copy before
|
|
including them in the tar. WAL and SHM files are excluded (the
|
|
vacuumed copy is self-contained).
|
|
|
|
### Standard Snapshot gRPC Service (mcdsl)
|
|
|
|
The `grpc` snapshot method uses a standard RPC that Metacircular
|
|
services implement via the `mcdsl/snapshot` package — same pattern as
|
|
`mcdsl/health`:
|
|
|
|
```protobuf
|
|
service SnapshotService {
|
|
rpc Snapshot(SnapshotRequest) returns (SnapshotResponse);
|
|
}
|
|
|
|
message SnapshotRequest {}
|
|
|
|
message SnapshotResponse {
|
|
bool success = 1;
|
|
string error = 2;
|
|
string path = 3; // path to the vacuumed backup (e.g. /srv/mcq/backups/...)
|
|
}
|
|
```
|
|
|
|
Services register the `SnapshotService` on their gRPC server. The
|
|
`mcdsl/snapshot` package provides a default implementation that reads
|
|
the database path from the service's config, runs `VACUUM INTO`, and
|
|
returns the backup path. Services with custom snapshot needs can
|
|
override the handler.
|
|
|
|
### Service Definition Examples
|
|
|
|
Metacircular service with gRPC (preferred):
|
|
```toml
|
|
[snapshot]
|
|
method = "grpc"
|
|
```
|
|
|
|
MCR (skip layer blobs):
|
|
```toml
|
|
[snapshot]
|
|
method = "grpc"
|
|
exclude = ["layers/", "uploads/"]
|
|
```
|
|
|
|
Non-Metacircular service with custom backup:
|
|
```toml
|
|
[snapshot]
|
|
method = "exec: /usr/local/bin/backup.sh"
|
|
```
|
|
|
|
Service with no snapshot config (default — captures *.toml, *.db, *.pem):
|
|
```toml
|
|
# No [snapshot] section needed
|
|
```
|
|
|
|
### Snapshot Storage
|
|
|
|
Snapshots are stored as flat files on the master node:
|
|
|
|
```
|
|
/srv/mcp-master/snapshots/
|
|
mcq/2026-04-01T00:00:00Z.tar.zst # deployed service
|
|
mcias/2026-04-01T00:00:00Z.tar.zst
|
|
mc-proxy/rift/2026-04-01T00:00:00Z.tar.zst # node infra — per-node
|
|
mc-proxy/svc/2026-04-01T00:00:00Z.tar.zst
|
|
```
|
|
|
|
Deployed services use `<service>/<timestamp>.tar.zst`. Node
|
|
infrastructure uses `<service>/<node>/<timestamp>.tar.zst` to avoid
|
|
collisions between instances on different nodes (see "Node
|
|
Infrastructure vs Deployed Services").
|
|
|
|
Format: tar.zst (tar archive with zstandard compression). One file per
|
|
snapshot, named by UTC timestamp.
|
|
|
|
### Snapshot Scheduling
|
|
|
|
The master runs a scheduled job that snapshots all services every 24
|
|
hours. The master iterates over all placements and for each one:
|
|
|
|
1. Acquires a per-service lock (skips if deploy/migrate/undeploy is
|
|
in progress).
|
|
2. Sends `ExportServiceData(service_name)` to the hosting agent
|
|
(timeout: 10m).
|
|
3. The agent runs the snapshot command (if configured), creates a
|
|
tar.zst archive of `/srv/<service>/` (respecting excludes), and
|
|
streams it back.
|
|
4. The master writes the archive to the snapshots directory.
|
|
5. The master prunes old snapshots (keep last N, configurable).
|
|
|
|
Scheduled snapshots are **live** — the service keeps running. Database
|
|
consistency is ensured by the vacuum step, not by stopping the
|
|
container. Migration snapshots use a different flow (stop first, then
|
|
tar) for perfect consistency.
|
|
|
|
**Agent fallback rule:** If `ExportServiceData` is called and the
|
|
container is not running (migration case), the agent skips the
|
|
configured snapshot method (`grpc`/`cli`/`exec`) and falls back to a
|
|
direct tar with auto-vacuum of `.db` files. This is correct because
|
|
the container already vacuumed on shutdown (SIGTERM handler).
|
|
|
|
For v2, the master always requests a full snapshot — no change
|
|
detection. Intelligence about dirty vs. clean services is future
|
|
optimization.
|
|
|
|
### Concurrency
|
|
|
|
The master holds a per-service lock for all operations that touch a
|
|
service (deploy, undeploy, migrate, snapshot). If a scheduled snapshot
|
|
overlaps with a deploy or migration, the snapshot waits. This prevents
|
|
capturing partial state during multi-step operations.
|
|
|
|
### Snapshot RPCs
|
|
|
|
```protobuf
|
|
// Service data export -- called by master on any agent.
|
|
// Authorization: mcp-master only.
|
|
rpc ExportServiceData(ExportServiceDataRequest)
|
|
returns (stream DataChunk);
|
|
|
|
// Service data import -- called by master on any agent.
|
|
// Authorization: mcp-master only.
|
|
rpc ImportServiceData(stream ImportServiceDataChunk)
|
|
returns (ImportServiceDataResponse);
|
|
|
|
message ExportServiceDataRequest {
|
|
string service_name = 1;
|
|
// Snapshot config is stored in the agent's registry at deploy time.
|
|
// The agent uses its persisted config to determine the snapshot method
|
|
// (grpc, cli, exec, full, default) and exclude patterns.
|
|
}
|
|
|
|
message DataChunk {
|
|
bytes data = 1;
|
|
}
|
|
|
|
message ImportServiceDataChunk {
|
|
// First message sets the service name; subsequent messages carry data.
|
|
string service_name = 1;
|
|
bytes data = 2;
|
|
bool force = 3; // overwrite existing /srv/<service>/ (first message only)
|
|
}
|
|
|
|
message ImportServiceDataResponse {
|
|
int64 bytes_written = 1;
|
|
}
|
|
```
|
|
|
|
Note: `ExportServiceData`/`ImportServiceData` transfer full directory
|
|
archives. The existing `PushFile`/`PullFile` RPCs transfer individual
|
|
files and serve a different purpose (config distribution, cert
|
|
provisioning).
|
|
|
|
### Master Snapshot Config
|
|
|
|
```toml
|
|
[snapshots]
|
|
dir = "/srv/mcp-master/snapshots"
|
|
interval = "24h"
|
|
retain = 7 # keep last 7 snapshots per service
|
|
```
|
|
|
|
---
|
|
|
|
## Service Migration
|
|
|
|
Services can be migrated between nodes with `mcp migrate`. This is
|
|
essential for moving workloads off rift (which starts as both master
|
|
and worker) onto dedicated workers like orion as they come online.
|
|
|
|
Migration uses snapshots for data transfer. This means migration works
|
|
even if the source node is down (disaster recovery).
|
|
|
|
### Constraints
|
|
|
|
- **Core services cannot be migrated.** `tier = "core"` services are
|
|
bound to the master node. Moving core services means designating a
|
|
new master — a manual, deliberate operation outside the scope of
|
|
`mcp migrate`.
|
|
- **Edge nodes are not migration targets.** Edge nodes run mc-proxy
|
|
only, not application containers.
|
|
|
|
### Migration Flow
|
|
|
|
```
|
|
mcp migrate mcq --to orion
|
|
```
|
|
|
|
When the master receives `Migrate(mcq, orion)`:
|
|
|
|
1. **Validate**: Master verifies `orion` is a healthy worker. Rejects
|
|
migration of `tier = "core"` services and migration to edge nodes.
|
|
|
|
2. **Stop on source** (if source is alive): Master sends `Stop` RPC
|
|
to the source agent. The agent gracefully stops the container
|
|
(SIGTERM). The service runs its shutdown handler, which vacuums
|
|
databases per the engineering standard. If the source is down,
|
|
skip this step.
|
|
|
|
3. **Snapshot** (if source is alive): Agent tars `/srv/<service>/`
|
|
(now consistent — the service vacuumed on shutdown) and streams
|
|
it to the master. If the source is down, the master uses the most
|
|
recent stored snapshot.
|
|
|
|
4. **Push snapshot to destination**: Master streams the snapshot to
|
|
the destination agent via `ImportServiceData`. The agent creates
|
|
`/srv/<service>/` (with correct permissions) and extracts the
|
|
archive.
|
|
|
|
5. **Deploy on destination**: Master sends `Deploy` RPC to the
|
|
destination agent (orion). The agent deploys the container using
|
|
the restored data. Provisions internal TLS cert and registers
|
|
mc-proxy route on the new node.
|
|
|
|
6. **Update DNS**: Master updates the internal A record
|
|
(`mcq.svc.mcp.metacircular.net`) to point to orion's Tailnet IP.
|
|
|
|
7. **Update edge route** (if public): Master sends `SetupEdgeRoute`
|
|
to svc's agent with the updated backend. The edge agent updates
|
|
the mc-proxy route. No new cert needed — the public hostname
|
|
hasn't changed.
|
|
|
|
8. **Clean up source** (if source is alive): Master sends `Undeploy`
|
|
to the source agent to remove the stopped container, old routes,
|
|
old certs, and old DNS records.
|
|
|
|
9. **Update placement**: Master updates the `placements` table to
|
|
reflect the new node. This step runs regardless of whether source
|
|
cleanup succeeded.
|
|
|
|
### Disaster Recovery
|
|
|
|
If a node dies, the operator migrates its services to another node:
|
|
|
|
```
|
|
mcp migrate mcq --to orion # source is down, uses latest snapshot
|
|
mcp migrate --all --from rift --to orion # evacuate all services
|
|
```
|
|
|
|
The master detects the source is unreachable (unhealthy in node
|
|
registry), skips the stop and cleanup steps, and restores from
|
|
the stored snapshot. Data loss is bounded by the snapshot interval
|
|
(24 hours).
|
|
|
|
### Batch Migration
|
|
|
|
Full node evacuation for decommissioning or disaster recovery:
|
|
|
|
```
|
|
mcp migrate --all --from rift --to orion
|
|
```
|
|
|
|
The master migrates each service sequentially. Core services are
|
|
skipped (they cannot be migrated). The operator sees per-service
|
|
progress. If any migration fails, the master stops and reports which
|
|
service failed — the operator can fix the issue and resume with
|
|
`--all` (already-migrated services are skipped since they no longer
|
|
have placements on the source node).
|
|
|
|
### Migration Safety
|
|
|
|
- The source data is not deleted until step 8 (cleanup). If migration
|
|
fails mid-transfer, the source still has the complete data and the
|
|
operator can retry or roll back.
|
|
- The master rejects migration if the destination already has a
|
|
`/srv/<service>/` directory (prevents accidental overwrite).
|
|
Use `--force` to override.
|
|
- Downtime window: from stop (step 2) to the new container starting
|
|
(step 5). For a personal platform this is acceptable.
|
|
- Migration snapshots use stop-then-tar for perfect consistency.
|
|
Scheduled daily snapshots use live vacuum (no downtime).
|
|
|
|
### Migration Proto
|
|
|
|
```protobuf
|
|
rpc Migrate(MigrateRequest) returns (MigrateResponse);
|
|
|
|
message MigrateRequest {
|
|
string service_name = 1;
|
|
string target_node = 2;
|
|
bool force = 3; // overwrite existing /srv/<service>/ on target
|
|
bool all = 4; // migrate all services from source
|
|
string source_node = 5; // required when all=true
|
|
// Validation: reject if all=true AND service_name is set (ambiguous).
|
|
}
|
|
|
|
message MigrateResponse {
|
|
repeated StepResult results = 1;
|
|
}
|
|
|
|
// Note: CreateSnapshot/ListSnapshots are master CLI commands.
|
|
// The mcdsl SnapshotService.Snapshot RPC is a separate, service-level
|
|
// RPC called by agents on individual services.
|
|
rpc CreateSnapshot(CreateSnapshotRequest) returns (CreateSnapshotResponse);
|
|
rpc ListSnapshots(ListSnapshotsRequest) returns (ListSnapshotsResponse);
|
|
|
|
message CreateSnapshotRequest {
|
|
string service_name = 1;
|
|
}
|
|
|
|
message CreateSnapshotResponse {
|
|
string filename = 1;
|
|
int64 size_bytes = 2;
|
|
}
|
|
|
|
message ListSnapshotsRequest {
|
|
string service_name = 1;
|
|
}
|
|
|
|
message ListSnapshotsResponse {
|
|
repeated SnapshotInfo snapshots = 1;
|
|
}
|
|
|
|
message SnapshotInfo {
|
|
string service_name = 1;
|
|
string node = 2; // node the snapshot was taken from
|
|
string filename = 3;
|
|
int64 size_bytes = 4;
|
|
string created_at = 5; // RFC3339
|
|
}
|
|
```
|
|
|
|
### CLI
|
|
|
|
```
|
|
mcp migrate <service> --to <node> # migrate single service
|
|
mcp migrate <service> --to <node> --force # overwrite existing data
|
|
mcp migrate --all --from <node> --to <node> # evacuate all services
|
|
mcp snapshot <service> # take an on-demand snapshot
|
|
mcp snapshot list <service> # list available snapshots
|
|
```
|
|
|
|
---
|
|
|
|
## Agent Changes for v2
|
|
|
|
### New RPCs
|
|
|
|
See Proto Definitions section above for full message definitions.
|
|
|
|
- `HealthCheck` — called by master on missed heartbeats.
|
|
- `SetupEdgeRoute` — called by master on edge nodes.
|
|
- `RemoveEdgeRoute` — called by master on edge nodes.
|
|
- `ListEdgeRoutes` — called by master on edge nodes.
|
|
|
|
All new RPCs require the caller to be `mcp-master` (authorization check).
|
|
|
|
### Cert Provisioning on All Agents
|
|
|
|
All agents need Metacrypt configuration:
|
|
|
|
```toml
|
|
[metacrypt]
|
|
server_url = "https://metacrypt.svc.mcp.metacircular.net:8443"
|
|
ca_cert = "/srv/mcp/certs/metacircular-ca.pem"
|
|
mount = "pki"
|
|
issuer = "infra"
|
|
token_path = "/srv/mcp/metacrypt-token"
|
|
```
|
|
|
|
Worker agents provision certs for internal hostnames. Edge agents
|
|
provision certs for public hostnames. Both use the same Metacrypt API
|
|
but with different identity-scoped policies.
|
|
|
|
### mc-proxy Management
|
|
|
|
The agent is the sole manager of mc-proxy routes via the gRPC admin API.
|
|
TOML config is not used for route management — this avoids the
|
|
database/config divergence problem from v1. mc-proxy's TOML config
|
|
only sets listener addresses and TLS defaults.
|
|
|
|
On mc-proxy restart, routes survive in mc-proxy's own SQLite database.
|
|
If mc-proxy's database is lost, the agent detects missing routes during
|
|
its monitoring cycle and re-registers them.
|
|
|
|
### Deploy Failure on Cert Error
|
|
|
|
If cert provisioning fails during deploy, the agent **must** fail the
|
|
deploy — do not register an mc-proxy route pointing to a nonexistent
|
|
cert. Return an error to the master, which reports it to the CLI. The
|
|
current v1 behavior (log warning, continue) is a bug.
|
|
|
|
---
|
|
|
|
## CLI Changes for v2
|
|
|
|
The CLI gains a `[master]` section and retains `[[nodes]]` for direct
|
|
access:
|
|
|
|
```toml
|
|
[master]
|
|
address = "100.x.x.x:9555"
|
|
|
|
# Retained for --direct mode (bypass master when it's down).
|
|
[[nodes]]
|
|
name = "rift"
|
|
address = "100.95.252.120:9444"
|
|
|
|
[[nodes]]
|
|
name = "svc"
|
|
address = "100.x.x.x:9444"
|
|
|
|
[mcias]
|
|
server_url = "https://mcias.metacircular.net:8443"
|
|
service_name = "mcp"
|
|
|
|
[auth]
|
|
token_path = "/home/kyle/.config/mcp/token"
|
|
|
|
[services]
|
|
dir = "/home/kyle/.config/mcp/services"
|
|
```
|
|
|
|
By default, all commands go through the master. The `--direct` flag
|
|
bypasses the master and dials agents directly (v1 behavior):
|
|
|
|
```
|
|
mcp deploy mcq # → master
|
|
mcp deploy mcq --direct -n rift # → agent on rift (v1 mode)
|
|
mcp ps # → master aggregates all agents
|
|
mcp ps --direct # → each agent individually (v1 mode)
|
|
```
|
|
|
|
`--direct` is the escape hatch when the master is down. In direct mode,
|
|
deploy requires an explicit `--node` flag (the CLI cannot auto-place
|
|
without the master).
|
|
|
|
### Sync Semantics
|
|
|
|
`mcp sync` is **declarative**: the service definitions on the operator's
|
|
workstation are the source of truth. The master converges the fleet:
|
|
|
|
- New definitions → deploy.
|
|
- Changed definitions → redeploy.
|
|
- Definitions present in the master's placement table but absent from
|
|
the sync request → undeploy.
|
|
|
|
This makes the services directory a complete, auditable declaration of
|
|
what should be running. Use `mcp sync --dry-run` to preview what sync
|
|
would do without executing.
|
|
|
|
### Direct Mode Caveat
|
|
|
|
Services deployed via `--direct` (bypassing the master) are invisible
|
|
to the master — no placement record exists. Reconciliation detects
|
|
them as orphans. To bring a directly-deployed service under master
|
|
management, redeploy it through the master.
|
|
|
|
### New Commands
|
|
|
|
```
|
|
mcp edge list # list all public edge routes
|
|
mcp edge status # health of edge routes (cert expiry, backend reachable)
|
|
mcp node list # fleet status from master
|
|
```
|
|
|
|
Service definition files remain on the operator's workstation. The CLI
|
|
pushes them to the master on `mcp deploy` and `mcp sync`.
|
|
|
|
---
|
|
|
|
## Agent Upgrades
|
|
|
|
The fleet is heterogeneous (NixOS + Debian, amd64 + arm64). NixOS flake
|
|
inputs don't work as a universal update mechanism.
|
|
|
|
MCP owns the binary at `/srv/mcp/mcp-agent` on all nodes.
|
|
|
|
```
|
|
mcp agent upgrade [node] # cross-compile, SCP, restart via SSH
|
|
```
|
|
|
|
- CLI cross-compiles for the target's GOARCH.
|
|
- Copies via SCP to `/srv/mcp/mcp-agent.new`.
|
|
- Restarts via SSH. The restart command is OS-aware: `doas` on NixOS
|
|
(rift, orion), `sudo` on Debian (svc). Configurable per node.
|
|
- Running containers survive the restart — rootless podman containers
|
|
are independent of the agent process. `--restart unless-stopped` means
|
|
podman handles liveness.
|
|
- The upgrade window (agent down for ~2s) only affects management
|
|
operations. The master marks the agent as temporarily unhealthy until
|
|
the next heartbeat.
|
|
|
|
All nodes: binary at `/srv/mcp/mcp-agent`, systemd unit
|
|
`mcp-agent.service`.
|
|
|
|
---
|
|
|
|
## Migration Plan
|
|
|
|
### Phase 1: Agent on svc
|
|
|
|
Deploy mcp-agent to svc (Debian):
|
|
|
|
- Create `mcp` user, install binary via SCP, configure systemd.
|
|
- Configure with Metacrypt access and mc-proxy gRPC socket access.
|
|
- Migrate existing mc-proxy TOML routes to agent-managed routes:
|
|
export current routes from mc-proxy SQLite, import via agent
|
|
`AddProxyRoute` RPCs.
|
|
- Verify with `mcp node list` (svc shows up).
|
|
|
|
### Phase 2: Edge routing RPCs
|
|
|
|
Implement `SetupEdgeRoute`, `RemoveEdgeRoute`, `ListEdgeRoutes` on the
|
|
agent. Test by calling directly from the CLI (temporary `mcp edge setup`
|
|
scaffolding command, removed after phase 3).
|
|
|
|
### Phase 3: Build mcp-master
|
|
|
|
Core coordination loop. Uses bootstrap `[[nodes]]` config for agent
|
|
addresses (dynamic registration comes in phase 4):
|
|
|
|
1. gRPC server with `McpMasterService`.
|
|
2. SQLite database for placements and edge routes.
|
|
3. Accept `Deploy` / `Undeploy` from CLI.
|
|
4. Place service on a node (tier / container-count).
|
|
5. Forward deploy to the correct agent.
|
|
6. Register DNS via MCNS.
|
|
7. Detect `public = true` routes, validate, call `SetupEdgeRoute`.
|
|
8. Return structured per-step results to CLI.
|
|
|
|
### Phase 4: Agent registration and health
|
|
|
|
- Agents self-register on startup (identity-bound).
|
|
- Heartbeat loop (30s interval, resource data).
|
|
- Master probe on missed heartbeats (90s threshold, 5s timeout).
|
|
- Fleet status aggregation for `mcp ps` and `mcp node list`.
|
|
- Reconciliation on master startup.
|
|
- Master transitions from bootstrap `[[nodes]]` to dynamic registry.
|
|
|
|
### Phase 5: Snapshots and migration
|
|
|
|
- Implement `ExportServiceData` / `ImportServiceData` on agents.
|
|
- Implement `mcdsl/snapshot` standard gRPC service.
|
|
- Add snapshot scheduling to master (24h cycle, retention pruning).
|
|
- Implement `CreateSnapshot`, `ListSnapshots`, `Migrate` on master.
|
|
- Add `mcp snapshot`, `mcp snapshot list`, `mcp migrate` CLI commands.
|
|
- Test migration between rift and orion.
|
|
|
|
### Phase 6: Cut over
|
|
|
|
- Update CLI config to add `[master]` section.
|
|
- Update service definitions with `tier` and `public` fields.
|
|
- Deploy agent to orion.
|
|
- Verify all services via `mcp ps` and public endpoint tests.
|
|
- Keep `[[nodes]]` config and `--direct` flag as escape hatch.
|
|
|
|
---
|
|
|
|
## Hostname Convention for Public Services
|
|
|
|
Services with public routes have two hostnames:
|
|
|
|
| Hostname | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `<svc>.metacircular.net` | Public — browser access, SSO login | `mcq.metacircular.net` |
|
|
| `<svc>.svc.mcp.metacircular.net` | Internal — API clients, service-to-service | `mcq.svc.mcp.metacircular.net` |
|
|
|
|
**SSO always uses the public hostname.** The service's `[sso].redirect_uri`
|
|
and the MCIAS SSO client registration both point to the public hostname
|
|
(e.g., `https://mcq.metacircular.net/sso/callback`). SSO state cookies
|
|
are bound to the domain they are set on, so the entire browser-based
|
|
login flow must stay on a single hostname.
|
|
|
|
**API clients use the internal hostname.** Service-to-service calls,
|
|
CLI tools, and MCP server communication authenticate with bearer tokens
|
|
(not SSO) and use the internal `.svc.mcp.` hostname. These do not
|
|
involve browser cookies and are unaffected by the SSO hostname
|
|
constraint.
|
|
|
|
This means:
|
|
- Human users bookmark `mcq.metacircular.net`, not the `.svc.mcp.` URL.
|
|
- The web UI's SSO "Sign in" button always initiates the flow on the
|
|
public hostname.
|
|
- API endpoints on both hostnames accept the same bearer tokens —
|
|
the hostname distinction is a routing and cookie concern, not an
|
|
auth concern.
|
|
|
|
---
|
|
|
|
## Superseded Documents
|
|
|
|
`docs/edge-routing-design.md` is superseded by this document. It used
|
|
agent-to-agent communication, a single shared cert, private key
|
|
transmission over gRPC, and an `edge` field instead of `public`. None
|
|
of these design choices carried forward to v2.
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Master HA**: mcp-master is a single point of failure. For v2, this
|
|
is acceptable — the operator can use `--direct` to bypass the master.
|
|
Future work could add master replication.
|
|
|
|
2. **Auto-reconciliation**: The master detects drift but does not
|
|
auto-remediate. Future work could add automatic redeploy on drift.
|
|
|
|
## v2 Scope
|
|
|
|
v2 targets amd64 nodes only: rift (master+worker), orion (worker),
|
|
svc (edge). All images are single-arch amd64.
|
|
|
|
## Fast-Follow: arm64 Support
|
|
|
|
Immediate follow-up after v2 to onboard Raspberry Pi workers
|
|
(hyperborea and others):
|
|
|
|
1. **MCR manifest list support**: Accept and serve OCI image indexes
|
|
(`application/vnd.oci.image.index.v1+json`) so a single tag
|
|
references both amd64 and arm64 variants.
|
|
2. **`mcp build` multi-arch**: Build `linux/amd64` + `linux/arm64`
|
|
images and push manifest lists to MCR.
|
|
3. **Onboard RPi workers**: Deploy agents, add to registration
|
|
allowlist. Placement remains arch-agnostic — podman pulls the
|
|
correct variant automatically.
|
|
|
|
## What v2 Does NOT Include
|
|
|
|
These remain future work beyond the arm64 fast-follow:
|
|
|
|
- Auto-reconciliation (master-driven redeploy on drift)
|
|
- Zero-downtime live migration (v2 migration stops the service)
|
|
- Web UI for fleet management
|
|
- Observability / log aggregation
|
|
- Object store
|
|
- Multiple edge nodes with load-based assignment
|
|
- Master replication / HA
|
|
- Resource-aware bin-packing (requires resource declarations in service defs)
|