Incorporates all 14 items from DESIGN_AUDIT.md: node registry in CLI config, container naming convention (<service>-<component>), active state semantics, adopt by service prefix, EventInfo service field, version from image tag, snapshot/backup timer, exec-style alert commands, overlay-only bind address, RPC audit logging, /srv/ ownership, rootless podman UID mapping docs. Three minor fixes from final review: stale adopt syntax in bootstrap section, explicit container naming in deploy flow, clarify that list/ps query all registered nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1221 lines
44 KiB
Markdown
1221 lines
44 KiB
Markdown
# MCP -- Metacircular Control Plane
|
||
|
||
## Overview
|
||
|
||
MCP is the orchestrator for the Metacircular platform. It manages container
|
||
lifecycle, tracks what services run where, and transfers files between the
|
||
operator's workstation and managed nodes.
|
||
|
||
MCP has two components:
|
||
|
||
- **The CLI** (`mcp`) is a thin client that runs on the operator's
|
||
workstation. It reads local service definition files — the operator's
|
||
declaration of what should be running — and pushes that intent to agents.
|
||
It has no database and no daemon process.
|
||
|
||
- **The agent** (`mcp-agent`) is a smart per-node daemon. It receives
|
||
desired state from the CLI, manages containers via the local runtime,
|
||
stores the node's registry (desired state, observed state, deployed specs,
|
||
events), monitors for drift, and alerts the operator. The agent owns the
|
||
full loop: it knows what should be running, observes what is running, and
|
||
can act on the difference.
|
||
|
||
The agent's container runtime interaction (podman/docker CLI) is an internal
|
||
subcomponent — the "dumb" part. The agent itself is the smart coordinator
|
||
that wraps it with state tracking, monitoring, and a gRPC API.
|
||
|
||
### v1 Scope
|
||
|
||
v1 targets a single-node deployment (one agent on rift, CLI on vade). The
|
||
core operations are:
|
||
|
||
- **Deploy** -- push service definitions to the agent; agent pulls images
|
||
and starts (or restarts) containers.
|
||
- **Component-level deploy** -- deploy individual components within a
|
||
service without disrupting others (e.g., update the web UI without
|
||
restarting the API server).
|
||
- **Container lifecycle** -- stop, start, restart services.
|
||
- **Monitoring** -- agent continuously watches container state, records
|
||
events, detects drift and flapping, alerts the operator.
|
||
- **Status** -- query live container state, view drift, review events.
|
||
- **File transfer** -- push or pull individual files between CLI and nodes
|
||
(config files, certificates), scoped to service directories.
|
||
- **Sync** -- push service definitions to the agent to update desired state
|
||
without deploying.
|
||
|
||
Explicitly **not in v1**: migration (snapshot/tar.zst transfer), automatic
|
||
scheduling/placement, certificate provisioning from Metacrypt, DNS updates
|
||
to MCNS, multi-node orchestration, auto-reconciliation (agent restarting
|
||
drifted containers without operator action).
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Operator workstation (vade)
|
||
┌──────────────────────────────┐
|
||
│ mcp (CLI) │
|
||
│ │
|
||
│ ~/.config/mcp/services/ │
|
||
│ metacrypt.toml │
|
||
│ mcr.toml │
|
||
│ mc-proxy.toml │
|
||
│ │
|
||
│ gRPC client ────────────────┼──── overlay ────┐
|
||
└──────────────────────────────┘ │
|
||
│
|
||
MC Node (rift) │
|
||
┌────────────────────────────────────────────────┼──┐
|
||
│ │ │
|
||
│ ┌──────────────────────────────────────────┐ │ │
|
||
│ │ mcp-agent │◄─┘ │
|
||
│ │ │ │
|
||
│ │ ┌─────────────┐ ┌──────────────────┐ │ │
|
||
│ │ │ Registry │ │ Monitor │ │ │
|
||
│ │ │ (SQLite) │ │ (watch loop, │ │ │
|
||
│ │ │ │ │ events, │ │ │
|
||
│ │ │ desired │ │ alerting) │ │ │
|
||
│ │ │ observed │ │ │ │ │
|
||
│ │ │ specs │ │ │ │ │
|
||
│ │ │ events │ │ │ │ │
|
||
│ │ └─────────────┘ └──────────────────┘ │ │
|
||
│ │ │ │
|
||
│ │ ┌──────────────────────────────────┐ │ │
|
||
│ │ │ Container runtime (podman) │ │ │
|
||
│ │ │ │ │ │
|
||
│ │ │ ┌───────┐ ┌───────┐ ┌───────┐ │ │ │
|
||
│ │ │ │ svc α │ │ svc β │ │ svc γ │ │ │ │
|
||
│ │ │ └───────┘ └───────┘ └───────┘ │ │ │
|
||
│ │ └──────────────────────────────────┘ │ │
|
||
│ └──────────────────────────────────────────┘ │
|
||
│ │
|
||
│ /srv/<service>/ (config, db, certs, backups) │
|
||
└───────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Components
|
||
|
||
| Component | Binary | Where | Role |
|
||
|-----------|--------|-------|------|
|
||
| CLI | `mcp` | Operator workstation (vade) | Thin client. Reads service definitions, pushes intent to agents, queries status. |
|
||
| Agent | `mcp-agent` | Each managed node (rift) | Smart daemon. Manages containers, stores registry, monitors, alerts. |
|
||
|
||
### Communication
|
||
|
||
The CLI communicates with agents over gRPC with server-side TLS. The
|
||
transport is the encrypted overlay network (Tailscale/WireGuard). The CLI
|
||
authenticates by presenting an MCIAS bearer token in gRPC metadata. The
|
||
agent validates the token by calling MCIAS and checking for the `admin`
|
||
role.
|
||
|
||
Client certificates (mTLS) are not used. The overlay network restricts
|
||
network access to platform participants, MCIAS tokens are short-lived with
|
||
role enforcement, and the agent's TLS certificate is verified against the
|
||
Metacrypt CA. The scenarios where mTLS adds value (stolen token, MCIAS
|
||
compromise) already imply broader platform compromise. mTLS remains an
|
||
option for future security hardening.
|
||
|
||
---
|
||
|
||
## Authentication and Authorization
|
||
|
||
MCP follows the platform authentication model: all auth is delegated to
|
||
MCIAS.
|
||
|
||
### Agent Authentication
|
||
|
||
The agent is a gRPC server with a unary interceptor that enforces
|
||
authentication on every RPC:
|
||
|
||
1. CLI includes an MCIAS bearer token in the gRPC metadata
|
||
(`authorization: Bearer <token>`).
|
||
2. Agent extracts the token and validates it against MCIAS (cached 30s by
|
||
SHA-256 of the token, per platform convention).
|
||
3. Agent checks that the caller has the `admin` role. All MCP operations
|
||
require admin -- there is no unprivileged MCP access.
|
||
4. If validation fails, the RPC returns `UNAUTHENTICATED` (invalid/expired
|
||
token) or `PERMISSION_DENIED` (valid token, not admin).
|
||
|
||
### CLI Authentication
|
||
|
||
The CLI authenticates to MCIAS before issuing commands. The token can be
|
||
obtained by:
|
||
|
||
1. `mcp login` -- interactive login, stores the token locally.
|
||
2. Environment variable (`MCP_TOKEN`) for scripted use.
|
||
3. System account credentials in the CLI config file.
|
||
|
||
The stored token is used for all subsequent agent RPCs until it expires.
|
||
|
||
---
|
||
|
||
## Services and Components
|
||
|
||
A **service** is a logical unit of the platform (e.g., "metacrypt"). A
|
||
service has one or more **components** -- the containers that make it up
|
||
(e.g., "api" and "web"). Components within a service:
|
||
|
||
- Share the same node.
|
||
- Share the same `/srv/<service>/` data directory.
|
||
- Are deployed together by default, but can be deployed independently.
|
||
|
||
This models the real constraint that components like an API server and its
|
||
web UI are co-located and share state, but have different operational
|
||
characteristics. For example, restarting Metacrypt's API server requires
|
||
unsealing the vault, but the web UI can be redeployed independently without
|
||
disrupting the API.
|
||
|
||
Services with a single component (e.g., mc-proxy) simply have one
|
||
`[[components]]` block.
|
||
|
||
The unique identity of a component is `node/service/component`.
|
||
|
||
### Container Naming Convention
|
||
|
||
Containers are named `<service>-<component>`:
|
||
|
||
- `metacrypt-api`, `metacrypt-web`
|
||
- `mcr-api`, `mcr-web`
|
||
- `mc-proxy` (single-component service)
|
||
|
||
This convention enables `mcp adopt <service>` to match all containers
|
||
for a service by prefix and derive component names automatically
|
||
(`metacrypt-api` → component `api`, `metacrypt-web` → component `web`).
|
||
|
||
---
|
||
|
||
## CLI
|
||
|
||
### Commands
|
||
|
||
```
|
||
mcp login Authenticate to MCIAS, store token
|
||
|
||
mcp deploy <service> Deploy all components from service definition
|
||
mcp deploy <service>/<component> Deploy a single component
|
||
mcp deploy <service> -f <file> Deploy from explicit file
|
||
mcp stop <service> Stop all components, set active=false
|
||
mcp start <service> Start all components, set active=true
|
||
mcp restart <service> Restart all components
|
||
|
||
mcp list List services from all agents (registry, no runtime query)
|
||
mcp ps Live check: query runtime on all agents, show running
|
||
containers with uptime and version
|
||
mcp status [service] Full picture: live query + drift + recent events
|
||
mcp sync Push service definitions to agent (update desired
|
||
state without deploying)
|
||
|
||
mcp adopt <service> Adopt all <service>-* containers into a service
|
||
|
||
mcp service show <service> Print current spec from agent registry
|
||
mcp service edit <service> Open service definition in $EDITOR
|
||
mcp service export <service> Write agent registry spec to local service file
|
||
mcp service export <service> -f <file> Write to explicit path
|
||
|
||
mcp push <local-file> <service> [path] Copy a local file into /srv/<service>/[path]
|
||
mcp pull <service> <path> [local-file] Copy a file from /srv/<service>/<path> to local
|
||
|
||
mcp node list List registered nodes
|
||
mcp node add <name> <address> Register a node
|
||
mcp node remove <name> Deregister a node
|
||
```
|
||
|
||
### Service Definition Files
|
||
|
||
A service definition is a TOML file that declares the components for a
|
||
service. These files live in `~/.config/mcp/services/` by default, one
|
||
per service. They are the operator's declaration of intent -- what should
|
||
exist, with what spec, in what state.
|
||
|
||
Example: `~/.config/mcp/services/metacrypt.toml`
|
||
|
||
```toml
|
||
name = "metacrypt"
|
||
node = "rift"
|
||
active = true
|
||
|
||
[[components]]
|
||
name = "api"
|
||
image = "mcr.svc.mcp.metacircular.net:8443/metacrypt:latest"
|
||
network = "docker_default"
|
||
user = "0:0"
|
||
restart = "unless-stopped"
|
||
ports = ["127.0.0.1:18443:8443", "127.0.0.1:19443:9443"]
|
||
volumes = ["/srv/metacrypt:/srv/metacrypt"]
|
||
|
||
[[components]]
|
||
name = "web"
|
||
image = "mcr.svc.mcp.metacircular.net:8443/metacrypt-web:latest"
|
||
network = "docker_default"
|
||
user = "0:0"
|
||
restart = "unless-stopped"
|
||
ports = ["127.0.0.1:18080:8080"]
|
||
volumes = ["/srv/metacrypt:/srv/metacrypt"]
|
||
cmd = ["server", "--config", "/srv/metacrypt/metacrypt.toml"]
|
||
```
|
||
|
||
### Active State
|
||
|
||
The `active` field is the operator's desired state for the service:
|
||
|
||
- `active = true` → CLI tells agent: all components should be `running`.
|
||
- `active = false` → CLI tells agent: all components should be `stopped`.
|
||
|
||
Lifecycle commands update the service definition file:
|
||
|
||
- `mcp stop <service>` sets `active = false` in the local file and tells
|
||
the agent to stop all components.
|
||
- `mcp start <service>` sets `active = true` and tells the agent to start.
|
||
- `mcp sync` pushes all service definitions — the agent stops anything
|
||
marked inactive and keeps active services running.
|
||
|
||
The service definition file is always the source of truth. Lifecycle
|
||
commands modify it so the file stays in sync with the operator's intent.
|
||
|
||
### Deploy Resolution
|
||
|
||
`mcp deploy <service>` resolves the component spec through a precedence
|
||
chain:
|
||
|
||
1. **Service definition file** -- if `-f <file>` is specified, use that
|
||
file. Otherwise look for `~/.config/mcp/services/<service>.toml`.
|
||
2. **Agent registry** (fallback) -- if no file exists, use the spec from
|
||
the last successful deploy stored in the agent's registry.
|
||
|
||
If neither exists (first deploy, no file), the deploy fails with an error
|
||
telling the operator to create a service definition.
|
||
|
||
The CLI pushes the resolved spec to the agent. The agent records it in its
|
||
registry and executes the deploy. The service definition file on disk is
|
||
**not** modified -- it represents the operator's declared intent, not the
|
||
deployed state. To sync the file with reality, use `mcp service export`.
|
||
|
||
### Spec Lifecycle
|
||
|
||
```
|
||
┌─────────────┐
|
||
write │ Service │ mcp deploy
|
||
──────────► │ definition │ ──────────────┐
|
||
│ (.toml) │ │
|
||
└─────────────┘ ▼
|
||
▲ ┌─────────────────┐
|
||
│ │ Agent registry │
|
||
mcp service │ │ (deployed │
|
||
export │ │ spec) │
|
||
│ └─────────────────┘
|
||
│ │
|
||
└───────────────────────┘
|
||
```
|
||
|
||
- **Operator writes** the service definition file (or copies one from
|
||
the service's repo).
|
||
- **`mcp deploy`** reads the file, pushes to the agent, agent records the
|
||
spec in its registry and deploys.
|
||
- **`mcp service export`** reads the agent's registry and writes it back to
|
||
the local file, incorporating any changes since the file was last edited.
|
||
|
||
`mcp service edit <service>` opens the service definition in `$EDITOR`
|
||
(falling back to `$VISUAL`, then `vi`). If no file exists yet, it exports
|
||
the current spec from the agent's registry first, so the operator starts
|
||
from the deployed state rather than a blank file. After the editor exits,
|
||
the file is saved to the standard path in the services directory.
|
||
|
||
### Where Definition Files Come From
|
||
|
||
Service definition files can be:
|
||
|
||
- **Written by hand** by the operator.
|
||
- **Copied from the service's repo** (a service could ship a
|
||
`deploy/mcp-service.toml` as a starting point).
|
||
- **Generated by `mcp adopt` + `mcp service export`** -- adopt existing
|
||
containers, then export to get a file matching the running config.
|
||
- **Generated by converting from mcdeploy.toml** during initial MCP
|
||
migration (one-time).
|
||
|
||
---
|
||
|
||
## Agent
|
||
|
||
The agent is the smart per-node daemon. It owns the full lifecycle:
|
||
receives desired state, manages containers, stores the registry, monitors
|
||
for drift, and alerts the operator.
|
||
|
||
### gRPC Service Definition
|
||
|
||
The agent exposes a single gRPC service. All RPCs require admin
|
||
authentication. The agent is gRPC-only -- it is internal C2 infrastructure,
|
||
not a user-facing service, so the platform's REST+gRPC parity rule does not
|
||
apply.
|
||
|
||
```protobuf
|
||
syntax = "proto3";
|
||
package mcp.v1;
|
||
|
||
import "google/protobuf/timestamp.proto";
|
||
|
||
service McpAgent {
|
||
// Service lifecycle
|
||
rpc Deploy(DeployRequest) returns (DeployResponse);
|
||
rpc StopService(ServiceRequest) returns (ServiceResponse);
|
||
rpc StartService(ServiceRequest) returns (ServiceResponse);
|
||
rpc RestartService(ServiceRequest) returns (ServiceResponse);
|
||
|
||
// Desired state
|
||
rpc SyncDesiredState(SyncRequest) returns (SyncResponse);
|
||
|
||
// Status and registry
|
||
rpc ListServices(ListServicesRequest) returns (ListServicesResponse);
|
||
rpc GetServiceStatus(ServiceStatusRequest) returns (ServiceStatusResponse);
|
||
rpc LiveCheck(LiveCheckRequest) returns (LiveCheckResponse);
|
||
|
||
// Adopt
|
||
rpc AdoptContainer(AdoptRequest) returns (AdoptResponse);
|
||
|
||
// File transfer
|
||
rpc PushFile(PushFileRequest) returns (PushFileResponse);
|
||
rpc PullFile(PullFileRequest) returns (PullFileResponse);
|
||
|
||
// Node
|
||
rpc NodeStatus(NodeStatusRequest) returns (NodeStatusResponse);
|
||
}
|
||
|
||
// --- Service lifecycle ---
|
||
|
||
message ComponentSpec {
|
||
string name = 1;
|
||
string image = 2;
|
||
string network = 3;
|
||
string user = 4;
|
||
string restart = 5;
|
||
repeated string ports = 6; // "host:container" mappings
|
||
repeated string volumes = 7; // "host:container" mount specs
|
||
repeated string cmd = 8; // command and arguments
|
||
}
|
||
|
||
message ServiceSpec {
|
||
string name = 1;
|
||
bool active = 2;
|
||
repeated ComponentSpec components = 3;
|
||
}
|
||
|
||
message DeployRequest {
|
||
ServiceSpec service = 1;
|
||
string component = 2; // deploy single component (empty = all)
|
||
}
|
||
|
||
message DeployResponse {
|
||
repeated ComponentResult results = 1;
|
||
}
|
||
|
||
message ComponentResult {
|
||
string name = 1;
|
||
bool success = 2;
|
||
string error = 3;
|
||
}
|
||
|
||
message ServiceRequest {
|
||
string name = 1;
|
||
}
|
||
|
||
message ServiceResponse {
|
||
repeated ComponentResult results = 1;
|
||
}
|
||
|
||
// --- Desired state ---
|
||
|
||
message SyncRequest {
|
||
repeated ServiceSpec services = 1; // all services for this node
|
||
}
|
||
|
||
message SyncResponse {
|
||
repeated ServiceSyncResult results = 1;
|
||
}
|
||
|
||
message ServiceSyncResult {
|
||
string name = 1;
|
||
bool changed = 2; // desired state was updated
|
||
string summary = 3;
|
||
}
|
||
|
||
// --- Status and registry ---
|
||
|
||
message ListServicesRequest {}
|
||
|
||
message ServiceInfo {
|
||
string name = 1;
|
||
bool active = 2;
|
||
repeated ComponentInfo components = 3;
|
||
}
|
||
|
||
message ComponentInfo {
|
||
string name = 1;
|
||
string image = 2;
|
||
string desired_state = 3; // "running", "stopped", "ignore"
|
||
string observed_state = 4; // "running", "stopped", "exited", "removed", "unknown"
|
||
string version = 5; // extracted from image tag
|
||
google.protobuf.Timestamp started = 6;
|
||
}
|
||
|
||
message ListServicesResponse {
|
||
repeated ServiceInfo services = 1;
|
||
}
|
||
|
||
message ServiceStatusRequest {
|
||
string name = 1; // empty = all services
|
||
}
|
||
|
||
message DriftInfo {
|
||
string service = 1;
|
||
string component = 2;
|
||
string desired_state = 3;
|
||
string observed_state = 4;
|
||
}
|
||
|
||
message EventInfo {
|
||
string service = 1;
|
||
string component = 2;
|
||
string prev_state = 3;
|
||
string new_state = 4;
|
||
google.protobuf.Timestamp timestamp = 5;
|
||
}
|
||
|
||
message ServiceStatusResponse {
|
||
repeated ServiceInfo services = 1;
|
||
repeated DriftInfo drift = 2;
|
||
repeated EventInfo recent_events = 3;
|
||
}
|
||
|
||
message LiveCheckRequest {}
|
||
|
||
message LiveCheckResponse {
|
||
repeated ServiceInfo services = 1; // with freshly observed state
|
||
}
|
||
|
||
// --- Adopt ---
|
||
|
||
message AdoptRequest {
|
||
string service = 1; // service name; matches <service>-* containers
|
||
}
|
||
|
||
message AdoptResult {
|
||
string container = 1; // runtime container name
|
||
string component = 2; // derived component name
|
||
bool success = 3;
|
||
string error = 4;
|
||
}
|
||
|
||
message AdoptResponse {
|
||
repeated AdoptResult results = 1;
|
||
}
|
||
|
||
// --- File transfer ---
|
||
// All file paths are relative to /srv/<service>/ on the node.
|
||
// The agent resolves the full path and rejects traversal attempts.
|
||
|
||
message PushFileRequest {
|
||
string service = 1; // service name (-> /srv/<service>/)
|
||
string path = 2; // relative path within service dir
|
||
bytes content = 3;
|
||
uint32 mode = 4; // file permissions (e.g. 0600)
|
||
}
|
||
|
||
message PushFileResponse {
|
||
bool success = 1;
|
||
string error = 2;
|
||
}
|
||
|
||
message PullFileRequest {
|
||
string service = 1; // service name (-> /srv/<service>/)
|
||
string path = 2; // relative path within service dir
|
||
}
|
||
|
||
message PullFileResponse {
|
||
bytes content = 1;
|
||
uint32 mode = 2;
|
||
string error = 3;
|
||
}
|
||
|
||
// --- Node ---
|
||
|
||
message NodeStatusRequest {}
|
||
|
||
message NodeStatusResponse {
|
||
string node_name = 1;
|
||
string runtime = 2; // "podman", "docker"
|
||
string runtime_version = 3;
|
||
uint32 service_count = 4;
|
||
uint32 component_count = 5;
|
||
uint64 disk_total_bytes = 6;
|
||
uint64 disk_free_bytes = 7;
|
||
uint64 memory_total_bytes = 8;
|
||
uint64 memory_free_bytes = 9;
|
||
double cpu_usage_percent = 10;
|
||
google.protobuf.Timestamp uptime_since = 11;
|
||
}
|
||
```
|
||
|
||
### Container Runtime
|
||
|
||
The agent manages containers by executing the local container runtime CLI
|
||
(`podman`). The runtime is configured in the agent's config file. The agent
|
||
shells out to the CLI for simplicity and debuggability -- the operator can
|
||
always run the same commands manually.
|
||
|
||
The agent runs as a dedicated `mcp` system user. Podman runs rootless under
|
||
this user. All containers are owned by `mcp`. The NixOS configuration
|
||
provisions the `mcp` user with podman access.
|
||
|
||
#### Deploy Flow
|
||
|
||
When the agent receives a `Deploy` RPC:
|
||
|
||
1. Record the service spec in the registry (desired state, component specs).
|
||
2. For each component being deployed (all, or the one named in the request):
|
||
a. Pull the image: `podman pull <image>`
|
||
b. Stop and remove the existing container (if any):
|
||
`podman stop <name>` and `podman rm <name>`
|
||
c. Start the new container (named `<service>-<component>`):
|
||
`podman run -d --name <service>-<component> [flags] <image> [cmd]`
|
||
d. Verify the container is running: `podman inspect <name>`
|
||
e. Update observed state in the registry.
|
||
3. Set desired state to `running` for deployed components.
|
||
4. Extract version from the image tag (e.g., `mcr.../metacrypt:v1.7.0`
|
||
→ `v1.7.0`) and record it in the registry.
|
||
5. Return success/failure per component.
|
||
|
||
The flags passed to `podman run` are derived from the `ComponentSpec`:
|
||
|
||
| Spec field | Runtime flag |
|
||
|------------|-------------|
|
||
| `network` | `--network <network>` |
|
||
| `user` | `--user <user>` |
|
||
| `restart` | `--restart <restart>` |
|
||
| `ports` | `-p <mapping>` (repeated) |
|
||
| `volumes` | `-v <mapping>` (repeated) |
|
||
| `cmd` | appended after the image name |
|
||
|
||
### File Transfer
|
||
|
||
The agent supports single-file push and pull, scoped to a specific
|
||
service's data directory. This is the mechanism for deploying config files
|
||
and certificates to nodes.
|
||
|
||
Every file operation specifies a **service name** and a **relative path**.
|
||
The agent resolves the full path as `/srv/<service>/<path>`. This scoping
|
||
ensures that a file operation for service A cannot write into service B's
|
||
directory.
|
||
|
||
**Push**: CLI sends the service name, relative path, file content, and
|
||
permissions. The agent resolves the path, validates it (no `..` traversal,
|
||
no symlinks escaping the service directory), creates intermediate
|
||
directories if needed, and writes the file atomically (write to temp file,
|
||
then rename).
|
||
|
||
**Pull**: CLI sends the service name and relative path. The agent resolves
|
||
the path, validates it, reads the file, and returns the content and
|
||
permissions.
|
||
|
||
```
|
||
# Push mcr.toml into /srv/mcr/mcr.toml
|
||
mcp push mcr.toml mcr
|
||
|
||
# Push a cert into /srv/mcr/certs/mcr.pem
|
||
mcp push cert.pem mcr certs/mcr.pem
|
||
|
||
# Pull a config file back
|
||
mcp pull mcr mcr.toml ./mcr.toml
|
||
```
|
||
|
||
When the relative path is omitted from `mcp push`, the basename of the
|
||
local file is used.
|
||
|
||
File size is bounded by gRPC message limits. For v1, the default 4MB gRPC
|
||
message size is sufficient -- config files and certificates are kilobytes.
|
||
If larger transfers are needed in the future, streaming RPCs or the v2
|
||
tar.zst archive transfer will handle them.
|
||
|
||
### Desired State vs. Observed State
|
||
|
||
The agent's registry tracks two separate pieces of information for each
|
||
component:
|
||
|
||
- **Desired state** -- what the operator wants: `running`, `stopped`, or
|
||
`ignore`. Set by the CLI via deploy, stop, start, sync, or adopt.
|
||
- **Observed state** -- what the container runtime reports: `running`,
|
||
`stopped`, `exited`, `removed`, or `unknown`.
|
||
|
||
These can diverge. A component with desired=`running` and observed=`exited`
|
||
has crashed. The agent flags this as **drift**. Components with
|
||
desired=`ignore` are tracked but never flagged as drifting.
|
||
|
||
| Desired | Observed | Status |
|
||
|---------|----------|--------|
|
||
| running | running | OK |
|
||
| running | stopped | **DRIFT** -- stopped unexpectedly |
|
||
| running | exited | **DRIFT** -- crashed |
|
||
| running | removed | **DRIFT** -- container gone |
|
||
| stopped | stopped | OK |
|
||
| stopped | removed | OK |
|
||
| stopped | running | **DRIFT** -- running when it shouldn't be |
|
||
| ignore | (any) | OK -- not managed |
|
||
|
||
For v1, the agent reports drift but does not auto-reconcile. The operator
|
||
decides whether to `mcp start`, `mcp deploy`, or investigate.
|
||
Auto-reconciliation (agent restarting drifted containers without operator
|
||
action) is a v2 concern.
|
||
|
||
### Registry Reconciliation
|
||
|
||
The agent reconciles its registry against the container runtime on three
|
||
occasions: during the monitor loop (continuous), on `mcp ps` / `mcp status`
|
||
(on demand), and on `mcp sync` (when new desired state is pushed).
|
||
|
||
Reconciliation:
|
||
|
||
1. Agent queries the container runtime for all containers.
|
||
2. Compares the runtime's report against the registry:
|
||
- **Component in registry, seen in runtime**: update observed state.
|
||
- **Component in registry, not in runtime**: set observed state to
|
||
`removed`.
|
||
- **Container in runtime, not in registry**: add to registry with
|
||
desired state `ignore`. These are containers the agent sees but
|
||
MCP didn't deploy.
|
||
3. Record state-change events for any transitions.
|
||
|
||
### Adopting Unmanaged Containers
|
||
|
||
On first sync, every container on rift will appear with desired state
|
||
`ignore` -- MCP didn't deploy them and doesn't know their intended service
|
||
grouping.
|
||
|
||
`mcp adopt <service>` claims unmanaged containers by prefix:
|
||
|
||
1. Find all containers matching `<service>-*` (plus `<service>` itself
|
||
for single-component services).
|
||
2. Create the service in the registry if it doesn't exist.
|
||
3. Add each container as a component, stripping the service name prefix
|
||
to derive the component name: `metacrypt-api` → `api`,
|
||
`metacrypt-web` → `web`.
|
||
4. Set desired state to `running` (or `stopped` if the container is
|
||
currently stopped).
|
||
|
||
This lets the operator bring existing containers under MCP management
|
||
without redeploying them. The typical bootstrap flow: `mcp sync` to
|
||
discover containers, `mcp adopt` to group them into services,
|
||
`mcp service export` to generate service definition files from the
|
||
adopted state.
|
||
|
||
### Monitoring
|
||
|
||
The agent runs a continuous monitor loop that watches container state and
|
||
alerts the operator when problems are detected. Monitoring is a core
|
||
function of the agent, not a separate process.
|
||
|
||
#### Event Log
|
||
|
||
Every state transition is recorded in the `events` table (see Database
|
||
Schema for the full DDL). Events accumulate over time and support rate
|
||
queries:
|
||
|
||
```sql
|
||
-- How many times has metacrypt-api exited in the last hour?
|
||
SELECT COUNT(*) FROM events
|
||
WHERE component = 'api' AND service = 'metacrypt'
|
||
AND new_state = 'exited'
|
||
AND timestamp > datetime('now', '-1 hour');
|
||
```
|
||
|
||
Old events are pruned at the start of each monitor iteration (default:
|
||
retain 30 days).
|
||
|
||
#### Monitor Loop
|
||
|
||
Each iteration of the monitor loop:
|
||
|
||
1. Query the container runtime for all container states.
|
||
2. Reconcile against the registry (update observed states).
|
||
3. For each state transition since the last iteration, insert an event.
|
||
4. Evaluate alert conditions against the current state and event history.
|
||
5. If an alert fires, execute the configured alert command.
|
||
6. Sleep for the configured interval.
|
||
|
||
#### Alert Conditions
|
||
|
||
The monitor evaluates two types of alert:
|
||
|
||
- **Drift alert**: a managed component's observed state does not match its
|
||
desired state. Fires on the transition, not on every iteration.
|
||
- **Flap alert**: a component has changed state more than N times within a
|
||
window. Default threshold: 3 transitions in 10 minutes.
|
||
|
||
Each alert has a **cooldown** per component. Once an alert fires for a
|
||
component, it is suppressed for the cooldown period regardless of further
|
||
transitions. This prevents notification spam from a flapping service.
|
||
|
||
```toml
|
||
[monitor]
|
||
interval = "60s"
|
||
alert_command = [] # argv to exec on alert; empty = log only
|
||
cooldown = "15m" # suppress repeat alerts per component
|
||
flap_threshold = 3 # state changes within flap_window = flapping
|
||
flap_window = "10m"
|
||
retention = "30d" # event log retention
|
||
```
|
||
|
||
#### Alert Command
|
||
|
||
When an alert fires, the agent executes the configured command using
|
||
exec-style invocation (no shell). The command is an argv array; context
|
||
is passed via environment variables on the child process:
|
||
|
||
| Variable | Value |
|
||
|----------|-------|
|
||
| `MCP_COMPONENT` | Component name |
|
||
| `MCP_SERVICE` | Parent service name |
|
||
| `MCP_NODE` | Node name |
|
||
| `MCP_DESIRED` | Desired state |
|
||
| `MCP_OBSERVED` | Observed state |
|
||
| `MCP_PREV_STATE` | Previous observed state |
|
||
| `MCP_ALERT_TYPE` | `drift` or `flapping` |
|
||
| `MCP_TRANSITIONS` | Number of transitions in the flap window (for flap alerts) |
|
||
|
||
The alert command is the operator's choice. MCP does not ship with or
|
||
depend on any notification system.
|
||
|
||
```toml
|
||
# Push notification
|
||
alert_command = ["/usr/local/bin/ntfy", "publish", "mcp-alerts"]
|
||
|
||
# Custom script (reads MCP_* env vars)
|
||
alert_command = ["/usr/local/bin/mcp-notify"]
|
||
|
||
# Syslog
|
||
alert_command = ["/usr/bin/logger", "-t", "mcp"]
|
||
```
|
||
|
||
The command receives all context via environment variables. No shell
|
||
expansion occurs, eliminating command injection via crafted container
|
||
names or other metadata.
|
||
|
||
---
|
||
|
||
## Database Schema
|
||
|
||
The agent's SQLite database stores the node-local registry. Each agent
|
||
has its own database. Component identity is scoped to the node -- there
|
||
are no cross-node name collisions because each node has a separate
|
||
database.
|
||
|
||
```sql
|
||
CREATE TABLE services (
|
||
name TEXT PRIMARY KEY,
|
||
active INTEGER NOT NULL DEFAULT 1,
|
||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||
);
|
||
|
||
CREATE TABLE components (
|
||
name TEXT NOT NULL,
|
||
service TEXT NOT NULL REFERENCES services(name) ON DELETE CASCADE,
|
||
image TEXT NOT NULL,
|
||
network TEXT NOT NULL DEFAULT 'bridge',
|
||
user_spec TEXT NOT NULL DEFAULT '',
|
||
restart TEXT NOT NULL DEFAULT 'unless-stopped',
|
||
desired_state TEXT NOT NULL DEFAULT 'running',
|
||
observed_state TEXT NOT NULL DEFAULT 'unknown',
|
||
version TEXT NOT NULL DEFAULT '',
|
||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||
PRIMARY KEY (service, name)
|
||
);
|
||
|
||
CREATE TABLE component_ports (
|
||
service TEXT NOT NULL,
|
||
component TEXT NOT NULL,
|
||
mapping TEXT NOT NULL,
|
||
PRIMARY KEY (service, component, mapping),
|
||
FOREIGN KEY (service, component) REFERENCES components(service, name) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE TABLE component_volumes (
|
||
service TEXT NOT NULL,
|
||
component TEXT NOT NULL,
|
||
mapping TEXT NOT NULL,
|
||
PRIMARY KEY (service, component, mapping),
|
||
FOREIGN KEY (service, component) REFERENCES components(service, name) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE TABLE component_cmd (
|
||
service TEXT NOT NULL,
|
||
component TEXT NOT NULL,
|
||
position INTEGER NOT NULL,
|
||
arg TEXT NOT NULL,
|
||
PRIMARY KEY (service, component, position),
|
||
FOREIGN KEY (service, component) REFERENCES components(service, name) ON DELETE CASCADE
|
||
);
|
||
|
||
CREATE TABLE events (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
service TEXT NOT NULL,
|
||
component TEXT NOT NULL,
|
||
prev_state TEXT NOT NULL,
|
||
new_state TEXT NOT NULL,
|
||
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
|
||
);
|
||
|
||
CREATE INDEX idx_events_component_time ON events(service, component, timestamp);
|
||
```
|
||
|
||
### State Values
|
||
|
||
**Desired state** (set by operator actions via CLI):
|
||
|
||
| State | Meaning |
|
||
|-------|---------|
|
||
| `running` | Operator wants this component running |
|
||
| `stopped` | Operator deliberately stopped this component |
|
||
| `ignore` | Unmanaged -- MCP sees it but is not responsible for it |
|
||
|
||
**Observed state** (set by container runtime queries):
|
||
|
||
| State | Meaning |
|
||
|-------|---------|
|
||
| `running` | Container is running |
|
||
| `stopped` | Container exists but is not running |
|
||
| `exited` | Container exited (crashed or completed) |
|
||
| `removed` | Container no longer exists |
|
||
| `unknown` | State has not been queried yet |
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
### CLI Config
|
||
|
||
```toml
|
||
[services]
|
||
dir = "/home/kyle/.config/mcp/services"
|
||
|
||
[mcias]
|
||
server_url = "https://mcias.metacircular.net:8443"
|
||
ca_cert = ""
|
||
service_name = "mcp"
|
||
|
||
[auth]
|
||
token_path = "/home/kyle/.config/mcp/token"
|
||
# Optional: for unattended operation (scripts, cron)
|
||
# username = "mcp-operator"
|
||
# password_file = "/home/kyle/.config/mcp/credentials"
|
||
|
||
[[nodes]]
|
||
name = "rift"
|
||
address = "100.95.252.120:9444"
|
||
```
|
||
|
||
`mcp node add/remove` edits the `[[nodes]]` section. `mcp node list`
|
||
reads it. The CLI routes commands to agents based on the node addresses
|
||
here and the `node` field in service definition files.
|
||
|
||
Directory layout on the operator's workstation:
|
||
|
||
```
|
||
~/.config/mcp/
|
||
├── mcp.toml CLI config
|
||
├── token Cached MCIAS bearer token (0600)
|
||
└── services/ Service definition files
|
||
├── metacrypt.toml
|
||
├── mcr.toml
|
||
├── mc-proxy.toml
|
||
└── ...
|
||
```
|
||
|
||
The CLI has no database. Service definition files are the operator's source
|
||
of truth for desired state. The agent's registry is the operational truth.
|
||
|
||
### Agent Config
|
||
|
||
```toml
|
||
[server]
|
||
grpc_addr = "100.95.252.120:9444" # bind to overlay interface only
|
||
tls_cert = "/srv/mcp/certs/cert.pem"
|
||
tls_key = "/srv/mcp/certs/key.pem"
|
||
|
||
[database]
|
||
path = "/srv/mcp/mcp.db"
|
||
|
||
[mcias]
|
||
server_url = "https://mcias.metacircular.net:8443"
|
||
ca_cert = ""
|
||
service_name = "mcp-agent"
|
||
|
||
[agent]
|
||
node_name = "rift"
|
||
container_runtime = "podman"
|
||
|
||
[monitor]
|
||
interval = "60s"
|
||
alert_command = []
|
||
cooldown = "15m"
|
||
flap_threshold = 3
|
||
flap_window = "10m"
|
||
retention = "30d"
|
||
|
||
[log]
|
||
level = "info"
|
||
```
|
||
|
||
The agent binds to the overlay network interface, not to all interfaces.
|
||
It does **not** sit behind MC-Proxy -- MCP manages MC-Proxy's lifecycle,
|
||
so a circular dependency would make the agent unreachable when MC-Proxy
|
||
is down. Like MC-Proxy itself, the agent is infrastructure that must be
|
||
directly reachable on the overlay.
|
||
|
||
The agent's data directory follows the platform convention:
|
||
|
||
```
|
||
/srv/mcp/
|
||
├── mcp-agent.toml Agent config
|
||
├── mcp.db Registry database
|
||
├── certs/
|
||
│ ├── cert.pem Agent TLS certificate
|
||
│ └── key.pem Agent TLS key
|
||
└── backups/ Database snapshots
|
||
```
|
||
|
||
---
|
||
|
||
## Deployment
|
||
|
||
### Agent Deployment (on nodes)
|
||
|
||
The agent is deployed like any other Metacircular service:
|
||
|
||
1. Provision the `mcp` system user via NixOS config (with podman access
|
||
and subuid/subgid ranges for rootless containers).
|
||
2. Set `/srv/` ownership to the `mcp` user (the agent creates and manages
|
||
`/srv/<service>/` directories for all services).
|
||
3. Create `/srv/mcp/` directory and config file.
|
||
4. Provision TLS certificate from Metacrypt.
|
||
5. Create an MCIAS system account for the agent (`mcp-agent`).
|
||
6. Install the `mcp-agent` binary.
|
||
7. Start via systemd unit.
|
||
|
||
The agent runs as a systemd service. Container-first deployment is a v2
|
||
concern -- MCP needs to be running before it can manage its own agent.
|
||
|
||
```ini
|
||
[Unit]
|
||
Description=MCP Agent
|
||
After=network-online.target
|
||
Wants=network-online.target
|
||
|
||
[Service]
|
||
Type=simple
|
||
ExecStart=/usr/local/bin/mcp-agent server --config /srv/mcp/mcp-agent.toml
|
||
Restart=on-failure
|
||
RestartSec=5
|
||
|
||
User=mcp
|
||
Group=mcp
|
||
|
||
NoNewPrivileges=true
|
||
ProtectSystem=strict
|
||
ProtectHome=true
|
||
PrivateTmp=true
|
||
PrivateDevices=true
|
||
ProtectKernelTunables=true
|
||
ProtectKernelModules=true
|
||
ProtectControlGroups=true
|
||
RestrictSUIDSGID=true
|
||
RestrictNamespaces=true
|
||
LockPersonality=true
|
||
MemoryDenyWriteExecute=true
|
||
RestrictRealtime=true
|
||
ReadWritePaths=/srv
|
||
|
||
[Install]
|
||
WantedBy=multi-user.target
|
||
```
|
||
|
||
Note: `ReadWritePaths=/srv` (not `/srv/mcp`) because the agent writes
|
||
files to any service's `/srv/<service>/` directory on behalf of the CLI.
|
||
|
||
### CLI Installation (on operator workstation)
|
||
|
||
The CLI is a standalone binary with no daemon.
|
||
|
||
1. Install the `mcp` binary to `~/.local/bin/` or `/usr/local/bin/`.
|
||
2. Create `~/.config/mcp/mcp.toml`.
|
||
3. Create `~/.config/mcp/services/` directory.
|
||
4. Run `mcp login` to authenticate.
|
||
5. Run `mcp sync` to push service definitions and discover existing
|
||
containers.
|
||
|
||
### MCP Bootstrap (first time)
|
||
|
||
When bringing MCP up on a node that already has running containers:
|
||
|
||
1. Deploy the agent (steps above).
|
||
2. `mcp sync` with no service definition files -- the agent discovers all
|
||
running containers and adds them to its registry with desired state
|
||
`ignore`.
|
||
3. `mcp adopt <service>` for each service -- groups matching containers
|
||
into the service and sets desired state to `running`.
|
||
4. `mcp service export <service>` for each service -- generate service
|
||
definition files from the adopted state.
|
||
5. Review and edit the generated files as needed.
|
||
|
||
From this point, the service definition files are the source of truth and
|
||
`mcp deploy` manages the containers.
|
||
|
||
Existing containers on rift currently run under kyle's podman instance.
|
||
As part of MCP bootstrap, they will need to be re-created under the `mcp`
|
||
user's rootless podman. This is a one-time migration. Containers should
|
||
also be renamed to follow the `<service>-<component>` convention (e.g.,
|
||
`metacrypt` → `metacrypt-api`) before adoption.
|
||
|
||
#### Rootless Podman and UID Mapping
|
||
|
||
The `mcp` user's subuid/subgid ranges (configured via NixOS) determine
|
||
how container UIDs map to host UIDs. With `user = "0:0"` inside the
|
||
container, the effective host UID depends on the mapping. Files in
|
||
`/srv/<service>/` must be accessible to the mapped UIDs. The NixOS
|
||
configuration should provision appropriate subuid/subgid ranges when
|
||
creating the `mcp` user.
|
||
|
||
---
|
||
|
||
## Security Model
|
||
|
||
### Threat Mitigations
|
||
|
||
| Threat | Mitigation |
|
||
|--------|------------|
|
||
| Unauthorized C2 commands | Agent requires admin MCIAS token on every RPC |
|
||
| Token theft | Tokens have short expiry; cached validation keyed by SHA-256 |
|
||
| Agent impersonation | CLI verifies agent TLS certificate against Metacrypt CA |
|
||
| Arbitrary file write via push | Agent restricts writes to `/srv/<service>/` for the named service |
|
||
| Arbitrary file read via pull | Agent restricts reads to `/srv/<service>/` for the named service |
|
||
| Cross-service file access | File ops require a service name; agent resolves to that service's directory only |
|
||
| Container runtime escape | Rootless podman under `mcp` user; containers follow platform hardening |
|
||
| Network eavesdropping | All C2 traffic is gRPC over TLS over encrypted overlay |
|
||
| Agent exposure on LAN | Agent binds to overlay interface only, not all interfaces |
|
||
| Alert command injection | Alert command is exec'd as argv array, no shell interpretation |
|
||
| Unaudited operations | Every RPC is logged at info level with method, caller identity, and timestamp |
|
||
|
||
### Security Invariants
|
||
|
||
1. Every agent RPC requires a valid MCIAS admin token. No anonymous or
|
||
unprivileged access.
|
||
2. Every RPC is audit-logged at `info` level via the auth interceptor:
|
||
method name, caller identity (from MCIAS token), timestamp. Uses
|
||
`log/slog` per platform convention.
|
||
3. File operations are scoped to `/srv/<service>/` for the named service.
|
||
Path traversal attempts (`../`, symlinks outside the service directory)
|
||
are rejected.
|
||
4. The agent never executes arbitrary commands. It only runs container
|
||
runtime operations and file I/O through well-defined code paths.
|
||
Alert commands are exec'd as argv arrays with no shell interpretation.
|
||
5. TLS 1.3 minimum on the agent's gRPC listener. The agent binds to the
|
||
overlay interface only.
|
||
6. The CLI's stored token is file-permission protected (0600).
|
||
7. The agent runs as a dedicated `mcp` user with rootless podman. `/srv/`
|
||
is owned by the `mcp` user. No root access required.
|
||
|
||
---
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
mcp/
|
||
├── cmd/
|
||
│ ├── mcp/ CLI
|
||
│ │ ├── main.go
|
||
│ │ ├── login.go
|
||
│ │ ├── deploy.go
|
||
│ │ ├── lifecycle.go stop, start, restart
|
||
│ │ ├── status.go list, ps, status
|
||
│ │ ├── sync.go sync desired state
|
||
│ │ ├── adopt.go adopt unmanaged containers
|
||
│ │ ├── service.go service show/edit/export
|
||
│ │ ├── transfer.go push, pull
|
||
│ │ └── node.go node add/list/remove
|
||
│ └── mcp-agent/ Agent daemon
|
||
│ ├── main.go
|
||
│ └── snapshot.go Database backup command
|
||
├── internal/
|
||
│ ├── agent/ Agent core
|
||
│ │ ├── agent.go Agent struct, setup, gRPC server
|
||
│ │ ├── deploy.go Deploy flow
|
||
│ │ ├── lifecycle.go Stop, start, restart
|
||
│ │ ├── files.go File push/pull with path validation
|
||
│ │ ├── sync.go Desired state sync, reconciliation
|
||
│ │ ├── adopt.go Container adoption
|
||
│ │ └── status.go Status queries
|
||
│ ├── runtime/ Container runtime abstraction
|
||
│ │ ├── runtime.go Interface
|
||
│ │ └── podman.go Podman implementation
|
||
│ ├── registry/ Node-local registry
|
||
│ │ ├── db.go Schema, migrations
|
||
│ │ ├── services.go Service CRUD
|
||
│ │ ├── components.go Component CRUD
|
||
│ │ └── events.go Event log
|
||
│ ├── monitor/ Monitoring subsystem
|
||
│ │ ├── monitor.go Watch loop
|
||
│ │ └── alerting.go Alert evaluation and command execution
|
||
│ ├── servicedef/ Service definition file parsing
|
||
│ │ └── servicedef.go Load, parse, write TOML service defs
|
||
│ ├── auth/ MCIAS integration
|
||
│ │ └── auth.go Token validation, interceptor
|
||
│ └── config/ Configuration loading
|
||
│ ├── cli.go
|
||
│ └── agent.go
|
||
├── proto/mcp/
|
||
│ └── v1/
|
||
│ └── mcp.proto
|
||
├── gen/mcp/
|
||
│ └── v1/ Generated Go code
|
||
├── deploy/
|
||
│ ├── systemd/
|
||
│ │ ├── mcp-agent.service
|
||
│ │ ├── mcp-agent-backup.service
|
||
│ │ └── mcp-agent-backup.timer
|
||
│ ├── examples/
|
||
│ │ ├── mcp.toml CLI config example
|
||
│ │ └── mcp-agent.toml Agent config example
|
||
│ └── scripts/
|
||
│ └── install-agent.sh
|
||
├── Makefile
|
||
├── buf.yaml
|
||
├── .golangci.yaml
|
||
├── CLAUDE.md
|
||
└── ARCHITECTURE.md
|
||
```
|
||
|
||
---
|
||
|
||
## Future Work (v2+)
|
||
|
||
These are explicitly out of scope for v1 but inform the design:
|
||
|
||
- **Auto-reconciliation**: the agent detects drift but does not act on it
|
||
in v1. v2 adds configurable auto-restart for drifted components (with
|
||
backoff to avoid restart storms). This is the path to fully declarative
|
||
operation -- the agent continuously reconciles toward desired state.
|
||
- **Migration**: snapshot `/srv/<service>/` as tar.zst (with VACUUM INTO
|
||
for clean DB copies), stream to destination node, restore. Requires
|
||
streaming gRPC and archive assembly logic.
|
||
- **Scheduling**: automatic node selection based on resource availability
|
||
and operator constraints. The agent already reports disk, memory, and CPU
|
||
in `NodeStatus` to support this.
|
||
- **Certificate provisioning**: MCP provisions TLS certs from Metacrypt
|
||
during deploy via the ACME client library.
|
||
- **DNS updates**: MCP pushes record updates to MCNS after deploy/migrate.
|
||
Requires MCNS to have an API (or, as a stopgap, zone file editing).
|
||
- **Multi-node orchestration**: deploy across multiple nodes, rolling
|
||
updates, health-aware placement.
|
||
- **Web UI**: a web interface for registry browsing and operations. Would
|
||
be a separate binary communicating with agents via gRPC, following the
|
||
platform's web UI pattern.
|