Add purge design to architecture doc

Purge removes stale registry entries — components that are no longer
in service definitions and have no running container. Designed as an
explicit, safe operation separate from sync: sync is additive (push
desired state), purge is subtractive (remove forgotten entries).

Includes safety rules (refuses to purge running containers), dry-run
mode, agent RPC definition, and rationale for why sync should not be
made destructive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-26 22:22:27 -07:00
parent ea8a42a696
commit 1afbf5e1f6

View File

@@ -207,6 +207,7 @@ mcp sync Push service definitions to agent (update
state without deploying)
mcp adopt <service> Adopt all <service>-* containers into a service
mcp purge [service[/component]] Remove stale registry entries (--dry-run to preview)
mcp service show <service> Print current spec from agent registry
mcp service edit <service> Open service definition in $EDITOR
@@ -1195,6 +1196,147 @@ mcp/
---
## Registry Cleanup: Purge
### Problem
The agent's registry accumulates stale entries over time. A component
that was replaced (e.g., `mcns/coredns``mcns/mcns`) or a service
that was decommissioned remains in the registry indefinitely with
`observed=removed` or `observed=unknown`. There is no mechanism to tell
the agent "this component no longer exists and should not be tracked."
This causes:
- Perpetual drift alerts for components that will never return.
- Noise in `mcp status` and `mcp list` output.
- Confusion about what the agent is actually responsible for.
The existing `mcp sync` compares local service definitions against the
agent's registry and updates desired state for components that are
defined. But it does not remove components or services that are *absent*
from the local definitions — sync is additive, not declarative.
### Design: `mcp purge`
Purge removes registry entries that are both **unwanted** (not in any
current service definition) and **gone** (no corresponding container in
the runtime). It is the garbage collector for the registry.
```
mcp purge [--dry-run] Purge all stale entries
mcp purge <service> [--dry-run] Purge stale entries for one service
mcp purge <service>/<component> [--dry-run] Purge a specific component
```
#### Semantics
Purge operates on the agent's registry, not on containers. It never
stops or removes running containers. The rules:
1. **Component purge**: a component is eligible for purge when:
- Its observed state is `removed`, `unknown`, or `exited`, AND
- It is not present in any current service definition file
(i.e., `mcp sync` would not recreate it).
Purging a component deletes its registry entry (from `components`,
`component_ports`, `component_volumes`, `component_cmd`) and its
event history.
2. **Service purge**: a service is eligible for purge when all of its
components have been purged (or it has no components). Purging a
service deletes its `services` row.
3. **Safety**: purge refuses to remove a component whose observed state
is `running` or `stopped` (i.e., a container still exists in the
runtime). This prevents accidentally losing track of live containers.
The operator must `mcp stop` and wait for the container to be removed
before purging, or manually remove it via podman.
4. **Dry run**: `--dry-run` lists what would be purged without modifying
the registry. This is the default-safe way to preview the operation.
#### Interaction with Sync
`mcp sync` pushes desired state from service definitions. `mcp purge`
removes entries that sync would never touch. They are complementary:
- `sync` answers: "what should exist?" (additive)
- `purge` answers: "what should be forgotten?" (subtractive)
A full cleanup is: `mcp sync && mcp purge`.
An alternative design would make `mcp sync` itself remove entries not
present in service definitions (fully declarative sync). This was
rejected because:
- Sync currently only operates on services that have local definition
files. A service without a local file is left untouched — this is
desirable when multiple operators or workstations manage different
services.
- Making sync destructive increases the blast radius of a missing file
(accidentally deleting the local `mcr.toml` would cause sync to
purge MCR from the registry).
- Purge as a separate, explicit command with `--dry-run` gives the
operator clear control over what gets cleaned up.
#### Agent RPC
```protobuf
rpc PurgeComponent(PurgeRequest) returns (PurgeResponse);
message PurgeRequest {
string service = 1; // service name (empty = all services)
string component = 2; // component name (empty = all eligible in service)
bool dry_run = 3; // preview only, do not modify registry
}
message PurgeResponse {
repeated PurgeResult results = 1;
}
message PurgeResult {
string service = 1;
string component = 2;
bool purged = 3; // true if removed (or would be, in dry-run)
string reason = 4; // why eligible, or why refused
}
```
The CLI sends the set of currently-defined service/component names
alongside the purge request so the agent can determine what is "not in
any current service definition" without needing access to the CLI's
filesystem.
#### Example
After replacing `mcns/coredns` with `mcns/mcns`:
```
$ mcp purge --dry-run
would purge mcns/coredns (observed=removed, not in service definitions)
$ mcp purge
purged mcns/coredns
$ mcp status
SERVICE COMPONENT DESIRED OBSERVED VERSION
mc-proxy mc-proxy running running latest
mcns mcns running running v1.0.0
mcr api running running latest
mcr web running running latest
metacrypt api running running latest
metacrypt web running running latest
```
#### Registry Auth
Purge also cleans up after the `mcp adopt` workflow. When containers are
adopted and later removed (replaced by a proper deploy), the adopted
entries linger. Purge removes them once the containers are gone and the
service definition no longer references them.
---
## Future Work (v2+)
These are explicitly out of scope for v1 but inform the design: