Add purge design to architecture doc
Purge removes stale registry entries — components that are no longer in service definitions and have no running container. Designed as an explicit, safe operation separate from sync: sync is additive (push desired state), purge is subtractive (remove forgotten entries). Includes safety rules (refuses to purge running containers), dry-run mode, agent RPC definition, and rationale for why sync should not be made destructive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
142
ARCHITECTURE.md
142
ARCHITECTURE.md
@@ -207,6 +207,7 @@ mcp sync Push service definitions to agent (update
|
||||
state without deploying)
|
||||
|
||||
mcp adopt <service> Adopt all <service>-* containers into a service
|
||||
mcp purge [service[/component]] Remove stale registry entries (--dry-run to preview)
|
||||
|
||||
mcp service show <service> Print current spec from agent registry
|
||||
mcp service edit <service> Open service definition in $EDITOR
|
||||
@@ -1195,6 +1196,147 @@ mcp/
|
||||
|
||||
---
|
||||
|
||||
## Registry Cleanup: Purge
|
||||
|
||||
### Problem
|
||||
|
||||
The agent's registry accumulates stale entries over time. A component
|
||||
that was replaced (e.g., `mcns/coredns` → `mcns/mcns`) or a service
|
||||
that was decommissioned remains in the registry indefinitely with
|
||||
`observed=removed` or `observed=unknown`. There is no mechanism to tell
|
||||
the agent "this component no longer exists and should not be tracked."
|
||||
|
||||
This causes:
|
||||
- Perpetual drift alerts for components that will never return.
|
||||
- Noise in `mcp status` and `mcp list` output.
|
||||
- Confusion about what the agent is actually responsible for.
|
||||
|
||||
The existing `mcp sync` compares local service definitions against the
|
||||
agent's registry and updates desired state for components that are
|
||||
defined. But it does not remove components or services that are *absent*
|
||||
from the local definitions — sync is additive, not declarative.
|
||||
|
||||
### Design: `mcp purge`
|
||||
|
||||
Purge removes registry entries that are both **unwanted** (not in any
|
||||
current service definition) and **gone** (no corresponding container in
|
||||
the runtime). It is the garbage collector for the registry.
|
||||
|
||||
```
|
||||
mcp purge [--dry-run] Purge all stale entries
|
||||
mcp purge <service> [--dry-run] Purge stale entries for one service
|
||||
mcp purge <service>/<component> [--dry-run] Purge a specific component
|
||||
```
|
||||
|
||||
#### Semantics
|
||||
|
||||
Purge operates on the agent's registry, not on containers. It never
|
||||
stops or removes running containers. The rules:
|
||||
|
||||
1. **Component purge**: a component is eligible for purge when:
|
||||
- Its observed state is `removed`, `unknown`, or `exited`, AND
|
||||
- It is not present in any current service definition file
|
||||
(i.e., `mcp sync` would not recreate it).
|
||||
|
||||
Purging a component deletes its registry entry (from `components`,
|
||||
`component_ports`, `component_volumes`, `component_cmd`) and its
|
||||
event history.
|
||||
|
||||
2. **Service purge**: a service is eligible for purge when all of its
|
||||
components have been purged (or it has no components). Purging a
|
||||
service deletes its `services` row.
|
||||
|
||||
3. **Safety**: purge refuses to remove a component whose observed state
|
||||
is `running` or `stopped` (i.e., a container still exists in the
|
||||
runtime). This prevents accidentally losing track of live containers.
|
||||
The operator must `mcp stop` and wait for the container to be removed
|
||||
before purging, or manually remove it via podman.
|
||||
|
||||
4. **Dry run**: `--dry-run` lists what would be purged without modifying
|
||||
the registry. This is the default-safe way to preview the operation.
|
||||
|
||||
#### Interaction with Sync
|
||||
|
||||
`mcp sync` pushes desired state from service definitions. `mcp purge`
|
||||
removes entries that sync would never touch. They are complementary:
|
||||
|
||||
- `sync` answers: "what should exist?" (additive)
|
||||
- `purge` answers: "what should be forgotten?" (subtractive)
|
||||
|
||||
A full cleanup is: `mcp sync && mcp purge`.
|
||||
|
||||
An alternative design would make `mcp sync` itself remove entries not
|
||||
present in service definitions (fully declarative sync). This was
|
||||
rejected because:
|
||||
|
||||
- Sync currently only operates on services that have local definition
|
||||
files. A service without a local file is left untouched — this is
|
||||
desirable when multiple operators or workstations manage different
|
||||
services.
|
||||
- Making sync destructive increases the blast radius of a missing file
|
||||
(accidentally deleting the local `mcr.toml` would cause sync to
|
||||
purge MCR from the registry).
|
||||
- Purge as a separate, explicit command with `--dry-run` gives the
|
||||
operator clear control over what gets cleaned up.
|
||||
|
||||
#### Agent RPC
|
||||
|
||||
```protobuf
|
||||
rpc PurgeComponent(PurgeRequest) returns (PurgeResponse);
|
||||
|
||||
message PurgeRequest {
|
||||
string service = 1; // service name (empty = all services)
|
||||
string component = 2; // component name (empty = all eligible in service)
|
||||
bool dry_run = 3; // preview only, do not modify registry
|
||||
}
|
||||
|
||||
message PurgeResponse {
|
||||
repeated PurgeResult results = 1;
|
||||
}
|
||||
|
||||
message PurgeResult {
|
||||
string service = 1;
|
||||
string component = 2;
|
||||
bool purged = 3; // true if removed (or would be, in dry-run)
|
||||
string reason = 4; // why eligible, or why refused
|
||||
}
|
||||
```
|
||||
|
||||
The CLI sends the set of currently-defined service/component names
|
||||
alongside the purge request so the agent can determine what is "not in
|
||||
any current service definition" without needing access to the CLI's
|
||||
filesystem.
|
||||
|
||||
#### Example
|
||||
|
||||
After replacing `mcns/coredns` with `mcns/mcns`:
|
||||
|
||||
```
|
||||
$ mcp purge --dry-run
|
||||
would purge mcns/coredns (observed=removed, not in service definitions)
|
||||
|
||||
$ mcp purge
|
||||
purged mcns/coredns
|
||||
|
||||
$ mcp status
|
||||
SERVICE COMPONENT DESIRED OBSERVED VERSION
|
||||
mc-proxy mc-proxy running running latest
|
||||
mcns mcns running running v1.0.0
|
||||
mcr api running running latest
|
||||
mcr web running running latest
|
||||
metacrypt api running running latest
|
||||
metacrypt web running running latest
|
||||
```
|
||||
|
||||
#### Registry Auth
|
||||
|
||||
Purge also cleans up after the `mcp adopt` workflow. When containers are
|
||||
adopted and later removed (replaced by a proper deploy), the adopted
|
||||
entries linger. Purge removes them once the containers are gone and the
|
||||
service definition no longer references them.
|
||||
|
||||
---
|
||||
|
||||
## Future Work (v2+)
|
||||
|
||||
These are explicitly out of scope for v1 but inform the design:
|
||||
|
||||
Reference in New Issue
Block a user