Add purge design to architecture doc
Purge removes stale registry entries — components that are no longer in service definitions and have no running container. Designed as an explicit, safe operation separate from sync: sync is additive (push desired state), purge is subtractive (remove forgotten entries). Includes safety rules (refuses to purge running containers), dry-run mode, agent RPC definition, and rationale for why sync should not be made destructive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
142
ARCHITECTURE.md
142
ARCHITECTURE.md
@@ -207,6 +207,7 @@ mcp sync Push service definitions to agent (update
|
|||||||
state without deploying)
|
state without deploying)
|
||||||
|
|
||||||
mcp adopt <service> Adopt all <service>-* containers into a service
|
mcp adopt <service> Adopt all <service>-* containers into a service
|
||||||
|
mcp purge [service[/component]] Remove stale registry entries (--dry-run to preview)
|
||||||
|
|
||||||
mcp service show <service> Print current spec from agent registry
|
mcp service show <service> Print current spec from agent registry
|
||||||
mcp service edit <service> Open service definition in $EDITOR
|
mcp service edit <service> Open service definition in $EDITOR
|
||||||
@@ -1195,6 +1196,147 @@ mcp/
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Registry Cleanup: Purge
|
||||||
|
|
||||||
|
### Problem
|
||||||
|
|
||||||
|
The agent's registry accumulates stale entries over time. A component
|
||||||
|
that was replaced (e.g., `mcns/coredns` → `mcns/mcns`) or a service
|
||||||
|
that was decommissioned remains in the registry indefinitely with
|
||||||
|
`observed=removed` or `observed=unknown`. There is no mechanism to tell
|
||||||
|
the agent "this component no longer exists and should not be tracked."
|
||||||
|
|
||||||
|
This causes:
|
||||||
|
- Perpetual drift alerts for components that will never return.
|
||||||
|
- Noise in `mcp status` and `mcp list` output.
|
||||||
|
- Confusion about what the agent is actually responsible for.
|
||||||
|
|
||||||
|
The existing `mcp sync` compares local service definitions against the
|
||||||
|
agent's registry and updates desired state for components that are
|
||||||
|
defined. But it does not remove components or services that are *absent*
|
||||||
|
from the local definitions — sync is additive, not declarative.
|
||||||
|
|
||||||
|
### Design: `mcp purge`
|
||||||
|
|
||||||
|
Purge removes registry entries that are both **unwanted** (not in any
|
||||||
|
current service definition) and **gone** (no corresponding container in
|
||||||
|
the runtime). It is the garbage collector for the registry.
|
||||||
|
|
||||||
|
```
|
||||||
|
mcp purge [--dry-run] Purge all stale entries
|
||||||
|
mcp purge <service> [--dry-run] Purge stale entries for one service
|
||||||
|
mcp purge <service>/<component> [--dry-run] Purge a specific component
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Semantics
|
||||||
|
|
||||||
|
Purge operates on the agent's registry, not on containers. It never
|
||||||
|
stops or removes running containers. The rules:
|
||||||
|
|
||||||
|
1. **Component purge**: a component is eligible for purge when:
|
||||||
|
- Its observed state is `removed`, `unknown`, or `exited`, AND
|
||||||
|
- It is not present in any current service definition file
|
||||||
|
(i.e., `mcp sync` would not recreate it).
|
||||||
|
|
||||||
|
Purging a component deletes its registry entry (from `components`,
|
||||||
|
`component_ports`, `component_volumes`, `component_cmd`) and its
|
||||||
|
event history.
|
||||||
|
|
||||||
|
2. **Service purge**: a service is eligible for purge when all of its
|
||||||
|
components have been purged (or it has no components). Purging a
|
||||||
|
service deletes its `services` row.
|
||||||
|
|
||||||
|
3. **Safety**: purge refuses to remove a component whose observed state
|
||||||
|
is `running` or `stopped` (i.e., a container still exists in the
|
||||||
|
runtime). This prevents accidentally losing track of live containers.
|
||||||
|
The operator must `mcp stop` and wait for the container to be removed
|
||||||
|
before purging, or manually remove it via podman.
|
||||||
|
|
||||||
|
4. **Dry run**: `--dry-run` lists what would be purged without modifying
|
||||||
|
the registry. This is the default-safe way to preview the operation.
|
||||||
|
|
||||||
|
#### Interaction with Sync
|
||||||
|
|
||||||
|
`mcp sync` pushes desired state from service definitions. `mcp purge`
|
||||||
|
removes entries that sync would never touch. They are complementary:
|
||||||
|
|
||||||
|
- `sync` answers: "what should exist?" (additive)
|
||||||
|
- `purge` answers: "what should be forgotten?" (subtractive)
|
||||||
|
|
||||||
|
A full cleanup is: `mcp sync && mcp purge`.
|
||||||
|
|
||||||
|
An alternative design would make `mcp sync` itself remove entries not
|
||||||
|
present in service definitions (fully declarative sync). This was
|
||||||
|
rejected because:
|
||||||
|
|
||||||
|
- Sync currently only operates on services that have local definition
|
||||||
|
files. A service without a local file is left untouched — this is
|
||||||
|
desirable when multiple operators or workstations manage different
|
||||||
|
services.
|
||||||
|
- Making sync destructive increases the blast radius of a missing file
|
||||||
|
(accidentally deleting the local `mcr.toml` would cause sync to
|
||||||
|
purge MCR from the registry).
|
||||||
|
- Purge as a separate, explicit command with `--dry-run` gives the
|
||||||
|
operator clear control over what gets cleaned up.
|
||||||
|
|
||||||
|
#### Agent RPC
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
rpc PurgeComponent(PurgeRequest) returns (PurgeResponse);
|
||||||
|
|
||||||
|
message PurgeRequest {
|
||||||
|
string service = 1; // service name (empty = all services)
|
||||||
|
string component = 2; // component name (empty = all eligible in service)
|
||||||
|
bool dry_run = 3; // preview only, do not modify registry
|
||||||
|
}
|
||||||
|
|
||||||
|
message PurgeResponse {
|
||||||
|
repeated PurgeResult results = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
message PurgeResult {
|
||||||
|
string service = 1;
|
||||||
|
string component = 2;
|
||||||
|
bool purged = 3; // true if removed (or would be, in dry-run)
|
||||||
|
string reason = 4; // why eligible, or why refused
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The CLI sends the set of currently-defined service/component names
|
||||||
|
alongside the purge request so the agent can determine what is "not in
|
||||||
|
any current service definition" without needing access to the CLI's
|
||||||
|
filesystem.
|
||||||
|
|
||||||
|
#### Example
|
||||||
|
|
||||||
|
After replacing `mcns/coredns` with `mcns/mcns`:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ mcp purge --dry-run
|
||||||
|
would purge mcns/coredns (observed=removed, not in service definitions)
|
||||||
|
|
||||||
|
$ mcp purge
|
||||||
|
purged mcns/coredns
|
||||||
|
|
||||||
|
$ mcp status
|
||||||
|
SERVICE COMPONENT DESIRED OBSERVED VERSION
|
||||||
|
mc-proxy mc-proxy running running latest
|
||||||
|
mcns mcns running running v1.0.0
|
||||||
|
mcr api running running latest
|
||||||
|
mcr web running running latest
|
||||||
|
metacrypt api running running latest
|
||||||
|
metacrypt web running running latest
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Registry Auth
|
||||||
|
|
||||||
|
Purge also cleans up after the `mcp adopt` workflow. When containers are
|
||||||
|
adopted and later removed (replaced by a proper deploy), the adopted
|
||||||
|
entries linger. Purge removes them once the containers are gone and the
|
||||||
|
service definition no longer references them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Future Work (v2+)
|
## Future Work (v2+)
|
||||||
|
|
||||||
These are explicitly out of scope for v1 but inform the design:
|
These are explicitly out of scope for v1 but inform the design:
|
||||||
|
|||||||
Reference in New Issue
Block a user