Add undeploy command: full inverse of deploy

Implements `mcp undeploy <service>` which tears down all infrastructure
for a service: removes mc-proxy routes, DNS records, TLS certificates,
stops and removes containers, releases allocated ports, and marks the
service inactive.

This fills the gap between `stop` (temporary pause) and `purge` (registry
cleanup). Undeploy is the complete teardown that returns the node to the
state before the service was deployed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-27 21:45:42 -07:00
parent b2eaa69619
commit f932dd64cc
8 changed files with 610 additions and 150 deletions

View File

@@ -198,6 +198,7 @@ mcp build <service>/<image> Build and push a single image
mcp deploy <service> Deploy all components from service definition
mcp deploy <service>/<component> Deploy a single component
mcp deploy <service> -f <file> Deploy from explicit file
mcp undeploy <service> Full teardown: remove routes, DNS, certs, containers
mcp stop <service> Stop all components, set active=false
mcp start <service> Start all components, set active=true
mcp restart <service> Restart all components
@@ -453,6 +454,7 @@ import "google/protobuf/timestamp.proto";
service McpAgent {
// Service lifecycle
rpc Deploy(DeployRequest) returns (DeployResponse);
rpc UndeployService(UndeployRequest) returns (UndeployResponse);
rpc StopService(ServiceRequest) returns (ServiceResponse);
rpc StartService(ServiceRequest) returns (ServiceResponse);
rpc RestartService(ServiceRequest) returns (ServiceResponse);
@@ -714,6 +716,40 @@ The flags passed to `podman run` are derived from the `ComponentSpec`:
| `volumes` | `-v <mapping>` (repeated) |
| `cmd` | appended after the image name |
#### Undeploy Flow
`mcp undeploy <service>` is the full inverse of deploy. It tears down all
infrastructure associated with a service. When the agent receives an
`UndeployService` RPC:
1. For each component:
a. Remove mc-proxy routes (traffic stops flowing).
b. Remove DNS A records from MCNS.
c. Remove TLS certificate and key files from the mc-proxy cert
directory (for L7 routes).
d. Stop and remove the container.
e. Release allocated host ports back to the port allocator.
f. Update component state to `removed` in the registry.
2. Mark the service as inactive.
3. Return success/failure per component.
The CLI also sets `active = false` in the local service definition file
to keep it in sync with the operator's intent.
Undeploy differs from `stop` in three ways:
| Aspect | `stop` | `undeploy` |
|--------|--------|-----------|
| Container | Stopped (still exists) | Stopped and removed |
| TLS certs | Kept | Removed |
| Ports | Kept allocated | Released |
| Service active | Unchanged | Set to inactive |
After undeploy, the service can be redeployed with `mcp deploy`. The
registry entries are preserved (desired state `removed`) so `mcp status`
and `mcp list` still show the service existed. Use `mcp purge` to clean
up the registry entries if desired.
### File Transfer
The agent supports single-file push and pull, scoped to a specific
@@ -1203,6 +1239,102 @@ container, the effective host UID depends on the mapping. Files in
configuration should provision appropriate subuid/subgid ranges when
creating the `mcp` user.
**Dockerfile convention**: Do not use `USER`, `VOLUME`, or `adduser`
directives in production Dockerfiles. The `user` field in the service
definition (typically `"0:0"`) controls the runtime user, and host
volumes provide the data directories. A non-root `USER` in the
Dockerfile maps to a subordinate UID under rootless podman that cannot
access files owned by the `mcp` user on the host.
#### Infrastructure Boot Order and Circular Dependencies
MCR (container registry) and MCNS (DNS) are both deployed as containers
via MCP, but MCP itself depends on them:
- **MCR** is reachable through mc-proxy (L4 passthrough on `:8443`).
The agent pulls images from MCR during `mcp deploy`.
- **MCNS** serves DNS for internal zones. Tailscale and the overlay
network depend on DNS resolution.
This creates circular dependencies during cold-start or recovery:
```
mcp deploy → agent pulls image → needs MCR → needs mc-proxy
mcp deploy → agent dials MCR → DNS resolves hostname → needs MCNS
```
**Cold-start procedure** (no containers running):
1. **Build images on the operator workstation** for mc-proxy, MCR, and
MCNS. Transfer to rift via `podman save` / `scp` / `podman load`
since the registry is not yet available:
```
docker save <image> -o /tmp/image.tar
scp /tmp/image.tar <rift-lan-ip>:/tmp/
# on rift, as mcp user:
podman load -i /tmp/image.tar
```
Use the LAN IP for scp, not a DNS name (DNS is not running yet).
2. **Start MCNS first** (DNS must come up before anything that resolves
hostnames). Run directly with podman since the MCP agent cannot reach
the registry yet:
```
podman run -d --name mcns --restart unless-stopped \
--sysctl net.ipv4.ip_unprivileged_port_start=53 \
-p <lan-ip>:53:53/tcp -p <lan-ip>:53:53/udp \
-p <overlay-ip>:53:53/tcp -p <overlay-ip>:53:53/udp \
-v /srv/mcns:/srv/mcns \
<mcns-image> server --config /srv/mcns/mcns.toml
```
3. **Start mc-proxy** (registry traffic routes through it):
```
podman run -d --name mc-proxy --network host \
--restart unless-stopped \
-v /srv/mc-proxy:/srv/mc-proxy \
<mc-proxy-image> server --config /srv/mc-proxy/mc-proxy.toml
```
4. **Start MCR** (API server, then web UI):
```
podman run -d --name mcr-api --network mcpnet \
--restart unless-stopped \
-p 127.0.0.1:28443:8443 -p 127.0.0.1:29443:9443 \
-v /srv/mcr:/srv/mcr \
<mcr-image> server --config /srv/mcr/mcr.toml
```
5. **Push images to MCR** from the operator workstation now that the
registry is reachable:
```
docker push <registry>/<image>:<tag>
```
6. **Start the MCP agent** (systemd service). It can now reach MCR for
image pulls.
7. **`mcp adopt`** the manually-started containers to bring them under
MCP management. Then `mcp service export` to generate service
definition files.
From this point, `mcp deploy` works normally. The manually-started
containers are replaced by MCP-managed ones on the next deploy.
**Recovery procedure** (mc-proxy or MCNS crashed):
If mc-proxy or MCNS goes down, the agent cannot pull images (registry
unreachable or DNS broken). Recovery:
1. Check if the required image is cached locally:
`podman images | grep <service>`
2. If cached, start the container directly with `podman run` (same
flags as the cold-start procedure above).
3. If not cached, transfer the image from the operator workstation via
`podman save` / `scp` / `podman load` using the LAN IP.
4. Once the infrastructure service is running, `mcp deploy` resumes
normal operation for other services.
---
## Security Model