Add undeploy command: full inverse of deploy
Implements `mcp undeploy <service>` which tears down all infrastructure for a service: removes mc-proxy routes, DNS records, TLS certificates, stops and removes containers, releases allocated ports, and marks the service inactive. This fills the gap between `stop` (temporary pause) and `purge` (registry cleanup). Undeploy is the complete teardown that returns the node to the state before the service was deployed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
132
ARCHITECTURE.md
132
ARCHITECTURE.md
@@ -198,6 +198,7 @@ mcp build <service>/<image> Build and push a single image
|
||||
mcp deploy <service> Deploy all components from service definition
|
||||
mcp deploy <service>/<component> Deploy a single component
|
||||
mcp deploy <service> -f <file> Deploy from explicit file
|
||||
mcp undeploy <service> Full teardown: remove routes, DNS, certs, containers
|
||||
mcp stop <service> Stop all components, set active=false
|
||||
mcp start <service> Start all components, set active=true
|
||||
mcp restart <service> Restart all components
|
||||
@@ -453,6 +454,7 @@ import "google/protobuf/timestamp.proto";
|
||||
service McpAgent {
|
||||
// Service lifecycle
|
||||
rpc Deploy(DeployRequest) returns (DeployResponse);
|
||||
rpc UndeployService(UndeployRequest) returns (UndeployResponse);
|
||||
rpc StopService(ServiceRequest) returns (ServiceResponse);
|
||||
rpc StartService(ServiceRequest) returns (ServiceResponse);
|
||||
rpc RestartService(ServiceRequest) returns (ServiceResponse);
|
||||
@@ -714,6 +716,40 @@ The flags passed to `podman run` are derived from the `ComponentSpec`:
|
||||
| `volumes` | `-v <mapping>` (repeated) |
|
||||
| `cmd` | appended after the image name |
|
||||
|
||||
#### Undeploy Flow
|
||||
|
||||
`mcp undeploy <service>` is the full inverse of deploy. It tears down all
|
||||
infrastructure associated with a service. When the agent receives an
|
||||
`UndeployService` RPC:
|
||||
|
||||
1. For each component:
|
||||
a. Remove mc-proxy routes (traffic stops flowing).
|
||||
b. Remove DNS A records from MCNS.
|
||||
c. Remove TLS certificate and key files from the mc-proxy cert
|
||||
directory (for L7 routes).
|
||||
d. Stop and remove the container.
|
||||
e. Release allocated host ports back to the port allocator.
|
||||
f. Update component state to `removed` in the registry.
|
||||
2. Mark the service as inactive.
|
||||
3. Return success/failure per component.
|
||||
|
||||
The CLI also sets `active = false` in the local service definition file
|
||||
to keep it in sync with the operator's intent.
|
||||
|
||||
Undeploy differs from `stop` in three ways:
|
||||
|
||||
| Aspect | `stop` | `undeploy` |
|
||||
|--------|--------|-----------|
|
||||
| Container | Stopped (still exists) | Stopped and removed |
|
||||
| TLS certs | Kept | Removed |
|
||||
| Ports | Kept allocated | Released |
|
||||
| Service active | Unchanged | Set to inactive |
|
||||
|
||||
After undeploy, the service can be redeployed with `mcp deploy`. The
|
||||
registry entries are preserved (desired state `removed`) so `mcp status`
|
||||
and `mcp list` still show the service existed. Use `mcp purge` to clean
|
||||
up the registry entries if desired.
|
||||
|
||||
### File Transfer
|
||||
|
||||
The agent supports single-file push and pull, scoped to a specific
|
||||
@@ -1203,6 +1239,102 @@ container, the effective host UID depends on the mapping. Files in
|
||||
configuration should provision appropriate subuid/subgid ranges when
|
||||
creating the `mcp` user.
|
||||
|
||||
**Dockerfile convention**: Do not use `USER`, `VOLUME`, or `adduser`
|
||||
directives in production Dockerfiles. The `user` field in the service
|
||||
definition (typically `"0:0"`) controls the runtime user, and host
|
||||
volumes provide the data directories. A non-root `USER` in the
|
||||
Dockerfile maps to a subordinate UID under rootless podman that cannot
|
||||
access files owned by the `mcp` user on the host.
|
||||
|
||||
#### Infrastructure Boot Order and Circular Dependencies
|
||||
|
||||
MCR (container registry) and MCNS (DNS) are both deployed as containers
|
||||
via MCP, but MCP itself depends on them:
|
||||
|
||||
- **MCR** is reachable through mc-proxy (L4 passthrough on `:8443`).
|
||||
The agent pulls images from MCR during `mcp deploy`.
|
||||
- **MCNS** serves DNS for internal zones. Tailscale and the overlay
|
||||
network depend on DNS resolution.
|
||||
|
||||
This creates circular dependencies during cold-start or recovery:
|
||||
|
||||
```
|
||||
mcp deploy → agent pulls image → needs MCR → needs mc-proxy
|
||||
mcp deploy → agent dials MCR → DNS resolves hostname → needs MCNS
|
||||
```
|
||||
|
||||
**Cold-start procedure** (no containers running):
|
||||
|
||||
1. **Build images on the operator workstation** for mc-proxy, MCR, and
|
||||
MCNS. Transfer to rift via `podman save` / `scp` / `podman load`
|
||||
since the registry is not yet available:
|
||||
```
|
||||
docker save <image> -o /tmp/image.tar
|
||||
scp /tmp/image.tar <rift-lan-ip>:/tmp/
|
||||
# on rift, as mcp user:
|
||||
podman load -i /tmp/image.tar
|
||||
```
|
||||
Use the LAN IP for scp, not a DNS name (DNS is not running yet).
|
||||
|
||||
2. **Start MCNS first** (DNS must come up before anything that resolves
|
||||
hostnames). Run directly with podman since the MCP agent cannot reach
|
||||
the registry yet:
|
||||
```
|
||||
podman run -d --name mcns --restart unless-stopped \
|
||||
--sysctl net.ipv4.ip_unprivileged_port_start=53 \
|
||||
-p <lan-ip>:53:53/tcp -p <lan-ip>:53:53/udp \
|
||||
-p <overlay-ip>:53:53/tcp -p <overlay-ip>:53:53/udp \
|
||||
-v /srv/mcns:/srv/mcns \
|
||||
<mcns-image> server --config /srv/mcns/mcns.toml
|
||||
```
|
||||
|
||||
3. **Start mc-proxy** (registry traffic routes through it):
|
||||
```
|
||||
podman run -d --name mc-proxy --network host \
|
||||
--restart unless-stopped \
|
||||
-v /srv/mc-proxy:/srv/mc-proxy \
|
||||
<mc-proxy-image> server --config /srv/mc-proxy/mc-proxy.toml
|
||||
```
|
||||
|
||||
4. **Start MCR** (API server, then web UI):
|
||||
```
|
||||
podman run -d --name mcr-api --network mcpnet \
|
||||
--restart unless-stopped \
|
||||
-p 127.0.0.1:28443:8443 -p 127.0.0.1:29443:9443 \
|
||||
-v /srv/mcr:/srv/mcr \
|
||||
<mcr-image> server --config /srv/mcr/mcr.toml
|
||||
```
|
||||
|
||||
5. **Push images to MCR** from the operator workstation now that the
|
||||
registry is reachable:
|
||||
```
|
||||
docker push <registry>/<image>:<tag>
|
||||
```
|
||||
|
||||
6. **Start the MCP agent** (systemd service). It can now reach MCR for
|
||||
image pulls.
|
||||
|
||||
7. **`mcp adopt`** the manually-started containers to bring them under
|
||||
MCP management. Then `mcp service export` to generate service
|
||||
definition files.
|
||||
|
||||
From this point, `mcp deploy` works normally. The manually-started
|
||||
containers are replaced by MCP-managed ones on the next deploy.
|
||||
|
||||
**Recovery procedure** (mc-proxy or MCNS crashed):
|
||||
|
||||
If mc-proxy or MCNS goes down, the agent cannot pull images (registry
|
||||
unreachable or DNS broken). Recovery:
|
||||
|
||||
1. Check if the required image is cached locally:
|
||||
`podman images | grep <service>`
|
||||
2. If cached, start the container directly with `podman run` (same
|
||||
flags as the cold-start procedure above).
|
||||
3. If not cached, transfer the image from the operator workstation via
|
||||
`podman save` / `scp` / `podman load` using the LAN IP.
|
||||
4. Once the infrastructure service is running, `mcp deploy` resumes
|
||||
normal operation for other services.
|
||||
|
||||
---
|
||||
|
||||
## Security Model
|
||||
|
||||
Reference in New Issue
Block a user