Files
metacircular/docs/packaging-and-deployment.md
Kyle Isom 8fb6374257 Document SSO login flow in packaging and deployment guide
Add SSO redirect flow alongside direct credentials, MCIAS client
registration steps, [sso] config section, and updated service versions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:49:36 -07:00

756 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Packaging and Deploying to the Metacircular Platform
This guide provides everything needed to build, package, and deploy a
service to the Metacircular platform. It assumes no prior knowledge of
the platform's internals.
---
## Platform Overview
Metacircular is a multi-service infrastructure platform. Services are
Go binaries running as containers on Linux nodes, managed by these core
components:
| Component | Role |
|-----------|------|
| **MCP** (Control Plane) | Deploys, monitors, and manages container lifecycle via rootless Podman |
| **MCR** (Container Registry) | OCI container registry at `mcr.svc.mcp.metacircular.net:8443` |
| **mc-proxy** (TLS Proxy) | Routes traffic to services via L4 (SNI passthrough) or L7 (TLS termination) |
| **MCIAS** (Identity Service) | Central SSO/IAM — all services authenticate through it |
| **MCNS** (DNS) | Authoritative DNS for `*.svc.mcp.metacircular.net` |
The operator workflow is: **build image → push to MCR → write service
definition → deploy via MCP**. MCP handles port assignment, route
registration, and container lifecycle.
---
## Prerequisites
| Requirement | Details |
|-------------|---------|
| Go | 1.25+ |
| Container engine | Docker or Podman (for building images) |
| `mcp` CLI | Installed on the operator workstation |
| MCR access | Credentials to push images to `mcr.svc.mcp.metacircular.net:8443` |
| MCP agent | Running on the target node (currently `rift`) |
| MCIAS account | For `mcp` CLI authentication to the agent |
---
## 1. Build the Container Image
### Dockerfile Pattern
All services use a two-stage Alpine build. This is the standard
template:
```dockerfile
FROM golang:1.25-alpine AS builder
RUN apk add --no-cache git
WORKDIR /build
COPY go.mod go.sum ./
RUN go mod download
COPY . .
ARG VERSION=dev
RUN CGO_ENABLED=0 go build -trimpath \
-ldflags="-s -w -X main.version=${VERSION}" \
-o /<binary> ./cmd/<binary>
FROM alpine:3.21
RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /<binary> /usr/local/bin/<binary>
WORKDIR /srv/<service>
EXPOSE <ports>
ENTRYPOINT ["<binary>"]
CMD ["server", "--config", "/srv/<service>/<service>.toml"]
```
### Dockerfile Rules
- **`CGO_ENABLED=0`** — all builds are statically linked. No CGo in
production.
- **`ca-certificates` and `tzdata`** — required in the runtime image
for TLS verification and timezone-aware logging.
- **No `USER` directive** — containers run as `--user 0:0` under MCP's
rootless Podman. UID 0 inside the container maps to the unprivileged
`mcp` host user. A non-root `USER` directive creates a subordinate
UID that cannot access host-mounted volumes.
- **No `VOLUME` directive** — causes layer unpacking failures under
rootless Podman. The host volume mount is declared in the service
definition, not the image.
- **No `adduser`/`addgroup`** — unnecessary given the rootless Podman
model.
- **`WORKDIR /srv/<service>`** — so relative paths resolve correctly
against the mounted data directory.
- **Version injection** — pass the git tag via `--build-arg VERSION=...`
so the binary can report its version.
- **Stripped binaries** — `-trimpath -ldflags="-s -w"` removes debug
symbols and build paths.
### Split Binaries
If the service has separate API and web UI binaries, create separate
Dockerfiles:
- `Dockerfile.api` — builds the API/gRPC server
- `Dockerfile.web` — builds the web UI server
Both follow the same template. The web binary communicates with the API
server via gRPC (no direct database access).
### Makefile Target
Every service includes a `make docker` target:
```makefile
docker:
docker build --build-arg VERSION=$(shell git describe --tags --always --dirty) \
-t <service> -f Dockerfile.api .
```
---
## 2. Write a Service Definition
Service definitions are TOML files that tell MCP what to deploy. They
live at `~/.config/mcp/services/<service>.toml` on the operator
workstation.
### Minimal Example (Single Component, L7)
```toml
name = "myservice"
node = "rift"
[build.images]
myservice = "Dockerfile"
[[components]]
name = "web"
image = "mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0"
[[components.routes]]
port = 8443
mode = "l7"
```
### API Service Example (L4, Multiple Routes)
```toml
name = "myservice"
node = "rift"
[build.images]
myservice = "Dockerfile"
[[components]]
name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0"
volumes = ["/srv/myservice:/srv/myservice"]
cmd = ["server", "--config", "/srv/myservice/myservice.toml"]
[[components.routes]]
name = "rest"
port = 8443
mode = "l4"
[[components.routes]]
name = "grpc"
port = 9443
mode = "l4"
```
### Full Example (API + Web)
```toml
name = "myservice"
node = "rift"
[build.images]
myservice = "Dockerfile.api"
myservice-web = "Dockerfile.web"
[[components]]
name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0"
volumes = ["/srv/myservice:/srv/myservice"]
cmd = ["server", "--config", "/srv/myservice/myservice.toml"]
[[components.routes]]
name = "rest"
port = 8443
mode = "l4"
[[components.routes]]
name = "grpc"
port = 9443
mode = "l4"
[[components]]
name = "web"
image = "mcr.svc.mcp.metacircular.net:8443/myservice-web:v1.0.0"
volumes = ["/srv/myservice:/srv/myservice"]
cmd = ["server", "--config", "/srv/myservice/myservice.toml"]
[[components.routes]]
port = 443
mode = "l7"
```
### Conventions
A few fields are derived by the agent at deploy time:
| Field | Default | Override when... |
|-------|---------|------------------|
| Source path | `<service>` relative to workspace root | Directory name differs from service name (use `path`) |
| Hostname | `<service>.svc.mcp.metacircular.net` | Service needs a public hostname (use route `hostname`) |
All other fields must be explicit in the service definition.
### Service Definition Reference
**Top-level fields:**
| Field | Required | Purpose |
|-------|----------|---------|
| `name` | Yes | Service name (matches project name) |
| `node` | Yes | Target node to deploy to |
| `active` | No | Whether MCP keeps this running (default: `true`) |
| `path` | No | Source directory relative to workspace (default: `name`) |
**Build fields:**
| Field | Purpose |
|-------|---------|
| `build.images.<name>` | Maps build image name to Dockerfile path. The `<name>` must match the repository name in a component's `image` field (the part after the last `/`, before the `:` tag). |
**Component fields:**
| Field | Required | Purpose |
|-------|----------|---------|
| `name` | Yes | Component name (e.g. `api`, `web`) |
| `image` | Yes | Full image reference (e.g. `mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0`) |
| `volumes` | No | Volume mounts (list of `host:container` strings) |
| `cmd` | No | Command override (list of strings) |
| `env` | No | Extra environment variables (list of `KEY=VALUE` strings) |
| `network` | No | Container network (default: none) |
| `user` | No | Container user (e.g. `0:0`) |
| `restart` | No | Restart policy (e.g. `unless-stopped`) |
**Route fields (under `[[components.routes]]`):**
| Field | Purpose |
|-------|---------|
| `name` | Route name — determines `$PORT_<NAME>` env var |
| `port` | External port on mc-proxy (e.g. `8443`, `9443`, `443`) |
| `mode` | `l4` (TLS passthrough) or `l7` (TLS termination by mc-proxy) |
| `hostname` | Public hostname override |
### Routing Modes
| Mode | TLS handled by | Use when... |
|------|----------------|-------------|
| `l4` | The service itself | Service manages its own TLS (API servers, gRPC) |
| `l7` | mc-proxy | mc-proxy terminates TLS and proxies HTTP to the service (web UIs) |
### Version Pinning
Component `image` fields **must** pin an explicit semver tag (e.g.
`mcr.svc.mcp.metacircular.net:8443/myservice:v1.1.0`). Never use
`:latest`. This ensures deployments are reproducible and `mcp status`
shows the actual running version. The version is extracted from the
image tag.
---
## 3. Build, Push, and Deploy
### Tag the Release
```bash
git tag -a v1.0.0 -m "v1.0.0"
git push origin v1.0.0
```
### Build and Push Images
```bash
mcp build <service>
```
This reads the `[build.images]` section of the service definition,
builds each Dockerfile, tags the images with the version from the
definition, and pushes them to MCR.
The workspace root is configured in `~/.config/mcp/mcp.toml`:
```toml
[build]
workspace = "~/src/metacircular"
```
Each service's source is at `<workspace>/<path>` (where `path` defaults
to the service name).
### Sync and Deploy
```bash
# Push all service definitions to agents, auto-build missing images
mcp sync
# Deploy (or redeploy) a specific service
mcp deploy <service>
```
`mcp sync` checks whether each component's image tag exists in MCR. If
missing and the source tree is available, it builds and pushes
automatically.
`mcp deploy` pulls the image on the target node and creates or
recreates the containers.
### What Happens During Deploy
1. Agent assigns a free host port (1000060000) for each declared route.
2. Agent starts containers with `$PORT` / `$PORT_<NAME>` environment
variables set to the assigned ports.
3. Agent registers routes with mc-proxy (hostname → `127.0.0.1:<port>`,
mode, TLS cert paths).
4. Agent records the full state in its SQLite registry.
On stop (`mcp stop <service>`), the agent reverses the process: removes
mc-proxy routes, then stops containers.
---
## 4. Data Directory Convention
All runtime data lives in `/srv/<service>/` on the host. This directory
is bind-mounted into the container.
```
/srv/<service>/
├── <service>.toml # Configuration file
├── <service>.db # SQLite database (created on first run)
├── certs/ # TLS certificates
│ ├── cert.pem
│ └── key.pem
└── backups/ # Database snapshots
```
This directory must exist on the target node before the first deploy,
owned by the `mcp` user (which runs rootless Podman). Create it with:
```bash
sudo mkdir -p /srv/<service>/certs
sudo chown -R mcp:mcp /srv/<service>
```
Place the service's TOML configuration and TLS certificates here before
deploying.
---
## 5. Configuration
Services use TOML configuration with environment variable overrides.
### Standard Config Sections
```toml
[server]
listen_addr = ":8443"
grpc_addr = ":9443"
tls_cert = "/srv/<service>/certs/cert.pem"
tls_key = "/srv/<service>/certs/key.pem"
[database]
path = "/srv/<service>/<service>.db"
[mcias]
server_url = "https://mcias.metacircular.net:8443"
ca_cert = ""
service_name = "<service>"
tags = []
[log]
level = "info"
```
For services with SSO-enabled web UIs, add:
```toml
[sso]
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
```
For services with a separate web UI binary, add:
```toml
[web]
listen_addr = "127.0.0.1:8080"
vault_grpc = "127.0.0.1:9443"
vault_ca_cert = ""
```
### $PORT Convention
When deployed via MCP, the agent assigns host ports and passes them as
environment variables. **Applications should not hardcode listen
addresses** — they will be overridden at deploy time.
| Env var | When set |
|---------|----------|
| `$PORT` | Component has a single unnamed route |
| `$PORT_<NAME>` | Component has named routes |
Route names are uppercased: `name = "rest"``$PORT_REST`,
`name = "grpc"``$PORT_GRPC`.
**Container listen address:** Services must bind to `0.0.0.0:$PORT`
(or `:$PORT`), not `localhost:$PORT`. Podman port-forwards go through
the container's network namespace — binding to `localhost` inside the
container makes the port unreachable from outside.
Services built with **mcdsl v1.1.0+** handle this automatically —
`config.Load` checks `$PORT` → overrides `Server.ListenAddr`, and
`$PORT_GRPC` → overrides `Server.GRPCAddr`. These take precedence over
TOML values.
Services not using mcdsl must check these environment variables in their
own config loading.
### Environment Variable Overrides
Beyond `$PORT`, services support `$SERVICENAME_SECTION_KEY` overrides.
For example, `$MCR_SERVER_LISTEN_ADDR=:9999` overrides
`[server] listen_addr` in MCR's config. `$PORT` takes precedence over
these.
---
## 6. Authentication (MCIAS Integration)
Every service delegates authentication to MCIAS. No service maintains
its own user database. Services support two login modes: **SSO
redirect** (recommended for web UIs) and **direct credentials**
(fallback / API clients).
### SSO Login (Web UIs)
SSO is the preferred login method for web UIs. The flow is an OAuth
2.0-style authorization code exchange:
1. User visits the service and is redirected to `/login`.
2. Login page shows a "Sign in with MCIAS" button.
3. Click redirects to MCIAS (`/sso/authorize`), which authenticates the
user.
4. MCIAS redirects back to the service's `/sso/callback` with an
authorization code.
5. The service exchanges the code for a JWT via a server-to-server call
to MCIAS `POST /v1/sso/token`.
6. The JWT is stored in a session cookie.
SSO is enabled by adding an `[sso]` section to the service config and
registering the service as an SSO client in MCIAS.
**Service config:**
```toml
[sso]
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
```
**MCIAS config** (add to the `[[sso_clients]]` list):
```toml
[[sso_clients]]
client_id = "<service>"
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
service_name = "<service>"
```
The `redirect_uri` must match exactly between the service config and
the MCIAS client registration.
When `[sso].redirect_uri` is empty or absent, the service falls back to
the direct credentials form.
**Implementation:** Services use `mcdsl/sso` (v1.7.0+) which handles
state management, CSRF-safe cookies, and the code exchange. The web
server registers three routes:
| Route | Purpose |
|-------|---------|
| `GET /login` | Renders landing page with "Sign in with MCIAS" button |
| `GET /sso/redirect` | Sets state cookies, redirects to MCIAS |
| `GET /sso/callback` | Validates state, exchanges code for JWT, sets session |
### Direct Credentials (API / Fallback)
1. Client sends credentials to the service's `POST /v1/auth/login`.
2. Service forwards them to MCIAS via `mcdsl/auth.Authenticator.Login()`.
3. MCIAS validates and returns a bearer token.
4. Subsequent requests include `Authorization: Bearer <token>`.
5. Service validates tokens via `ValidateToken()`, cached for 30s
(keyed by SHA-256 of the token).
Web UIs use this mode when SSO is not configured, presenting a
username/password/TOTP form instead of the SSO button.
### Roles
| Role | Access |
|------|--------|
| `admin` | Full access, policy bypass |
| `user` | Access governed by policy rules, default deny |
| `guest` | Service-dependent restrictions, default deny |
Admin detection comes solely from the MCIAS `admin` role. Services
never promote users locally.
---
## 7. Networking
### Hostnames
Every service gets `<service>.svc.mcp.metacircular.net` automatically.
Public-facing services can declare additional hostnames:
```toml
[[components.routes]]
port = 443
mode = "l7"
hostname = "docs.metacircular.net"
```
### TLS
- **Minimum TLS 1.3.** No exceptions.
- L4 services manage their own TLS — certificates go in
`/srv/<service>/certs/`.
- L7 services have TLS terminated by mc-proxy — certs are stored at
`/srv/mc-proxy/certs/<service>.pem`.
- Certificate and key paths are required config — the service refuses
to start without them.
### Container Networking
Containers join the `mcpnet` Podman network by default. Services
communicate with each other over this network or via loopback (when
co-located on the same node).
---
## 8. Command Reference
| Command | Purpose |
|---------|---------|
| `mcp build <service>` | Build and push images to MCR |
| `mcp sync` | Push all service definitions to agents; auto-build missing images |
| `mcp deploy <service>` | Pull image, (re)create containers, register routes |
| `mcp undeploy <service>` | Full teardown: remove routes, DNS, certs, and containers |
| `mcp stop <service>` | Remove routes, stop containers |
| `mcp start <service>` | Start previously stopped containers |
| `mcp restart <service>` | Restart containers in place |
| `mcp ps` | List all managed containers and status |
| `mcp status [service]` | Detailed status for a specific service |
| `mcp logs <service>` | Stream container logs |
| `mcp edit <service>` | Edit service definition |
---
## 9. Complete Walkthrough
Deploying a new service called `myservice` from scratch:
```bash
# 1. Prepare the target node
ssh rift
sudo mkdir -p /srv/myservice/certs
sudo chown -R mcp:mcp /srv/myservice
# Place myservice.toml and TLS certs in /srv/myservice/
exit
# 2. Tag the release
cd ~/src/metacircular/myservice
git tag -a v1.0.0 -m "v1.0.0"
git push origin v1.0.0
# 3. Write the service definition
cat > ~/.config/mcp/services/myservice.toml << 'EOF'
name = "myservice"
node = "rift"
[build.images]
myservice = "Dockerfile.api"
[[components]]
name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/myservice:v1.0.0"
volumes = ["/srv/myservice:/srv/myservice"]
[[components.routes]]
name = "rest"
port = 8443
mode = "l4"
[[components.routes]]
name = "grpc"
port = 9443
mode = "l4"
EOF
# 4. Build and push the image
mcp build myservice
# 5. Deploy
mcp deploy myservice
# 6. Verify
mcp status myservice
mcp ps
```
The service is now running, with mc-proxy routing
`myservice.svc.mcp.metacircular.net` traffic to the agent-assigned
ports.
---
## Appendix: Repository Layout
Services follow a standard directory structure:
```
.
├── cmd/<service>/ CLI entry point (server, subcommands)
├── cmd/<service>-web/ Web UI entry point (if separate)
├── internal/ All service logic (not importable externally)
│ ├── auth/ MCIAS integration
│ ├── config/ TOML config loading
│ ├── db/ Database setup, migrations
│ ├── server/ REST API server
│ ├── grpcserver/ gRPC server
│ └── webserver/ Web UI server (if applicable)
├── proto/<service>/v1/ Protobuf definitions
├── gen/<service>/v1/ Generated gRPC code
├── web/ Templates and static assets (embedded)
├── deploy/
│ ├── <service>-rift.toml Reference MCP service definition
│ ├── docker/ Docker Compose files
│ ├── examples/ Example config files
│ └── systemd/ systemd units
├── Dockerfile.api API server container
├── Dockerfile.web Web UI container (if applicable)
├── Makefile Standard build targets
└── <service>.toml.example Example configuration
```
### Standard Makefile Targets
| Target | Purpose |
|--------|---------|
| `make all` | vet → lint → test → build (the CI pipeline) |
| `make build` | `go build ./...` |
| `make test` | `go test ./...` |
| `make vet` | `go vet ./...` |
| `make lint` | `golangci-lint run ./...` |
| `make docker` | Build the container image |
| `make proto` | Regenerate gRPC code from .proto files |
| `make devserver` | Build and run locally against `srv/` config |
---
## 10. Agent Management
MCP manages a fleet of nodes with heterogeneous operating systems and
architectures. The agent binary lives at `/srv/mcp/mcp-agent` on every
node — this is a mutable path that MCP controls, regardless of whether
the node runs NixOS or Debian.
### Node Configuration
Each node in `~/.config/mcp/mcp.toml` includes SSH and architecture
info for agent management:
```toml
[[nodes]]
name = "rift"
address = "100.95.252.120:9444"
ssh = "rift"
arch = "amd64"
[[nodes]]
name = "hyperborea"
address = "100.x.x.x:9444"
ssh = "hyperborea"
arch = "arm64"
```
### Upgrading Agents
After tagging a new MCP release:
```bash
# Upgrade all nodes (recommended — prevents version skew)
mcp agent upgrade
# Upgrade a single node
mcp agent upgrade rift
# Check versions across the fleet
mcp agent status
```
`mcp agent upgrade` cross-compiles the agent binary for each target
architecture, SSHs to each node, atomically replaces the binary, and
restarts the systemd service. All nodes should be upgraded together
because new CLI versions often depend on new agent RPCs.
### Provisioning New Nodes
One-time setup for a new Debian node:
```bash
# 1. Provision the node (creates user, dirs, systemd unit, installs binary)
mcp node provision <name>
# 2. Register the node
mcp node add <name> <address>
# 3. Deploy services
mcp deploy <service>
```
For NixOS nodes, provisioning is handled by the NixOS configuration.
The NixOS config creates the `mcp` user, systemd unit, and directories.
The `ExecStart` path points to `/srv/mcp/mcp-agent` so that `mcp agent
upgrade` works the same as on Debian nodes.
---
## Appendix: Currently Deployed Services
For reference, these services are operational on the platform:
| Service | Version | Node | Purpose |
|---------|---------|------|---------|
| MCIAS | v1.9.0 | (separate) | Identity and access |
| Metacrypt | v1.4.1 | rift | Cryptographic service, PKI/CA |
| MC-Proxy | v1.2.1 | rift | TLS proxy and router |
| MCR | v1.2.1 | rift | Container registry |
| MCNS | v1.1.1 | rift | Authoritative DNS |
| MCDoc | v0.1.0 | rift | Documentation server |
| MCQ | v0.4.0 | rift | Document review queue |
| MCP | v0.7.6 | rift | Control plane agent |