Document SSO login flow in packaging and deployment guide
Add SSO redirect flow alongside direct credentials, MCIAS client registration steps, [sso] config section, and updated service versions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -385,7 +385,14 @@ tags = []
|
||||
level = "info"
|
||||
```
|
||||
|
||||
For services with a web UI, add:
|
||||
For services with SSO-enabled web UIs, add:
|
||||
|
||||
```toml
|
||||
[sso]
|
||||
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
|
||||
```
|
||||
|
||||
For services with a separate web UI binary, add:
|
||||
|
||||
```toml
|
||||
[web]
|
||||
@@ -433,18 +440,72 @@ these.
|
||||
## 6. Authentication (MCIAS Integration)
|
||||
|
||||
Every service delegates authentication to MCIAS. No service maintains
|
||||
its own user database.
|
||||
its own user database. Services support two login modes: **SSO
|
||||
redirect** (recommended for web UIs) and **direct credentials**
|
||||
(fallback / API clients).
|
||||
|
||||
### Auth Flow
|
||||
### SSO Login (Web UIs)
|
||||
|
||||
SSO is the preferred login method for web UIs. The flow is an OAuth
|
||||
2.0-style authorization code exchange:
|
||||
|
||||
1. User visits the service and is redirected to `/login`.
|
||||
2. Login page shows a "Sign in with MCIAS" button.
|
||||
3. Click redirects to MCIAS (`/sso/authorize`), which authenticates the
|
||||
user.
|
||||
4. MCIAS redirects back to the service's `/sso/callback` with an
|
||||
authorization code.
|
||||
5. The service exchanges the code for a JWT via a server-to-server call
|
||||
to MCIAS `POST /v1/sso/token`.
|
||||
6. The JWT is stored in a session cookie.
|
||||
|
||||
SSO is enabled by adding an `[sso]` section to the service config and
|
||||
registering the service as an SSO client in MCIAS.
|
||||
|
||||
**Service config:**
|
||||
|
||||
```toml
|
||||
[sso]
|
||||
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
|
||||
```
|
||||
|
||||
**MCIAS config** (add to the `[[sso_clients]]` list):
|
||||
|
||||
```toml
|
||||
[[sso_clients]]
|
||||
client_id = "<service>"
|
||||
redirect_uri = "https://<service>.svc.mcp.metacircular.net/sso/callback"
|
||||
service_name = "<service>"
|
||||
```
|
||||
|
||||
The `redirect_uri` must match exactly between the service config and
|
||||
the MCIAS client registration.
|
||||
|
||||
When `[sso].redirect_uri` is empty or absent, the service falls back to
|
||||
the direct credentials form.
|
||||
|
||||
**Implementation:** Services use `mcdsl/sso` (v1.7.0+) which handles
|
||||
state management, CSRF-safe cookies, and the code exchange. The web
|
||||
server registers three routes:
|
||||
|
||||
| Route | Purpose |
|
||||
|-------|---------|
|
||||
| `GET /login` | Renders landing page with "Sign in with MCIAS" button |
|
||||
| `GET /sso/redirect` | Sets state cookies, redirects to MCIAS |
|
||||
| `GET /sso/callback` | Validates state, exchanges code for JWT, sets session |
|
||||
|
||||
### Direct Credentials (API / Fallback)
|
||||
|
||||
1. Client sends credentials to the service's `POST /v1/auth/login`.
|
||||
2. Service forwards them to MCIAS via the client library
|
||||
(`git.wntrmute.dev/mc/mcias/clients/go`).
|
||||
2. Service forwards them to MCIAS via `mcdsl/auth.Authenticator.Login()`.
|
||||
3. MCIAS validates and returns a bearer token.
|
||||
4. Subsequent requests include `Authorization: Bearer <token>`.
|
||||
5. Service validates tokens via MCIAS `ValidateToken()`, cached for 30s
|
||||
5. Service validates tokens via `ValidateToken()`, cached for 30s
|
||||
(keyed by SHA-256 of the token).
|
||||
|
||||
Web UIs use this mode when SSO is not configured, presenting a
|
||||
username/password/TOTP form instead of the SSO button.
|
||||
|
||||
### Roles
|
||||
|
||||
| Role | Access |
|
||||
@@ -685,10 +746,10 @@ For reference, these services are operational on the platform:
|
||||
| Service | Version | Node | Purpose |
|
||||
|---------|---------|------|---------|
|
||||
| MCIAS | v1.9.0 | (separate) | Identity and access |
|
||||
| Metacrypt | v1.3.1 | rift | Cryptographic service, PKI/CA |
|
||||
| Metacrypt | v1.4.1 | rift | Cryptographic service, PKI/CA |
|
||||
| MC-Proxy | v1.2.1 | rift | TLS proxy and router |
|
||||
| MCR | v1.2.1 | rift | Container registry |
|
||||
| MCNS | v1.1.1 | rift | Authoritative DNS |
|
||||
| MCDoc | v0.1.0 | rift | Documentation server |
|
||||
| MCQ | v0.2.0 | rift | Document review queue |
|
||||
| MCQ | v0.4.0 | rift | Document review queue |
|
||||
| MCP | v0.7.6 | rift | Control plane agent |
|
||||
|
||||
103
docs/phase-e-plan.md
Normal file
103
docs/phase-e-plan.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Phase E: Multi-Node Orchestration
|
||||
|
||||
Phase D (automated DNS registration) is complete. Phase E extends MCP from
|
||||
a single-node agent on rift to a multi-node fleet with a central master
|
||||
process.
|
||||
|
||||
## Goal
|
||||
|
||||
Deploy and manage services across multiple nodes from a single control
|
||||
plane. The operator runs `mcp deploy` and the system places the workload on
|
||||
the right node, provisions certs, registers DNS, and configures routing --
|
||||
same as today on rift, but across the fleet.
|
||||
|
||||
## Fleet Topology
|
||||
|
||||
| Node | OS | Arch | Role |
|
||||
|------|----|------|------|
|
||||
| desktop (TBD) | NixOS | amd64 | Control plane -- runs master + MCIAS + MCNS |
|
||||
| rift | NixOS | amd64 | Compute -- application services |
|
||||
| orion | NixOS | amd64 | Compute |
|
||||
| hyperborea | Debian | arm64 | Compute (Raspberry Pi) |
|
||||
| svc | Debian | amd64 | Edge -- mc-proxy for public traffic, no containers |
|
||||
|
||||
Tailnet is the interconnect between all nodes. Public traffic enters via
|
||||
mc-proxy on svc, which forwards over Tailnet to compute nodes.
|
||||
|
||||
## Components
|
||||
|
||||
### Master (`mcp-master`)
|
||||
|
||||
Long-lived orchestrator on the control plane node. Responsibilities:
|
||||
|
||||
- Accept CLI commands and dispatch to the correct agent
|
||||
- Aggregate status from all agents (fleet-wide view)
|
||||
- Node selection when `node` is omitted from a service definition
|
||||
- Health-aware scheduling using agent heartbeat data
|
||||
|
||||
The master is stateless in the durable sense -- it rebuilds its world view
|
||||
from agents on startup. If the master goes down, running services continue
|
||||
unaffected; only new deploys and rescheduling stop.
|
||||
|
||||
### Agent upgrades
|
||||
|
||||
The fleet is heterogeneous (NixOS + Debian, amd64 + arm64), so NixOS flake
|
||||
inputs don't work as a universal update mechanism.
|
||||
|
||||
**Design:** MCP owns the binary at `/srv/mcp/mcp-agent` on all nodes.
|
||||
|
||||
- `mcp agent upgrade [node]` -- CLI cross-compiles for the target's
|
||||
GOARCH, SCPs the binary, restarts via SSH
|
||||
- Node config gains `ssh` (user@host) and `arch` (amd64/arm64) fields
|
||||
- rift's NixOS `ExecStart` changes from nix store path to
|
||||
`/srv/mcp/mcp-agent`
|
||||
- All nodes: binary at `/srv/mcp/mcp-agent`, systemd unit
|
||||
`mcp-agent.service`
|
||||
|
||||
Upgrades must be coordinated -- new RPCs cause `Unimplemented` errors on
|
||||
old agents.
|
||||
|
||||
### Edge agents
|
||||
|
||||
svc runs an agent but does NOT run containers. Its agent manages mc-proxy
|
||||
routing only: when the master provisions a service on a compute node, svc's
|
||||
agent updates mc-proxy routes to point at the compute node's Tailnet
|
||||
address.
|
||||
|
||||
### MCIAS migration
|
||||
|
||||
MCIAS moves from the svc VPS to the control plane node, running as an
|
||||
MCP-managed container with an independent lifecycle. Bootstrap order:
|
||||
|
||||
1. MCIAS image pre-staged or pulled unauthenticated
|
||||
2. MCIAS starts (L4 passthrough through mc-proxy -- manages its own TLS)
|
||||
3. All other services bootstrap after MCIAS is up
|
||||
|
||||
## Scheduling
|
||||
|
||||
Three placement modes, in order of specificity:
|
||||
|
||||
1. `node = "rift"` -- explicit placement on a named node
|
||||
2. `node = "pi-pool"` -- master picks within a named cluster
|
||||
3. `node` omitted -- master picks any compute node with capacity
|
||||
|
||||
Resource-aware placement via agent heartbeats (CPU, memory, disk). RPis
|
||||
with 4-8 GB RAM need resource tracking more than beefy servers.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **Control plane machine**: which desktop becomes the always-on node?
|
||||
- **Heartbeat model**: agent push vs. master poll?
|
||||
- **Cluster definition**: explicit pool config in master vs. node labels/tags?
|
||||
- **MCIAS migration timeline**: when to cut over from svc to control plane?
|
||||
- **Agent on svc**: what subset of agent RPCs does an edge-only agent need?
|
||||
|
||||
## What Phase E Does NOT Include
|
||||
|
||||
These remain future work:
|
||||
|
||||
- Auto-reconciliation (agent auto-restarting drifted containers)
|
||||
- Migration (snapshot streaming between nodes)
|
||||
- Web UI for fleet management
|
||||
- Observability / log aggregation
|
||||
- Object store
|
||||
Reference in New Issue
Block a user